When I wrote about good modules and bad modules, I mentioned that an indication of a “bad” module was when it was used to extract code for the sake of code extraction. This usually results in a module that is only being mixed into one class.
When I published the article, I had lots of support from people intimately familiar with the internals of many popular libraries like Active Record, agreeing that such extraction is not helpful and would eventually harm a project. I also had people rise up to defend their use of modules.
This gave me more time to reflect on my views and how I expressed them. I realized that while I casually mentioned some of the problem’s “readability,” “complexity,” and “maintainability,” I didn’t do much to explain why modules made this harder. I took my experience for granted, and readers deserve more.
Developers love borrowing concepts from other trades to describe their work. We especially enjoy to comparing ourselves to woodworkers. The phrase “sharp tools” brings to mind a chisel chopping out an oak mortise, a hatchet splitting a well seasoned timber, and a sawmill slicing a tree into boards. Programmers use the phrase “sharp tools” to refer to tradeoff of productivity and the bloodlust and gore that awaits the careless worker. I’m a sometimes woodworker and a full time programmer and I have used truly sharp tools for both.
“Sharp tools should exist” they will hear me screaming, as they try to drag me away from the computer.
There are two wood tools that I keep “shaving” sharp in my shop - chisels and plane blades. You probably know what a chisel is, but lots of people might be unfamiliar with a plane. It is designed to shave off bits of wood in impossibly small increments, the best can get wood shavings in 0.001 inch thicknesses. It makes more sense to see it, than to have it described:
The blade on this device is razor sharp (literally), and it has to be. You can’t see in the video, but this man, likely 180+ lbs, is putting his whole body weight to provide the forces necessary to separate the bonds and literally shave off a section of wood. If a tool is not sharp enough to shave arm hair, it’s going to be much harder to make whispy thin curls of wood shavings:
This is what sharp means to me. You can’t buy sharp, you have to create it and maintain it. Sharp is a constant state of maintenance and honing. These tools are a joy to use. The tool does it’s job remarkably well. A sharp tool is one that works with you, a dull tool constantly reminds you of its shortcomings and fights you every step along the way.
There is a fallacy common to “advanced” developers that tools designed to be “simple” or “easy” or “safe” are not sharp. They are “beginner’s” tools. Some devs see a large framework and think themselves above it. They see complex tasks wrapped up in simple APIs and ask “where is the blood?”. The association between effective devices and their carnage so interlinked that they cannot be separated.
While a jointer plane is a remarkably sharp device, I’ve never once cut myself while using it. This isn’t a statement of skill, but rather a comment on the design of the device. It does everything I will it to, and more. Its clever design means that this razor sharp edge is barely protruding from a safe surface. When using it your hands are on the opposite end side of the body. Here’s a view from the business end:
You can see the blade sticking through the “throat” of the plane. This device provides me with safety and a simple interface, it certainly isn’t “just for beginners”. This design is an evolution of planes over centuries of use. This safety does not diminish their usefulness to users of all skill levels. In fact, while a beginner might use this tool unaware of the full feature set of the tool, only a truly advanced user can appreciate the subtleties.
Just because there’s no blood doesn’t mean your tool isn’t sharp
Pain and difficulty of use do not indicate an “advanced” tool. Developers of all levels can appreciate and consume a good API. I’ve never met someone who complained because the documentation was too well thought out. I’ve never been upset about an error that was too helpful. A good tool guides and compliments the worker.
This isn’t to say that all sharp tools are also bloodless. On the woodworking forums there’s a number of posts reminding us that chisels can send you to the emergency room as fast as a power tool. Like a gun, don’t point the dangerous end at anything you’re attached to. Another tool I own is a “draw knife”. You use it by pointing the sharp end towards your body and pulling. It is probably the scariest thing in my shop:
This brings me to the second, somewhat contradictory fallacy of “advanced” developers - they think that if a tool is simple that it must inherently hiding painful secrets.
A common example here is an ORM. On the surface they can seem very simple; automating tedious tasks. If you’ve used one long enough you’ve been bitten by it. A recent example for me was bringing down a production database. How? I used the postgres int datatype instead of bigint for a primary key. I eventually created so many records that I ran out of numbers and the database started throwing errors. Migrating all the entries locked the database for a few hours:
Why did this happen? I used a DSL provided by my ORM for creating the schema and it wasn’t obvious that it was using int by default. I know what you’re thinking and no, it wasn’t Active Record. While the ORM was originally super fast in helping me define and migrate my schema, when I slipped with it, it wasn’t pleasant.
This was an extremely painful experience. As humans we learn from pain, we try to avoid it. The longer you’ve been programming the more adverse to specific pain points you become.
“Fool me once, shame on you, use an ORM again, shame on me.”
This type of mentality ends up with a massive case of NIH. The fallacy is equating the inherent pain caused to the ease of use with simplicity of the tool. The solution in the user’s mind is clear. If we make something more complex and explicit we wouldn’t have hit that pain. Either people will go to hand crafted artisanal SQL, or to a smaller home brew ORM. Unfortunately it’s usually too late when they realize their micro library has a SQL injection vulnerability, or another critical mistake. More blood.
If simplicity hides pain, and complexity also hides pain, what’s the solution?
To avoid pain…find where pain lives and take steps to avoid it.
While working in the shop I wear canvas apron and use a bench that is bolted to the floor. I clamp my wood securely to the table. My tool might slip, but it’s not compounded by additional failures.
In some cases you can modify a tool to your needs. For example you can use the drawknife for cutting corners on square stock. For this one off task you can attach bronze drawknife guides:
It won’t prevent all nicks and cuts, but it will provide an extra layer of safety when things do go wrong. It also helps us do this one task faster than we could without any protection at all. A good tool helps to guide our work.
In the case of the ORM, instead of giving up we can find the pain and fix it. For this case we can default foreign key columns to bigserial. This doesn’t make the ORM a “tool for beginners” it makes it a tool for everyone.
On a sidenote you can use pg:diagnose on Heroku to see if you’re using an integer primary key and are running out of numbers.
The difference between an “advanced” and a “beginner” programmer is not the tools we choose. The difference is that when a senior programmer experiences pain, they dig in with a retrospective. Why did we use that tool incorrectly? Was there documentation that explained that problem? Why not? Can we make this tool safer without sacrificing performance or usability? Yes? Let’s raise awareness by opening an issue. Fix it by opening a pull request. At the very least write some docs.
There are times when we experience pain even with good documentation and safeguards. It doesn’t mean bloodshed has to be avoided at all costs. It’s natural to recoil when we hit pain. Seasoned developers can learn from that pain and reach deep into their tools to make them better. Their work isn’t always selfless or charitable. It’s not always about “giving back”, if anything they are “fixing forwards”. The next time they’re tired and frustrated they’ll be glad they spent the extra effort making sure that pointy corner was rounded.
The beauty of open source is that we can all learn from each other’s mistakes. The tragedy is that so many people fail to realize they aren’t alone and don’t share their lessons. Some have never used a truly powerful “sharp” interface that is also effective and safe. Some have, but the experience was so seamless the presence of the tool wasn’t even noticed. When such an interface is missing, they don’t spend much time thinking of how to create it.
While using woodworking is fun for me, there is a mantra in the shop. Be present or be hurt. You might be the best worker in the world but if you go into the shop tired, upset, hungry, or drunk you’ll be sorry. The same is true for programmers. We shrug exhaustion and wear it as a badge of honor. We drink beers at happy hour then get paged. We push our bodies and minds to such lengths that frustration doesn’t even begin to describe our mental state. When we do this, we are all beginners. Either we only use tools when we’re inspired and working at 100%, or we start accepting that sharp tools need to be forgiving of our mistakes. A good tool can be appreciated by the elite, a great tool can be appreciated by everyone.
Support isn’t sexy, but it’s necessary. How open source software is supported is just as important as how well it works. Given the choice between building awesome new features or carefully reading and responding to 10 bug reports, which would you choose? Which is more important? When you think of Open Source maintainers what do you see? I see issues. I see dozens of open bug reports that haven’t been responded to in days. I see a pile of feature requests waiting to be worked on. Now when I open those issues, I see maintainers spending most of their time trying to get the information they need. “What version are you using?”, “was it working before”, “can you give me an example app”? Would you rather maintainers spend time asking for minute in-bug reports or fixing issues?
When I think of open source bug reports, I think of hospitals. A bug report is like a sick patient walking into an emergency room. No one knows exactly why they need to get better. They have self reported symptoms, but more information is needed. When someone goes into an ER, they don’t immediately go to a thoracic surgeon to get their blood pressure taken. Instead, they go to someone that records their vitals, take care of paperwork, and then makes sure the right doctor is assigned to the case. Hospitals call this process “triage”, from the French “separate out”. This person or team of people doesn’t need to know exactly what is wrong with you, they only need to know what information the doctor will need. Every minute someone spends on triage, is an extra minute someone else gets to dig into the root problem.
Triaging bugs is a necessary skill for any open source maintainers, whether they’re working on a newly minted library or helping out with a 10 year old framework. It’s also a skill that can be picked up relatively quickly without years of required programming knowledge. If you want to get started in Open Source but worried you’re not ready, triage can be a great first step.
First up, consider Open Source software you use. Think of the languages, and then think of any open source libraries you use. For example if you’re a Python programmer you might use django, the Requests library, and a whole lot of other packages. Pick a few that you think are very popular. These projects will have a larger existing team and you’ll get good experience from them, however you’ll also want to pick a few smaller projects. While you might get lost in the crowd with a larger project, 1 person can make a huge impact on a smaller one.
Once you’ve picked a few libraries, find their issue trackers. Try to think of a good trigger when you’ll be able to spend some time on open source. Maybe it’s every day in the middle of your morning coffee, or right after your once-a-week team standup, or maybe you can use the time right before your once-a-month community meet-up. I maintain an Open Source project that sends you issues to your inbox. Find a consistent schedule and pace that works with your life.
When you sit down to triage, make a goal. Maybe you want to find 5 issues to comment on. What exactly should you say? There are 3 states to an issue:
10) A bug is reported.
20) Extra information is gathered to reproduce the bug.
30) If the bug can’t be reproduced GOTO 20.
It’s your job to gather information and move towards a resolution. Let’s look at some common issue types and things you can do to help.
Issues have a shelf life. If they’re not actively being worked on, they can go bad. Maybe the original reporter found a work around, or maybe the bug resolved itself.
To find issues, sort by “updated at” if the issue tracker allows for it. Go through a few of them and leave a comment:
Can you confirm this is still an issue?.
If the issue is really old and someone already did that, you can push for the inactive issue to be closed:
There is no activity here for the last 2 months, let’s close this issue for now and re-open if needed.
An issue is only good as helpful as the people working on it. If the issue isn’t important enough for the original reporter to care about, there are likely more pressing problems that maintainers should focus on. Closing an issue isn’t a finality, if the problem comes up again, the issue can be re-opened or referenced by a new issue. Often if a thread goes on for too long it can be difficult to easily scan all the conversations, sometimes starting fresh is the best way to move forward.
When someone’s program was working but after a version update no longer works, it’s a regression. These are usually the easiest bugs to work on as a maintainer. Usually these bugs will say something like “it worked before” but not give specifics, which is where you can come in:
What version are you using now? What version were you using before when it was working?
Usually these reports will have some kind of reproduction instructions. Once you find the version numbers, try to reproduce the problem. Time box it to maybe 5 or 10 minutes so you don’t sink too much time into the problem. Take notes. If you run into any problems or questions when trying to reproduce the issue, write them down. 99% of reports will not include enough information to reproduce the bug. To the reporter it’s almost impossible to NOT hit the bug, they will give you wide sweeping brush strokes and expect you to hit the bug just as they did. In reality they’re probably hitting a stress case or made an assumption that most other people don’t. Your job is to find that assumption.
If you can’t reproduce the problem, you’ve got two options. Either you can report back with detailed information about how you tried to reproduce the issue and how it didn’t work. Hopefully this will jog the reporter’s memory and they will point out a crucial step you missed. The second takes more time from the reporter but in my experience it will save everyone time in the long run.
Can you please attach a small example app that reproduces the problem?
If the original reporter makes a new repo with the minimal artifacts required to reproduce the problem, you might still need to have some back and forth with them, but it will be much less. If they don’t want to make an example app, tell them about how you couldn’t reproduce and remind them that normally if a bug cannot be reproduced it cannot be fixed. People triaging and fixing open source bugs have limited time, the faster it takes them to reproduce the problem, the faster they can fix it. If a problem is important enough to you to report, it is important enough to report well, which means going the extra mile and including an example app.
If they already included an app, try to reproduce the problem and report back if it’s valid or not. When a maintainer sees the issue if they know someone else reproduced the problem, they’ll know they can reproduce as well. It might not sound like much, but it’s really helpful.
Trust but Verify
Reading bug reports is a bit like reading a story written by an unreliable narrator. They all mean well, but submitting bug reports is as much a skill as triaging bugs. Half of your job is education and coaching bug reporters on what a good bug report looks like. Likewise, it can be tempting when you see a report that looks like a “Usual Suspect” to declare the underlying problem code without verification.
When someone comes along to look into fixing a bug, we want them to have a consistent narrative. One thing that can help with getting this information is a good rapport with the person submitting the bug. You can thank them for their time and their report. If they have follow up questions about the triage process “why exactly do I have to make an example app after I gave you reproduction instructions?”. Remember that this might be the first issue they’ve ever opened. We’re all on the same team and working towards the same goals of making our open source software better. Humans are a large part of software and they all come with emotions. When everyone is level headed and respectful is when the most gets accomplished.
A PR is a bug report with a solution attached. Go through the same motions. Make sure you understand the problem, if there isn’t a linked issue ask for a reproduction case. In addition, see if the coding style matches the rest of the project. Does the PR come with tests or a CHANGELOG attached? What are base things that every new feature or bugfix to the project must have?
Not a Bug
Frequently people will open up bug reports that have nothing to do with the library or have very well documented fixes. Instead of replying “RTFM”, closing, and locking the issue, put yourself in their shoes. Maybe they saw the documentation but were confused. Maybe it could be documented somewhere else? It costs you nothing to be nice, and may earn a contributor in the future.
Here is an example from a comment I made in 2012. The behavior was unclear so I explained why the software worked as it did. I also linked to the relevant documentation and invited them for followup questions.
Sometimes issues are opened to ask questions “I want to do this " but i'm not sure how. This is a clear sign that documentation could be better. If you're looking for easy commits, look no further than writing documentation. The best stories are written by the people who experienced them. The best docs are written by users. If you don't have time to add docs, you can leave a note explaining where you think a doc change should have gone and ask the reporter to make the change. If your issue tracker is small, maybe open a new issue pointing out the documentation change. Often it's easier to just make the change.
Nothing to do
If you find a bug report that is active and has already been reproduced, you might not need to do anything. Great! Find another issue or take the time to read how they got from bug report to reproduction. Is there a consistent theme or phrase other maintainers use? Maybe they have their own techniques for triaging issues that you can use.
Historian - Code, culture, and conventions
Over time you’ll become more and more familiar with the projects you’re triaging. You’ll know which maintainers love fixing bugs, you’ll be able to predict which maintainer will ask to trim whitespace. Every project has their own conventions. For example, I’ve worked with projects that don’t accept any refactoring as they mess with the ability to blame. Some projects welcome feature proposals in issues, some don’t and ask for discussions to go to a mailing list. As you see more issues, you’ll see what the existing maintainers tell people who open issues. Over time you’ll notice an issue almost identical to a previous one. It’s good for you to provide context. If you know that another maintainer will ask for something, you can be pre-emptive and ask for it first.
Eventually you’ll start to see patterns in bugs, or documentation. While you’re only signing up to triage issues for now, it’s a very slippery slope to full blown bug fixing and code contribution. This is a good thing. If you already know what the other maintainers will say about an issue or a PR you open up, the process becomes a whole lot less scary.
Start at the Beginning
While this article is a bit long, the core ideas are simple. Maintainer time is valuable. Triaging issues takes a minute of your time and saves a minute of maintainers time. Triaging is a skill you can learn with common patterns that can be easily applied. If you’re still struggling with where to start I recommend going back to your list of libraries, find or add them to CodeTriage and subscribe.
Contributing to Open Source doesn’t have to be a full time job. If everyone chips in a little the contributions really add up. Your small contributions can make a big impact.
How do we make our programs faster? How do we make anything faster?
My first co-op job was working in packaging. They had a small Industrial Engineering library and let me read from it at work. The first book I read changed my life and my way of thinking about problem solving: The Goal by Eliyahu M. Goldratt. Just about every time I work on performance, it’s impossible for me to not make comparisons to The Goal in my head. In this post, we’ll look at some stories from the book and how they apply to performance tuning in programming.