Heroes and Juniors: Increasing Engineering Team Velocity
Or why organizational science is a powerful tool for building better software
Ten years ago recruiting engineers was all about ninjas, rockstars and 10xers, and while many people made fun of those terms and the people who used them, almost no one questioned the underlying concept that engineers who can do the work of full teams were desirable hires.
Few people in engineering feel that way today. The conversation around what a valuable engineer looks like has flipped. What were small, scrappy companies in 2010/2011 are now either dead or medium sized companies with mounds of poorly documented, idiosyncratic technical debt — the apparent cost of ninjas and rockstars. We’ve learned from experience that faster does not mean maintainable. People no longer want heroes, they want senior engineers who mentor, knowledge share, balance their work and life so that they don’t burn out and build sustainable and maintainable software.
But this desire to forego the rockstars in favor of more a mature and reliable (if a bit slower) workforce hasn’t created a shift in opinions around junior engineers, which has always struck me as funny. Who exactly are these great, supportive, mentoring senior engineers supposed to mentor if you aren’t hiring juniors?
The Plight of the Most Junior Senior Engineer
Sometime ago I was looking at a system with one of my engineering teams. This system had been built to support three different templating languages, which sounds useful until you consider that Twig, Liquid, and Jinja 2 have limited differences in functionality. Supporting them all means supporting the libraries necessary to compile them and all their dependencies, which was quickly becoming unsustainable. It seemed clear that we needed to consolidate templates to one templating language. The problem? Roughly 60 million customer templates would need to be migrated.
Any time there’s customer input, the pain of migration is inversely proportional to how much validation you’re able to do. As it turns out all the our templating languages allowed for a large degree of syntax variability. When your syntax wasn’t 100% correct the compiler would often still render valid HTML input, just kind of its best guess. This meant converting our templates to another templating language had a very long tail. There were a few common mistakes with a lot of cases, then there were a whole bunch of one-off mistakes.
So we eventually got through all the inconsistencies enough to do the migration with no negatives for customers, but it took one of my senior engineers close to a full month to fine tune the data conversion scripts to handle all the one-off errors. Meanwhile there were a bunch of engineering challenges that are just as important that were not getting done.
The question became for me: was it necessary to burn senior time on this? I didn’t think the senior engineer’s senior ability ended up shaving off that much time — sometimes data entry is just data entry — but we could have structured the work to teach a junior engineer lots of different skills around working with deltas, building command line tools, debugging and contributing back to open source libraries. Then instead of a senior investing a month of time, the whole team would invest a few hours here and there pairing with the junior. Only we didn’t have a more junior person on the team because it was not that organization’s policy to hire junior people. Leadership felt more senior people would get more done and therefore why waste headcount on junior people?
This had a couple of strange effects on the org. First it meant we had a group of perfectly competent mid-career software engineers whose growth we were stunting because we insisted on calling them senior engineers when they weren’t. That put them under incredible pressure to perform and also made it impossible to hold the actual senior engineers accountable because to hold them accountable would mean judging different groups of people with the same position on the career ladder by different standards. Either the mid-career engineers got thrown under the bus, or the senior people slacked off on their complete senior responsibilities… or both.
On-Call for Life
The other big impact this had was that engineering was having trouble maintaining efficient on-call rotations. A good on-call rotation needs six people. You want to run two tracks: a primary on-call person who gets the pages and a secondary on-call person who gets paged if the primary fails to respond. But you also want to space things out so that people are not on-call too often. Six people on two tracks means that engineers spend about 30% of their year on-call, with only one week approximately every two months where they are primary. This allows for long rest periods from the psychological stress of being on-call and deters burnout.
Sometimes people who have never been on-call before assume that if the number of pages to on-call is low than on-call will be less stressful. Ergo better engineering teams can run smaller on-call rotations. This is wrong. The stress of on-call comes from not being able to disconnect from work, not from answering pages. When you are on-call you have to have your work machine and however you are being paged accessible and with you at all times. That means AT. ALL. TIMES. I once caught an engineer in the pool carrying his work machine over his head because he was on-call but didn’t want to miss out. That’s fucked up, but that’s what being on-call for a long period of time is like.
The further you get away from six people on-call the higher the risk of burnout is. Five people on-call with two tracks means people are spending 40% of their time on-call. Four people on-call are spending half of their year on-call. You can drop the secondary rotation — and sometimes I do this if the staff shortage is temporary — but that’s a huge risk. Nothing can go wrong with the primary. They can’t sleep through a page. They can’t forget to charge their phone. They need to be perfect through-out their shift.
So at this organization rotations were shared by multiple teams, or managers were trying to make rotations with three or four people work somehow. Why was this happening? Well… They had too many senior people and keeping all those senior people together on the same team was making the teams harder to run. Too many cooks were in the kitchen. They were endlessly subdividing their teams into smaller, and smaller, more specialized teams.
Going Faster by Going Slower
Look, no matter how brilliant they are or how thoroughly you screen for “culture fit” you can’t have a team full of tech leads. If you do nothing but hire senior engineers the only way to maximize the output of those engineers is to allow them to continue building new things. In any given software project there are different levels of tasks to be completed. Obviously in the beginning there’s a lot of architectural decisions to be made, a lot of unknowns, and a lot of requirements to be discovered, but as the project continues the team will start to pull off well-scoped items of work that need to be done in order for things to continue. These include:
- Data collection and research defining the context of challenges
- Well scoped implementation work that adds one more of the same kind of pattern already implemented in the project.
- Tweaks to existing code after user feedback
All of these things are work that senior engineers should be willing to do, but none of these things will fill up a senior engineer’s plate — nor will a senior person find them particularly challenging or engaging. On top of that if your seniors are not delivering the value of seniors then you are overpaying for technical talent.
There are places in an organization’s life cycle where it makes sense to over-represent senior talent. In the beginning when the organization needs to build a lot of services very fast in order to even have an organization in the first place, hiring a few too many seniors can help that get done.
But at different points of an organization’s life cycle hiring more senior people does not increase velocity it actually diminishes it. The senior people create more things than the organization can maintain or support. People work longer and longer hours, spend more time on-call, have to maintain expertise in more and more different parts of an architecture that is getting more and more complex. They burn out and the organization loses institutional knowledge.
One of the most valuable lessons from organizational science is that it doesn’t matter who people are, it matters how they are incentivized. A lot of times engineering organizations will discuss their struggles as a matter of individual character flaws and assume the way to solve them are verbal appeals to “do the right thing” (however that is defined) when the fact is the structure of the organization itself is pushing people to behave the way they behave.
If an engineering organization doesn’t want heroes, if they want senior engineers who tackle complex problems and mentor others, they have to actually incentivize that behavior. If no one does the more junior work, the project can’t move forward. If a senior engineer does the junior work then he loses his pathway to promotion. So either the junior work never gets done, or the seniors who do it also seek out opportunities to be heroes in order to keep themselves on the pathway to promotion. Instead of a balanced team moving towards one goal we have a team with underutilized resources in the form of senior engineers looking for extra work.
Transferring Knowledge
It can be scary to hire a junior person. In technology, situations can change fast. An open role where it felt safe to add someone with great growth potential can shift to a critical hire that you need to step up to complex challenges in a matter of weeks. I’ve definitely had situations where we chose the more senior of two mid-career candidates and were grateful for that when priorities changed unexpectedly two months later.
But when the alternative is backing yourself into a situation where you have one engineer per service it really makes sense to slow down and pay more attention to the overall health of your engineering organization.
If one engineer monopolizes a whole work stream then no one else understands it, no one else can maintain it, and if the system is critical enough the engineer may be denied opportunities to do other things. The organization ends up with a knowledge transfer issue that stunts growth. We also don’t actually have a team in this scenario. We might have a few engineers who are organized under the same heading but they’re not a team because they’re not actually building anything together.
I define junior engineers here pretty broadly. It includes both the traditional new grads/entry level people, and engineers with a couple years of experience who have yet to operate a system at scale. How many junior engineers an engineering organization should be hiring and how junior they should be depends on the organization’s maturity level. When organizations are new and the vast majority of engineering work is building the first iteration of the core product, teams full of senior talent are not that big a deal. If you can hire them, do it. But as the work shifts from building the core product to maintaining and optimizing the core product. You want to start hiring more juniors because juniors slow down the pace of new growth and you want things to slow down. As organizations get older they should be lengthening their feedback periods so that they can study and adapt to more long term second and third order effects of what they build. Larger more complicated organizations naturally have to wait a longer period of time to see outcomes show up on the balance sheet (or whatever metric is relevant). There was a great blog post on the outcomes of growth hacking published last month that outlined how short term optimizations can be cancelled out or completely inverted by long term trends. Such situations are not relevant at 2 year old companies because when you have nothing to begin with, any change at all is likely to be long term positive.
Junior engineers also force your senior engineers to understand what they know, which is a huge distinction. If your senior engineers do not understand why they are good at what they are good at their ability to grow technically at the same organization is capped. As a person learns they take on new responsibilities, if they cannot transfer some of their knowledge to colleagues they also cannot transfer any of those responsibilities. As they take on additional responsibilities they are not being released from their old responsibilities. The team becomes more and more dependent on them, they have less and less time to devote to new learning. Eventually you either burn the person out or you force them to leave for a new organization in order to clear their responsibilities and focus on growth opportunities.
Knowledge transfer isn’t something that just happens, it’s a skill. And like all skills you only get good at it by doing it over and over again. Consider this: in 2013 Sun Microsystems published the conclusions of a seven year study about the impact of mentorship that found that the people doing the mentoring experienced similar benefits to the junior colleagues being mentored. 25% of mentees saw a positive change in their salary growth, and so did 28% of mentors. Retention rate for the juniors rose from 49% to 72% and for the seniors it rose from 49% to 69%. When senior level people are able to grow junior colleagues, they are able to take on new responsibilities, build new skills, find new challenges without leaving the organization and ultimately everyone is better off.
So why aren’t you hiring junior engineers?