So Your Company Wants You to Start Interviewing

I finally have a reason to say: these opinions are mine own and not my employer’s ;) I’ve chosen to frame things with my current employer in mind because after three years the bulk of my interviewing experience and experiments have been with them, but really this is just my opinion on how technical interviews should be, not an accurate description of how USDS interviews.

I’ve gotten a couple of requests for this post. Technical interviews have become one of my favorite topics to wax poetically about over the last few years. Most people who either encounter me in person or frequent the same online communities as I do have been treated to the output of some of my experiments in different approaches and types of questions. Some have even become guinea pigs for them.

My fascination with this topic started when I interviewed for the job I have now. Because it was bad. It was very very bad (although in fairness the organization was barely a year old by that point and scaling up faster than sensible process could be built). It was so bad that my second week on the job I was asked to help fix it — my second week!

Here’s what happened: no one asked me any programming questions of any kind. It was pretty weird. I had what I could identify as a basic technical screen that asked me a lot of stuff about the Linux command line and TCP/IP, then I had another interview where I chatted about my resume, then I had an interview to assess my emotional intelligence where the interviewer threw out the script the organization used and we talked about legacy software and using computer vision to sort a backlog of scanned medical records (specifically how to design an algorithm to classify MRIs, X-Rays, and CAT scans).

It was pretty weird for a couple of different reasons. First of all, the vast majority of my resume talked about my experience with data infrastructure and no one ever validated that. Not one question about databases or API design or ETL pipelines or anything of the kind. Then there was the fact that the screening questions were heavily biased toward Linux and when I arrived all the computers I worked with were (at the time) Windows machines. And of course, they were hiring me as a software engineer and no one had looked at a single line of code I had written.

When I started work a few months later, I realized that internally the organization was just as frustrated with the process as I had been. I got pulled into the effort to design a better system, which scared the crap out of me. Sure, I had interviewed people before. I had hired people. I had fired people. Just not on the scale that USDS was looking for, not with the level of process and consistency that USDS expected.

I ended up learning a lot and it was tremendously rewarding to help make a tiny, fragile organization stronger. (It still is)

TL;DR: Rules for Constructing an Interview

This will be kind of a long post. I’ll reference some theory, but I’d really like to be more specific and pragmatic than most reference guides on this topic usually are.

But essentially my philosophy about interviewing can be broken down into four basic rules:

1 — We ask questions that reveal who we are, what we value and what is exciting about our work.

2 — We ask questions that test skills we actually need and have been vetted for false correlations (for example, knowing the appropriate order of arguments for a command or function off the top of your head indicates a good memory, not necessarily superior programming ability).

3 — Our process is standardized in such a way that the candidate would get an equivalent experience from any of our interviewers, but allows enough flexibility for interviewers to dig into details specific to just that candidate and her answers.

4 — Our interviews are structured in such a way that different perspectives increase the strength of the signal, not the noise.

Rule 1: YOU Are Not Interviewing THEM

Interviews, especially technical ones, are bi-directional. You are trying to ask questions of candidates that help you figure out whether it’s worth hiring them, but they are also assessing whether your organization is a good fit for their needs. Before my last interview I had absolutely no intention of coming to USDS. I nearly missed out on my dream job because none of the interviews that came before that last one actually reflected what the work was. They asked me about technology that is common and popular but not used in government. They asked me about concepts too basic to reveal what their needs were. They did not verify my skills which suggested to me they did not think my skills were particularly valuable. Then the last interviewer threw out the script and asked me for advice on what he was working on. In doing so he gave me a guided tour of his day-to-day. I found it fascinating and I was hooked.

Candidate experience is an under-appreciated and under-invested component of interviewing. Too often candidate experience is thought to be the responsibility of Recruiting or HR, but first and foremost as an interviewer you should be designing your interviewing strategy around giving every candidate, regardless of their level of talent, a great experience. Do not fall into the trap of thinking that a harder, more grueling interview is a better interview. If you are putting a candidate through hell because the job is stressful and you want to test their ability to handle that stress, the candidate should be able to tell that is the purpose. If you are putting a candidate through hell just to put them through hell, then what the candidate learns about your organization is that it is a place where assholes are allow to thrive.

A hard interview isn’t always an unpleasant interview either. It’s all about purpose. That last interview for USDS forced me to talk about computer vision, something I had no experience implementing. Sure I understood the theory, I knew some of the toolsets, some of the research, but I could not really speak with authority on how one would implement such a solution because I never had.

Despite that I found this conversation fully engaging because it was obvious that what the question was testing had nothing to do with computer vision or even machine learning. The interviewer started out by giving me a broad picture of the problem. Every time I came up with a solution he would add some complication. Computer vision was where we ended up, but what the question was actually testing was my resilience and creativity. Since that was understood, the fact that the interview pushed me to the limits of my knowledge was not an unpleasant experience. I left that conversation feeling awesome.

But thinking about what interview questions say about your organization -vs- what their answers might say about your candidates is not something most interviewers spend a lot of time on.

Consider the following question (note that none of the example questions here are actual USDS interview questions, sorry)

The CISO at the Department of Technology is forbidding the use of Google Chrome on government computers bought and provided by his Department. His team are required to test and scan every software update before it’s allowed to be installed on a Federal network and he thinks that Chrome releases too many updates, overloading his staff and driving up cost. He would like to require all Department employees to use Internet Explorer so that his team need only vet one set of updates. How would you advise him?

So starting off, does this question accurately describe the challenges my organization faces? Unfortunately, yes. Although this specific situation has never happened, we have encountered the individual elements. Some agencies do turn off autoupdate on software like Chrome so that they can evaluate and approve every update themselves. Some agencies do require use of Internet Explorer (usually because they are using some obscure plugin only supported by IE). Cost is a critical factor in decision making at all agencies.

Does this question accurately reflect what we value? Yes. The wording of it suggests that we disapprove of the scenario and the reasoning leading up to it. We could have worded this question in a more neutral way to allow people to argue in favor of the CISO’s strategy, but instead we give them a little more information about who we are.

Does this question help us assess a candidate’s fit for our organization? Yes, in a couple of different ways. Some candidates may not think they can push back on the CISO’s plan in its entirety and may end up trying to figure out how to implement IE-only. Some candidates will try to dissuade the CISO diplomatically. Some much less diplomatically. We’ll end up learning not just about the candidate’s technical ability but her temperament, patience, ability to explain technical concepts to various audiences … all good data points for us in determining whether someone will succeed here.

But here’s the problem with this question: does this question reflect the work we want to do? No. Is it a problem that an engineer is going to get excited about solving? Probably not.

We might get a lot of good data from this question, but we ultimately give the candidate a negative impression of what it’s like to work for us. Like most organizations, we don’t want to invest time and energy interviewing people who do not accept our offer when made and if a candidate ends up thinking to herself “God this work sounds awful!” because we’re asking her things that don’t reflect our best work then that’s exactly what will happen.

Let’s look at another hypothetical question:

An important application that processes visas is down. Embassies and consulates around the world are panicking as their visa operations grind to a halt. Meanwhile farmers in the southern states are facing a shortage of help with thousands of migrant workers unable to cross the border for the harvest. The application is written in Java with an Oracle database. It is hosted in a private government data center located somewhere in West Virginia. The servers are Windows NT 4.0. The outage appears to have started over the weekend. How would you begin diagnosing and resolving this issue?

Does this question accurately describe the challenges my organization faces? Yes, USDS was founded to handle situations like the one described above.

Does this question accurately reflect what we value? Yes. There’s no reason to include the background information about farmers and migrant workers except to tell the candidate that we do work that affects millions of real Americans.

Does this question help us assess a candidate’s fit for our organization? Yes.

Does this question reflect the work we want to do? Yes!

So the second question is probably better, but the first question may not be a bad thing to ask if it’s part of an overall strategy. In other words, if you have other questions that better represent what is compelling about your work and you want to gather data on the candidate’s tolerance for some of the tougher challenges … that’s a fair thing to test for.

Rule 2: Know What You’re Testing

Perhaps the biggest problem with technical interviews is that they may not test for the skills you’re actually hiring for. There are no neutral interview questions. All interview questions favor a certain type of experience, a certain type of person, or a certain stage of a candidate’s career. If you’re asking questions that match the type of people you are looking to hire then the bias is acceptable. If you’re asking questions that favor characteristics contrary to your needs then the interview will not be able to produce consistent and clear data. In the absence of good reliable signal, hiring decisions get made based on more pervasive and destructive biases (like gender, race, age, and other “culture fit” criteria).

Here are some common interview formats and the types of people they are biased towards:

Obviously, asking questions that are easier for new grads to answer is not a problem if what you want are bright young new grads. Similarly if you are a Ruby shop, coding challenges that set non-Ruby programmers at a disadvantage is not necessarily a bad thing.

Something we have struggled with at USDS is how to accurately assess someone’s programming ability when the stack could be almost anything. I have pretty vivid memories of walking into USDS HQ one evening to find one of our more experienced interviewers flapping around in panic while obviously in the middle of a phone screen. Once I confirmed the phone was on mute I asked him what the problem was. “I told him he could do the exercise in any language and he chose Haskell! Do you know any Haskell?” (answer: a little, but only because Evie Borthwick talked my ear off about it for roughly two years)

We do our best with it, but it remains a challenge.

When I was planning out the interviewing strategy for TRU, I decided I was less interested in code writing and much more interested in code reading. I wanted to assess the candidate’s ability to take something completely foreign and start to break it apart. I ended up writing the following question:

ReturnMax(x,y) == CASE x > y -> x
[] y > x -> y
[] OTHER -> x

ReturnMin(x,y) == CASE x < y -> x
[] y < x -> y
[] OTHER -> y
MaxDepth(node) ==
IF Len(node) = 0 THEN 0
ELSE 1 + ReturnMax(MaxDepth(node[1]), MaxDepth(node[2]))
MinDepth(node) ==
IF Len(node) = 0 THEN 0
ELSE 1 + ReturnMin(MinDepth(node[1]), MinDepth(node[2]))
(* --algorithm testvariables
input \in {<< <<>>, << <<>>, << <<>>, <<>> >> >> >>},
max_depth := MaxDepth(input);
min_depth := MinDepth(input);
result := max_depth - min_depth <= 1;
assert result = TRUE
end algorithm *)

We’d tell the candidate this was pseudocode, but it’s actually not (bonus points if you know what it is!). Some parts of the syntax should be familiar to programmers, while other parts are quite foreign. Unless a candidate recognizes the language right away they will need to guess what certain constructions mean, then build on those assumptions until a clear picture snaps into place. By far the hardest part is this line:

input \in {<< <<>>, << <<>>, << <<>>, <<>> >> >> >>}

Once you figure that out the rest comes quickly, but I’ve seen really experienced engineers struggle with it. (hint: try replacing each set of double angle brackets with plain old parenthesis and see if that helps)

Then there are inefficiencies left in the code specifically to see if the candidate picks up on them. This construction for example:

CASE x > y -> x
[] y > x -> y
[] OTHER -> x

It shouldn’t take a decent engineer too long to come to the conclusion that this is some kind of case statement and once they know that then it should be obvious that three clauses for ultimately two possibilities is completely unnecessary. But will they point that out? It’s one thing to be able to take in something completely foreign and make sense of it. It’s something else to suggest improvements on something when your understanding of it still feels tenuous.

But ultimately that was exactly the skill we wanted. I wasn’t trying to write the hardest questions so that only the strongest would survive. I specifically wanted people who would not be scared by stuff they did not know, so that’s what we tested for.

And the bias inherit in this kind of question? Well, if you know what the language is you have an immediate advantage. But if you know what the language is you probably work in designing highly complex concurrent architecture. If we give the engineering team the builds AWS S3 a bit of an unfair advantage, that’s hardly a bad thing in my mind.

Rule 3: Develop a Pool

The worst thing an interview process can be is inconsistent. A good candidate should pass regardless of when she applied or who did the interview. A bad candidate should fail on her merits.

Easier said than done.

Standardization helps a lot here. Developing a pool of interview questions that interviewers can draw from means you can create templates to even out different personality types among your interviewers. We should only ask non-standard questions when there is a compelling reason to do so (for example the candidate has specialized experience not otherwise validated by standard questions) and only after a suitable number of standard questions have been answered.

When questions are standardized it becomes easier for the hiring manager or committee to compare candidates to each other. How strict you want to be here depends on how much hiring you need to do. If you are a small company with only a few positions open, you may want to develop what amounts to essentially a script (ask these questions in this order) for every interviewer to follow. If you are hiring regularly, you can give your interviewers a little more freedom. Over time you’ll see the same questions and different ways of answering them and get a sense of how candidates compare to one another. This is particularly true if you work for a company that does not interview for specific positions but rather wants a firehose of talent all the time.

The trick about standardization is figuring out the process of accepting new questions into the pool and retiring old questions. A new question should be workshopped with a group of interviewers both to ensure it is clear and error free but also to make sure it’s a question other people besides its writer are open to asking candidates. Then the question should be field tested two or three times. You field test an interview question by tacking it on to an otherwise complete interview.

Before accepting a question into the pool you should know the following things about it:

  • Runtime: how long do we expect the discussion around this question to take before the candidate has completely answered it?
  • Core Competencies: what skills is this question testing?
  • Stage: is this a prescreen question? Do we ask this on the first round or as part of a longer more intense session?
  • Scoring: what kinds of answers do we expect from the best candidates? What kinds of answers do we expect from inexperienced candidates? There is generally a gradient effect between those two poles, documenting what types of insights candidates at various levels make helps mitigate the impact of a lot of destructive biases.

Retiring old questions is a bit more difficult. Obviously when the question is posted up on Glassdoor, it’s probably time for it to go. But beyond that … it depends. I tend towards kicking questions out of the pool when they start to become intellectual crutches for the interviewers. Interviewers have a habit of gravitating towards the same questions over and over again. They feel comfortable with them, they feel like they understand all possible answers. That’s great, but if they get to the point where they are no longer paying attention to the candidate’s answers because they’ve heard that exact answer a hundred times it becomes a problem. Suddenly we have to worry about whether the candidate actually answered the question at all or whether the candidate looked the part, hit the right notes and the interviewer assumed the rest of the answer based on those few signals. It’s also, going back to Rule 1, terrible candidate experience. You may think they don’t notice when the interviewer is on auto-pilot but trust me they do.

Rule 4: Interviewing as a Team Sport

One of my most profound experiences as an interviewer of engineers came while shadowing another interviewer. The interviewer I was shadowing had specifically requested that — instead of passively observing — I ask a question or two as part of the discussion. After the interview was over, my partner pulled me aside to apologize for the candidate’s behavior.

“What are you talking about?” I asked.

“Didn’t you notice how dismissive he was every time you asked a question?”

Truthfully, I hadn’t noticed until my partner pointed it out. Largely because as a woman in tech, I’m so used to being treated dismissively I’ve learned to ignore it to preserve my sanity. But he had noticed and because he had noticed we were able to give the hiring committee a more nuanced perspective on who that candidate was and how likely he was to take things seriously when the spotlight wasn’t on him.

Interviewing in teams of two to three people is a great way to mitigate all kinds of destructive bias. Two people can interpret the same comment in wildly different ways, having both perspectives on the table helps the hiring manager calibrate individual pieces of feedback.

It’s important to highlight the difference between interviewing as a team, and having many people interview the same person. The advantage interviewing as a team gives is being able to verify and weight the same data points because different people with different backgrounds and levels of experience are assessing the same exact conversation. More interviews do not necessarily help you verify the data from previous interviews, they just create more data.

Here’s some fun research on this topic: In 2016, a research team from the UK hired hundreds of people on Mechanical Turk to rate sample responses from imaginary interview candidates. More eyes meant more accuracy. A year later Google released its own research from five years of internal data, revealing that its 12 rounds of interviews were largely irrelevant after interview four. The difference between these two experiments is that the UK team was putting the same interview question and answer pairs in front of hundreds of people where the Google experiment was pooling more and more interviews.

More interviews did not create significantly better data, but more opinions on the same data trended towards greater accuracy.

Of course, scheduling an interview with one person can be hard enough. Interviewing teams are not always possible. What to do?

In a word: transcript.

USDS was the first time I’d worked for an organization that was trying to hire at scale. Over time we’ve moved our feedback format to as close to a transcript as possible. Not every engineer can type 60 words per minute while also thinking of appropriate follow up questions, but the closer the feedback is to a transcript the easier it is for other people to review and draw their own conclusions. I resisted it at first, but lately I’ve come to appreciate the value.

Especially because the use of transcripts may finally make it possible for us to reproduce the “auditioning behind a screen” issue.

This is an experiment I have yet to be able to run: I want to use transcripts to track how interview feedback is effected by gender, racial and age biases. If we can produce a good transcript and remove identifying details we can have a second interviewer assess the candidate’s performance without knowing either their identity or the identity of the first interviewer. I’m interested in whether we could identify a significant difference between how interviewer’s interpreted candidate responses when they didn’t know who they were. My gut instinct is yes, but alas I haven’t figured out a good way to run this experiment in a sustained fashion yet.

Ah well. One day…

Author of Kill It with Fire Manage Aging Computer Systems (and Future Proof Modern Ones)