Can you apply SRE thinking to things that don’t involve computers?

Vector created by iconicbestiary
  • Since candidates have no idea which keywords on a job posting will be critical they produce 30 to 40 page resumes to cover as much ground as possible. Those long resumes are then scored higher even if huge portions of the work experience are not relevant to the position because they have more keywords in them.
  • Candidates are given a survey asking them to self-assess how their skills meet the requirements of the position. If you don’t give yourself the highest score on every metric you get rejected. Even if everyone else lies in the most horrible, blatant fashion and you are truly the most qualified candidate, because you didn’t give yourself 5 out of 5 mastery on every requirement, your application is dead.
  • Veterans get bonus points added to their score before candidates are ranked for hiring lists, but since everyone in the system is cheating by this point and no one has interviewed any of these candidates to verify their abilities yet, veterans with no experience are frequently ranked above qualified candidates. If enough veterans apply, qualified candidates might not even make the hiring list.

Principles of Reliability

The first step was to define how exactly SRE works beyond the technical description of tools. We came up with three focus areas:

  • Budgeting: What level of performance do we need from this system? What are we optimizing for and how much failure can we withstand before we lose value?
  • Escalation/Empowerment: If on the ground workers realize the system is producing a subpart result are they empowered to fix it?

Mapping Vulnerabilities

With that in mind, I decided to approach the project design by running a series of thought experiments that focused not on how we would prevent a given problem, but how we would identify it developing in progress. We sat down, looked at our proposed pipeline structure and brainstormed a bunch of potential vulnerabilities mapped to each stage of the process.

Pre-Hiring

  • Agency identifies someone as a subject matter expert (SME) for the purpose of reviewing and interviewing candidates who is not qualified.

Developing Job Listings

  • Agency uses language that scares off qualified candidates, particularly underrepresented groups
  • Agency fails to separate “requirements” from “nice to haves”, forcing qualified candidates out of the pipeline because they “don’t have all the keywords”

Resume Review

  • Bias: Reviewers approve/reject candidates based on gut feeling, inventing additional criteria to justify the decision. Candidates rejected for listing older languages on their resume, working at certain companies, attending college later in life or not at all.
  • Irrational escalation of standards: reviewers judge candidates stricter or not based on time of day, the candidate they reviewed before, perceived robustness of pipeline (Note: is it better to fill an open position ASAP or leave the position open for a bit to find a better candidate?)

Interviewing

  • Certain interviewers are scheduled for more than their fair share of interviews, burning them out faster.
  • Interviewers write questions that do not test skills actually relevant to the position
  • Multiple interviews overrepresent some competencies and underrepresent (or fail to assess at all) other competencies
  • Interviewers fail to accurately transcribe QA, making it difficult for the hiring manager to assess interviewer’s feedback

Hiring Committee

  • Hiring Manager Discretion: How and why to overrule interview feedback?
  • Interviewer scores an interview one way but the tone of their written feedback suggests something else
  • Candidate scores high in technical interview but demonstrates behavioral issues (ie What to do with brilliant assholes).

CAP Theorem for Hiring

There were lots of challenges around developing a monitoring strategy. Most of our problems wrapping our heads around things in the thought experiment stage came down to a simple truth that success is easy to define when running a website. Keep the damn thing online and available as often as it takes for it to make money. Others might word that concept more politely, with lip service to value to customers, but at most for profit organizations value to customers is only appreciated to the degree that it can encourage customers to keep being customers. If competition is non-existent, SLOs get a lot more generous.

Defining Antipatterns

Monitoring used to be an easy thing to explain. For the longest time it felt self-evident: you are watching a system. Lately I’ve had my doubts. The closer I look at it, the more there seem to be schools of thought.

Empowerment

Normally the monitoring of a system is relatively easy and the empowerment of the people is hard. With this system the monitoring was hard and the empowerment ended up being easy. Agencies were desperate to hire good people and they hated the current process. We thought we were going to have to hold their hands through the change but all we had to do was get OPM to admit that our proposed pipeline was legal. One CIO looked at our documentation and told our team lead “No offense but I don’t need a pilot, I’m just going to do it.”

Outcomes

Government being government (ie: slow) by the time the pilots really got moving I was ready to roll off and start work at Auth0. It wasn’t until several months later that I heard how all our marathon white boarding sessions had played out. The pilots — which tossed out self-assessments, had software engineers review resumes of software engineers and put candidates through a couple rounds of interviewing before ranking them — had produced a hiring list with which agencies had hired 12 people. TWELVE! In a system where hiring 0 people was the normal outcome. Women had also scored on average higher than men, which helped alleviate concerns that deviating from the normal procedure would invite discrimination lawsuits. In the end The Office of Performance and Personnel Management secured funding to expand the program federal wide and hired the USDS team lead full time to oversee it.

Author of Kill It with Fire Manage Aging Computer Systems (and Future Proof Modern Ones)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store