Senior Engineer Interview Rubric (50-Point Scoring Sheet)

Most senior-engineer interview loops fail in the same way: each interviewer scores against an unspoken bar, the debrief collapses into "vibes", and the offer goes out (or doesn't) on whoever spoke last in the panel meeting. The fix isn't more interviews — it's a rubric specific enough that two trained interviewers, scoring the same candidate, land within 5 points of each other.

This is the rubric we use at First Bridge Consulting to evaluate senior software engineer candidates. It's 50 points, weighted across five dimensions, with named anti-patterns. Use it, modify it, throw the parts you don't like — but don't run senior loops without one.

TL;DR

5 dimensions, 10 points each, 50 points total. Hire bar = 35+. Strong-hire bar = 42+. Below 25 is a no-hire even if every panellist liked them.
Panel calibration matters more than question selection. Three trained interviewers asking the same questions outscore five untrained ones.
The rubric is public to the candidate — they get the dimensions before the loop. Hidden bars create lottery hires.
Weight judgement and pragmatism equal to coding ability. The senior bar is decision quality under ambiguity, not algorithm fluency.
Track inter-rater agreement quarterly. If your interviewers diverge by > 8 points on the same candidate, the rubric isn't being applied consistently — fix that before fixing the questions.

The 5 dimensions

1. Coding craft (10 pts)

Can they write production code that another senior would happily merge?

Score	Anchor
9–10	Writes clean, idiomatic code in the target language. Picks correct data structures without prompting. Handles edge cases without being asked. Code reads like documentation.
7–8	Writes correct, working code with minor stylistic issues. Names are good. Edge cases require one prompt.
5–6	Solution works but is structurally untidy — repeated logic, vague names, missing error handling. Visible improvement when prompted.
3–4	Solution barely works. Struggles with language idioms. Doesn't catch obvious bugs in their own code.
0–2	Cannot produce working code in the time allotted. Persistent confusion about basic constructs.

Common false-positive anti-pattern: candidate is fast at LeetCode-style problems but writes throwaway scripts (one-letter variables, no structure) — high score from a junior interviewer, low score from a senior. Score on craft, not speed.

2. System design judgement (10 pts)

Given an ambiguous problem, can they break it down into a defensible architecture?

Score	Anchor
9–10	Asks 3+ scoping questions before drawing. Names trade-offs explicitly. Picks one path with stated reasons. Anticipates failure modes (what happens at 10×, 100× load, region failure).
7–8	Reasonable architecture, fewer trade-offs surfaced unprompted. Picks a path but is less explicit about why.
5–6	Draws a working diagram but treats every component as a "well-known building block". Limited reasoning about scale, consistency or failure.
3–4	Cargo-cults popular components without justification ("we'd use Kafka here"). Cannot explain when not to use them.
0–2	Can't structure the problem at all. Jumps to implementation detail before defining the interface.

Common anti-pattern: candidate has memorised a "design Twitter" canned answer. Probe with "now make this real-time / global / stateless across regions" — separates the cargo cult from the genuine seniors fast.

3. Code review & technical communication (10 pts)

Show them a 60–80 line PR with seeded issues. How thoroughly and humanely do they review it?

Seeded issues should include: a real bug, a security smell, a performance pitfall, a naming issue, a missing test, and a non-issue (a stylistic choice that's defensible).

Score	Anchor
9–10	Catches the bug, the security smell, and the perf pitfall. Suggests improvements with reasons. Correctly leaves the non-issue alone or asks the author about it rather than rewriting. Tone is collaborative.
7–8	Catches most issues. May rewrite the non-issue without asking. Comments are constructive.
5–6	Catches surface-level issues. Misses the security or performance one. Comments lean dictatorial ("change this") without rationale.
3–4	Misses substantive issues. Focuses on nitpicks.
0–2	Doesn't engage with the substance of the change. Cannot articulate review priorities.

This is the dimension hiring teams underweight most consistently. Senior engineers spend 25–40% of their week reviewing other people's code; if they can't do it well, your team's velocity gets capped by them.

4. Pragmatism & ambiguity handling (10 pts)

A senior engineer's value is decision quality when no one tells them what to do. Test it explicitly.

Format: present an open-ended scenario ("you've been asked to migrate this legacy service to a new database; the team that owns it is on parental leave; product wants it done in 6 weeks") and have them talk through their first week.

Score	Anchor
9–10	Names what they would not do. Identifies the highest-risk unknown and how they'd de-risk it. Picks a small, reversible first step. Asks who they'd talk to and what artefact they'd produce by Friday.
7–8	Sensible plan with one or two assumptions left implicit. Knows the risk profile but doesn't sequence around it.
5–6	Generic plan ("I'd write a design doc"). Limited to engineering activities; doesn't engage with stakeholders or ambiguity.
3–4	Says they'd "wait for more requirements". Cannot decompose the problem into reversible steps.
0–2	Visibly stalls when there isn't a clear right answer. Asks for the spec.

This is the senior bar. A mid-level engineer who scores 9 here is more senior than a "senior" who scores 4.

5. Behavioural & collaboration signal (10 pts)

Assess via two structured behavioural questions: a conflict story and a failure story.

For each, listen for:

Specificity — names, dates, real systems, real outcomes. Generic answers are red flags.
Self-awareness — what would they do differently? Do they take ownership without performative self-flagellation?
Other-people awareness — do they describe colleagues as humans with their own incentives, or as obstacles?
Outcome — did the situation actually resolve? What did the team learn?

Score	Anchor
9–10	Concrete stories. Owns their part. Names other people's reasonable motivations. Articulates what they'd do differently.
7–8	Concrete stories with mild self-protective framing. Genuine learning.
5–6	Stories are abstract or rehearsed-sounding. Some self-awareness.
3–4	Blames external factors. "I haven't really had any conflicts" is a 3, not a 10.
0–2	Hostile framing. Describes colleagues as enemies. Or stories don't match the seniority claimed on the CV.

How to run the panel

A 4-hour senior loop, 5 interviewers, 1 hiring manager debrief. No "culture-fit" round — culture-fit unmoored from the rubric is where bias gets laundered.

Round	Duration	Interviewer	Dimensions scored
1. Coding (pair-programmed, real problem from the team's domain)	60 min	Senior IC	Coding craft + Pragmatism
2. System design	60 min	Staff+ IC	System design + Pragmatism
3. Code review (live walkthrough of a seeded PR)	45 min	Senior IC	Code review + Pragmatism
4. Behavioural	45 min	Hiring manager	Behavioural + Pragmatism
5. Bar-raiser / cross-functional	30 min	Senior from a different team	Any dimension; tie-breaker

Pragmatism is scored in every round, not just one — because it shows up everywhere or nowhere.

Calibration: how to make the rubric actually work

A rubric on paper is worth nothing. The work is calibrating the people using it.

Before any new interviewer joins the loop:

They shadow two real interviews and score independently.
Their scores are compared to the trained interviewers' scores. If they're > 2 points off on any dimension, they shadow another.
They run a mock interview with a fake candidate (a current engineer playing the role); the panel scores together.
Only then do they run a real interview.

Quarterly:

Pull the scoring sheets from the last 30 interviews.
For any interview where 2+ interviewers scored the same candidate, compute the spread per dimension.
If spread > 2 points on any dimension, run a calibration session with examples.
If spread > 4 points across the board, your rubric isn't being used — retrain or scrap it.

Hiring decision rule:

35+ points + no dimension below 5 → hire.
42+ points + no dimension below 6 → strong hire / level up.
< 35 points OR any dimension below 5 → no hire, even if you liked them.

The "any dimension below 5" rule is non-negotiable. A candidate who codes brilliantly but cannot review a PR humanely will tank a team.

Common rubric anti-patterns to avoid

The "rockstar premium" loophole — interviewers say "they were a 4 on system design but a 10 on coding so net 7". Don't average across radical splits; pick the lowest sub-dimension that disqualifies them.
The "but I liked them" override — once the rubric exists, the panel does not override it on vibes. If you're going to ignore the rubric, don't have one.
Question rotation chaos — different interviewers ask different questions to the same candidate. Stick to a question library per round; rotate quarterly.
Scoring at the end — interviewers fill in the rubric during the interview or within 15 minutes of the end, not at the panel debrief. Memory degrades fast.
Hiring-manager-only veto — if the IC interviewers all score above bar but the hiring manager says no, the manager must justify in writing against the rubric. Otherwise it's bias laundered through authority.

What the rubric won't fix

A weak candidate funnel. The rubric filters; it does not source.
Misaligned levelling. If your "senior" is industry-junior, the rubric makes the gap visible but doesn't close it.
Pay-band misfits. Strong candidates score high here and decline because comp isn't competitive. Calibrate your bands separately.
Time-pressure decisions. The rubric needs ~4 hours of interview time. If you must hire in 24, you're rolling dice.

Download the scorecard

A printable / fillable PDF of this rubric (with spaces for interviewer notes per dimension) is on the way as a downloadable lead magnet. In the meantime, you can copy this Markdown into your ATS's interview-template field today.

FAQ

Should we share the rubric with candidates? Yes. Send the dimensions (not the questions) before the loop. Hidden bars create unfair surprises and unnecessary anxiety. Strong candidates appreciate clarity; weak ones over-prep, which surfaces the same way as failing the rubric.

How often should the rubric change? Annually, with quarterly tweaks. Don't churn it more than that — calibration takes a quarter to converge.

Does this work for staff/principal candidates? Add a sixth dimension: "scope and influence" (10 pts), assessed in a second behavioural round. The other five dimensions still apply but with stricter thresholds (35 → 42 minimum hire bar at staff+).

What about take-home tests? If you use one, it replaces Round 1 of the loop above. Be explicit about time-limit (≤ 4 hours), don't grade speed, and pay candidates for their time at staff+ level. Most candidates will decline an unpaid take-home in 2026 — and the ones who accept self-select for desperation, not seniority.

How do we avoid bias in behavioural scoring? Two interviewers in every behavioural round, scoring independently, comparing only after submission. The dimension where bias creeps in worst is "behavioural" — having two scorers reduces single-interviewer drift.

Hiring senior engineers and want help running the loop? First Bridge Consulting builds and runs interview loops for clients across SAP, .NET, React, Java, Shopify and DevOps stacks. Talk to us about hiring →

The Senior Engineer Interview Rubric: A 50-Point Scoring Sheet