Won't scoring make interviews feel robotic?

Only if you read the scorecard out like a form. Done well, the conversation stays natural and human; the scorecard is what you fill in afterwards from what you heard. It structures the decision, not the dialogue. Candidates rarely notice the scorecard exists; they notice a fair, focused interview.

What if interviewers' scores disagree a lot?

That's a feature, not a bug, disagreement points exactly where to dig. Often it means two interviewers probed different things, or one saw a strength or red flag the other missed. Discuss the specific criterion they split on, with evidence; that conversation is usually where the real decision gets made.

KI BMS

Pricing

Guides

How to build an interview scorecard that produces fair, comparable decisions

Without a scorecard, interviews collapse into gut feeling and whoever spoke last. A simple, criteria-based scorecard makes hiring fairer, more defensible, and far easier to compare.

Scorecard

How-to

Interviews

Finn GlasCo-Founder + Engineering

·March 6, 2026·

3 min read

On this page

1. Why a scorecard beats a gut verdict 2. Derive the criteria from the intake 3. Use a scale with behavioural anchors 4. Score independently, then compare

Key takeaways

Score against fixed criteria, not an overall vibe; vibe is where bias hides.

Derive the criteria from the intake's must-haves and success picture.

Interviewers score independently first, then compare; avoid groupthink.

Step by step

Take criteria from the intake

4 to 6, mapped to must-haves + success picture.

Write a 1 to 4 scale with anchors

Describe what each level looks like.

Score independently

Each interviewer rates before any group talk.

Compare + decide on evidence

Discuss the disagreements; decide from the scores.

1. Why a scorecard beats a gut verdict

A scorecard forces interviewers to rate a candidate against the specific things the role needs, instead of collapsing the whole interview into a single "yeah, I liked them". That single impression is exactly where bias and recency live, the confident talker, the person who reminds you of yourself, the strong finish that erases a weak middle. Scoring each criterion separately makes the decision legible, comparable across candidates, and defensible if it's ever questioned. In Germany it's also a core part of an AGG-compliant, bias-resistant process.

Score evidence, not personality

Anchor every criterion to something the candidate did or demonstrated, not to traits like "culture fit" that quietly become "is like us". Evidence-based scoring is both fairer and more predictive, and it keeps the process defensible under the AGG.

2. Derive the criteria from the intake

Don't invent generic criteria. Take them straight from the must-haves and success picture you agreed in the intake meeting, that's the whole point of having one. A scorecard with four to six criteria that actually map to the role (a specific skill, a relevant kind of experience, a way of working the team needs) beats a long list of vague traits like "communication" that everyone scores differently. Each criterion should be something you can gather real evidence for in the conversation, not a personality guess.

3. Use a scale with behavioural anchors

A 1-to-4 scale works well (an even number quietly forces a lean rather than a safe middle). But numbers alone drift, your 3 isn't my 3. Add a short behavioural anchor to each level: what does a 2 look like versus a 4 on this criterion? Anchors turn a subjective number into a shared standard and make scores comparable across different interviewers. Capture a sentence of evidence next to each score ("rated 4: walked through a near-identical migration they led"), so the rating is grounded, not just asserted.

4. Score independently, then compare

The order matters. Each interviewer fills in their scorecard BEFORE the group discusses, because the moment a senior voice says "I loved them" out loud, everyone's scores quietly converge on it, that's groupthink erasing the value of multiple perspectives. Independent-first, discuss-second surfaces real disagreement, which is often the most useful signal you have. In KI BMS the scorecard lives on the application and feeds the candidate's fit picture, so the structured evaluation you designed is what actually drives the decision and stays on the record, rather than a hallway consensus nobody can reconstruct later.

FAQ

Frequently asked

Share this article

Try KI BMS

Free plan, no credit card. We host in Germany. You can export and delete everything self-serve.

Written by

Finn Glas

Co-Founder + Engineering

Finn is one of the Co-Founders. He owns the engineering side, the infrastructure, and most of the late-night fixes that ship before anyone notices.

finn.glas at aicuflow dot comLinkedIn Website