The Agreement Trap: Why Inter-Coder Reliability Isn’t Always What It Seems
We’re taught to aim for agreement. Often, through compromise.
In qualitative research, inter-coder reliability is often framed as a gold standard. An assurance that our thematic analysis is sound, valid, trustworthy. If multiple human coders arrive at the same codes, we’re told, then we must be onto something real.
But what if that agreement is giving us a false sense of clarity?
What if consensus isn’t always a sign of rigour, but sometimes a sign of shared assumptions, pressure to conform, or a bias hiding in plain sight?
Yeah, let’s unpack that.
The Appeal of Agreement
Getting two or more people to independently code the same set of qualitative data and arrive at the same categories feels like science. It looks like replication. It ticks boxes. But qualitative interpretation is messy. And the more complex or subjective the data, the more human coders will differ. Not so much because someone’s wrong, but because meaning isn’t always fixed.
So how do we get high agreement? Well, we negotiate.
We adjust the codes. We clarify definitions. We collapse differences until we find something “everyone can agree on” or to find a narrative that’s more easily justifiable. But that process often smooths out nuance, silences outliers, and shapes the data to fit the team, not necessarily the truth. And unfortunately, that’s not usually intentional, either.
When Consensus Masks Bias
We don’t often talk about how agreement can reinforce shared bias. If everyone on the coding team has similar disciplinary backgrounds, or implicit assumptions about the topic, their coding frames may reflect that. Even just the goal of having the same research question answered can sway this.
High agreement might just mean everyone viewed it through the same lens.
And that lens might miss voices at the edges, interpretations that feel less familiar, or patterns that challenge the dominant narrative.
In this context, reproducibility becomes more valuable than agreement. Because it doesn’t rely on human consensus. It relies on process.
The Trade-off Between Richness and Consistency
This doesn’t mean interpretation is bad. Far from it. Qualitative research thrives on depth, context, and reflexivity. But when consistency is required (across large datasets, interdisciplinary teams, or longitudinal studies) human coding struggles to scale without losing its texture.
Manual coding offers richness. But automated, replicable tools offer structure and often, that’s what’s needed to surface unexpected themes without filtering them through personal bias. That’s no to say richness has to be compromised, either.
Leximancer’s Approach
Leximancer doesn’t ask multiple coders to align on meaning. Instead, it uses unsupervised machine learning to discover concepts based on word co-occurrence in the data. This means no preset thesauri, no human labels, no interpretive nudging. It’s a bottom-up approach.
It’s not trying to replace human interpretation. It’s offering a consistent, transparent starting point. Something you can share, replicate, and build on, without needing to sit in a room and argue over what counts as themes. It takes language in its rawest form and tells you black and white the relationships that are actually appearing in your data.
And that’s powerful.
Reproducibility as an Honest Foundation for Collaboration
In mixed-methods teams, cross-cultural projects, or multi-site studies, reproducibility levels the playing field. It gives every team member the same foundation, regardless of background or bias.
Instead of asking people to agree on the right interpretation, reproducible outputs ask: “What does the data consistently show, no matter who’s looking?”
From there, you can begin the work of interpretation, but now you’re standing on stable ground.
The Real Win: Finding What You Didn’t Already Know
Here’s what often gets lost in the obsession with agreement: the thrill of discovery.
When coding is done by humans with an expected frame, it’s easy to find what you’re looking for. But that’s not where insight lives. The real value in qualitative analysis is finding what you didn’t expect. The themes that sit just outside your field of vision, the connections you wouldn’t have thought to trace.
Leximancer’s bottom-up approach creates room for those moments.
By surfacing concept relationships that emerge, rather than conforming to a fixed coding scheme, you’re more likely to stumble across something new. And those surprises are often the most valuable part of the work you do.
Inter-coder reliability has its place. But it shouldn’t be the north star. Agreement is only as good as the assumptions behind it, and if we’re not careful, it can become more about comfort than a marker of truth.
Reproducibility doesn’t guarantee brilliance. But it offers something rare: a consistent, transparent foundation you can test, share, and return to. And in a research world increasingly shaped by AI and fast publishing, that might be the most honest place to begin.