the engineering challenge of identity matching: using LLMs to handle the long tail

Somewhere in your network there's a person you think of as one human. To your computer (or traditional CRM), they're five strangers until you specifically tell it otherwise.

The same person is andrecerqueira on GitHub, "André M. Cerqueira" on LinkedIn, a gmail address on a calendar invite, @andre_dev on X, and a phone number in your contacts. You know it's one person but nothing in the data says so - there's no shared key tying those five accounts together.

One of noticed's main jobs and coincidentally one of the largest engineering challenges is to figure out which accounts are the same human. Most of that is relatively simple and can be done deterministically. What's hard is the long tail - and that's where an LLM can come in.

profiles and people

To noticed, a profile is one account - an identity atom. github:123, linkedin:jane, email:ana@stripe.com. A person is the human behind some set of them and matching (and subsequently merging) is the act of deciding which profiles belong to the same person and clustering them.

the whole product thesis in one picture: scattered accounts in, one person out.

A lot of matching is not hard at all. If two accounts share a verified email or if a GitHub profile self-declares its own LinkedIn in its profile links, they're the same person. These are exact identifiers, and we resolve them deterministically with high confidence. A large chunk of any network snaps together this way.

the long tail is where it gets hard

Then there's everything else. These accounts have no shared email and no easy way to self-link. We get names that are almost the same, nicknames, work history and other types of signal that don't get us anywhere.

"André Cerqueira" on GitHub and "André M. Cerqueira" on LinkedIn - same person? Probably. "André Cerqueira" and "André Cerqueira", two different developers in the same city - definitely not. A support@ address next to a real employee at that company - a colleague, not the same person. "Mike" and "Michael". A name in Latin script and the same name with its accents stripped.

Rules fall off a cliff here. You can't write an if statement for "people change jobs" or "this @ is a role inbox, not a human." These cases need judgment and a bit of world knowledge - and there are a lot of them. This is the long tail, and it's most of the interesting work.

the cardinal sin and our approach

Before any of that, there's one rule that governs everything:

Merging two different people is far worse than missing a match.

A miss is cheap - the next sweep tries again, and the person just isn't linked yet. A false merge is more expensive: it fuses two humans' histories across every screen in the product, and untangling it is painful. So the entire system is built to be a bit of a coward towards merging accounts. Precision over recall, every time. When in doubt, don't.

Here's the path a single new identity walks.

exact identifiers short-circuit to a confirmed match. everything else — the long tail — goes to the LLM, and its answer is capped at 0.85.

First, blocking: a handful of cheap database lookups pull up to five plausible look-alikes - shared email, shared handle, same company, and similar name. Everything is recall-oriented and deliberately over-fetching. It gives us a bit more luck surface area.

Then the fork. If there's an exact identifier, we're in the head - confirmed and no need to launch our algorithm. If not, the candidates go to the adjudicator: a small, fast model (Claude Haiku 4.5) with tight rails and a set of worked examples, asked one question - "is any of these the same person as the subject? pick exactly one, or none, and say how sure you are." This is a bit of a simplification and the prompt has been worked through from past experience; check how we improve our AI workflows with noticed's lab skills.

All we've done at this point had one target: to score a potential merge. The breakdown is simple:

if a noticed user connects 2 accounts → score is 1.0, no room for doubts.
if we can deterministically associate 2 accounts → reach score threshold and ask for human confirmation.
if we need to handle the long tail → every answer the LLM gives is capped at 0.85 (for now).

But why cap a model that's often right? Because "often right" is still guessing, and we refuse to auto-merge on a guess.

the machine can propose. only exact facts — or you — can merge.

The confidence score decides what happens next. At 0.92 and up, a match auto-merges - and because every algorithmic answer is capped at 0.85, the only things that ever reach 0.92 are exact identifiers. The 0.80 to 0.92 band becomes a "same person?" card we ask our users to confirm in-app. Below 0.80, we let this merge go and it can be called in again in the future when there's a new - hopefully smarter - version of the algorithm.

when you say merge

When users confirm our hypothesis that 2 identifiers should be merged, we'd like to take their hard-earned time and effort and make it work for the entire noticed network. If another user has this same person on their network and the 2 identifiers being matched are not private identifiers (like phone numbers, emails, and similar), we run a second algorithm that has the opposite job of the first.

When you assert that two profiles are the same person - you know things our data doesn't - we don't want a yes-man rubber-stamping it. So a separate, stronger model (Claude Sonnet 4.6) plays skeptic. Its only goal is to prove you wrong: it hunts for a hard contradiction - the two accounts declaring two different verified identities, an identifier that points somewhere else. If it finds one, it blocks. If it doesn't, it gets out of the way and lets this merge trickle down to the global network.

As a last precaution and a means to keep improving the matching algo (and also because a false merge is the thing we most refuse to get wrong), we log everything and make every merge cheap to undo.

merges are logged and reversible, and a 'different people' decision sticks.

Every merge writes a snapshot before it happens, so undo restores the two people exactly as they were. And when you say two profiles are not the same, that sticks and we won't propose the merge again.

what keeps it honest

The long tail is adversarial, so we test against the adversary.

We keep a set of golden pairs - deliberately nasty ones: same name / different person, role inboxes like support@, bots like renovate[bot], nicknames, stripped diacritics, CJK names, two people who just share an employer. When we change a prompt or a threshold, we re-run the whole set and keep the change only if quality improves and false merges stay at zero (again, noticed's lab skills). The run fails on a single false merge, even if every other number got better.

One guardrail that came straight out of a real case: the self-link. A GitHub user who declares their own LinkedIn is also telling us, by implication, which LinkedIn is not theirs. So when a name-only match tried to pair andrecerqueira with a different André's LinkedIn at 0.9, we suppressed it. The person's own declaration outranks a guess every time.

TLDR;

Exact identifiers handle the center of the distribution, deterministically and confidently. The LLM handles the long tail, but only as a proposal - capped, reviewed, and pointed at a benchmark that treats a false merge as unforgivable. A separate skeptic guards the merges you make by hand and keeps every one of them reversible.

Startups have been built around the sole principle of perfecting identity matching and merging. The approach I described allows us to keep noticed's operating cost low and, in the process of building the system, to learn more about how humans handle matches and how this can be eventually completely handed off to a machine learning model. It keeps our customer's data within the boundaries of noticed and the scope of our own ZDR agreements with model providers.

If you want to see it work on your own network, get early access at noticed.so.