Vera Vera
Research Note / Hallucination Reduction

Deterministic identifiers
for LLMs

Some citation failures are not retrieval failures. They are identifier failures: long random strings fracture into messy tokens, drift across model turns, and come back one character wrong. A better ID format can make failures easier to diagnose, and often makes the whole system more accurate too.

J.R. Wells
J.R. Wells
Vera Studio
Why citations disappear Tokenizer view
UUID / fragmented
f1166652-812d-480d-8d11-5495e8b92e31
f 116 6652 - 812 d -480 d -8 d 11 -549 5 e 8 b 92 e 31
Irregular chunks. Irregular punctuation. Easy to transpose. Hard to notice when one symbol drifts.
DTID / atomic
263-037-515-764-525
263 - 037 - 515 - 764 - 525
Fixed-width triplets behave like clean atoms. The model sees a repeating pattern instead of noise.
01 / The failure mode

Some broken citations are copy errors, not search errors

In a retrieval system, it is natural to blame missing citations on bad ranking, bad embeddings, or missing source data. But in practice, some failures happen after retrieval succeeds. The model finds the right citation, carries its identifier through multiple tool calls and model turns, and then emits a string that is almost right.

Almost right is still broken. If the database expects one exact identifier and the model returns a nearby variant, the frontend cannot resolve it. The citation vanishes. The user sees a plain sentence instead of a linked, inspectable source. From the outside it looks like the model never found the source at all.

That distinction matters because it changes the solution. If the failure is in recall, you improve search. If the failure is in representation, you improve the identifier. And just as important, you make the failure legible enough that engineers can tell where the system went wrong instead of treating every broken citation like a retrieval mystery.

What the system sees

A UUID is globally useful, but locally hostile to language models

UUIDs are great for databases and distributed systems. They are terrible when the same string must be carried through natural-language generation, tool output, prompt stuffing, and final rendering.

404e-a104-7b9c-48d8-b9c2-1f8af2c1147e One missing character. One swapped chunk. One extra hyphen. The citation is gone.
What the model needs

The identifier should have shape, rhythm, and obvious boundaries

An LLM-friendly identifier should be regular enough to survive repetition. The model should not have to memorize a pile of meaningless subfragments. It should carry a short sequence of clear units.

263-037-515-764-525 Five stable groups. Zero-padded. Easy to compare by eye. Easy to preserve across turns.
When an identifier becomes legible to the tokenizer, it becomes more durable inside the model.
02 / Tokenizers

The tokenizer is where the problem starts

LLMs do not read strings the way humans do. They read tokens: irregular chunks chosen by a learned tokenizer. Common words often map cleanly. Unfamiliar strings do not. A UUID tends to shatter into inconsistent pieces: one token for part of a hex group, another for a hyphen plus a letter, another for a trailing fragment that means nothing on its own.

That fragmentation creates two layers of difficulty. First, the model has more pieces to preserve. Second, those pieces do not correspond to semantic units. There is no human-like sense that one chunk belongs with the next. The string is just statistical debris.

A fixed pattern of three-digit groups changes that. The tokenizer tends to preserve each triplet as an atom. The resulting sequence is not shorter only in characters; it is cleaner in model-space.

01

Fixed width

Every group should look identical from the model's perspective. Zero padding matters because it removes visual ambiguity.

02

Stable separators

Hyphens or colons are fine. The key is a repeating rhythm the model can predict and preserve.

03

Enough entropy

Five groups of three digits gives about 10^15 combinations: plenty for citation-scale identifiers without turning them back into noise.

04

Optional determinism

The same shape can be random or derived from a stable source value like a record id or SHA, depending on the use case.

03 / The format

A simple answer: five zero-padded triplets

The proposed format is intentionally plain. Instead of a 36-character UUID, use five groups of three digits: 263-037-515-764-525. This is long enough to avoid practical collisions at citation scale, but structured enough that a model can carry it through a conversation without constantly breaking it apart.

The deeper point is not the punctuation. Hyphens, colons, brackets, or wrapper syntax can change by surface. What matters is the underlying shape: a repeated sequence of compact, fixed-width atoms.

Once the shape is stable, the rest of the system gets easier to inspect. Human debugging gets easier. Prompt examples get easier. Render-time validation gets easier. If a model drifts, the drift is visible, which reduces a lot of the chaos in understanding whether the failure lived in retrieval, transport, prompting, or rendering.

class DtidUtility
  def self.generate_random
    5.times.map { format("%03d", rand(1000)) }.join("-")
  end

  def self.generate_from_sha(input)
    digest = Digest::SHA256.hexdigest(input.to_s)
    digits = digest.scan(/[0-9a-f]{3}/).first(5).map { |chunk| chunk.to_i(16) % 1000 }
    digits.map { |n| format("%03d", n) }.join("-")
  end
end
04 / Benchmarks

In one benchmark, the error rate dropped substantially

A useful stress test is simple: give the model a large set of identifiers and ask it to sort them. The task forces the model to read, preserve, and re-emit every identifier. That makes transposition errors measurable.

On UUIDs, errors show up quickly. On deterministic triplet identifiers, the exact same task became noticeably more stable in this test. That matters for accuracy, but also because cleaner identifiers make it easier to understand what happened when a system does fail, especially on smaller or faster models where representation debt shows up sooner.

Model UUID benchmark DTID benchmark Change
Gemini 2.5 Pro 1.1% error / 989 valid 0.0% error / 1000 valid Best observed case: zero drift in this run
Gemini 2.5 Flash 8.5% error / 915 valid 1.2% error / 988 valid About 7x fewer errors
Gemini 2.5 Flash Lite 98.9% error / 11 valid Improved materially / still unstable Suggests representation still matters here
OpenAI 4.1 4.9% error / 951 valid Not shown in final rerun Suggests headroom even on stronger models

This can matter more in agentic systems than in single-turn chats

Single-turn prompting already stresses identifiers. Multi-step agents stress them far more. A citation id might move from retrieval output to tool result, from tool result to planner, from planner to model response, from model response to renderer. Every hop is a chance to mutate the string.

That is one reason production systems can look worse than toy benchmarks. The identifier is not merely generated once. It circulates. And when it circulates in a cleaner format, it becomes easier to spot whether the break happened in retrieval, in a tool handoff, or at final render time.

1. Search finds the right record
Retrieval succeeds and the system now has a correct internal citation id.
2. The id crosses model boundaries
Tool calls, reasoning traces, chain-of-thought summaries, and final prompts all repeat the identifier.
3. One microscopic drift breaks rendering
The model returns a string that looks almost right, but no longer resolves against the database.

Why this matters in production

133k
Total code citations observed in production output during one analysis pass.
34k
Document citations in the same review, crossing even more heterogeneous source types.
38%
Code citations that did not match a database identifier, despite the system intending to emit a resolvable citation.
~50%
Document-side mismatch severity described in the walkthrough: nearly half of citations failing to render.
05 / System design

Identifiers should be treated as part of the model interface

Traditional software treats identifiers as back-office implementation details. In LLM systems, that is no longer true. The identifier is now part of the model interface. It lives inside prompts. It gets copied by a generative model. It is parsed by a renderer. That means identifier design belongs next to prompt design, tool design, and output-schema design.

The right mental model is not only "make IDs unique." It is also "make IDs survivable." Uniqueness is table stakes. Survivability makes the system easier to reason about when something goes sideways.

That does not mean every system should abandon UUIDs globally. It means systems should be willing to introduce an LLM-facing identifier layer where exact string fidelity matters. You can keep UUIDs internally and project a deterministic, tokenizer-friendly alias outward.

Implementation pattern

Keep the alias close to the data model

Create a dedicated citation-identifier mapping table, generate the deterministic id once, validate uniqueness, and let prompts speak only in that alias. The model never needs to see the raw UUID.

Operational advantage

The format is debuggable by humans too

When an identifier is visually structured, engineers can spot transpositions faster, compare examples faster, and write sanity checks that match the mental model of the system.

06 / Takeaway

Context engineering is not only about prompts. It is also about shapes.

LLM systems fail on small things that classical software barely notices: a separator, a token boundary, a repeated shape that the model can or cannot hold in working memory. Those details feel cosmetic until they become the difference between a working citation and an invisible one.

Deterministic identifiers are a narrow idea, but they point at a broader lesson. If a string has to survive inside a language model, its form matters. Human readability matters. Tokenizer regularity matters. Repetition matters. In agentic systems, representation is part of the architecture.