Some citation failures are not retrieval failures. They are identifier failures: long random strings fracture into messy tokens, drift across model turns, and come back one character wrong. A better ID format can make failures easier to diagnose, and often makes the whole system more accurate too.
In a retrieval system, it is natural to blame missing citations on bad ranking, bad embeddings, or missing source data. But in practice, some failures happen after retrieval succeeds. The model finds the right citation, carries its identifier through multiple tool calls and model turns, and then emits a string that is almost right.
Almost right is still broken. If the database expects one exact identifier and the model returns a nearby variant, the frontend cannot resolve it. The citation vanishes. The user sees a plain sentence instead of a linked, inspectable source. From the outside it looks like the model never found the source at all.
That distinction matters because it changes the solution. If the failure is in recall, you improve search. If the failure is in representation, you improve the identifier. And just as important, you make the failure legible enough that engineers can tell where the system went wrong instead of treating every broken citation like a retrieval mystery.
UUIDs are great for databases and distributed systems. They are terrible when the same string must be carried through natural-language generation, tool output, prompt stuffing, and final rendering.
An LLM-friendly identifier should be regular enough to survive repetition. The model should not have to memorize a pile of meaningless subfragments. It should carry a short sequence of clear units.
When an identifier becomes legible to the tokenizer, it becomes more durable inside the model.
LLMs do not read strings the way humans do. They read tokens: irregular chunks chosen by a learned tokenizer. Common words often map cleanly. Unfamiliar strings do not. A UUID tends to shatter into inconsistent pieces: one token for part of a hex group, another for a hyphen plus a letter, another for a trailing fragment that means nothing on its own.
That fragmentation creates two layers of difficulty. First, the model has more pieces to preserve. Second, those pieces do not correspond to semantic units. There is no human-like sense that one chunk belongs with the next. The string is just statistical debris.
A fixed pattern of three-digit groups changes that. The tokenizer tends to preserve each triplet as an atom. The resulting sequence is not shorter only in characters; it is cleaner in model-space.
Every group should look identical from the model's perspective. Zero padding matters because it removes visual ambiguity.
Hyphens or colons are fine. The key is a repeating rhythm the model can predict and preserve.
Five groups of three digits gives about 10^15 combinations: plenty for citation-scale identifiers without turning them back into noise.
The same shape can be random or derived from a stable source value like a record id or SHA, depending on the use case.
The proposed format is intentionally plain. Instead of a 36-character UUID, use five groups of three digits: 263-037-515-764-525. This is long enough to avoid practical collisions at citation scale, but structured enough that a model can carry it through a conversation without constantly breaking it apart.
The deeper point is not the punctuation. Hyphens, colons, brackets, or wrapper syntax can change by surface. What matters is the underlying shape: a repeated sequence of compact, fixed-width atoms.
Once the shape is stable, the rest of the system gets easier to inspect. Human debugging gets easier. Prompt examples get easier. Render-time validation gets easier. If a model drifts, the drift is visible, which reduces a lot of the chaos in understanding whether the failure lived in retrieval, transport, prompting, or rendering.
class DtidUtility
def self.generate_random
5.times.map { format("%03d", rand(1000)) }.join("-")
end
def self.generate_from_sha(input)
digest = Digest::SHA256.hexdigest(input.to_s)
digits = digest.scan(/[0-9a-f]{3}/).first(5).map { |chunk| chunk.to_i(16) % 1000 }
digits.map { |n| format("%03d", n) }.join("-")
end
end
A useful stress test is simple: give the model a large set of identifiers and ask it to sort them. The task forces the model to read, preserve, and re-emit every identifier. That makes transposition errors measurable.
On UUIDs, errors show up quickly. On deterministic triplet identifiers, the exact same task became noticeably more stable in this test. That matters for accuracy, but also because cleaner identifiers make it easier to understand what happened when a system does fail, especially on smaller or faster models where representation debt shows up sooner.
| Model | UUID benchmark | DTID benchmark | Change |
|---|---|---|---|
| Gemini 2.5 Pro | 1.1% error / 989 valid | 0.0% error / 1000 valid | Best observed case: zero drift in this run |
| Gemini 2.5 Flash | 8.5% error / 915 valid | 1.2% error / 988 valid | About 7x fewer errors |
| Gemini 2.5 Flash Lite | 98.9% error / 11 valid | Improved materially / still unstable | Suggests representation still matters here |
| OpenAI 4.1 | 4.9% error / 951 valid | Not shown in final rerun | Suggests headroom even on stronger models |
Single-turn prompting already stresses identifiers. Multi-step agents stress them far more. A citation id might move from retrieval output to tool result, from tool result to planner, from planner to model response, from model response to renderer. Every hop is a chance to mutate the string.
That is one reason production systems can look worse than toy benchmarks. The identifier is not merely generated once. It circulates. And when it circulates in a cleaner format, it becomes easier to spot whether the break happened in retrieval, in a tool handoff, or at final render time.
Traditional software treats identifiers as back-office implementation details. In LLM systems, that is no longer true. The identifier is now part of the model interface. It lives inside prompts. It gets copied by a generative model. It is parsed by a renderer. That means identifier design belongs next to prompt design, tool design, and output-schema design.
The right mental model is not only "make IDs unique." It is also "make IDs survivable." Uniqueness is table stakes. Survivability makes the system easier to reason about when something goes sideways.
That does not mean every system should abandon UUIDs globally. It means systems should be willing to introduce an LLM-facing identifier layer where exact string fidelity matters. You can keep UUIDs internally and project a deterministic, tokenizer-friendly alias outward.
Create a dedicated citation-identifier mapping table, generate the deterministic id once, validate uniqueness, and let prompts speak only in that alias. The model never needs to see the raw UUID.
When an identifier is visually structured, engineers can spot transpositions faster, compare examples faster, and write sanity checks that match the mental model of the system.
LLM systems fail on small things that classical software barely notices: a separator, a token boundary, a repeated shape that the model can or cannot hold in working memory. Those details feel cosmetic until they become the difference between a working citation and an invisible one.
Deterministic identifiers are a narrow idea, but they point at a broader lesson. If a string has to survive inside a language model, its form matters. Human readability matters. Tokenizer regularity matters. Repetition matters. In agentic systems, representation is part of the architecture.