aarushi singh · cs'26 · ml research + engineering

header

should models always finish their sentences : the case for explicit hesitation

so the industry has spent three years optimizing for fluency, accidentally creating the world’s most confident liars. we treat hallucination as a bug to be patched with more data when its a structural byproduct of our objective functions.

we train models to minimize prediction error on a fixed string, which forces them to collapse the latent uncertainty of the world into a single, high-probability token.

we reward completion. we punish silence. the rest follows.

but in human intelligence, the most critical layer is the interrupt. it’s the metacognitive pause, the moment where the system detects a low confidence state and decides to actually not execute the next step.

the ghost in the hidden layers

if you probe the intermediate residual streams of a transformer, the "doubt" is very much already there. research into "internal hallucination triggers" shows that models often display high logit entropy (a flat probability distribution) right before they fabricate a fact.

the model knows it is guessing. its attention heads are struggling to find a stable dependency, but the auto-regressive loop is a treadmill that just. won't. let. it. stop.

we have actively tuned these signals out. through RLHF, we have actively penalized models for saying "i don't know" because human raters historically preferred a confident, slightly wrong answer over a "useless" admission of ignorance. we are literally training the pause out of the system.

scaling the stop signal

true “AGI” won't be defined by how many tokens it can generate per second, but by its selective classification capability. if we want a model that can perform heart surgery or manage a financial grid, we need to stop rewarding completion and start rewarding active abstention.

while modern ai ides attempt to simulate this by wrapping models in agentic loops using external scripts to check for terminal errors or index files, we could consider something deeper: moving toward architectures where hesitation is a first-class, “cannot be sidelined” primitive.

think of this: an explicit state (not a wrapper script) where the model can go ahead and halt the generation loop to request more compute (via chain-of-thought), a tool use verification, or a human in the loop. to move the pause button from the orchestration layer into the latent space.

the unreliable perfection

a system that never says "i don't know" isn't intelligent but a fixed-function script. intelligence shouldn’t plainly be defined as the ability to complete a pattern but the capacity to recognize when the pattern no longer applies to a given situation.

until we make the invisible doubt visible, we’re building very fast systems that never learned how to stop.

and a system that cannot stop will always sound confident right up until it’s catastrophically wrong.