Every assistive communication tool built so far shares one assumption: the user will notice they're struggling, decide to ask for help, and operate an interface to get it.
For most disabilities, that's a reasonable assumption. For the two conditions this series is about, it's backwards.
The condition this series is about
Anomic aphasia is the most common aphasia subtype. Grammar is intact, comprehension is intact, fluency is intact — but somewhere between "I know exactly what I mean" and "the word comes out of my mouth," the connection drops. What's left is circumlocution: describing the thing instead of naming it.
Example(What circumlocution sounds like)
"I need the — you know, the thing — it's round, you put food on it, it goes in the cupboard with the bowls — the... the..."
The speaker knows exactly what they mean. Grammar, intonation, and turn-taking are all intact. The single missing piece is the word "plate."
Early-stage Alzheimer's compounds this with a second failure: conversational thread loss. Not just losing words, but losing the thread of the conversation itself — what was just asked, what you yourself said two minutes ago, what topic you're even on.
Definition(Anomic aphasia)
Selective lexical retrieval failure with intact grammar, comprehension, and fluency. The breakdown is at the final step of word production — mapping a fully-formed concept to its phonological form. Result: semantic paraphrase and description in place of the target word.
Both deficits are common after stroke, traumatic brain injury, and — critically for Alzheimer's — anomia is often one of the earliest reliable language symptoms, showing up before major episodic memory loss does.
Why current tools don't fit
Predictive text, AAC boards, voice assistants — all of them are query-initiated. You have to:
- Notice you're failing to retrieve a word
- Decide to seek help
- Find and operate an interface
- Formulate a request
Step 1 is the problem. For anomic aphasia, the moment of failure is often not salient to the person experiencing it — they're mid-sentence, mid-thought, often mid-social-interaction with the added pressure of someone waiting on them. For Alzheimer's, the conversational thread is already gone by the time anyone would think to ask "what was I talking about?"
The population that most needs retrieval support is the population least equipped to initiate a structured request for it, at the exact moment it would help.
Important(The core claim)
Any tool that requires the user to recognize their own failure and act on it is solving the wrong half of the problem. The hard part isn't retrieval assistance — it's knowing, without being told, that assistance is needed right now.
What WordBridge proposes instead
WordBridge flips the interaction model from reactive to ambient: a system that listens continuously, detects the signs of word-finding failure as they happen — hesitation, rising intonation, trailing off, semantic paraphrasing — and delivers ranked word candidates through a nearby speaker or personal audio device without being asked.
This only works if the underlying model can do two things that are normally in tension:
- React in real time, at the sub-second granularity of actual speech (to catch hesitation as it starts)
- Hold enough context about the whole conversation to know what's actually being talked about (to suggest the right word, not just a word)
A single model under tight latency constraints can't do both — the context horizon needed for the second task is exactly what the first task's latency budget doesn't allow.
Intuition(Why one model can't do both)
A model fast enough to react within a few hundred milliseconds doesn't have time to re-read five minutes of conversation before responding. A model that can digest five minutes of context takes longer than a few hundred milliseconds to do it. Speed and context horizon trade off against each other inside a single forward pass — so WordBridge doesn't try to fit both into one model. It splits them into two models running on different clocks.
The dual-model split
This is where WordBridge leans on Thinking Machines Lab's interaction model architecture: a fast "frontend" model that processes audio in continuous ~200ms micro-turns and can listen and respond simultaneously, paired with an asynchronous background model that handles longer-horizon reasoning and streams results back into the live session (Thinking Machines Lab, VentureBeat).
Mapped onto WordBridge:
| Layer | Job | Constraint it's built around |
|---|---|---|
| Interaction Model | Detect circumlocution onset in real time, deliver Top-3 word candidates via audio output | Sub-second latency — must not interrupt the speaker |
| Background Model | Maintain a rolling session summary, track topic drift, generate contextual anchors ("you were asking about your medication") | Needs the full conversation history — can't run on a 200ms budget |
Neither layer alone is sufficient. The interaction model has no room to think about the last five minutes; the background model has no way to act inside the current second.
The part that worries me more than the modeling
A system that's always listening and occasionally interjects into someone's conversation is, by construction, a system that can get the timing wrong. For most users that's an annoyance. For someone mid-anxiety-spiral over their own aphasia, or an Alzheimer's patient already confused about where they are, an ill-timed interruption isn't neutral — it can make things worse.
WordBridge's answer is a physiological-state-aware tiered fallback, using wearable heart rate and GSR (galvanic skin response) as a distress signal that gates what the system is allowed to do:
| Tier | State | Behavior |
|---|---|---|
| 1 | Baseline HR/GSR | Full operation — word candidates + contextual anchors |
| 2 | Elevated HR/GSR (mild distress) | Suppress word suggestions; contextual anchors only |
| 3 | High distress + conversational collapse | Suppress everything; silent caregiver alert; observation only |
Warning(Tier 3 is a fail-safe, not a feature)
By the time the system reaches Tier 3, it has already stopped trying to help with retrieval or context — its only job is to get out of the way and quietly notify a caregiver. If Tier 3 triggers often in practice, that's a signal the earlier tiers are miscalibrated, not that Tier 3 is "working as intended."
The detail I find most interesting here is the division of authority: the background model decides whether to escalate (it has the context to distinguish real distress from noise), but the interaction model executes the delivery policy. Neither layer has both the authority to escalate and the real-time pressure that could cause a premature escalation. That's a structural safety property, not a tuned threshold — and it's a genuinely interesting small case study in how to split responsibility between a fast model and a slow one when the cost of a mistake isn't symmetric.
Note
This framing — a real-time interface where misalignment has a measurable, human cost, and where the fix requires coordination between two models with different time horizons — is also why this is being pitched as an alignment research testbed, not just an assistive device.
What's next
This post is the overview. The next few posts in this series are research notes — going deeper on the pieces above before any implementation starts:
- The actual literature on detecting circumlocution and lexical retrieval failure from speech — AphasiaBank, disfluency markers, what's been tried
- Dual-model streaming architectures in more depth — what the interaction/background split looks like elsewhere, and where it might break for this use case
- The safety framework as its own problem — tiered fallback design for always-on assistive systems, and what "getting it wrong" actually costs
- What's actually different about TML-Interaction-Small — a non-specialist walkthrough of the model powering the Interaction Model
- How would you know if it worked? — the evaluation design, baselines, and the benchmark gap
- Where does the data come from? — AphasiaBank, onset-label annotation plan, and honest corpus size estimates
Summary(Summary)
WordBridge is a proposal for a passive, always-listening communication aid for anomic aphasia and early Alzheimer's — conditions where the person most in need of help is least able to ask for it. It splits the problem across a fast interaction model (real-time detection and delivery) and a slow background model (context, memory, and safety tier decisions), with a physiological-signal-gated fallback system designed so that no single layer has both the authority and the time pressure to make a bad call alone.
Note(See it in action)
The interactive demo reconstructs both scenarios in real time — anomic aphasia and early Alzheimer's — with live candidate ranking, contextual anchors, and tier escalation. If you want the full research framing, the grant proposal covers the hypotheses, evaluation design, and dataset plan.