[blog_mcp]/WordBridge/WordBridge Research Proposal
#research #proposal #aphasia #alzheimers #ai

WordBridge Research Proposal

The full research proposal for WordBridge — passive ambient AAC for anomic aphasia and early-stage Alzheimer's via TML's dual-model streaming architecture.

June 18, 2026|gitcoder89431|9 min read
Note(Thinking Machines Lab · Research Proposal)

This is the full proposal as submitted. The research notes series covers each section in more depth — start with the intro or jump to any post from the series nav below. You can also watch a reconstructed session in the interactive demo.

Abstract

Anomic aphasia and early-stage Alzheimer's disease share two compounding communication deficits: lexical retrieval failure and progressive loss of conversational context. Existing assistive tools require explicit user-initiated queries — a structural mismatch for populations whose primary challenge is recognizing and articulating their own failure in real time. This proposal describes WordBridge, a passive ambient communication assistant that continuously monitors conversational audio, detects circumlocutory speech and hesitation markers without prompting, and delivers ranked lexical candidates and conversational anchors through a nearby speaker or personal audio device in real time.

WordBridge is designed expressly around TML's dual-model streaming architecture: the Interaction Model handles sub-utterance real-time detection and delivery; the Background Model maintains longitudinal session context and coordinates caregiver escalation. A physiological state-aware safety framework governs tiered intervention fallback, addressing TML's stated alignment research agenda for real-time interfaces operating with vulnerable populations.

1. Background and Significance

1.1 Anomic Aphasia

Aphasia affects approximately 2 million Americans, with 180,000 new cases annually following stroke, traumatic brain injury, or neurological disease. Anomic aphasia is the most prevalent subtype — characterized by selective lexical retrieval failure in the presence of intact grammar, fluency, and comprehension. The breakdown occurs at the final step of word production: mapping an intact conceptual representation to its phonological form. The result is circumlocutory speech — semantic paraphrases and descriptions produced in place of the unretrievable target word.

The psychosocial burden is well-documented. Up to 44% of individuals with aphasia develop comorbid anxiety disorders, and quality of life scores are significantly lower than both healthy controls and non-aphasic stroke survivors. Social withdrawal is a consistent outcome — driven not by cognitive incapacity but by the compounding cost of public, repeated retrieval failure.

1.2 Alzheimer's Disease

Anomia is among the earliest and most diagnostically reliable language symptoms of Alzheimer's disease, preceding significant episodic memory loss in many patients. As the disease progresses, word-finding failure is compounded by conversational thread loss — the inability to maintain the context of an ongoing exchange. These two deficits co-occur and mutually reinforce their impact, creating a clinical profile that maps directly onto WordBridge's dual-component design.

1.3 The Gap in Current Tools

No existing AI assistive tool operates passively. Current approaches — predictive text, AAC boards, voice assistants — are query-initiated: the user must recognize their failure, decide to seek help, navigate an interface, and formulate a request. For both anomic aphasia and Alzheimer's, this model is clinically backwards. WordBridge eliminates this requirement entirely by shifting from reactive to ambient operation.

2. Hypotheses

H1 (Passive Detection): A continuously listening streaming model will detect circumlocutory speech onset and produce a correct Top-3 lexical candidate without explicit prompting, at accuracy non-inferior (within 5 percentage points) to explicit-query baselines on held-out natural AphasiaBank samples.

H2 (Temporal Advantage): TML's full-duplex sub-utterance streaming will produce a first correct lexical candidate at a significantly earlier timepoint — measured at 500ms, 1s, and 2s windows from circumlocution onset — compared to turn-based pipeline baselines.

H3 (Contextual Anchoring): The Background Model's rolling conversational memory will produce accurate topic anchors that reduce time-to-topic-recovery in simulated Alzheimer's-profile thread-loss scenarios, as rated by blinded speech-language pathologist evaluators.

H4 (Safety Fallback): Physiological state-aware tiered intervention will produce lower false-positive intervention rates than a state-agnostic baseline — including a nested comparison of context-reasoned (Background Model) vs. raw threshold-based tier transitions, to verify the Background Model's reasoning adds value over a simpler sensor trigger.

3. System Architecture

WordBridge is built on TML's dual-model framework — an Interaction Model for real-time perception and response, and a Background Model for asynchronous longitudinal reasoning — operating as two coordinated agents over a shared session context.

3.1 Interaction Model Layer

Continuously processes incoming audio via TML's full-duplex 200ms micro-turn stream. Circumlocution detection is triggered by prosodic markers (hesitation, rising intonation, trailing), semantic paraphrasing patterns, and disfluency signatures — without requiring the user to pause or initiate a query. Ranked Top-3 lexical candidates are delivered as brief, non-interruptive audio cues. The model continues refining candidates as the description develops, exploiting TML's simultaneous speech capability without interrupting the user's own speech.

The Interaction Model is instantiated on TML-Interaction-Small: a 276B-parameter mixture-of-experts model with 12B active parameters, operating in 200ms micro-turns. Published benchmarks: 0.40s response latency (FD-bench V1); 77.8 quality score on interruption and overlapping-speech handling (FD-bench V1.5). TML's internal TimeSpeak (64.7%) and CueSpeak (81.7%) benchmarks evaluate time-triggered and semantically-timed speech generation — the exact capability WordBridge relies on to deliver a candidate at the right moment during active speech, not just fast.

3.2 Background Model Layer

Operates asynchronously on accumulated session context. Responsibilities: rolling conversational summary; topic drift detection for Alzheimer's-profile thread loss; contextual anchor generation ("you were asking about your medication schedule"); longitudinal patient profile tracking vocabulary patterns and session-over-session changes; asynchronous caregiver dashboard with session summaries and anomaly flags.

The Background Model is the memory and reasoning layer the Interaction Model's latency constraints prevent it from maintaining directly. Neither layer alone is sufficient.

3.3 Alignment and Safety Framework

A passive always-on model operating with a vulnerable population introduces alignment challenges that are clinically consequential if mishandled. WordBridge addresses this through a physiological state-aware tiered fallback:

TierStateSystem Behavior
1Baseline HR/GSRFull ambient operation — lexical suggestions + contextual anchors
2Elevated HR/GSR (mild distress)Suppress lexical suggestions; contextual anchors only
3High distress + conversational collapseSuppress all suggestions; silent caregiver alert; observation only
Important(Structural safety property)

The Background Model governs tier transitions — it has the longitudinal context to distinguish genuine distress from transient physiological noise (physical activity, excitement). The Interaction Model executes the delivery policy. No single model has both the authority to escalate and the real-time pressure that might cause premature escalation.

This framework directly addresses TML's stated alignment research interest: a real-time interface where the cost of misalignment is measurable, the vulnerable population makes the stakes concrete, and the fallback hierarchy requires non-trivial coordination between two models with different time horizons. WordBridge is designed as a safety research testbed, not just a clinical tool.

4. Methods

4.1 Dataset Construction

Primary corpus: AphasiaBank (talkbank.org) — open-access, IRB-cleared transcripts and audio from standardized discourse tasks with a dedicated anomic aphasia cohort. AphasiaBank provides CHAT-format transcripts with disfluency markers but not sub-second circumlocution onset timestamps — these will be annotated, yielding an estimated 300–500 natural labeled samples. This onset-labeled corpus is itself a contribution: no existing dataset provides circumlocution events labeled at the temporal resolution H2 requires. Annotations will be contributed back to AphasiaBank.

Synthetic augmentation: ~1,000–1,200 samples generated by prompting a base LLM constrained to anomic speech profiles (intact grammar, semantic paraphrasing, hesitation injection). Alzheimer's conversational drift simulated via progressive topic substitution. All synthetic samples labeled and reported separately in all experiments.

Final dataset: ~1,500 samples. Natural AphasiaBank samples reserved exclusively for the test set.

4.2 Experimental Design

BaselineWhat it tests
Explicit-query GPT-4oH1 accuracy ceiling — active help, full utterance, can prompt for clarification
Whisper + LLM turn-based pipelineH2 latency floor — status quo, full utterance before any inference
Semantic similarity retrievalH1 without LLM reasoning — embedding nearest-neighbor over vocabulary
State-agnostic WordBridgeH4 calibration — full system, safety framework disabled
Threshold-based tier gatingH4 nested — raw HR/GSR thresholds, no Background Model reasoning

Primary outcomes: Top-1/Top-3 lexical retrieval accuracy; latency to first correct candidate at 500ms/1s/2s windows (cumulative distribution); contextual anchor accuracy rated by blinded SLP evaluators; false-positive intervention rate comparing threshold vs. context-reasoned tier transitions.

Secondary outcomes: Circumlocution tolerance curve (accuracy vs. elapsed time from onset); Background Model summary coherence under drift simulation; tier transition accuracy against ground-truth physiological labels.

4.3 Prototype Implementation

Continuous ambient audio input with real-time lexical candidates and contextual anchors delivered as non-interruptive audio through a nearby speaker or personal audio device. Wearable integration via BLE for HR/GSR ingestion. Web-based caregiver dashboard for session review and anomaly alerts. Full open-source release of dataset, evaluation pipeline, model configuration, and prototype.

5. Prior Work

This proposal extends Clarity (github.com/LEAF420/clarity-ai), a privacy-first on-device multimodal communication assistant for aphasia, autism, and social anxiety built with Gemma 3n. Clarity established the ethical framework, privacy architecture, and initial multimodal design. WordBridge advances on three axes: passive ambient operation replacing explicit-query interaction, TML's dual-model streaming architecture replacing turn-based inference, and a physiological state-aware safety framework replacing static confidence thresholds.

The closest architectural analog in the literature is ProAct (arXiv:2602.14048), which demonstrates that separating the decision to initiate from the execution of initiation — precisely the Background/Interaction Model split WordBridge uses — measurably improves both decision quality and interaction naturalness versus end-to-end approaches.

6. Expected Contributions

  1. A publicly released annotated circumlocution dataset derived from AphasiaBank with documented onset-timestamp labeling methodology — contributed back to AphasiaBank
  2. A quantitative benchmark for passive ambient circumlocution detection at sub-utterance latency — a task with no existing evaluation standard
  3. Empirical evidence on whether TML's dual-model architecture provides measurable clinical advantage over turn-based baselines for anomic lexical retrieval and Alzheimer's conversational anchoring
  4. A concrete alignment and safety framework for passive always-on models in vulnerable populations — tiered physiological fallback with an explicit nested comparison of context-reasoned vs. threshold-based tier transitions
  5. An open-source reference implementation of passive ambient AAC, suitable for clinical research and follow-on development
Note(Download)
CONTENTS
METADATA
DATEJun 18, 2026
BYgitcoder89431
READ9 min
TAGS#research#proposal#aphasia#alzheimers#ai
STATUSpublished