# WordBridge: Passive Ambient Lexical and Contextual Support for Anomic Aphasia and Early-Stage Alzheimer's Disease via Dual-Model Streaming Architecture

---

## Abstract

Anomic aphasia and early-stage Alzheimer's disease share two compounding communication deficits: lexical retrieval failure and progressive loss of conversational context. Existing assistive tools require explicit user-initiated queries — a structural mismatch for populations whose primary challenge is recognizing and articulating their own failure in real time. This proposal describes **WordBridge**, a passive ambient communication assistant that continuously monitors conversational audio, detects circumlocutory speech and hesitation markers without prompting, and delivers ranked lexical candidates and conversational anchors through a nearby speaker or personal audio device in real time.

WordBridge is designed expressly around TML's dual-model streaming architecture: the Interaction Model handles sub-utterance real-time detection and delivery; the Background Model maintains longitudinal session context, manages a patient profile, and coordinates caregiver escalation. A physiological state-aware safety framework — integrating wearable heart rate and galvanic skin response signals — governs tiered intervention fallback, addressing TML's stated alignment research agenda for real-time interfaces operating with vulnerable populations.

---

## 1. Background and Significance

### 1.1 Anomic Aphasia

Aphasia affects approximately 2 million Americans, with 180,000 new cases annually following stroke, traumatic brain injury, or neurological disease. Anomic aphasia is the most prevalent subtype — characterized by selective lexical retrieval failure in the presence of intact grammar, fluency, and comprehension. The breakdown occurs at the final step of word production: mapping an intact conceptual representation to its phonological form. The result is circumlocutory speech — semantic paraphrases, descriptions, and approximations produced in place of the unretrievable target word.

The psychosocial burden is severe and well-documented. Up to 44% of individuals with aphasia develop comorbid anxiety disorders, and quality of life scores are significantly lower than both healthy controls and non-aphasic stroke survivors. Social withdrawal is a consistent outcome — driven not by cognitive incapacity but by the compounding cost of public, repeated retrieval failure.

### 1.2 Alzheimer's Disease

Anomia is among the earliest and most diagnostically reliable language symptoms of Alzheimer's disease, preceding significant episodic memory loss in many patients. As the disease progresses, word-finding failure is compounded by a second deficit: conversational thread loss — the inability to maintain the context of an ongoing exchange. Patients lose track of what was asked, what they themselves said minutes prior, and what the current topic is. These two deficits co-occur and mutually reinforce their impact on communication quality, creating a clinical profile that maps directly onto WordBridge's dual-component design.

### 1.3 The Gap in Current Tools

No existing AI assistive tool operates passively. Current approaches — predictive text, AAC boards, voice assistants — are query-initiated: the user must recognize their failure, decide to seek help, navigate an interface, and formulate a request. For both anomic aphasia and Alzheimer's, this model is clinically backwards. The population most in need of retrieval support is least positioned to initiate a structured query at the moment of failure. WordBridge eliminates this requirement entirely by shifting from reactive to ambient operation.

---

## 2. Hypotheses

**H1 (Passive Detection):** A continuously listening streaming model will detect circumlocutory speech onset and produce a correct Top-3 lexical candidate without explicit user prompting, at accuracy non-inferior (within 5 percentage points) to explicit-query baselines on held-out natural AphasiaBank samples.

**H2 (Temporal Advantage):** TML's full-duplex sub-utterance streaming will produce a first correct lexical candidate at a significantly earlier timepoint — measured at 500ms, 1s, and 2s windows from circumlocution onset — compared to turn-based pipeline baselines. The 500ms window is the target for delivery during active speech; 2s is the outer bound of clinical utility.

**H3 (Contextual Anchoring):** The Background Model's rolling conversational memory will produce accurate topic anchors that reduce time-to-topic-recovery in simulated Alzheimer's-profile thread-loss scenarios, as rated by blinded speech-language pathologist evaluators.

**H4 (Safety Fallback):** Physiological state-aware tiered intervention — governed by wearable distress signals — will produce lower false-positive intervention rates during high-distress states compared to a state-agnostic baseline, without measurably degrading retrieval support during normal states. Critically, this comparison includes a nested test of context-reasoned tier transitions (Background Model) against raw HR/GSR threshold triggers, to verify that the Background Model's reasoning adds value over a simpler sensor threshold.

---

## 3. System Architecture

WordBridge is built on TML's dual-model framework — an Interaction Model for real-time perception and response, and a Background Model for asynchronous longitudinal reasoning — operating as two coordinated agents over a shared session context.

### 3.1 Interaction Model Layer

Continuously processes incoming audio via TML's full-duplex 200ms micro-turn stream. At each tick the model maintains a rolling hypothesis over the user's likely intended lexical target, updated in real time as new audio arrives. Circumlocution detection is triggered by prosodic markers (hesitation, rising intonation, trailing), semantic paraphrasing patterns, and disfluency signatures — without requiring the user to pause or initiate a query. Upon detection, ranked Top-3 lexical candidates are delivered as brief, non-interruptive audio cues. The model continues listening and refining candidates as the description develops, exploiting TML's simultaneous speech capability to update suggestions without interrupting the user's own speech.

The Interaction Model is instantiated on TML-Interaction-Small: a 276B-parameter mixture-of-experts model with 12B active parameters, operating in 200ms micro-turns. Published benchmarks confirm a 0.40s response latency (FD-bench V1) and a 77.8 quality score on interruption and overlapping-speech handling (FD-bench V1.5). TML's internal TimeSpeak benchmark (64.7% macro-accuracy) and CueSpeak benchmark (81.7%) evaluate time-triggered and semantically-timed speech generation — the exact capability WordBridge relies on to deliver a candidate at the right moment during active speech, not just fast.

### 3.2 Background Model Layer

Operates asynchronously on accumulated session context across the full conversation. Responsibilities:

- **Rolling conversational summary** — maintains a continuously updated session digest
- **Topic drift detection** — monitors for Alzheimer's-profile conversational repetition and thread loss
- **Contextual anchor generation** — surfaces brief reorientation cues ("you were asking about your medication schedule") to the Interaction Model for delivery at appropriate moments
- **Longitudinal patient profile** — tracks vocabulary patterns, retrieval difficulty trends, and session-over-session changes
- **Caregiver dashboard** — asynchronously generates session summaries and anomaly flags for clinical review

The Background Model functions as the memory and reasoning layer that the Interaction Model's latency constraints prevent it from maintaining directly. Neither layer alone is sufficient: the Interaction Model lacks the context horizon for anchor generation; the Background Model lacks the temporal resolution for real-time detection.

TML identifies "agentic intelligence" as an essential next capability for interaction models. The Background Model is exactly this: an autonomous agent making tier escalation decisions, generating contextual interventions, and coordinating caregiver alerts — all without explicit user instruction. WordBridge is a concrete, testable instantiation of that agentic layer operating in a high-stakes real-world context.

### 3.3 Alignment and Safety Framework

A passive always-on model operating with a vulnerable population introduces alignment challenges that are non-trivial to resolve and clinically consequential if mishandled. A misfire for a neurotypical user is annoying; for an Alzheimer's patient mid-confusion or an aphasia speaker mid-anxiety spiral, an incorrectly timed intervention can actively worsen the episode.

WordBridge addresses this through a physiological state-aware tiered fallback, integrating wearable heart rate and galvanic skin response (GSR) signals as distress indicators:

| Tier | State | System Behavior |
|------|-------|-----------------|
| 1 | Baseline HR/GSR | Full ambient operation: passive lexical suggestions + contextual anchors |
| 2 | Elevated HR/GSR — mild distress | Suppress lexical suggestions; deliver contextual anchors only |
| 3 | High distress + conversational collapse | Suppress all suggestions; silent caregiver alert; observation only |

The Background Model governs tier transitions — it has the longitudinal context to distinguish genuine distress from transient physiological noise (elevated HR from physical activity, excitement, caffeine), while the Interaction Model executes the delivery policy. This division of responsibility is a structural safety property: no single model has both the authority to escalate and the real-time pressure that might cause premature escalation.

This framework directly addresses TML's stated alignment research interest: a real-time interface where the cost of misalignment is measurable, the vulnerable population makes the stakes concrete, and the fallback hierarchy requires non-trivial coordination between two models with different time horizons. WordBridge is designed as a safety research testbed, not just a clinical tool.

---

## 4. Methods

### 4.1 Dataset Construction

**Primary corpus:** AphasiaBank ([talkbank.org](https://talkbank.org)) — open-access, IRB-cleared transcripts and audio from standardized discourse tasks with a dedicated anomic aphasia cohort. AphasiaBank provides CHAT-format transcripts with disfluency markers but does not include sub-second circumlocution onset timestamps — these will be annotated, yielding an estimated 300–500 natural labeled samples. This onset-labeled corpus is itself a contribution: no existing dataset provides circumlocution events labeled at the temporal resolution H2 requires. Annotations will be contributed back to AphasiaBank under the existing data use agreement.

**Synthetic augmentation:** Synthetic circumlocutions generated by prompting a base LLM constrained to anomic speech profiles (intact grammar, semantic paraphrasing, hesitation injection). Alzheimer's conversational drift simulated via progressive topic substitution and working memory degradation patterns drawn from published discourse analysis literature. All synthetic samples clearly labeled and reported separately in all experiments.

**Final dataset:** ~1,500 samples across circumlocution retrieval and conversational anchor tasks. Natural AphasiaBank samples reserved exclusively for the test set.

### 4.2 Experimental Design

**Baselines:**

| Baseline | What it tests |
|----------|--------------|
| Explicit-query GPT-4o | H1 accuracy ceiling — active help, full utterance, can prompt for clarification |
| Whisper + LLM turn-based pipeline | H2 latency floor — status quo, transcribes full utterance then queries |
| Semantic similarity retrieval | H1 without LLM reasoning — embedding nearest-neighbor over vocabulary |
| State-agnostic WordBridge | H4 calibration — full system with safety framework disabled |
| Threshold-based tier gating | H4 nested comparison — raw HR/GSR thresholds, no Background Model reasoning |

**Primary outcome measures:**
- Top-1 and Top-3 lexical retrieval accuracy on held-out natural test set
- Latency to first correct candidate at 500ms, 1s, 2s streaming windows (cumulative distribution, not just mean)
- Contextual anchor accuracy rated by blinded speech-language pathologist evaluators
- False-positive intervention rate under simulated high-distress conditions (H4), comparing threshold-based vs. context-reasoned tier transitions

**Secondary outcome measures:**
- Circumlocution tolerance curve — accuracy as a function of elapsed time from onset, revealing whether the model improves with context or plateaus early
- Background Model summary coherence under Alzheimer's drift simulation
- Tier transition accuracy against ground-truth physiological labels

### 4.3 Prototype Implementation

Continuous ambient audio input with real-time lexical candidates and contextual anchors delivered as non-interruptive audio through a nearby speaker or personal audio device. Wearable integration via BLE (Bluetooth Low Energy) for HR/GSR ingestion. Web-based caregiver dashboard for session review and anomaly alerts. Full open-source release of dataset, evaluation pipeline, model configuration, and prototype.

---

## 5. Prior Work

This proposal extends **Clarity** ([github.com/LEAF420/clarity-ai](https://github.com/LEAF420/clarity-ai)), a privacy-first on-device multimodal communication assistant for aphasia, autism, and social anxiety built with Gemma 3n. Clarity established the ethical framework, privacy architecture, and initial multimodal design. WordBridge advances on three axes: passive ambient operation replacing explicit-query interaction, TML's dual-model streaming architecture replacing turn-based inference, and a physiological state-aware safety framework replacing static confidence thresholds.

The closest architectural analog in the literature is **ProAct** (proactive embodied social agents), which demonstrates that separating the decision to initiate from the execution of initiation — precisely the Background/Interaction Model split WordBridge uses — measurably improves both decision quality and interaction naturalness versus end-to-end approaches.

---

## 6. Expected Contributions

1. A publicly released annotated circumlocution dataset derived from AphasiaBank with documented onset-timestamp labeling methodology and synthetic augmentation — contributed back to AphasiaBank
2. A quantitative benchmark for passive ambient circumlocution detection at sub-utterance latency — a task with no existing evaluation standard
3. Empirical evidence on whether TML's dual-model architecture provides measurable clinical advantage over turn-based baselines for anomic lexical retrieval and Alzheimer's conversational anchoring
4. A concrete alignment and safety framework for passive always-on models in vulnerable populations — including a tiered physiological fallback hierarchy and dual-model escalation protocol with an explicit nested comparison of context-reasoned vs. threshold-based tier transitions
5. An open-source reference implementation of passive ambient AAC, suitable for clinical research and follow-on development

---

*Research notes and interactive demo: /blog/wordbridge-intro*
