# Copyright (c) 2026 DysVoxa.
# Speech Card Template for Dysarthric ASR Adaptation
# 
# This template documents speaker-specific dysarthria traits, phoneme confusions,
# and prosodic features that boost ASR accuracy by 30-60% via speaker adaptation.
# Fill in sections based on baseline ASR transcription analysis and SLP consultation.

## Personal Info
Speaker ID: [SPEAKER_ID or leave for auto-generation]
Recording Date: [YYYY-MM-DD]
Dysarthria Type: ataxic
Dysarthria Severity: [mild|moderate|severe] per PVDD score or SLP rating
Age: [years]
Gender: [M|F|Other]
Disease/Etiology: spinocerebellar ataxia
Disease Duration: [years since diagnosis]
Current Devices/Aids: [e.g., "none", "cane", "wheelchair", list speech apps/AAC]

## Common Error Patterns (Phoneme Confusion Matrix)

Confusion Rate Format: "intended_phoneme -> pronounced_phoneme (error_rate%)"

### Consonants
[List top consonant confusions observed from baseline ASR analysis]
Example patterns (ataxic dysarthria typically shows):
  r -> l (70%)
  t -> d (60%)
  s -> z (50%)
  p -> b (45%)
  k -> g (40%)

### Vowels
[Vowel substitutions or lengthening patterns]
Example patterns:
  /i/ -> /uh/ (40%)
  /eh/ -> /ae/ (35%)
  Irregular stress (30%)

### Prosody / Suprasegmentals
Articulation Rate: [syllables/sec, typical 3-5 for normal, <2 for severe dysarthria]
Pause Duration: Variable / Irregular / [specific ranges in ms]
Voice Quality: Breathy / Harsh / Diplophonic / Tremor / [describe]
Stress Pattern: Irregular / Reduced / [describe]
Pitch Variation: Reduced / Monotone / Excessive / [describe]

## Pronunciation Dictionary

Map each commonly-confused word variant to its intended form.
Format: intended_word<TAB>pronounced_variant<TAB>frequency_estimate

Examples:
red	led	frequent (3x per recording)
table	dable	occasional (1-2x)
rapid	lapid	frequent
therapy	derapy	occasional
schedule	skedule	rare
brother	brudder	frequent
water	wader	frequent
pharmacy	family	occasional (semantic confusion)

[Add more rows as discovered through ASR baseline transcription analysis]

## Sample Utterances for ASR Fine-Tuning

Record these phrases at a comfortable pace. Include broad phoneme coverage.
Format: "Intended Sentence" -> [Transcribe baseline ASR output if available, or leave for later]

### Digits & Simple Phrases
0. "one two three four five" -> [baseline ASR output]
1. "red table rapid" -> [baseline ASR output]
2. "brother water therapy" -> [baseline ASR output]
3. "schedule medication reminder" -> [baseline ASR output]
4. "open kitchen light" -> [baseline ASR output]

### Commands
5. "Turn on the kitchen light" -> [baseline ASR output]
6. "Call my therapy appointment" -> [baseline ASR output]
7. "Close the front door" -> [baseline ASR output]
8. "Please send a reminder" -> [baseline ASR output]

### Conversational
9. "The weather is clear this morning" -> [baseline ASR output]
10. "I am planning a short walk after breakfast" -> [baseline ASR output]
11. "Tomorrow I will call my family" -> [baseline ASR output]
12. "Place the blue notebook on the table" -> [baseline ASR output]

### Names & Proper Nouns (personalized)
13. "My name is [SPEAKER_NAME]" -> [baseline ASR output]
14. "My doctor is Dr. [DOCTOR_NAME]" -> [baseline ASR output]
15. "I live at [ADDRESS_SNIPPET]" -> [baseline ASR output]

[Record 10-20 total phrases for optimal fine-tuning data]

## Creation Instructions

1. **Baseline Transcription**: 
   - Record 5-10 minutes of read speech using "Sample Utterances" above
   - Run through baseline ASR (e.g., OpenAI Whisper)
   - Manually correct obvious confusions

2. **Phoneme Confusion Analysis**:
   - Use Montreal Forced Aligner (MFA) or SLP expertise to identify phoneme-level errors
   - For each error, note: [intended] → [spoken variant] and frequency
   - Fill "Common Error Patterns" section

3. **Pronunciation Dictionary**:
   - For top 10-20 confusion words, create pronunciation variants
   - Pair with frequency estimate ("frequent", "occasional", "rare")

4. **Prosody Notes**:
   - Measure articulation rate from recordings (audio analysis tools)
   - Note pause patterns, voice quality changes, stress irregularities

5. **Validation**:
   - Share with SLP partner for clinical validation
   - Update as dysarthria progresses (esp. for progressive diseases like ALS)

## Usage in ASR Pipeline

### For Whisper/Vosk (Lexicon-Based):
- Export pronunciation dictionary to custom.dict format:
   ```
   red  R EH D
   led  L EH D
   dable  D EH B L
   ```
- Load into ASR backend via config

### For Fine-Tuning (HuggingFace/OpenAI):
- Pair Sample Utterances audio with corrected reference transcripts
- Train speaker-adapted model for 1-5 epochs
- Expected improvement: 30-60% relative WER reduction

### For Correction Prompts:
- Reference "Common Error Patterns" in correction system prompt
- Example: "Speaker tends to confuse r<->l and t<->d; resolve ambiguous cases toward actual intended meaning"

## Review Schedule
- Initial creation: After first 10-20 min baseline recording
- Review: Every 3-6 months (or as dysarthria progresses)
- Update: When new error patterns emerge or severity changes

---
# End of file copyright: DysVoxa (c) 2026.