# Copyright (c) 2026 DysVoxa. # Speech Card Template for Dysarthric ASR Adaptation # # This template documents speaker-specific dysarthria traits, phoneme confusions, # and prosodic features that boost ASR accuracy by 30-60% via speaker adaptation. # Fill in sections based on baseline ASR transcription analysis and SLP consultation. ## Personal Info Speaker ID: [SPEAKER_ID or leave for auto-generation] Recording Date: [YYYY-MM-DD] Dysarthria Type: ataxic Dysarthria Severity: [mild|moderate|severe] per PVDD score or SLP rating Age: [years] Gender: [M|F|Other] Disease/Etiology: spinocerebellar ataxia Disease Duration: [years since diagnosis] Current Devices/Aids: [e.g., "none", "cane", "wheelchair", list speech apps/AAC] ## Common Error Patterns (Phoneme Confusion Matrix) Confusion Rate Format: "intended_phoneme -> pronounced_phoneme (error_rate%)" ### Consonants [List top consonant confusions observed from baseline ASR analysis] Example patterns (ataxic dysarthria typically shows): r -> l (70%) t -> d (60%) s -> z (50%) p -> b (45%) k -> g (40%) ### Vowels [Vowel substitutions or lengthening patterns] Example patterns: /i/ -> /uh/ (40%) /eh/ -> /ae/ (35%) Irregular stress (30%) ### Prosody / Suprasegmentals Articulation Rate: [syllables/sec, typical 3-5 for normal, <2 for severe dysarthria] Pause Duration: Variable / Irregular / [specific ranges in ms] Voice Quality: Breathy / Harsh / Diplophonic / Tremor / [describe] Stress Pattern: Irregular / Reduced / [describe] Pitch Variation: Reduced / Monotone / Excessive / [describe] ## Pronunciation Dictionary Map each commonly-confused word variant to its intended form. Format: intended_wordpronounced_variantfrequency_estimate Examples: red led frequent (3x per recording) table dable occasional (1-2x) rapid lapid frequent therapy derapy occasional schedule skedule rare brother brudder frequent water wader frequent pharmacy family occasional (semantic confusion) [Add more rows as discovered through ASR baseline transcription analysis] ## Sample Utterances for ASR Fine-Tuning Record these phrases at a comfortable pace. Include broad phoneme coverage. Format: "Intended Sentence" -> [Transcribe baseline ASR output if available, or leave for later] ### Digits & Simple Phrases 0. "one two three four five" -> [baseline ASR output] 1. "red table rapid" -> [baseline ASR output] 2. "brother water therapy" -> [baseline ASR output] 3. "schedule medication reminder" -> [baseline ASR output] 4. "open kitchen light" -> [baseline ASR output] ### Commands 5. "Turn on the kitchen light" -> [baseline ASR output] 6. "Call my therapy appointment" -> [baseline ASR output] 7. "Close the front door" -> [baseline ASR output] 8. "Please send a reminder" -> [baseline ASR output] ### Conversational 9. "The weather is clear this morning" -> [baseline ASR output] 10. "I am planning a short walk after breakfast" -> [baseline ASR output] 11. "Tomorrow I will call my family" -> [baseline ASR output] 12. "Place the blue notebook on the table" -> [baseline ASR output] ### Names & Proper Nouns (personalized) 13. "My name is [SPEAKER_NAME]" -> [baseline ASR output] 14. "My doctor is Dr. [DOCTOR_NAME]" -> [baseline ASR output] 15. "I live at [ADDRESS_SNIPPET]" -> [baseline ASR output] [Record 10-20 total phrases for optimal fine-tuning data] ## Creation Instructions 1. **Baseline Transcription**: - Record 5-10 minutes of read speech using "Sample Utterances" above - Run through baseline ASR (e.g., OpenAI Whisper) - Manually correct obvious confusions 2. **Phoneme Confusion Analysis**: - Use Montreal Forced Aligner (MFA) or SLP expertise to identify phoneme-level errors - For each error, note: [intended] → [spoken variant] and frequency - Fill "Common Error Patterns" section 3. **Pronunciation Dictionary**: - For top 10-20 confusion words, create pronunciation variants - Pair with frequency estimate ("frequent", "occasional", "rare") 4. **Prosody Notes**: - Measure articulation rate from recordings (audio analysis tools) - Note pause patterns, voice quality changes, stress irregularities 5. **Validation**: - Share with SLP partner for clinical validation - Update as dysarthria progresses (esp. for progressive diseases like ALS) ## Usage in ASR Pipeline ### For Whisper/Vosk (Lexicon-Based): - Export pronunciation dictionary to custom.dict format: ``` red R EH D led L EH D dable D EH B L ``` - Load into ASR backend via config ### For Fine-Tuning (HuggingFace/OpenAI): - Pair Sample Utterances audio with corrected reference transcripts - Train speaker-adapted model for 1-5 epochs - Expected improvement: 30-60% relative WER reduction ### For Correction Prompts: - Reference "Common Error Patterns" in correction system prompt - Example: "Speaker tends to confuse r<->l and t<->d; resolve ambiguous cases toward actual intended meaning" ## Review Schedule - Initial creation: After first 10-20 min baseline recording - Review: Every 3-6 months (or as dysarthria progresses) - Update: When new error patterns emerge or severity changes --- # End of file copyright: DysVoxa (c) 2026.