# Voice Mode Implementation Plan ## Phase 1: Clean State Machine ### Step 1: Rewrite state machine definition - Remove all unnecessary complexity - Clear state hierarchy - Simple event handlers - Proper tags on all states ### Step 2: Add test buttons to UI - Button: "Skip to Listening" - sends START_LISTENING - Button: "Simulate User Speech" - sends USER_STARTED_SPEAKING - Button: "Simulate Silence" - sends SILENCE_TIMEOUT - Button: "Simulate AI Response" - sends AI_RESPONSE_READY with test data - Button: "Skip Audio" - sends SKIP_AUDIO (already exists) - Display: Current state value and tags ## Phase 2: Fix Processing Logic ### Problem Analysis Current issue: The processing effect is too complex and uses refs incorrectly. ### Solution **Simple rule**: In processing state, check messages array: 1. If last message is NOT user with our transcript → submit 2. If last message IS user with our transcript AND second-to-last is assistant → play that assistant message 3. Otherwise → wait **Implementation**: ```typescript useEffect(() => { if (!state.hasTag('processing')) return; if (status !== 'ready') return; const transcript = state.context.transcript; if (!transcript) return; // Check last 2 messages const lastMsg = messages[messages.length - 1]; const secondLastMsg = messages[messages.length - 2]; // Case 1: Need to submit user message if (!lastMsg || lastMsg.role !== 'user' || getText(lastMsg) !== transcript) { submitUserInput(); return; } // Case 2: User message submitted, check for AI response if (secondLastMsg && secondLastMsg.role === 'assistant') { const aiMsg = secondLastMsg; // Only play if we haven't played this exact message in this session if (state.context.lastSpokenMessageId !== aiMsg.id) { const text = getText(aiMsg); send({ type: 'AI_RESPONSE_READY', messageId: aiMsg.id, text }); playAudio(text, aiMsg.id); } } // Otherwise, still waiting for AI response }, [messages, state, status]); ``` No refs needed! Just check the messages array directly. ## Phase 3: Clean Audio Management ### Step 1: Simplify audio cancellation - Keep shouldCancelAudioRef - Call stopAllAudio() when leaving canSkipAudio states - playAudio() checks cancel flag at each await ### Step 2: Effect cleanup - Remove submittingTranscriptRef completely - Remove the "reset ref when leaving processing" effect - Rely only on messages array state ## Phase 4: Testing with Playwright ### Test Script ```typescript test('Voice mode conversation flow', async (agent) => { await agent.open('http://localhost:3000/chat'); // Login first await agent.act('Log in with Bluesky'); // Start voice mode await agent.act('Click "Start Voice Conversation"'); await agent.check('Button shows "Generating speech..." or "Listening..."'); // Skip initial greeting if playing const skipVisible = await agent.check('Skip button is visible', { optional: true }); if (skipVisible) { await agent.act('Click Skip button'); } await agent.check('Button shows "Listening... Start speaking"'); // Simulate user speech await agent.act('Click "Simulate User Speech" test button'); await agent.check('Button shows "Speaking..."'); await agent.act('Click "Simulate Silence" test button'); await agent.check('Button shows "Processing..."'); // Wait for AI response await agent.wait(5000); await agent.check('AI message appears in chat'); await agent.check('Button shows "Generating speech..." or "AI is speaking..."'); // Skip AI audio await agent.act('Click Skip button'); await agent.check('Button shows "Listening... Start speaking"'); // Second exchange await agent.act('Click "Simulate User Speech" test button'); await agent.act('Click "Simulate Silence" test button'); // Let AI audio play completely this time await agent.wait(10000); await agent.check('Button shows "Listening... Start speaking"'); }); ``` ## Phase 5: Validation ### Checklist - [ ] State machine is serializable (can be visualized in Stately) - [ ] No refs used in processing logic - [ ] Latest message only plays once per session - [ ] Skip works instantly in both aiGenerating and aiSpeaking - [ ] Re-entering voice mode plays most recent AI message (if not already spoken) - [ ] All test cases from PRD pass - [ ] Playwright test passes ## Implementation Order 1. Add test buttons to UI (for manual testing) 2. Rewrite processing effect with simple messages array logic 3. Remove submittingTranscriptRef completely 4. Test manually with test buttons 5. Write Playwright test 6. Run and validate Playwright test 7. Clean up any remaining issues