app/docs/voice-mode-implementation-plan.md

# Voice Mode Implementation Plan

## Phase 1: Clean State Machine

### Step 1: Rewrite state machine definition
- Remove all unnecessary complexity
- Clear state hierarchy
- Simple event handlers
- Proper tags on all states

### Step 2: Add test buttons to UI
- Button: "Skip to Listening" - sends START_LISTENING
- Button: "Simulate User Speech" - sends USER_STARTED_SPEAKING
- Button: "Simulate Silence" - sends SILENCE_TIMEOUT
- Button: "Simulate AI Response" - sends AI_RESPONSE_READY with test data
- Button: "Skip Audio" - sends SKIP_AUDIO (already exists)
- Display: Current state value and tags

## Phase 2: Fix Processing Logic

### Problem Analysis
Current issue: The processing effect is too complex and uses refs incorrectly.

### Solution
**Simple rule**: In processing state, check messages array:
1. If last message is NOT user with our transcript → submit
2. If last message IS user with our transcript AND second-to-last is assistant → play that assistant message
3. Otherwise → wait

**Implementation**:
```typescript
useEffect(() => {
  if (!state.hasTag('processing')) return;
  if (status !== 'ready') return;

  const transcript = state.context.transcript;
  if (!transcript) return;

  // Check last 2 messages
  const lastMsg = messages[messages.length - 1];
  const secondLastMsg = messages[messages.length - 2];

  // Case 1: Need to submit user message
  if (!lastMsg || lastMsg.role !== 'user' || getText(lastMsg) !== transcript) {
    submitUserInput();
    return;
  }

  // Case 2: User message submitted, check for AI response
  if (secondLastMsg && secondLastMsg.role === 'assistant') {
    const aiMsg = secondLastMsg;

    // Only play if we haven't played this exact message in this session
    if (state.context.lastSpokenMessageId !== aiMsg.id) {
      const text = getText(aiMsg);
      send({ type: 'AI_RESPONSE_READY', messageId: aiMsg.id, text });
      playAudio(text, aiMsg.id);
    }
  }
  // Otherwise, still waiting for AI response
}, [messages, state, status]);
```

No refs needed! Just check the messages array directly.

## Phase 3: Clean Audio Management

### Step 1: Simplify audio cancellation
- Keep shouldCancelAudioRef
- Call stopAllAudio() when leaving canSkipAudio states
- playAudio() checks cancel flag at each await

### Step 2: Effect cleanup
- Remove submittingTranscriptRef completely
- Remove the "reset ref when leaving processing" effect
- Rely only on messages array state

## Phase 4: Testing with Playwright

### Test Script
```typescript
test('Voice mode conversation flow', async (agent) => {
  await agent.open('http://localhost:3000/chat');

  // Login first
  await agent.act('Log in with Bluesky');

  // Start voice mode
  await agent.act('Click "Start Voice Conversation"');
  await agent.check('Button shows "Generating speech..." or "Listening..."');

  // Skip initial greeting if playing
  const skipVisible = await agent.check('Skip button is visible', { optional: true });
  if (skipVisible) {
    await agent.act('Click Skip button');
  }
  await agent.check('Button shows "Listening... Start speaking"');

  // Simulate user speech
  await agent.act('Click "Simulate User Speech" test button');
  await agent.check('Button shows "Speaking..."');

  await agent.act('Click "Simulate Silence" test button');
  await agent.check('Button shows "Processing..."');

  // Wait for AI response
  await agent.wait(5000);
  await agent.check('AI message appears in chat');
  await agent.check('Button shows "Generating speech..." or "AI is speaking..."');

  // Skip AI audio
  await agent.act('Click Skip button');
  await agent.check('Button shows "Listening... Start speaking"');

  // Second exchange
  await agent.act('Click "Simulate User Speech" test button');
  await agent.act('Click "Simulate Silence" test button');

  // Let AI audio play completely this time
  await agent.wait(10000);
  await agent.check('Button shows "Listening... Start speaking"');
});
```

## Phase 5: Validation

### Checklist
- [ ] State machine is serializable (can be visualized in Stately)
- [ ] No refs used in processing logic
- [ ] Latest message only plays once per session
- [ ] Skip works instantly in both aiGenerating and aiSpeaking
- [ ] Re-entering voice mode plays most recent AI message (if not already spoken)
- [ ] All test cases from PRD pass
- [ ] Playwright test passes

## Implementation Order

1. Add test buttons to UI (for manual testing)
2. Rewrite processing effect with simple messages array logic
3. Remove submittingTranscriptRef completely
4. Test manually with test buttons
5. Write Playwright test
6. Run and validate Playwright test
7. Clean up any remaining issues