Files
app/docs/voice-mode-implementation-plan.md
Albert 0ed2d6c0b3 feat: Improve UI layout and navigation
- Increase logo size (48x48 desktop, 56x56 mobile) for better visibility
- Add logo as favicon
- Add logo to mobile header
- Move user menu to navigation bars (sidebar on desktop, bottom bar on mobile)
- Fix desktop chat layout - container structure prevents voice controls cutoff
- Fix mobile bottom bar - use icon-only ActionIcons instead of truncated text buttons
- Hide Create Node/New Conversation buttons on mobile to save header space
- Make fixed header and voice controls work properly with containers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 14:43:11 +00:00

4.6 KiB

Voice Mode Implementation Plan

Phase 1: Clean State Machine

Step 1: Rewrite state machine definition

  • Remove all unnecessary complexity
  • Clear state hierarchy
  • Simple event handlers
  • Proper tags on all states

Step 2: Add test buttons to UI

  • Button: "Skip to Listening" - sends START_LISTENING
  • Button: "Simulate User Speech" - sends USER_STARTED_SPEAKING
  • Button: "Simulate Silence" - sends SILENCE_TIMEOUT
  • Button: "Simulate AI Response" - sends AI_RESPONSE_READY with test data
  • Button: "Skip Audio" - sends SKIP_AUDIO (already exists)
  • Display: Current state value and tags

Phase 2: Fix Processing Logic

Problem Analysis

Current issue: The processing effect is too complex and uses refs incorrectly.

Solution

Simple rule: In processing state, check messages array:

  1. If last message is NOT user with our transcript → submit
  2. If last message IS user with our transcript AND second-to-last is assistant → play that assistant message
  3. Otherwise → wait

Implementation:

useEffect(() => {
  if (!state.hasTag('processing')) return;
  if (status !== 'ready') return;

  const transcript = state.context.transcript;
  if (!transcript) return;

  // Check last 2 messages
  const lastMsg = messages[messages.length - 1];
  const secondLastMsg = messages[messages.length - 2];

  // Case 1: Need to submit user message
  if (!lastMsg || lastMsg.role !== 'user' || getText(lastMsg) !== transcript) {
    submitUserInput();
    return;
  }

  // Case 2: User message submitted, check for AI response
  if (secondLastMsg && secondLastMsg.role === 'assistant') {
    const aiMsg = secondLastMsg;

    // Only play if we haven't played this exact message in this session
    if (state.context.lastSpokenMessageId !== aiMsg.id) {
      const text = getText(aiMsg);
      send({ type: 'AI_RESPONSE_READY', messageId: aiMsg.id, text });
      playAudio(text, aiMsg.id);
    }
  }
  // Otherwise, still waiting for AI response
}, [messages, state, status]);

No refs needed! Just check the messages array directly.

Phase 3: Clean Audio Management

Step 1: Simplify audio cancellation

  • Keep shouldCancelAudioRef
  • Call stopAllAudio() when leaving canSkipAudio states
  • playAudio() checks cancel flag at each await

Step 2: Effect cleanup

  • Remove submittingTranscriptRef completely
  • Remove the "reset ref when leaving processing" effect
  • Rely only on messages array state

Phase 4: Testing with Playwright

Test Script

test('Voice mode conversation flow', async (agent) => {
  await agent.open('http://localhost:3000/chat');

  // Login first
  await agent.act('Log in with Bluesky');

  // Start voice mode
  await agent.act('Click "Start Voice Conversation"');
  await agent.check('Button shows "Generating speech..." or "Listening..."');

  // Skip initial greeting if playing
  const skipVisible = await agent.check('Skip button is visible', { optional: true });
  if (skipVisible) {
    await agent.act('Click Skip button');
  }
  await agent.check('Button shows "Listening... Start speaking"');

  // Simulate user speech
  await agent.act('Click "Simulate User Speech" test button');
  await agent.check('Button shows "Speaking..."');

  await agent.act('Click "Simulate Silence" test button');
  await agent.check('Button shows "Processing..."');

  // Wait for AI response
  await agent.wait(5000);
  await agent.check('AI message appears in chat');
  await agent.check('Button shows "Generating speech..." or "AI is speaking..."');

  // Skip AI audio
  await agent.act('Click Skip button');
  await agent.check('Button shows "Listening... Start speaking"');

  // Second exchange
  await agent.act('Click "Simulate User Speech" test button');
  await agent.act('Click "Simulate Silence" test button');

  // Let AI audio play completely this time
  await agent.wait(10000);
  await agent.check('Button shows "Listening... Start speaking"');
});

Phase 5: Validation

Checklist

  • State machine is serializable (can be visualized in Stately)
  • No refs used in processing logic
  • Latest message only plays once per session
  • Skip works instantly in both aiGenerating and aiSpeaking
  • Re-entering voice mode plays most recent AI message (if not already spoken)
  • All test cases from PRD pass
  • Playwright test passes

Implementation Order

  1. Add test buttons to UI (for manual testing)
  2. Rewrite processing effect with simple messages array logic
  3. Remove submittingTranscriptRef completely
  4. Test manually with test buttons
  5. Write Playwright test
  6. Run and validate Playwright test
  7. Clean up any remaining issues