- Increase logo size (48x48 desktop, 56x56 mobile) for better visibility - Add logo as favicon - Add logo to mobile header - Move user menu to navigation bars (sidebar on desktop, bottom bar on mobile) - Fix desktop chat layout - container structure prevents voice controls cutoff - Fix mobile bottom bar - use icon-only ActionIcons instead of truncated text buttons - Hide Create Node/New Conversation buttons on mobile to save header space - Make fixed header and voice controls work properly with containers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.7 KiB
4.7 KiB
Voice Mode PRD
User Flows
Flow 1: Starting Voice Conversation (No Previous Messages)
- User clicks "Start Voice Conversation" button
- System enters listening mode
- Button shows "Listening... Start speaking"
- Microphone indicator appears
Flow 2: Starting Voice Conversation (With Previous AI Message)
- User clicks "Start Voice Conversation" button
- System checks for most recent AI message
- If found and not already spoken in this session:
- System generates and plays TTS for that message
- Button shows "Generating speech..." then "AI is speaking..."
- Skip button appears
- After audio finishes OR user clicks skip:
- System enters listening mode
Flow 3: User Speaks
- User speaks (while in listening state)
- System detects speech, button shows "Speaking..."
- System receives interim transcripts (updates display)
- System receives finalized phrases (appends to transcript)
- After each finalized phrase, 3-second silence timer starts
- Button shows countdown: "Speaking... (auto-submits in 2.1s)"
- If user continues speaking, timer resets
Flow 4: Submit and AI Response
- After 3 seconds of silence, transcript is submitted
- Button shows "Processing..."
- User message appears in chat
- AI streams response (appears in chat)
- When streaming completes:
- System generates TTS for AI response
- Button shows "Generating speech..."
- When TTS ready, plays audio
- Button shows "AI is speaking..."
- Skip button appears
- After audio finishes OR user clicks skip:
- System returns to listening mode
Flow 5: Skipping AI Audio
- While AI is generating or speaking (button shows "Generating speech..." or "AI is speaking...")
- Skip button is visible
- User clicks Skip
- Audio stops immediately
- System enters listening mode
- Button shows "Listening... Start speaking"
Flow 6: Exiting Voice Mode
- User clicks voice button (at any time)
- System stops all audio
- System closes microphone connection
- Returns to text mode
- Button shows "Start Voice Conversation"
Critical Rules
- Latest Message Only: AI ONLY plays the most recent assistant message. Never re-play old messages.
- Skip Always Works: Skip button must IMMEDIATELY stop audio and return to listening.
- One Message Per Turn: Each user speech -> one submission -> one AI response -> one audio playback.
- Clean State: Every state transition should cancel any incompatible ongoing operations.
State Machine
text
├─ TOGGLE_VOICE_MODE → voice.idle
voice.idle
├─ Check for latest AI message not yet spoken
│ ├─ If found → Send AI_RESPONSE_READY → voice.aiGenerating
│ └─ If not found → Send START_LISTENING → voice.listening
└─ TOGGLE_VOICE_MODE → text
voice.listening
├─ USER_STARTED_SPEAKING → voice.userSpeaking
├─ TRANSCRIPT_UPDATE → (update context.input for display)
└─ TOGGLE_VOICE_MODE → text
voice.userSpeaking
├─ FINALIZED_PHRASE → voice.timingOut (starts 3s timer)
├─ TRANSCRIPT_UPDATE → (update context.input for display)
└─ TOGGLE_VOICE_MODE → text
voice.timingOut
├─ FINALIZED_PHRASE → voice.timingOut (restart 3s timer)
├─ TRANSCRIPT_UPDATE → (update context.input for display)
├─ SILENCE_TIMEOUT → voice.processing
└─ TOGGLE_VOICE_MODE → text
voice.processing
├─ (Effect: submit if not submitted, wait for AI response)
├─ When AI response ready → Send AI_RESPONSE_READY → voice.aiGenerating
└─ TOGGLE_VOICE_MODE → text
voice.aiGenerating
├─ TTS_PLAYING → voice.aiSpeaking
├─ SKIP_AUDIO → voice.listening
└─ TOGGLE_VOICE_MODE → text
voice.aiSpeaking
├─ TTS_FINISHED → voice.listening
├─ SKIP_AUDIO → voice.listening
└─ TOGGLE_VOICE_MODE → text
Test Cases
Test 1: Basic Conversation
- Click "Start Voice Conversation"
- Skip initial greeting
- Say "Hello"
- Wait for AI response
- Let AI audio play completely
- Say "How are you?"
- Skip AI audio
- Say "Goodbye"
Expected: 3 exchanges, AI only plays latest message each time
Test 2: Multiple Skips
- Start voice mode
- Skip greeting immediately
- Say "Test one"
- Skip AI response immediately
- Say "Test two"
- Skip AI response immediately
Expected: All skips work instantly, no audio bleeding
Test 3: Re-entering Voice Mode
- Start voice mode
- Say "Hello"
- Let AI respond
- Exit voice mode (click button again)
- Re-enter voice mode
Expected: AI reads the most recent message (its last response)
Test 4: Long Speech
- Start voice mode
- Skip greeting
- Say a long sentence with multiple pauses < 3 seconds
- Wait for final 3s timeout
Expected: All speech is captured in one transcript