Files
app/docs/voice-mode-implementation-plan.md
Albert 0ed2d6c0b3 feat: Improve UI layout and navigation
- Increase logo size (48x48 desktop, 56x56 mobile) for better visibility
- Add logo as favicon
- Add logo to mobile header
- Move user menu to navigation bars (sidebar on desktop, bottom bar on mobile)
- Fix desktop chat layout - container structure prevents voice controls cutoff
- Fix mobile bottom bar - use icon-only ActionIcons instead of truncated text buttons
- Hide Create Node/New Conversation buttons on mobile to save header space
- Make fixed header and voice controls work properly with containers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 14:43:11 +00:00

145 lines
4.6 KiB
Markdown

# Voice Mode Implementation Plan
## Phase 1: Clean State Machine
### Step 1: Rewrite state machine definition
- Remove all unnecessary complexity
- Clear state hierarchy
- Simple event handlers
- Proper tags on all states
### Step 2: Add test buttons to UI
- Button: "Skip to Listening" - sends START_LISTENING
- Button: "Simulate User Speech" - sends USER_STARTED_SPEAKING
- Button: "Simulate Silence" - sends SILENCE_TIMEOUT
- Button: "Simulate AI Response" - sends AI_RESPONSE_READY with test data
- Button: "Skip Audio" - sends SKIP_AUDIO (already exists)
- Display: Current state value and tags
## Phase 2: Fix Processing Logic
### Problem Analysis
Current issue: The processing effect is too complex and uses refs incorrectly.
### Solution
**Simple rule**: In processing state, check messages array:
1. If last message is NOT user with our transcript → submit
2. If last message IS user with our transcript AND second-to-last is assistant → play that assistant message
3. Otherwise → wait
**Implementation**:
```typescript
useEffect(() => {
if (!state.hasTag('processing')) return;
if (status !== 'ready') return;
const transcript = state.context.transcript;
if (!transcript) return;
// Check last 2 messages
const lastMsg = messages[messages.length - 1];
const secondLastMsg = messages[messages.length - 2];
// Case 1: Need to submit user message
if (!lastMsg || lastMsg.role !== 'user' || getText(lastMsg) !== transcript) {
submitUserInput();
return;
}
// Case 2: User message submitted, check for AI response
if (secondLastMsg && secondLastMsg.role === 'assistant') {
const aiMsg = secondLastMsg;
// Only play if we haven't played this exact message in this session
if (state.context.lastSpokenMessageId !== aiMsg.id) {
const text = getText(aiMsg);
send({ type: 'AI_RESPONSE_READY', messageId: aiMsg.id, text });
playAudio(text, aiMsg.id);
}
}
// Otherwise, still waiting for AI response
}, [messages, state, status]);
```
No refs needed! Just check the messages array directly.
## Phase 3: Clean Audio Management
### Step 1: Simplify audio cancellation
- Keep shouldCancelAudioRef
- Call stopAllAudio() when leaving canSkipAudio states
- playAudio() checks cancel flag at each await
### Step 2: Effect cleanup
- Remove submittingTranscriptRef completely
- Remove the "reset ref when leaving processing" effect
- Rely only on messages array state
## Phase 4: Testing with Playwright
### Test Script
```typescript
test('Voice mode conversation flow', async (agent) => {
await agent.open('http://localhost:3000/chat');
// Login first
await agent.act('Log in with Bluesky');
// Start voice mode
await agent.act('Click "Start Voice Conversation"');
await agent.check('Button shows "Generating speech..." or "Listening..."');
// Skip initial greeting if playing
const skipVisible = await agent.check('Skip button is visible', { optional: true });
if (skipVisible) {
await agent.act('Click Skip button');
}
await agent.check('Button shows "Listening... Start speaking"');
// Simulate user speech
await agent.act('Click "Simulate User Speech" test button');
await agent.check('Button shows "Speaking..."');
await agent.act('Click "Simulate Silence" test button');
await agent.check('Button shows "Processing..."');
// Wait for AI response
await agent.wait(5000);
await agent.check('AI message appears in chat');
await agent.check('Button shows "Generating speech..." or "AI is speaking..."');
// Skip AI audio
await agent.act('Click Skip button');
await agent.check('Button shows "Listening... Start speaking"');
// Second exchange
await agent.act('Click "Simulate User Speech" test button');
await agent.act('Click "Simulate Silence" test button');
// Let AI audio play completely this time
await agent.wait(10000);
await agent.check('Button shows "Listening... Start speaking"');
});
```
## Phase 5: Validation
### Checklist
- [ ] State machine is serializable (can be visualized in Stately)
- [ ] No refs used in processing logic
- [ ] Latest message only plays once per session
- [ ] Skip works instantly in both aiGenerating and aiSpeaking
- [ ] Re-entering voice mode plays most recent AI message (if not already spoken)
- [ ] All test cases from PRD pass
- [ ] Playwright test passes
## Implementation Order
1. Add test buttons to UI (for manual testing)
2. Rewrite processing effect with simple messages array logic
3. Remove submittingTranscriptRef completely
4. Test manually with test buttons
5. Write Playwright test
6. Run and validate Playwright test
7. Clean up any remaining issues