- Increase logo size (48x48 desktop, 56x56 mobile) for better visibility - Add logo as favicon - Add logo to mobile header - Move user menu to navigation bars (sidebar on desktop, bottom bar on mobile) - Fix desktop chat layout - container structure prevents voice controls cutoff - Fix mobile bottom bar - use icon-only ActionIcons instead of truncated text buttons - Hide Create Node/New Conversation buttons on mobile to save header space - Make fixed header and voice controls work properly with containers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.6 KiB
4.6 KiB
Voice Mode Implementation Plan
Phase 1: Clean State Machine
Step 1: Rewrite state machine definition
- Remove all unnecessary complexity
- Clear state hierarchy
- Simple event handlers
- Proper tags on all states
Step 2: Add test buttons to UI
- Button: "Skip to Listening" - sends START_LISTENING
- Button: "Simulate User Speech" - sends USER_STARTED_SPEAKING
- Button: "Simulate Silence" - sends SILENCE_TIMEOUT
- Button: "Simulate AI Response" - sends AI_RESPONSE_READY with test data
- Button: "Skip Audio" - sends SKIP_AUDIO (already exists)
- Display: Current state value and tags
Phase 2: Fix Processing Logic
Problem Analysis
Current issue: The processing effect is too complex and uses refs incorrectly.
Solution
Simple rule: In processing state, check messages array:
- If last message is NOT user with our transcript → submit
- If last message IS user with our transcript AND second-to-last is assistant → play that assistant message
- Otherwise → wait
Implementation:
useEffect(() => {
if (!state.hasTag('processing')) return;
if (status !== 'ready') return;
const transcript = state.context.transcript;
if (!transcript) return;
// Check last 2 messages
const lastMsg = messages[messages.length - 1];
const secondLastMsg = messages[messages.length - 2];
// Case 1: Need to submit user message
if (!lastMsg || lastMsg.role !== 'user' || getText(lastMsg) !== transcript) {
submitUserInput();
return;
}
// Case 2: User message submitted, check for AI response
if (secondLastMsg && secondLastMsg.role === 'assistant') {
const aiMsg = secondLastMsg;
// Only play if we haven't played this exact message in this session
if (state.context.lastSpokenMessageId !== aiMsg.id) {
const text = getText(aiMsg);
send({ type: 'AI_RESPONSE_READY', messageId: aiMsg.id, text });
playAudio(text, aiMsg.id);
}
}
// Otherwise, still waiting for AI response
}, [messages, state, status]);
No refs needed! Just check the messages array directly.
Phase 3: Clean Audio Management
Step 1: Simplify audio cancellation
- Keep shouldCancelAudioRef
- Call stopAllAudio() when leaving canSkipAudio states
- playAudio() checks cancel flag at each await
Step 2: Effect cleanup
- Remove submittingTranscriptRef completely
- Remove the "reset ref when leaving processing" effect
- Rely only on messages array state
Phase 4: Testing with Playwright
Test Script
test('Voice mode conversation flow', async (agent) => {
await agent.open('http://localhost:3000/chat');
// Login first
await agent.act('Log in with Bluesky');
// Start voice mode
await agent.act('Click "Start Voice Conversation"');
await agent.check('Button shows "Generating speech..." or "Listening..."');
// Skip initial greeting if playing
const skipVisible = await agent.check('Skip button is visible', { optional: true });
if (skipVisible) {
await agent.act('Click Skip button');
}
await agent.check('Button shows "Listening... Start speaking"');
// Simulate user speech
await agent.act('Click "Simulate User Speech" test button');
await agent.check('Button shows "Speaking..."');
await agent.act('Click "Simulate Silence" test button');
await agent.check('Button shows "Processing..."');
// Wait for AI response
await agent.wait(5000);
await agent.check('AI message appears in chat');
await agent.check('Button shows "Generating speech..." or "AI is speaking..."');
// Skip AI audio
await agent.act('Click Skip button');
await agent.check('Button shows "Listening... Start speaking"');
// Second exchange
await agent.act('Click "Simulate User Speech" test button');
await agent.act('Click "Simulate Silence" test button');
// Let AI audio play completely this time
await agent.wait(10000);
await agent.check('Button shows "Listening... Start speaking"');
});
Phase 5: Validation
Checklist
- State machine is serializable (can be visualized in Stately)
- No refs used in processing logic
- Latest message only plays once per session
- Skip works instantly in both aiGenerating and aiSpeaking
- Re-entering voice mode plays most recent AI message (if not already spoken)
- All test cases from PRD pass
- Playwright test passes
Implementation Order
- Add test buttons to UI (for manual testing)
- Rewrite processing effect with simple messages array logic
- Remove submittingTranscriptRef completely
- Test manually with test buttons
- Write Playwright test
- Run and validate Playwright test
- Clean up any remaining issues