- Increase logo size (48x48 desktop, 56x56 mobile) for better visibility - Add logo as favicon - Add logo to mobile header - Move user menu to navigation bars (sidebar on desktop, bottom bar on mobile) - Fix desktop chat layout - container structure prevents voice controls cutoff - Fix mobile bottom bar - use icon-only ActionIcons instead of truncated text buttons - Hide Create Node/New Conversation buttons on mobile to save header space - Make fixed header and voice controls work properly with containers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
145 lines
4.6 KiB
Markdown
145 lines
4.6 KiB
Markdown
# Voice Mode Implementation Plan
|
|
|
|
## Phase 1: Clean State Machine
|
|
|
|
### Step 1: Rewrite state machine definition
|
|
- Remove all unnecessary complexity
|
|
- Clear state hierarchy
|
|
- Simple event handlers
|
|
- Proper tags on all states
|
|
|
|
### Step 2: Add test buttons to UI
|
|
- Button: "Skip to Listening" - sends START_LISTENING
|
|
- Button: "Simulate User Speech" - sends USER_STARTED_SPEAKING
|
|
- Button: "Simulate Silence" - sends SILENCE_TIMEOUT
|
|
- Button: "Simulate AI Response" - sends AI_RESPONSE_READY with test data
|
|
- Button: "Skip Audio" - sends SKIP_AUDIO (already exists)
|
|
- Display: Current state value and tags
|
|
|
|
## Phase 2: Fix Processing Logic
|
|
|
|
### Problem Analysis
|
|
Current issue: The processing effect is too complex and uses refs incorrectly.
|
|
|
|
### Solution
|
|
**Simple rule**: In processing state, check messages array:
|
|
1. If last message is NOT user with our transcript → submit
|
|
2. If last message IS user with our transcript AND second-to-last is assistant → play that assistant message
|
|
3. Otherwise → wait
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
useEffect(() => {
|
|
if (!state.hasTag('processing')) return;
|
|
if (status !== 'ready') return;
|
|
|
|
const transcript = state.context.transcript;
|
|
if (!transcript) return;
|
|
|
|
// Check last 2 messages
|
|
const lastMsg = messages[messages.length - 1];
|
|
const secondLastMsg = messages[messages.length - 2];
|
|
|
|
// Case 1: Need to submit user message
|
|
if (!lastMsg || lastMsg.role !== 'user' || getText(lastMsg) !== transcript) {
|
|
submitUserInput();
|
|
return;
|
|
}
|
|
|
|
// Case 2: User message submitted, check for AI response
|
|
if (secondLastMsg && secondLastMsg.role === 'assistant') {
|
|
const aiMsg = secondLastMsg;
|
|
|
|
// Only play if we haven't played this exact message in this session
|
|
if (state.context.lastSpokenMessageId !== aiMsg.id) {
|
|
const text = getText(aiMsg);
|
|
send({ type: 'AI_RESPONSE_READY', messageId: aiMsg.id, text });
|
|
playAudio(text, aiMsg.id);
|
|
}
|
|
}
|
|
// Otherwise, still waiting for AI response
|
|
}, [messages, state, status]);
|
|
```
|
|
|
|
No refs needed! Just check the messages array directly.
|
|
|
|
## Phase 3: Clean Audio Management
|
|
|
|
### Step 1: Simplify audio cancellation
|
|
- Keep shouldCancelAudioRef
|
|
- Call stopAllAudio() when leaving canSkipAudio states
|
|
- playAudio() checks cancel flag at each await
|
|
|
|
### Step 2: Effect cleanup
|
|
- Remove submittingTranscriptRef completely
|
|
- Remove the "reset ref when leaving processing" effect
|
|
- Rely only on messages array state
|
|
|
|
## Phase 4: Testing with Playwright
|
|
|
|
### Test Script
|
|
```typescript
|
|
test('Voice mode conversation flow', async (agent) => {
|
|
await agent.open('http://localhost:3000/chat');
|
|
|
|
// Login first
|
|
await agent.act('Log in with Bluesky');
|
|
|
|
// Start voice mode
|
|
await agent.act('Click "Start Voice Conversation"');
|
|
await agent.check('Button shows "Generating speech..." or "Listening..."');
|
|
|
|
// Skip initial greeting if playing
|
|
const skipVisible = await agent.check('Skip button is visible', { optional: true });
|
|
if (skipVisible) {
|
|
await agent.act('Click Skip button');
|
|
}
|
|
await agent.check('Button shows "Listening... Start speaking"');
|
|
|
|
// Simulate user speech
|
|
await agent.act('Click "Simulate User Speech" test button');
|
|
await agent.check('Button shows "Speaking..."');
|
|
|
|
await agent.act('Click "Simulate Silence" test button');
|
|
await agent.check('Button shows "Processing..."');
|
|
|
|
// Wait for AI response
|
|
await agent.wait(5000);
|
|
await agent.check('AI message appears in chat');
|
|
await agent.check('Button shows "Generating speech..." or "AI is speaking..."');
|
|
|
|
// Skip AI audio
|
|
await agent.act('Click Skip button');
|
|
await agent.check('Button shows "Listening... Start speaking"');
|
|
|
|
// Second exchange
|
|
await agent.act('Click "Simulate User Speech" test button');
|
|
await agent.act('Click "Simulate Silence" test button');
|
|
|
|
// Let AI audio play completely this time
|
|
await agent.wait(10000);
|
|
await agent.check('Button shows "Listening... Start speaking"');
|
|
});
|
|
```
|
|
|
|
## Phase 5: Validation
|
|
|
|
### Checklist
|
|
- [ ] State machine is serializable (can be visualized in Stately)
|
|
- [ ] No refs used in processing logic
|
|
- [ ] Latest message only plays once per session
|
|
- [ ] Skip works instantly in both aiGenerating and aiSpeaking
|
|
- [ ] Re-entering voice mode plays most recent AI message (if not already spoken)
|
|
- [ ] All test cases from PRD pass
|
|
- [ ] Playwright test passes
|
|
|
|
## Implementation Order
|
|
|
|
1. Add test buttons to UI (for manual testing)
|
|
2. Rewrite processing effect with simple messages array logic
|
|
3. Remove submittingTranscriptRef completely
|
|
4. Test manually with test buttons
|
|
5. Write Playwright test
|
|
6. Run and validate Playwright test
|
|
7. Clean up any remaining issues
|