feat: Improve UI layout and navigation

- Increase logo size (48x48 desktop, 56x56 mobile) for better visibility - Add logo as favicon - Add logo to mobile header - Move user menu to navigation bars (sidebar on desktop, bottom bar on mobile) - Fix desktop chat layout - container structure prevents voice controls cutoff - Fix mobile bottom bar - use icon-only ActionIcons instead of truncated text buttons - Hide Create Node/New Conversation buttons on mobile to save header space - Make fixed header and voice controls work properly with containers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 14:43:11 +00:00
parent 0b632a31eb
commit f0284ef813
74 changed files with 6996 additions and 629 deletions
--- a/docs/voice-mode-implementation-plan.md
+++ b/docs/voice-mode-implementation-plan.md
@@ -0,0 +1,144 @@
+# Voice Mode Implementation Plan
+
+## Phase 1: Clean State Machine
+
+### Step 1: Rewrite state machine definition
+- Remove all unnecessary complexity
+- Clear state hierarchy
+- Simple event handlers
+- Proper tags on all states
+
+### Step 2: Add test buttons to UI
+- Button: "Skip to Listening" - sends START_LISTENING
+- Button: "Simulate User Speech" - sends USER_STARTED_SPEAKING
+- Button: "Simulate Silence" - sends SILENCE_TIMEOUT
+- Button: "Simulate AI Response" - sends AI_RESPONSE_READY with test data
+- Button: "Skip Audio" - sends SKIP_AUDIO (already exists)
+- Display: Current state value and tags
+
+## Phase 2: Fix Processing Logic
+
+### Problem Analysis
+Current issue: The processing effect is too complex and uses refs incorrectly.
+
+### Solution
+**Simple rule**: In processing state, check messages array:
+1. If last message is NOT user with our transcript → submit
+2. If last message IS user with our transcript AND second-to-last is assistant → play that assistant message
+3. Otherwise → wait
+
+**Implementation**:
+```typescript
+useEffect(() => {
+  if (!state.hasTag('processing')) return;
+  if (status !== 'ready') return;
+
+  const transcript = state.context.transcript;
+  if (!transcript) return;
+
+  // Check last 2 messages
+  const lastMsg = messages[messages.length - 1];
+  const secondLastMsg = messages[messages.length - 2];
+
+  // Case 1: Need to submit user message
+  if (!lastMsg || lastMsg.role !== 'user' || getText(lastMsg) !== transcript) {
+    submitUserInput();
+    return;
+  }
+
+  // Case 2: User message submitted, check for AI response
+  if (secondLastMsg && secondLastMsg.role === 'assistant') {
+    const aiMsg = secondLastMsg;
+
+    // Only play if we haven't played this exact message in this session
+    if (state.context.lastSpokenMessageId !== aiMsg.id) {
+      const text = getText(aiMsg);
+      send({ type: 'AI_RESPONSE_READY', messageId: aiMsg.id, text });
+      playAudio(text, aiMsg.id);
+    }
+  }
+  // Otherwise, still waiting for AI response
+}, [messages, state, status]);
+```
+
+No refs needed! Just check the messages array directly.
+
+## Phase 3: Clean Audio Management
+
+### Step 1: Simplify audio cancellation
+- Keep shouldCancelAudioRef
+- Call stopAllAudio() when leaving canSkipAudio states
+- playAudio() checks cancel flag at each await
+
+### Step 2: Effect cleanup
+- Remove submittingTranscriptRef completely
+- Remove the "reset ref when leaving processing" effect
+- Rely only on messages array state
+
+## Phase 4: Testing with Playwright
+
+### Test Script
+```typescript
+test('Voice mode conversation flow', async (agent) => {
+  await agent.open('http://localhost:3000/chat');
+
+  // Login first
+  await agent.act('Log in with Bluesky');
+
+  // Start voice mode
+  await agent.act('Click "Start Voice Conversation"');
+  await agent.check('Button shows "Generating speech..." or "Listening..."');
+
+  // Skip initial greeting if playing
+  const skipVisible = await agent.check('Skip button is visible', { optional: true });
+  if (skipVisible) {
+    await agent.act('Click Skip button');
+  }
+  await agent.check('Button shows "Listening... Start speaking"');
+
+  // Simulate user speech
+  await agent.act('Click "Simulate User Speech" test button');
+  await agent.check('Button shows "Speaking..."');
+
+  await agent.act('Click "Simulate Silence" test button');
+  await agent.check('Button shows "Processing..."');
+
+  // Wait for AI response
+  await agent.wait(5000);
+  await agent.check('AI message appears in chat');
+  await agent.check('Button shows "Generating speech..." or "AI is speaking..."');
+
+  // Skip AI audio
+  await agent.act('Click Skip button');
+  await agent.check('Button shows "Listening... Start speaking"');
+
+  // Second exchange
+  await agent.act('Click "Simulate User Speech" test button');
+  await agent.act('Click "Simulate Silence" test button');
+
+  // Let AI audio play completely this time
+  await agent.wait(10000);
+  await agent.check('Button shows "Listening... Start speaking"');
+});
+```
+
+## Phase 5: Validation
+
+### Checklist
+- [ ] State machine is serializable (can be visualized in Stately)
+- [ ] No refs used in processing logic
+- [ ] Latest message only plays once per session
+- [ ] Skip works instantly in both aiGenerating and aiSpeaking
+- [ ] Re-entering voice mode plays most recent AI message (if not already spoken)
+- [ ] All test cases from PRD pass
+- [ ] Playwright test passes
+
+## Implementation Order
+
+1. Add test buttons to UI (for manual testing)
+2. Rewrite processing effect with simple messages array logic
+3. Remove submittingTranscriptRef completely
+4. Test manually with test buttons
+5. Write Playwright test
+6. Run and validate Playwright test
+7. Clean up any remaining issues