docs: Add comprehensive implementation plans for all todo items

Created detailed markdown plans for all items in todo.md: 1. 01-playwright-scaffolding.md - Base Playwright infrastructure 2. 02-magnitude-tests-comprehensive.md - Complete test coverage 3. 03-stream-ai-to-deepgram-tts.md - TTS latency optimization 4. 04-fix-galaxy-node-clicking.md - Galaxy navigation bugs 5. 05-dark-light-mode-theme.md - Dark/light mode with dynamic favicons 6. 06-fix-double-border-desktop.md - UI polish 7. 07-delete-backup-files.md - Code cleanup 8. 08-ai-transition-to-edit.md - Intelligent node creation flow 9. 09-umap-minimum-nodes-analysis.md - Technical analysis Each plan includes: - Detailed problem analysis - Proposed solutions with code examples - Manual Playwright MCP testing strategy - Magnitude test specifications - Implementation steps - Success criteria Ready to implement in sequence. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 21:07:42 +00:00
parent 346326e31f
commit b96159ec02
9 changed files with 2994 additions and 0 deletions
--- a/plans/03-stream-ai-to-deepgram-tts.md
+++ b/plans/03-stream-ai-to-deepgram-tts.md
@@ -0,0 +1,382 @@
+# Plan: Stream AI Output to Deepgram for Faster TTS Synthesis
+
+**Priority:** MEDIUM
+**Dependencies:** None
+**Affects:** Voice interaction latency, user experience
+
+## Overview
+
+Currently, the app waits for the complete AI response before sending it to Deepgram for TTS. This creates a laggy experience. By streaming the AI output directly to Deepgram as it's generated, we can start playing audio much faster and create a more responsive voice interaction.
+
+## Current Implementation
+
+### Current Flow (SLOW)
+```
+User speaks → Deepgram transcribe → Send to AI
+                                      ↓
+                                  Wait for full response (3-10s)
+                                      ↓
+                                  Send complete text to Deepgram TTS
+                                      ↓
+                                  Wait for audio generation (1-3s)
+                                      ↓
+                                  Play audio
+```
+
+**Total latency:** 4-13 seconds before first audio plays
+
+## Proposed Implementation
+
+### New Flow (FAST)
+```
+User speaks → Deepgram transcribe → Stream to AI
+                                      ↓
+                                  Stream chunks to Deepgram TTS
+                                      ↓ (chunks arrive)
+                                  Play audio chunks immediately
+```
+
+**Total latency:** 1-2 seconds before first audio plays
+
+## Technical Approach
+
+### 1. Modify AI SDK Integration
+
+Currently using `useChat` from Vercel AI SDK with async completion:
+
+```typescript
+// Current (app/api/chat/route.ts)
+const result = await streamText({
+  model: google('gemini-2.0-flash-exp'),
+  messages,
+  system: systemPrompt,
+});
+
+return result.toDataStreamResponse();
+```
+
+Need to add TTS streaming:
+
+```typescript
+// New approach
+const result = streamText({
+  model: google('gemini-2.0-flash-exp'),
+  messages,
+  system: systemPrompt,
+  async onChunk({ chunk }) {
+    // Stream each chunk to Deepgram TTS
+    if (chunk.type === 'text-delta') {
+      await streamToDeepgram(chunk.textDelta);
+    }
+  },
+});
+
+return result.toDataStreamResponse();
+```
+
+### 2. Create Deepgram TTS Streaming Service
+
+#### `lib/deepgram-tts-stream.ts`
+```typescript
+import { createClient, LiveClient } from '@deepgram/sdk';
+
+export class DeepgramTTSStream {
+  private client: LiveClient;
+  private audioQueue: Uint8Array[] = [];
+  private isPlaying = false;
+
+  constructor(apiKey: string) {
+    const deepgram = createClient(apiKey);
+    this.client = deepgram.speak.live({
+      model: 'aura-asteria-en',
+      encoding: 'linear16',
+      sample_rate: 24000,
+    });
+
+    this.client.on('data', (data: Buffer) => {
+      this.audioQueue.push(new Uint8Array(data));
+      this.playNextChunk();
+    });
+  }
+
+  async streamText(text: string) {
+    // Send text chunk to Deepgram for synthesis
+    this.client.send(text);
+  }
+
+  async flush() {
+    // Signal end of text stream
+    this.client.close();
+  }
+
+  private async playNextChunk() {
+    if (this.isPlaying || this.audioQueue.length === 0) return;
+
+    this.isPlaying = true;
+    const chunk = this.audioQueue.shift()!;
+
+    // Play audio chunk using Web Audio API
+    await this.playAudioChunk(chunk);
+
+    this.isPlaying = false;
+    this.playNextChunk(); // Play next if available
+  }
+
+  private async playAudioChunk(chunk: Uint8Array) {
+    const audioContext = new AudioContext({ sampleRate: 24000 });
+    const audioBuffer = audioContext.createBuffer(
+      1, // mono
+      chunk.length / 2, // 16-bit samples
+      24000
+    );
+
+    const channelData = audioBuffer.getChannelData(0);
+    for (let i = 0; i < chunk.length / 2; i++) {
+      // Convert 16-bit PCM to float32
+      const sample = (chunk[i * 2] | (chunk[i * 2 + 1] << 8));
+      channelData[i] = sample / 32768.0;
+    }
+
+    const source = audioContext.createBufferSource();
+    source.buffer = audioBuffer;
+    source.connect(audioContext.destination);
+
+    return new Promise((resolve) => {
+      source.onended = resolve;
+      source.start();
+    });
+  }
+}
+```
+
+### 3. Create Server-Sent Events (SSE) Endpoint for TTS
+
+#### `app/api/chat-with-tts/route.ts`
+```typescript
+import { DeepgramTTSStream } from '@/lib/deepgram-tts-stream';
+import { streamText } from 'ai';
+import { google } from '@ai-sdk/google';
+
+export async function POST(request: Request) {
+  const { messages } = await request.json();
+
+  // Create a TransformStream for SSE
+  const stream = new TransformStream();
+  const writer = stream.writable.getWriter();
+  const encoder = new TextEncoder();
+
+  // Start streaming AI response
+  (async () => {
+    const ttsStream = new DeepgramTTSStream(process.env.DEEPGRAM_API_KEY!);
+
+    try {
+      const result = streamText({
+        model: google('gemini-2.0-flash-exp'),
+        messages,
+        async onChunk({ chunk }) {
+          if (chunk.type === 'text-delta') {
+            // Send text to client
+            await writer.write(
+              encoder.encode(`data: ${JSON.stringify({ text: chunk.textDelta })}\n\n`)
+            );
+
+            // Stream to Deepgram TTS
+            await ttsStream.streamText(chunk.textDelta);
+          }
+        },
+      });
+
+      await result.text; // Wait for completion
+      await ttsStream.flush();
+
+      await writer.write(encoder.encode('data: [DONE]\n\n'));
+    } catch (error) {
+      await writer.write(
+        encoder.encode(`data: ${JSON.stringify({ error: error.message })}\n\n`)
+      );
+    } finally {
+      await writer.close();
+    }
+  })();
+
+  return new Response(stream.readable, {
+    headers: {
+      'Content-Type': 'text/event-stream',
+      'Cache-Control': 'no-cache',
+      Connection: 'keep-alive',
+    },
+  });
+}
+```
+
+### 4. Update Client to Consume SSE with TTS
+
+#### `components/ChatInterface.tsx`
+```typescript
+const [isTTSEnabled, setIsTTSEnabled] = useState(false);
+const ttsStreamRef = useRef<DeepgramTTSStream | null>(null);
+
+async function sendMessageWithTTS(message: string) {
+  const response = await fetch('/api/chat-with-tts', {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ messages: [...messages, { role: 'user', content: message }] }),
+  });
+
+  const reader = response.body!.getReader();
+  const decoder = new TextDecoder();
+
+  // Initialize TTS stream
+  if (isTTSEnabled) {
+    ttsStreamRef.current = new DeepgramTTSStream();
+  }
+
+  let fullText = '';
+
+  while (true) {
+    const { done, value } = await reader.read();
+    if (done) break;
+
+    const chunk = decoder.decode(value);
+    const lines = chunk.split('\n');
+
+    for (const line of lines) {
+      if (line.startsWith('data: ')) {
+        const data = line.slice(6);
+        if (data === '[DONE]') {
+          if (ttsStreamRef.current) {
+            await ttsStreamRef.current.flush();
+          }
+          break;
+        }
+
+        try {
+          const parsed = JSON.parse(data);
+          if (parsed.text) {
+            fullText += parsed.text;
+            // Update UI with incremental text
+            setMessages((prev) => {
+              const last = prev[prev.length - 1];
+              if (last && last.role === 'assistant') {
+                return [...prev.slice(0, -1), { ...last, content: fullText }];
+              }
+              return [...prev, { role: 'assistant', content: fullText }];
+            });
+
+            // Stream to TTS
+            if (ttsStreamRef.current) {
+              await ttsStreamRef.current.streamText(parsed.text);
+            }
+          }
+        } catch (e) {
+          console.error('Failed to parse SSE data:', e);
+        }
+      }
+    }
+  }
+}
+```
+
+## Alternative: Use Deepgram's Native Streaming TTS
+
+Deepgram has a WebSocket-based streaming TTS API that's even more efficient:
+
+```typescript
+const deepgram = createClient(process.env.DEEPGRAM_API_KEY);
+
+const connection = deepgram.speak.live({
+  model: 'aura-asteria-en',
+  encoding: 'linear16',
+  sample_rate: 24000,
+});
+
+connection.on('open', () => {
+  console.log('TTS connection established');
+});
+
+connection.on('data', (audioData: Buffer) => {
+  // Play audio chunk immediately
+  playAudioBuffer(audioData);
+});
+
+// As AI chunks arrive, send to Deepgram
+aiStream.on('text-delta', (text) => {
+  connection.send(text);
+});
+
+// When AI completes
+aiStream.on('finish', () => {
+  connection.close();
+});
+```
+
+## Implementation Steps
+
+1. **Research Deepgram TTS Streaming API**
+   - Review docs: https://developers.deepgram.com/docs/tts-streaming
+   - Test WebSocket connection manually
+   - Understand audio format and buffering
+
+2. **Create TTS streaming service**
+   - `lib/deepgram-tts-stream.ts`
+   - Implement audio queue and playback
+   - Handle reconnection and errors
+
+3. **Modify API route for streaming**
+   - Create `/api/chat-with-tts` route
+   - Implement SSE response
+   - Connect AI stream to TTS stream
+
+4. **Update client components**
+   - Add TTS toggle in UI
+   - Implement SSE consumption
+   - Connect to audio playback
+
+5. **Test with Playwright MCP**
+   - Enable TTS
+   - Send message
+   - Verify audio starts playing quickly (< 2s)
+   - Verify audio quality
+   - Test error handling (network drop, TTS failure)
+
+6. **Add Magnitude test**
+   ```typescript
+   test('TTS streams audio with low latency', async (agent) => {
+     await agent.open('http://localhost:3000/chat');
+     await agent.act('Enable TTS in settings');
+     await agent.act('Send message "Hello"');
+
+     await agent.check('Audio starts playing within 2 seconds');
+     await agent.check('Audio continues as AI generates response');
+     await agent.check('Audio completes without gaps');
+   });
+   ```
+
+## Performance Targets
+
+- **Time to first audio:** < 2 seconds (vs current 4-13s)
+- **Perceived latency:** Near real-time streaming
+- **Audio quality:** No degradation from current implementation
+- **Reliability:** Graceful fallback if streaming fails
+
+## Success Criteria
+
+- ✅ TTS audio starts playing within 2 seconds of AI response beginning
+- ✅ Audio streams continuously as AI generates text
+- ✅ No perceptible gaps or stuttering in audio playback
+- ✅ Graceful fallback to batch TTS if streaming fails
+- ✅ Playwright MCP manual test passes
+- ✅ Magnitude test passes
+- ✅ No regression in audio quality
+
+## Files to Create
+
+1. `lib/deepgram-tts-stream.ts` - TTS streaming service
+2. `app/api/chat-with-tts/route.ts` - SSE endpoint for TTS
+3. `tests/playwright/tts-streaming.spec.ts` - Manual Playwright test
+4. `tests/magnitude/tts-streaming.mag.ts` - Magnitude test
+
+## Files to Update
+
+1. `components/ChatInterface.tsx` - Add TTS streaming consumption
+2. `app/theme.ts` - Add TTS toggle styling if needed