app/plans/03-stream-ai-to-deepgram-tts.md

# Plan: Stream AI Output to Deepgram for Faster TTS Synthesis

**Priority:** MEDIUM
**Dependencies:** None
**Affects:** Voice interaction latency, user experience

## Overview

Currently, the app waits for the complete AI response before sending it to Deepgram for TTS. This creates a laggy experience. By streaming the AI output directly to Deepgram as it's generated, we can start playing audio much faster and create a more responsive voice interaction.

## Current Implementation

### Current Flow (SLOW)
```
User speaks → Deepgram transcribe → Send to AI
                                      ↓
                                  Wait for full response (3-10s)
                                      ↓
                                  Send complete text to Deepgram TTS
                                      ↓
                                  Wait for audio generation (1-3s)
                                      ↓
                                  Play audio
```

**Total latency:** 4-13 seconds before first audio plays

## Proposed Implementation

### New Flow (FAST)
```
User speaks → Deepgram transcribe → Stream to AI
                                      ↓
                                  Stream chunks to Deepgram TTS
                                      ↓ (chunks arrive)
                                  Play audio chunks immediately
```

**Total latency:** 1-2 seconds before first audio plays

## Technical Approach

### 1. Modify AI SDK Integration

Currently using `useChat` from Vercel AI SDK with async completion:

```typescript
// Current (app/api/chat/route.ts)
const result = await streamText({
  model: google('gemini-2.0-flash-exp'),
  messages,
  system: systemPrompt,
});

return result.toDataStreamResponse();
```

Need to add TTS streaming:

```typescript
// New approach
const result = streamText({
  model: google('gemini-2.0-flash-exp'),
  messages,
  system: systemPrompt,
  async onChunk({ chunk }) {
    // Stream each chunk to Deepgram TTS
    if (chunk.type === 'text-delta') {
      await streamToDeepgram(chunk.textDelta);
    }
  },
});

return result.toDataStreamResponse();
```

### 2. Create Deepgram TTS Streaming Service

#### `lib/deepgram-tts-stream.ts`
```typescript
import { createClient, LiveClient } from '@deepgram/sdk';

export class DeepgramTTSStream {
  private client: LiveClient;
  private audioQueue: Uint8Array[] = [];
  private isPlaying = false;

  constructor(apiKey: string) {
    const deepgram = createClient(apiKey);
    this.client = deepgram.speak.live({
      model: 'aura-asteria-en',
      encoding: 'linear16',
      sample_rate: 24000,
    });

    this.client.on('data', (data: Buffer) => {
      this.audioQueue.push(new Uint8Array(data));
      this.playNextChunk();
    });
  }

  async streamText(text: string) {
    // Send text chunk to Deepgram for synthesis
    this.client.send(text);
  }

  async flush() {
    // Signal end of text stream
    this.client.close();
  }

  private async playNextChunk() {
    if (this.isPlaying || this.audioQueue.length === 0) return;

    this.isPlaying = true;
    const chunk = this.audioQueue.shift()!;

    // Play audio chunk using Web Audio API
    await this.playAudioChunk(chunk);

    this.isPlaying = false;
    this.playNextChunk(); // Play next if available
  }

  private async playAudioChunk(chunk: Uint8Array) {
    const audioContext = new AudioContext({ sampleRate: 24000 });
    const audioBuffer = audioContext.createBuffer(
      1, // mono
      chunk.length / 2, // 16-bit samples
      24000
    );

    const channelData = audioBuffer.getChannelData(0);
    for (let i = 0; i < chunk.length / 2; i++) {
      // Convert 16-bit PCM to float32
      const sample = (chunk[i * 2] | (chunk[i * 2 + 1] << 8));
      channelData[i] = sample / 32768.0;
    }

    const source = audioContext.createBufferSource();
    source.buffer = audioBuffer;
    source.connect(audioContext.destination);

    return new Promise((resolve) => {
      source.onended = resolve;
      source.start();
    });
  }
}
```

### 3. Create Server-Sent Events (SSE) Endpoint for TTS

#### `app/api/chat-with-tts/route.ts`
```typescript
import { DeepgramTTSStream } from '@/lib/deepgram-tts-stream';
import { streamText } from 'ai';
import { google } from '@ai-sdk/google';

export async function POST(request: Request) {
  const { messages } = await request.json();

  // Create a TransformStream for SSE
  const stream = new TransformStream();
  const writer = stream.writable.getWriter();
  const encoder = new TextEncoder();

  // Start streaming AI response
  (async () => {
    const ttsStream = new DeepgramTTSStream(process.env.DEEPGRAM_API_KEY!);

    try {
      const result = streamText({
        model: google('gemini-2.0-flash-exp'),
        messages,
        async onChunk({ chunk }) {
          if (chunk.type === 'text-delta') {
            // Send text to client
            await writer.write(
              encoder.encode(`data: ${JSON.stringify({ text: chunk.textDelta })}\n\n`)
            );

            // Stream to Deepgram TTS
            await ttsStream.streamText(chunk.textDelta);
          }
        },
      });

      await result.text; // Wait for completion
      await ttsStream.flush();

      await writer.write(encoder.encode('data: [DONE]\n\n'));
    } catch (error) {
      await writer.write(
        encoder.encode(`data: ${JSON.stringify({ error: error.message })}\n\n`)
      );
    } finally {
      await writer.close();
    }
  })();

  return new Response(stream.readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      Connection: 'keep-alive',
    },
  });
}
```

### 4. Update Client to Consume SSE with TTS

#### `components/ChatInterface.tsx`
```typescript
const [isTTSEnabled, setIsTTSEnabled] = useState(false);
const ttsStreamRef = useRef<DeepgramTTSStream | null>(null);

async function sendMessageWithTTS(message: string) {
  const response = await fetch('/api/chat-with-tts', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ messages: [...messages, { role: 'user', content: message }] }),
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  // Initialize TTS stream
  if (isTTSEnabled) {
    ttsStreamRef.current = new DeepgramTTSStream();
  }

  let fullText = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') {
          if (ttsStreamRef.current) {
            await ttsStreamRef.current.flush();
          }
          break;
        }

        try {
          const parsed = JSON.parse(data);
          if (parsed.text) {
            fullText += parsed.text;
            // Update UI with incremental text
            setMessages((prev) => {
              const last = prev[prev.length - 1];
              if (last && last.role === 'assistant') {
                return [...prev.slice(0, -1), { ...last, content: fullText }];
              }
              return [...prev, { role: 'assistant', content: fullText }];
            });

            // Stream to TTS
            if (ttsStreamRef.current) {
              await ttsStreamRef.current.streamText(parsed.text);
            }
          }
        } catch (e) {
          console.error('Failed to parse SSE data:', e);
        }
      }
    }
  }
}
```

## Alternative: Use Deepgram's Native Streaming TTS

Deepgram has a WebSocket-based streaming TTS API that's even more efficient:

```typescript
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

const connection = deepgram.speak.live({
  model: 'aura-asteria-en',
  encoding: 'linear16',
  sample_rate: 24000,
});

connection.on('open', () => {
  console.log('TTS connection established');
});

connection.on('data', (audioData: Buffer) => {
  // Play audio chunk immediately
  playAudioBuffer(audioData);
});

// As AI chunks arrive, send to Deepgram
aiStream.on('text-delta', (text) => {
  connection.send(text);
});

// When AI completes
aiStream.on('finish', () => {
  connection.close();
});
```

## Implementation Steps

1. **Research Deepgram TTS Streaming API**
   - Review docs: https://developers.deepgram.com/docs/tts-streaming
   - Test WebSocket connection manually
   - Understand audio format and buffering

2. **Create TTS streaming service**
   - `lib/deepgram-tts-stream.ts`
   - Implement audio queue and playback
   - Handle reconnection and errors

3. **Modify API route for streaming**
   - Create `/api/chat-with-tts` route
   - Implement SSE response
   - Connect AI stream to TTS stream

4. **Update client components**
   - Add TTS toggle in UI
   - Implement SSE consumption
   - Connect to audio playback

5. **Test with Playwright MCP**
   - Enable TTS
   - Send message
   - Verify audio starts playing quickly (< 2s)
   - Verify audio quality
   - Test error handling (network drop, TTS failure)

6. **Add Magnitude test**
   ```typescript
   test('TTS streams audio with low latency', async (agent) => {
     await agent.open('http://localhost:3000/chat');
     await agent.act('Enable TTS in settings');
     await agent.act('Send message "Hello"');

     await agent.check('Audio starts playing within 2 seconds');
     await agent.check('Audio continues as AI generates response');
     await agent.check('Audio completes without gaps');
   });
   ```

## Performance Targets

- **Time to first audio:** < 2 seconds (vs current 4-13s)
- **Perceived latency:** Near real-time streaming
- **Audio quality:** No degradation from current implementation
- **Reliability:** Graceful fallback if streaming fails

## Success Criteria

- ✅ TTS audio starts playing within 2 seconds of AI response beginning
- ✅ Audio streams continuously as AI generates text
- ✅ No perceptible gaps or stuttering in audio playback
- ✅ Graceful fallback to batch TTS if streaming fails
- ✅ Playwright MCP manual test passes
- ✅ Magnitude test passes
- ✅ No regression in audio quality

## Files to Create

1. `lib/deepgram-tts-stream.ts` - TTS streaming service
2. `app/api/chat-with-tts/route.ts` - SSE endpoint for TTS
3. `tests/playwright/tts-streaming.spec.ts` - Manual Playwright test
4. `tests/magnitude/tts-streaming.mag.ts` - Magnitude test

## Files to Update

1. `components/ChatInterface.tsx` - Add TTS streaming consumption
2. `app/theme.ts` - Add TTS toggle styling if needed