# Plan: Fix Grapheme Computation (Text Splitting)

**Priority:** HIGH - Blocking production node creation

## Current Implementation (Broken)

### Problems Identified

1. **Line 113**: Uses character length instead of grapheme length:
   ```typescript
   testText = testText.substring(0, Math.floor(testText.length * 0.9));
   ```
   With emojis or multi-byte chars, this can never converge properly.

2. **Variable URL lengths**: URL can be 72-112 chars depending on environment:
   - `http://localhost:3000`: 72 chars
   - `https://ponderants.app`: 73 chars
   - `https://www.ponderants.com`: 77 chars
   - `https://ponderants-dev-preview-abc123.vercel.app`: 99 chars

3. **Pre-calculates limit**: Computes `linkGraphemes` once with current URL, but doesn't account for worst-case

## Correct Algorithm

### Step 1: Calculate overhead for each post type

```typescript
const detailUrl = `${baseUrl}/galaxy/${encodeURIComponent(nodeId)}`;
const linkSuffix = `\n\nRead more: ${detailUrl}`;
const linkGraphemes = getGraphemeLength(linkSuffix);

// Thread indicator: "(N/Total) " where both N and Total can be 1-99
// Worst case: "(99/99) " = 9 characters
const threadIndicatorGraphemes = 9;

// Safety buffer to account for RichText facet detection potentially adding chars
const safetyBuffer = 5;
```

### Step 2: Calculate max graphemes for each post type

```typescript
const firstPostMaxGraphemes = 300 - linkGraphemes - safetyBuffer;
const threadPostMaxGraphemes = 300 - threadIndicatorGraphemes - safetyBuffer;
```

### Step 3: Split fullText by GRAPHEME count

```typescript
function splitByGraphemes(text: string, firstMax: number, otherMax: number): string[] {
  const chunks: string[] = [];
  let remainingText = text;
  let isFirst = true;

  while (remainingText.length > 0) {
    const maxGraphemes = isFirst ? firstMax : otherMax;
    const rt = new RichText({ text: remainingText });

    if (rt.graphemeLength <= maxGraphemes) {
      // Rest of text fits in one chunk
      chunks.push(remainingText);
      break;
    }

    // Need to split - find the split point
    let testText = remainingText;

    // Binary search to find the right character boundary
    while (getGraphemeLength(testText) > maxGraphemes) {
      // Find last word boundary before current position
      const lastSpace = testText.lastIndexOf(' ');
      if (lastSpace > testText.length * 0.5) {
        // Good word boundary found
        testText = testText.substring(0, lastSpace);
      } else {
        // No good word boundary - shrink by grapheme-aware amount
        // Take (maxGraphemes / currentGraphemes) * currentLength
        const currentGraphemes = getGraphemeLength(testText);
        const ratio = maxGraphemes / currentGraphemes;
        const newLength = Math.floor(testText.length * ratio * 0.95); // 0.95 for safety
        testText = testText.substring(0, newLength);
      }
    }

    chunks.push(testText.trim());
    remainingText = remainingText.substring(testText.length).trim();
    isFirst = false;
  }

  return chunks;
}
```

### Step 4: Build posts with proper grapheme validation

```typescript
const chunks = splitByGraphemes(fullText, firstPostMaxGraphemes, threadPostMaxGraphemes);

for (let i = 0; i < chunks.length; i++) {
  const isFirstPost = i === 0;
  let postText = chunks[i];

  // Add thread indicator if needed
  if (chunks.length > 1 && !isFirstPost) {
    postText = `(${i + 1}/${chunks.length}) ${postText}`;
  }

  // Add link to first post
  if (isFirstPost) {
    postText += linkSuffix;
  }

  // Final validation
  const finalGraphemes = getGraphemeLength(postText);
  if (finalGraphemes > 300) {
    console.error(`[POST /api/nodes] Post ${i + 1} exceeds limit: ${finalGraphemes} graphemes`);
    console.error(`[POST /api/nodes] Content: ${postText.substring(0, 100)}...`);
    throw new Error(`Post exceeds 300 grapheme limit: ${finalGraphemes}`);
  }

  // Continue with post creation...
}
```

## Implementation Steps

1. **Extract constants at the top**
   - Calculate `linkGraphemes` from actual URL
   - Define `threadIndicatorGraphemes = 9` (worst case)
   - Define `safetyBuffer = 5`

2. **Fix splitIntoChunks function**
   - Replace character-based substring with grapheme-aware splitting
   - Use RichText.graphemeLength for all length checks
   - When shrinking text, calculate ratio based on graphemes, not chars

3. **Add comprehensive logging**
   - Log chunk grapheme counts before adding overhead
   - Log final post grapheme counts
   - Log URL used and its grapheme length

4. **Test edge cases**
   - Long Vercel preview URLs (100+ chars)
   - Text with emojis and multi-byte characters
   - Text that needs 10+ chunks (thread indicators "(10/15)")
   - Text exactly at boundaries

## Files to Modify

- `app/api/nodes/route.ts` - Replace `splitIntoChunks()` function

## Test Cases

### Test Case 1: Short text (fits in one post)
**Input:**
- Title: "Test"
- Body: "Short content"
- Expected: 1 post with link

### Test Case 2: Long text (needs splitting)
**Input:**
- Title: "Long Article"
- Body: 500 graphemes of text
- Expected: 2-3 posts, first with link, others with thread indicators

### Test Case 3: Text with emojis
**Input:**
- Title: "🎉 Celebration"
- Body: "Hello 👋 World 🌍" repeated to 400 graphemes
- Expected: Correct grapheme counting (emojis = 1 grapheme each)

### Test Case 4: Vercel preview URL
**Input:**
- NEXT_PUBLIC_APP_URL: `https://ponderants-git-development-abc123.vercel.app`
- Expected: URL accounts for ~100 char length

### Test Case 5: Exactly at boundary
**Input:**
- Text that's exactly 300 graphemes including link
- Expected: 1 post, no error

## Validation

After implementation, verify:
1. No posts exceed 300 graphemes
2. Splitting happens at word boundaries when possible
3. All chunks account for thread indicators
4. First post always includes detail URL
5. Works with emoji and multi-byte characters