# Plan: Fix Grapheme Computation (Text Splitting) **Priority:** HIGH - Blocking production node creation ## Current Implementation (Broken) ### Problems Identified 1. **Line 113**: Uses character length instead of grapheme length: ```typescript testText = testText.substring(0, Math.floor(testText.length * 0.9)); ``` With emojis or multi-byte chars, this can never converge properly. 2. **Variable URL lengths**: URL can be 72-112 chars depending on environment: - `http://localhost:3000`: 72 chars - `https://ponderants.app`: 73 chars - `https://www.ponderants.com`: 77 chars - `https://ponderants-dev-preview-abc123.vercel.app`: 99 chars 3. **Pre-calculates limit**: Computes `linkGraphemes` once with current URL, but doesn't account for worst-case ## Correct Algorithm ### Step 1: Calculate overhead for each post type ```typescript const detailUrl = `${baseUrl}/galaxy/${encodeURIComponent(nodeId)}`; const linkSuffix = `\n\nRead more: ${detailUrl}`; const linkGraphemes = getGraphemeLength(linkSuffix); // Thread indicator: "(N/Total) " where both N and Total can be 1-99 // Worst case: "(99/99) " = 9 characters const threadIndicatorGraphemes = 9; // Safety buffer to account for RichText facet detection potentially adding chars const safetyBuffer = 5; ``` ### Step 2: Calculate max graphemes for each post type ```typescript const firstPostMaxGraphemes = 300 - linkGraphemes - safetyBuffer; const threadPostMaxGraphemes = 300 - threadIndicatorGraphemes - safetyBuffer; ``` ### Step 3: Split fullText by GRAPHEME count ```typescript function splitByGraphemes(text: string, firstMax: number, otherMax: number): string[] { const chunks: string[] = []; let remainingText = text; let isFirst = true; while (remainingText.length > 0) { const maxGraphemes = isFirst ? firstMax : otherMax; const rt = new RichText({ text: remainingText }); if (rt.graphemeLength <= maxGraphemes) { // Rest of text fits in one chunk chunks.push(remainingText); break; } // Need to split - find the split point let testText = remainingText; // Binary search to find the right character boundary while (getGraphemeLength(testText) > maxGraphemes) { // Find last word boundary before current position const lastSpace = testText.lastIndexOf(' '); if (lastSpace > testText.length * 0.5) { // Good word boundary found testText = testText.substring(0, lastSpace); } else { // No good word boundary - shrink by grapheme-aware amount // Take (maxGraphemes / currentGraphemes) * currentLength const currentGraphemes = getGraphemeLength(testText); const ratio = maxGraphemes / currentGraphemes; const newLength = Math.floor(testText.length * ratio * 0.95); // 0.95 for safety testText = testText.substring(0, newLength); } } chunks.push(testText.trim()); remainingText = remainingText.substring(testText.length).trim(); isFirst = false; } return chunks; } ``` ### Step 4: Build posts with proper grapheme validation ```typescript const chunks = splitByGraphemes(fullText, firstPostMaxGraphemes, threadPostMaxGraphemes); for (let i = 0; i < chunks.length; i++) { const isFirstPost = i === 0; let postText = chunks[i]; // Add thread indicator if needed if (chunks.length > 1 && !isFirstPost) { postText = `(${i + 1}/${chunks.length}) ${postText}`; } // Add link to first post if (isFirstPost) { postText += linkSuffix; } // Final validation const finalGraphemes = getGraphemeLength(postText); if (finalGraphemes > 300) { console.error(`[POST /api/nodes] Post ${i + 1} exceeds limit: ${finalGraphemes} graphemes`); console.error(`[POST /api/nodes] Content: ${postText.substring(0, 100)}...`); throw new Error(`Post exceeds 300 grapheme limit: ${finalGraphemes}`); } // Continue with post creation... } ``` ## Implementation Steps 1. **Extract constants at the top** - Calculate `linkGraphemes` from actual URL - Define `threadIndicatorGraphemes = 9` (worst case) - Define `safetyBuffer = 5` 2. **Fix splitIntoChunks function** - Replace character-based substring with grapheme-aware splitting - Use RichText.graphemeLength for all length checks - When shrinking text, calculate ratio based on graphemes, not chars 3. **Add comprehensive logging** - Log chunk grapheme counts before adding overhead - Log final post grapheme counts - Log URL used and its grapheme length 4. **Test edge cases** - Long Vercel preview URLs (100+ chars) - Text with emojis and multi-byte characters - Text that needs 10+ chunks (thread indicators "(10/15)") - Text exactly at boundaries ## Files to Modify - `app/api/nodes/route.ts` - Replace `splitIntoChunks()` function ## Test Cases ### Test Case 1: Short text (fits in one post) **Input:** - Title: "Test" - Body: "Short content" - Expected: 1 post with link ### Test Case 2: Long text (needs splitting) **Input:** - Title: "Long Article" - Body: 500 graphemes of text - Expected: 2-3 posts, first with link, others with thread indicators ### Test Case 3: Text with emojis **Input:** - Title: "🎉 Celebration" - Body: "Hello 👋 World 🌍" repeated to 400 graphemes - Expected: Correct grapheme counting (emojis = 1 grapheme each) ### Test Case 4: Vercel preview URL **Input:** - NEXT_PUBLIC_APP_URL: `https://ponderants-git-development-abc123.vercel.app` - Expected: URL accounts for ~100 char length ### Test Case 5: Exactly at boundary **Input:** - Text that's exactly 300 graphemes including link - Expected: 1 post, no error ## Validation After implementation, verify: 1. No posts exceed 300 graphemes 2. Splitting happens at word boundaries when possible 3. All chunks account for thread indicators 4. First post always includes detail URL 5. Works with emoji and multi-byte characters