Files
app/plans/fix-grapheme-splitting.md
Albert d8a975122f feat: Fix grapheme splitting and add automatic UMAP calculation
Critical fixes for core functionality:

1. Fixed grapheme-aware text splitting (app/api/nodes/route.ts)
   - Changed character-based substring to grapheme-ratio calculation
   - Now properly handles emojis and multi-byte characters
   - Prevents posts from exceeding 300 grapheme Bluesky limit
   - Added comprehensive logging for debugging

2. Automatic UMAP coordinate calculation (app/api/nodes/route.ts)
   - Triggers /api/calculate-graph automatically after node creation
   - Only when user has 3+ nodes with embeddings (UMAP minimum)
   - Non-blocking background process
   - Eliminates need for manual "Calculate Graph" button
   - Galaxy visualization ready on first visit

3. Simplified galaxy route (app/api/galaxy/route.ts)
   - Removed auto-trigger logic (now handled on insertion)
   - Simply returns existing coordinates
   - More efficient, no redundant calculations

4. Added idempotency (app/api/calculate-graph/route.ts)
   - Safe to call multiple times
   - Returns early if all nodes already have coordinates
   - Better logging for debugging

Implementation plans documented in /plans directory.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 20:19:20 +00:00

5.8 KiB

Plan: Fix Grapheme Computation (Text Splitting)

Priority: HIGH - Blocking production node creation

Current Implementation (Broken)

Problems Identified

  1. Line 113: Uses character length instead of grapheme length:

    testText = testText.substring(0, Math.floor(testText.length * 0.9));
    

    With emojis or multi-byte chars, this can never converge properly.

  2. Variable URL lengths: URL can be 72-112 chars depending on environment:

    • http://localhost:3000: 72 chars
    • https://ponderants.app: 73 chars
    • https://www.ponderants.com: 77 chars
    • https://ponderants-dev-preview-abc123.vercel.app: 99 chars
  3. Pre-calculates limit: Computes linkGraphemes once with current URL, but doesn't account for worst-case

Correct Algorithm

Step 1: Calculate overhead for each post type

const detailUrl = `${baseUrl}/galaxy/${encodeURIComponent(nodeId)}`;
const linkSuffix = `\n\nRead more: ${detailUrl}`;
const linkGraphemes = getGraphemeLength(linkSuffix);

// Thread indicator: "(N/Total) " where both N and Total can be 1-99
// Worst case: "(99/99) " = 9 characters
const threadIndicatorGraphemes = 9;

// Safety buffer to account for RichText facet detection potentially adding chars
const safetyBuffer = 5;

Step 2: Calculate max graphemes for each post type

const firstPostMaxGraphemes = 300 - linkGraphemes - safetyBuffer;
const threadPostMaxGraphemes = 300 - threadIndicatorGraphemes - safetyBuffer;

Step 3: Split fullText by GRAPHEME count

function splitByGraphemes(text: string, firstMax: number, otherMax: number): string[] {
  const chunks: string[] = [];
  let remainingText = text;
  let isFirst = true;

  while (remainingText.length > 0) {
    const maxGraphemes = isFirst ? firstMax : otherMax;
    const rt = new RichText({ text: remainingText });

    if (rt.graphemeLength <= maxGraphemes) {
      // Rest of text fits in one chunk
      chunks.push(remainingText);
      break;
    }

    // Need to split - find the split point
    let testText = remainingText;

    // Binary search to find the right character boundary
    while (getGraphemeLength(testText) > maxGraphemes) {
      // Find last word boundary before current position
      const lastSpace = testText.lastIndexOf(' ');
      if (lastSpace > testText.length * 0.5) {
        // Good word boundary found
        testText = testText.substring(0, lastSpace);
      } else {
        // No good word boundary - shrink by grapheme-aware amount
        // Take (maxGraphemes / currentGraphemes) * currentLength
        const currentGraphemes = getGraphemeLength(testText);
        const ratio = maxGraphemes / currentGraphemes;
        const newLength = Math.floor(testText.length * ratio * 0.95); // 0.95 for safety
        testText = testText.substring(0, newLength);
      }
    }

    chunks.push(testText.trim());
    remainingText = remainingText.substring(testText.length).trim();
    isFirst = false;
  }

  return chunks;
}

Step 4: Build posts with proper grapheme validation

const chunks = splitByGraphemes(fullText, firstPostMaxGraphemes, threadPostMaxGraphemes);

for (let i = 0; i < chunks.length; i++) {
  const isFirstPost = i === 0;
  let postText = chunks[i];

  // Add thread indicator if needed
  if (chunks.length > 1 && !isFirstPost) {
    postText = `(${i + 1}/${chunks.length}) ${postText}`;
  }

  // Add link to first post
  if (isFirstPost) {
    postText += linkSuffix;
  }

  // Final validation
  const finalGraphemes = getGraphemeLength(postText);
  if (finalGraphemes > 300) {
    console.error(`[POST /api/nodes] Post ${i + 1} exceeds limit: ${finalGraphemes} graphemes`);
    console.error(`[POST /api/nodes] Content: ${postText.substring(0, 100)}...`);
    throw new Error(`Post exceeds 300 grapheme limit: ${finalGraphemes}`);
  }

  // Continue with post creation...
}

Implementation Steps

  1. Extract constants at the top

    • Calculate linkGraphemes from actual URL
    • Define threadIndicatorGraphemes = 9 (worst case)
    • Define safetyBuffer = 5
  2. Fix splitIntoChunks function

    • Replace character-based substring with grapheme-aware splitting
    • Use RichText.graphemeLength for all length checks
    • When shrinking text, calculate ratio based on graphemes, not chars
  3. Add comprehensive logging

    • Log chunk grapheme counts before adding overhead
    • Log final post grapheme counts
    • Log URL used and its grapheme length
  4. Test edge cases

    • Long Vercel preview URLs (100+ chars)
    • Text with emojis and multi-byte characters
    • Text that needs 10+ chunks (thread indicators "(10/15)")
    • Text exactly at boundaries

Files to Modify

  • app/api/nodes/route.ts - Replace splitIntoChunks() function

Test Cases

Test Case 1: Short text (fits in one post)

Input:

  • Title: "Test"
  • Body: "Short content"
  • Expected: 1 post with link

Test Case 2: Long text (needs splitting)

Input:

  • Title: "Long Article"
  • Body: 500 graphemes of text
  • Expected: 2-3 posts, first with link, others with thread indicators

Test Case 3: Text with emojis

Input:

  • Title: "🎉 Celebration"
  • Body: "Hello 👋 World 🌍" repeated to 400 graphemes
  • Expected: Correct grapheme counting (emojis = 1 grapheme each)

Test Case 4: Vercel preview URL

Input:

  • NEXT_PUBLIC_APP_URL: https://ponderants-git-development-abc123.vercel.app
  • Expected: URL accounts for ~100 char length

Test Case 5: Exactly at boundary

Input:

  • Text that's exactly 300 graphemes including link
  • Expected: 1 post, no error

Validation

After implementation, verify:

  1. No posts exceed 300 graphemes
  2. Splitting happens at word boundaries when possible
  3. All chunks account for thread indicators
  4. First post always includes detail URL
  5. Works with emoji and multi-byte characters