Critical fixes for core functionality: 1. Fixed grapheme-aware text splitting (app/api/nodes/route.ts) - Changed character-based substring to grapheme-ratio calculation - Now properly handles emojis and multi-byte characters - Prevents posts from exceeding 300 grapheme Bluesky limit - Added comprehensive logging for debugging 2. Automatic UMAP coordinate calculation (app/api/nodes/route.ts) - Triggers /api/calculate-graph automatically after node creation - Only when user has 3+ nodes with embeddings (UMAP minimum) - Non-blocking background process - Eliminates need for manual "Calculate Graph" button - Galaxy visualization ready on first visit 3. Simplified galaxy route (app/api/galaxy/route.ts) - Removed auto-trigger logic (now handled on insertion) - Simply returns existing coordinates - More efficient, no redundant calculations 4. Added idempotency (app/api/calculate-graph/route.ts) - Safe to call multiple times - Returns early if all nodes already have coordinates - Better logging for debugging Implementation plans documented in /plans directory. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.8 KiB
5.8 KiB
Plan: Fix Grapheme Computation (Text Splitting)
Priority: HIGH - Blocking production node creation
Current Implementation (Broken)
Problems Identified
-
Line 113: Uses character length instead of grapheme length:
testText = testText.substring(0, Math.floor(testText.length * 0.9));With emojis or multi-byte chars, this can never converge properly.
-
Variable URL lengths: URL can be 72-112 chars depending on environment:
http://localhost:3000: 72 charshttps://ponderants.app: 73 charshttps://www.ponderants.com: 77 charshttps://ponderants-dev-preview-abc123.vercel.app: 99 chars
-
Pre-calculates limit: Computes
linkGraphemesonce with current URL, but doesn't account for worst-case
Correct Algorithm
Step 1: Calculate overhead for each post type
const detailUrl = `${baseUrl}/galaxy/${encodeURIComponent(nodeId)}`;
const linkSuffix = `\n\nRead more: ${detailUrl}`;
const linkGraphemes = getGraphemeLength(linkSuffix);
// Thread indicator: "(N/Total) " where both N and Total can be 1-99
// Worst case: "(99/99) " = 9 characters
const threadIndicatorGraphemes = 9;
// Safety buffer to account for RichText facet detection potentially adding chars
const safetyBuffer = 5;
Step 2: Calculate max graphemes for each post type
const firstPostMaxGraphemes = 300 - linkGraphemes - safetyBuffer;
const threadPostMaxGraphemes = 300 - threadIndicatorGraphemes - safetyBuffer;
Step 3: Split fullText by GRAPHEME count
function splitByGraphemes(text: string, firstMax: number, otherMax: number): string[] {
const chunks: string[] = [];
let remainingText = text;
let isFirst = true;
while (remainingText.length > 0) {
const maxGraphemes = isFirst ? firstMax : otherMax;
const rt = new RichText({ text: remainingText });
if (rt.graphemeLength <= maxGraphemes) {
// Rest of text fits in one chunk
chunks.push(remainingText);
break;
}
// Need to split - find the split point
let testText = remainingText;
// Binary search to find the right character boundary
while (getGraphemeLength(testText) > maxGraphemes) {
// Find last word boundary before current position
const lastSpace = testText.lastIndexOf(' ');
if (lastSpace > testText.length * 0.5) {
// Good word boundary found
testText = testText.substring(0, lastSpace);
} else {
// No good word boundary - shrink by grapheme-aware amount
// Take (maxGraphemes / currentGraphemes) * currentLength
const currentGraphemes = getGraphemeLength(testText);
const ratio = maxGraphemes / currentGraphemes;
const newLength = Math.floor(testText.length * ratio * 0.95); // 0.95 for safety
testText = testText.substring(0, newLength);
}
}
chunks.push(testText.trim());
remainingText = remainingText.substring(testText.length).trim();
isFirst = false;
}
return chunks;
}
Step 4: Build posts with proper grapheme validation
const chunks = splitByGraphemes(fullText, firstPostMaxGraphemes, threadPostMaxGraphemes);
for (let i = 0; i < chunks.length; i++) {
const isFirstPost = i === 0;
let postText = chunks[i];
// Add thread indicator if needed
if (chunks.length > 1 && !isFirstPost) {
postText = `(${i + 1}/${chunks.length}) ${postText}`;
}
// Add link to first post
if (isFirstPost) {
postText += linkSuffix;
}
// Final validation
const finalGraphemes = getGraphemeLength(postText);
if (finalGraphemes > 300) {
console.error(`[POST /api/nodes] Post ${i + 1} exceeds limit: ${finalGraphemes} graphemes`);
console.error(`[POST /api/nodes] Content: ${postText.substring(0, 100)}...`);
throw new Error(`Post exceeds 300 grapheme limit: ${finalGraphemes}`);
}
// Continue with post creation...
}
Implementation Steps
-
Extract constants at the top
- Calculate
linkGraphemesfrom actual URL - Define
threadIndicatorGraphemes = 9(worst case) - Define
safetyBuffer = 5
- Calculate
-
Fix splitIntoChunks function
- Replace character-based substring with grapheme-aware splitting
- Use RichText.graphemeLength for all length checks
- When shrinking text, calculate ratio based on graphemes, not chars
-
Add comprehensive logging
- Log chunk grapheme counts before adding overhead
- Log final post grapheme counts
- Log URL used and its grapheme length
-
Test edge cases
- Long Vercel preview URLs (100+ chars)
- Text with emojis and multi-byte characters
- Text that needs 10+ chunks (thread indicators "(10/15)")
- Text exactly at boundaries
Files to Modify
app/api/nodes/route.ts- ReplacesplitIntoChunks()function
Test Cases
Test Case 1: Short text (fits in one post)
Input:
- Title: "Test"
- Body: "Short content"
- Expected: 1 post with link
Test Case 2: Long text (needs splitting)
Input:
- Title: "Long Article"
- Body: 500 graphemes of text
- Expected: 2-3 posts, first with link, others with thread indicators
Test Case 3: Text with emojis
Input:
- Title: "🎉 Celebration"
- Body: "Hello 👋 World 🌍" repeated to 400 graphemes
- Expected: Correct grapheme counting (emojis = 1 grapheme each)
Test Case 4: Vercel preview URL
Input:
- NEXT_PUBLIC_APP_URL:
https://ponderants-git-development-abc123.vercel.app - Expected: URL accounts for ~100 char length
Test Case 5: Exactly at boundary
Input:
- Text that's exactly 300 graphemes including link
- Expected: 1 post, no error
Validation
After implementation, verify:
- No posts exceed 300 graphemes
- Splitting happens at word boundaries when possible
- All chunks account for thread indicators
- First post always includes detail URL
- Works with emoji and multi-byte characters