Files
app/scripts/recalculate-all-coords.ts
Albert 0c4934cf70 fix: Recalculate ALL nodes for UMAP instead of incremental
Fixed critical bug where nodes 4+ wouldn't get 3D coordinates because
UMAP manifold learning requires seeing the complete dataset together.

Root Cause:
- Previous code only calculated coords for nodes WHERE coords_3d = NONE
- When creating nodes 4-5, only those 2 nodes were passed to UMAP
- UMAP requires minimum 3 points to define a manifold
- Result: "Not enough nodes to map (2/3)" error

Why Full Recalculation is Necessary:
- UMAP is a non-linear manifold learning algorithm
- It creates relative coordinates, not absolute positions
- Each UMAP run produces different coordinate systems
- No "fixed origin" exists - positions are only meaningful relative to each other
- Adding new data changes the manifold structure

Changes:
- Updated /app/api/calculate-graph/route.ts:
  * Removed "AND coords_3d = NONE" filter from query
  * Now fetches ALL nodes with embeddings every time
  * Recalculates entire graph when triggered
  * Updated comments and logging to reflect full recalculation

- Created docs/umap-recalculation-strategy.md:
  * Comprehensive explanation of UMAP manifold learning
  * Why incremental calculation doesn't work
  * Trade-offs of full recalculation approach
  * Performance characteristics (<100 nodes: <1.5s)
  * Future optimization strategies for scale

- Added scripts/recalculate-all-coords.ts:
  * Emergency script to manually fix production database
  * Successfully recalculated all 5 nodes in production

UX Impact:
The thought galaxy now "reorganizes" when adding new nodes - existing
nodes will shift slightly. This is actually a feature, showing the
evolving structure of your knowledge graph as it grows.

Performance:
Full recalculation is O(n²) but acceptable for <100 nodes:
- 3 nodes: ~50ms
- 10 nodes: ~200ms
- 50 nodes: ~800ms
- 100 nodes: ~1.5s

For Ponderants MVP, this is perfectly acceptable. Future optimizations
documented if we reach 1000+ nodes per user.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 01:15:27 +00:00

92 lines
3.0 KiB
TypeScript

import Surreal from 'surrealdb';
import { UMAP } from 'umap-js';
/**
* Recalculate 3D coordinates for ALL nodes
*
* This script fixes the issue where new nodes don't get coordinates
* because UMAP needs to see the full dataset to properly position points.
*
* Usage: tsx scripts/recalculate-all-coords.ts
*/
async function recalculateAllCoordinates() {
const db = new Surreal();
try {
// Connect to production database
const dbUrl = process.env.SURREALDB_URL || 'wss://ponderants-prod-06d6iecp19qj3bvmv2o0r5j50o.aws-usw2.surreal.cloud/rpc';
const dbNs = process.env.SURREALDB_NS || 'ponderants';
const dbName = process.env.SURREALDB_DB || 'production';
const dbUser = process.env.SURREALDB_USER || 'root';
const dbPass = process.env.SURREALDB_PASS;
if (!dbPass) {
throw new Error('SURREALDB_PASS environment variable is required');
}
console.log(`Connecting to ${dbUrl}...`);
await db.connect(dbUrl);
await db.signin({ username: dbUser, password: dbPass });
await db.use({ namespace: dbNs, database: dbName });
console.log('✓ Connected to database');
// Fetch ALL nodes with embeddings (not just those without coords)
console.log('Fetching all nodes with embeddings...');
const results = await db.query<[Array<{ id: string; embedding: number[] }>]>(
'SELECT id, title, embedding FROM node WHERE embedding != NONE'
);
const nodes = results[0] || [];
console.log(`Found ${nodes.length} nodes with embeddings`);
if (nodes.length === 0) {
console.log('No nodes with embeddings found');
return;
}
if (nodes.length < 3) {
console.error(`ERROR: Need at least 3 nodes for UMAP, found ${nodes.length}`);
return;
}
// Run UMAP on ALL nodes together
const embeddings = nodes.map((n) => n.embedding);
console.log('Running UMAP dimensionality reduction...');
console.log(`- Input: ${nodes.length} nodes with ${embeddings[0].length}-dimensional embeddings`);
console.log(`- Output: 3D coordinates`);
const umap = new UMAP({
nComponents: 3,
nNeighbors: Math.min(15, nodes.length - 1), // nNeighbors must be < sample size
minDist: 0.1,
spread: 1.0,
});
const coords_3d_array = await umap.fitAsync(embeddings);
console.log('✓ UMAP projection complete');
// Update ALL nodes with their new 3D coords
console.log('Updating nodes with new coordinates...');
for (let i = 0; i < nodes.length; i++) {
const node = nodes[i];
const coords = coords_3d_array[i];
await db.merge(node.id, {
coords_3d: [coords[0], coords[1], coords[2]],
});
console.log(` ✓ Updated ${node.id}: [${coords[0].toFixed(3)}, ${coords[1].toFixed(3)}, ${coords[2].toFixed(3)}]`);
}
console.log(`\n✅ Successfully updated ${nodes.length} nodes with 3D coordinates`);
} catch (error) {
console.error('❌ Error:', error);
process.exit(1);
} finally {
await db.close();
}
}
recalculateAllCoordinates();