# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview SkyTalk API is an AI-powered backend service for conversational idea exploration and knowledge synthesis. It interviews users to explore ideas and synthesizes conversations into a structured, semantically-linked knowledge base using the Zettelkasten method. ## Core Development Commands ### Environment Setup ```bash # Install dependencies using uv package manager uv pip install -r requirements.txt # Run the API server (async FastAPI) uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` ### Code Quality ```bash # Format code with black (mandatory) black . # Type checking mypy . # Linting ruff check . ``` ## Architecture & Implementation Standards ### Three-Layer Architecture 1. **API Layer**: FastAPI RPC-style endpoints (`/sessions/start`, `/sessions/message`) 2. **Services Layer**: Orchestration logic with LangChain LCEL 3. **Data Layer**: SQLModel (SQLite) + ChromaDB (embedded, persisted) ### Critical Implementation Rules **Database Models**: - SQLModel is the single source of truth for all database schemas - ChromaDB runs in embedded mode, persisted to disk - All database operations must be async **AI Integration**: - **Interviewing**: Use `gemini-2.5-flash-latest` (optimized for speed) - **Synthesis/Linking**: Use `gemini-2.5-pro-latest` (optimized for reasoning) - **Embeddings**: Use `models/text-embedding-004` - **Structured Output**: Always use `.with_structured_output()` with Pydantic models for data extraction - never parse raw text **Code Standards**: - Maximum 400 lines per Python file - Full type hints required (Pydantic V2 for API, SQLModel for DB) - All I/O operations must use async/await - Configuration via environment variables with `pydantic-settings` - Use `HTTPException` for client errors ### Project Structure ``` api/ ├── app/ │ ├── api/ # FastAPI endpoints (RPC-style) │ ├── services/ # Business logic, LangChain agents │ │ ├── interviewer.py │ │ ├── synthesizer.py │ │ └── vector.py │ ├── data/ # Repositories and database models │ │ ├── models/ # SQLModel definitions │ │ └── repositories/ │ ├── core/ # Configuration, prompts │ └── main.py # FastAPI app initialization ├── requirements.txt # Dependencies managed via uv └── .env # Environment variables ``` ### Key Implementation Patterns **LangChain LCEL Pipeline**: ```python chain = prompt | llm.with_structured_output(OutputModel) | parser result = await chain.ainvoke({"input": data}) ``` **Async Database Operations**: ```python async def get_session(session_id: str) -> Session: async with get_session() as db: return await db.get(Session, session_id) ``` **Background Task for Synthesis**: ```python background_tasks.add_task(synthesize_session, session_id) ``` ### Overall Style - NO sycophancy -- push back/suggest alternative routes when it would help improve the project