Codexa

Codexa is a powerful CLI tool that ingests your codebase and allows you to ask questions about it using Retrieval-Augmented Generation (RAG).

Features

🔒 Privacy-First: All data processing happens locally by default
⚡ Fast & Efficient: Local embeddings and optimized vector search
🤖 Multiple LLM Support: Works with Groq (cloud)
💾 Local Storage: SQLite database for embeddings and context
🎯 Smart Chunking: Intelligent code splitting with configurable overlap
📊 Streaming Output: Real-time response streaming for better UX
🎨 Multiple File Types: Supports TypeScript, JavaScript, Python, Go, Rust, Java, and more
🧠 Smart Configuration: Automatically detects project languages and optimizes config
🛡️ Intelligent Filtering: Automatically excludes binaries, large files, and build artifacts
⚙️ Highly Configurable: Fine-tune chunking, retrieval, and model parameters
🚀 Zero Setup: Works out of the box with sensible defaults

[!WARNING] Codebase Size Limitation: Codexa is optimized for small to medium-sized codebases. It currently supports projects with up to 200 files and 20,000 chunks. For larger codebases, consider using more restrictive includeGlobs patterns to focus on specific directories or file types.

How It Works

Codexa uses Retrieval-Augmented Generation (RAG) to answer questions about your codebase:

1. Ingestion Phase

When you run codexa ingest:

File Discovery: Scans your repository using glob patterns (includeGlobs/excludeGlobs)
Smart Filtering: Automatically excludes binaries, large files (>5MB), and build artifacts
Code Chunking: Splits files into manageable chunks with configurable overlap
Embedding Generation: Creates vector embeddings for each chunk using local transformers
Storage: Stores chunks and embeddings in a SQLite database (.codexa/index.db)

2. Query Phase

When you run codexa ask:

Question Embedding: Converts your question into a vector embedding
Vector Search: Finds the most similar code chunks using cosine similarity
Context Retrieval: Selects top-K most relevant chunks as context
LLM Generation: Sends question + context to your configured LLM
Response: Returns an answer grounded in your actual codebase

Benefits

Privacy: All processing happens locally by default
Speed: Local embeddings and vector search are very fast
Accuracy: Answers are based on your actual code, not generic responses
Context-Aware: Understands relationships across your codebase

Architecture

┌─────────────────┐
│   User Query    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────┐
│  Embedding      │────▶│   Vector     │
│  Generation     │     │   Search     │
└─────────────────┘     └──────┬───────┘
                               │
                               ▼
                        ┌──────────────┐
                        │   Context    │
                        │   Retrieval  │
                        └──────┬───────┘
                               │
                               ▼
┌─────────────────┐     ┌──────────────┐
│   SQLite DB     │◀────│   LLM        │
│   (Chunks +     │     │   (Groq)     │
│   Embeddings)   │     └──────┬───────┘
└─────────────────┘            │
                               ▼
                        ┌──────────────┐
                        │   Answer     │
                        └──────────────┘

Key Components: