Introduction

Codexa is a powerful CLI tool that ingests your codebase and allows you to ask questions about it using Retrieval-Augmented Generation (RAG).

Features

  • πŸ”’ Privacy-First: All data processing happens locally by default
  • ⚑ Fast & Efficient: Local embeddings and optimized vector search
  • πŸ€– Multiple LLM Support: Works with Groq (cloud)
  • πŸ’Ύ Local Storage: SQLite database for embeddings and context
  • 🎯 Smart Chunking: Intelligent code splitting with configurable overlap
  • πŸ“Š Streaming Output: Real-time response streaming for better UX
  • 🎨 Multiple File Types: Supports TypeScript, JavaScript, Python, Go, Rust, Java, and more
  • 🧠 Smart Configuration: Automatically detects project languages and optimizes config
  • πŸ›‘οΈ Intelligent Filtering: Automatically excludes binaries, large files, and build artifacts
  • βš™οΈ Highly Configurable: Fine-tune chunking, retrieval, and model parameters
  • πŸš€ Zero Setup: Works out of the box with sensible defaults

[!WARNING] Codebase Size Limitation: Codexa is optimized for small to medium-sized codebases. It currently supports projects with up to 200 files and 20,000 chunks. For larger codebases, consider using more restrictive includeGlobs patterns to focus on specific directories or file types.

How It Works

Codexa uses Retrieval-Augmented Generation (RAG) to answer questions about your codebase:

1. Ingestion Phase

When you run codexa ingest:

  1. File Discovery: Scans your repository using glob patterns (includeGlobs/excludeGlobs)
  2. Smart Filtering: Automatically excludes binaries, large files (>5MB), and build artifacts
  3. Code Chunking: Splits files into manageable chunks with configurable overlap
  4. Embedding Generation: Creates vector embeddings for each chunk using local transformers
  5. Storage: Stores chunks and embeddings in a SQLite database (.codexa/index.db)

2. Query Phase

When you run codexa ask:

  1. Question Embedding: Converts your question into a vector embedding
  2. Vector Search: Finds the most similar code chunks using cosine similarity
  3. Context Retrieval: Selects top-K most relevant chunks as context
  4. LLM Generation: Sends question + context to your configured LLM
  5. Response: Returns an answer grounded in your actual codebase

Benefits

  • Privacy: All processing happens locally by default
  • Speed: Local embeddings and vector search are very fast
  • Accuracy: Answers are based on your actual code, not generic responses
  • Context-Aware: Understands relationships across your codebase

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Query    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Embedding      │────▢│   Vector     β”‚
β”‚  Generation     β”‚     β”‚   Search     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   Context    β”‚
                        β”‚   Retrieval  β”‚
                        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   SQLite DB     │◀────│   LLM        β”‚
β”‚   (Chunks +     β”‚     β”‚   (Groq)     β”‚
β”‚   Embeddings)   β”‚     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
                               β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   Answer     β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components:

  • Chunker: Splits code files into semantic chunks
  • Embedder: Generates vector embeddings (local transformers)
  • Retriever: Finds relevant chunks using vector similarity
  • LLM Client: Generates answers (Groq cloud)
  • Database: SQLite for storing chunks and embeddings
On this page