Auto-Indexing - Consuelo Docs

How Auto-Indexing Works

When you upload a file to Consuelo — whether through the Files page in the UI or through the GraphQL API — the system automatically checks if the file can be indexed for semantic search.

The Process

File Upload — You upload a file through the UI or API. The file is stored in S3 (or local storage for self-hosted instances).
Type Check — The system checks the file extension. Only text-based formats are indexed: .pdf, .doc, .docx, .txt, .md, .csv, .html.
Collection Lookup — The system finds or creates the workspace’s default collection.
Text Extraction — The file content is read from storage and text is extracted based on the file type.
Chunking & Embedding — The text is split into ~500-token chunks and each chunk gets a vector embedding.
Storage — Chunks and embeddings are stored in the knowledge base, linked to the original file.

What Gets Indexed

Scenario	Indexed?
Upload a PDF through the Files page	Yes
Upload a Word doc through the API	Yes
Upload a `.txt` file attached to a Person record	Yes
Upload a PNG screenshot	No (stored but not indexed)
Upload a ZIP archive	No (stored but not indexed)
Upload a video file	No (stored but not indexed)

Re-Indexing

If you update a file’s content, you can re-index it by calling the indexFileInKnowledgeBase mutation via the GraphQL API. This replaces the old chunks with new ones from the updated content.

Manual Indexing

For files that weren’t auto-indexed (or to index into a specific collection), use the GraphQL API:

mutation {
  indexFileInKnowledgeBase(input: {
    fileId: "your-file-id"
    collectionId: "target-collection-id"
  }) {
    chunkCount
  }
}

This is useful when you want to organize files into specific collections rather than the default one.

​How Auto-Indexing Works

​The Process

​What Gets Indexed

​Re-Indexing

​Manual Indexing

How Auto-Indexing Works

The Process

What Gets Indexed

Re-Indexing

Manual Indexing