Skip to main content

How Auto-Indexing Works

When you upload a file to Consuelo — whether through the Files page in the UI or through the GraphQL API — the system automatically checks if the file can be indexed for semantic search.

The Process

  1. File Upload — You upload a file through the UI or API. The file is stored in S3 (or local storage for self-hosted instances).
  2. Type Check — The system checks the file extension. Only text-based formats are indexed: .pdf, .doc, .docx, .txt, .md, .csv, .html.
  3. Collection Lookup — The system finds or creates the workspace’s default collection.
  4. Text Extraction — The file content is read from storage and text is extracted based on the file type.
  5. Chunking & Embedding — The text is split into ~500-token chunks and each chunk gets a vector embedding.
  6. Storage — Chunks and embeddings are stored in the knowledge base, linked to the original file.

What Gets Indexed

ScenarioIndexed?
Upload a PDF through the Files pageYes
Upload a Word doc through the APIYes
Upload a .txt file attached to a Person recordYes
Upload a PNG screenshotNo (stored but not indexed)
Upload a ZIP archiveNo (stored but not indexed)
Upload a video fileNo (stored but not indexed)

Re-Indexing

If you update a file’s content, you can re-index it by calling the indexFileInKnowledgeBase mutation via the GraphQL API. This replaces the old chunks with new ones from the updated content.

Manual Indexing

For files that weren’t auto-indexed (or to index into a specific collection), use the GraphQL API:
mutation {
  indexFileInKnowledgeBase(input: {
    fileId: "your-file-id"
    collectionId: "target-collection-id"
  }) {
    chunkCount
  }
}
This is useful when you want to organize files into specific collections rather than the default one.