How Auto-Indexing Works
When you upload a file to Consuelo — whether through the Files page in the UI or through the GraphQL API — the system automatically checks if the file can be indexed for semantic search.The Process
- File Upload — You upload a file through the UI or API. The file is stored in S3 (or local storage for self-hosted instances).
-
Type Check — The system checks the file extension. Only text-based formats are indexed:
.pdf,.doc,.docx,.txt,.md,.csv,.html. - Collection Lookup — The system finds or creates the workspace’s default collection.
- Text Extraction — The file content is read from storage and text is extracted based on the file type.
- Chunking & Embedding — The text is split into ~500-token chunks and each chunk gets a vector embedding.
- Storage — Chunks and embeddings are stored in the knowledge base, linked to the original file.
What Gets Indexed
| Scenario | Indexed? |
|---|---|
| Upload a PDF through the Files page | Yes |
| Upload a Word doc through the API | Yes |
Upload a .txt file attached to a Person record | Yes |
| Upload a PNG screenshot | No (stored but not indexed) |
| Upload a ZIP archive | No (stored but not indexed) |
| Upload a video file | No (stored but not indexed) |
Re-Indexing
If you update a file’s content, you can re-index it by calling theindexFileInKnowledgeBase mutation via the GraphQL API. This replaces the old chunks with new ones from the updated content.