Skip to main content

How It Works

The Knowledge Base is Consuelo’s semantic search layer. It goes beyond simple keyword matching — it understands the meaning of your content and finds relevant results even when the exact words don’t match.

The Indexing Pipeline

When a file is uploaded to Consuelo, the following happens automatically:
  1. Text Extraction — Consuelo reads the file content. Supported formats: PDF, Word (.doc/.docx), plain text, Markdown, CSV, and HTML.
  2. Chunking — The extracted text is split into chunks of approximately 500 tokens each, with a 50-token overlap between chunks. This overlap ensures that context isn’t lost at chunk boundaries.
  3. Embedding — Each chunk is converted into a 1536-dimensional vector using OpenAI’s text-embedding-3-small model. This vector captures the semantic meaning of the text.
  4. Storage — The chunks and their embeddings are stored in PostgreSQL using the pgvector extension, with an HNSW index for fast approximate nearest-neighbor search.
  5. Collection Assignment — The chunks are assigned to the workspace’s default collection (or a specific collection if you specify one via the API).

Searching

When you (or an AI agent) search the knowledge base:
  1. Your query is converted into the same 1536-dimensional vector
  2. pgvector finds the chunks whose vectors are most similar (cosine similarity)
  3. Results are returned ranked by similarity score (0 to 1, where 1 is a perfect match)
  4. Only results above the minimum similarity threshold (default: 0.7) are returned

Collections

Collections are workspace-scoped groupings of knowledge chunks. They let you organize indexed content by topic, team, or purpose.

Default Collection

Every workspace has a default collection that’s automatically created when the first file is indexed. All auto-indexed files go into this collection.

Custom Collections

You can create custom collections via the GraphQL API to organize knowledge:
  • Sales Playbook — battle cards, competitive intel, pricing guides
  • Product Knowledge — feature docs, release notes, technical specs
  • Onboarding — training materials, process docs, team handbook
  • Industry Research — market reports, analyst briefings, case studies

Collection Operations

OperationGraphQLDescription
List collectionsknowledgeCollections querySee all collections and their chunk counts
Create collectioncreateKnowledgeCollection mutationCreate a new named collection
Delete collectiondeleteKnowledgeCollection mutationRemove a collection and all its chunks
Index fileindexFileInKnowledgeBase mutationIndex a file into a specific collection
SearchknowledgeSearch querySearch across all or specific collections

For AI Agents

The knowledge base is designed to be the primary way AI agents access your team’s sales content. An agent connected via the GraphQL API can:
  1. Search for context before a call — “What do we know about Acme Corp’s pricing concerns?”
  2. Find relevant scripts — “What’s our objection handling for ‘we already have a solution’?”
  3. Access methodology — “What are the MEDDIC qualification criteria for enterprise deals?”
  4. Retrieve competitive intel — “How do we compare to Competitor X on security features?”
The agent doesn’t need to know which file contains the answer. It searches by meaning and gets the most relevant chunks back.

Supported File Types

File TypeExtensionText Extraction
PDF.pdfFull text + page-level extraction
Microsoft Word.doc, .docxFull text extraction
Plain Text.txtDirect read
Markdown.mdDirect read
CSV.csvDirect read
HTML.htmlDirect read
Images.png, .jpg, etc.Not indexed (stored only)
Video.mp4, etc.Not indexed (stored only)
Archives.zipNot indexed (stored only)
OCR and image-based PDFs are not currently supported. If your PDF contains scanned images instead of selectable text, the content won’t be extracted. We recommend using text-based PDFs for best results.