Knowledge Base - Consuelo Docs

How It Works

The Knowledge Base is Consuelo’s semantic search layer. It goes beyond simple keyword matching — it understands the meaning of your content and finds relevant results even when the exact words don’t match.

The Indexing Pipeline

When a file is uploaded to Consuelo, the following happens automatically:

Text Extraction — Consuelo reads the file content. Supported formats: PDF, Word (.doc/.docx), plain text, Markdown, CSV, and HTML.
Chunking — The extracted text is split into chunks of approximately 500 tokens each, with a 50-token overlap between chunks. This overlap ensures that context isn’t lost at chunk boundaries.
Embedding — Each chunk is converted into a 1536-dimensional vector using OpenAI’s text-embedding-3-small model. This vector captures the semantic meaning of the text.
Storage — The chunks and their embeddings are stored in PostgreSQL using the pgvector extension, with an HNSW index for fast approximate nearest-neighbor search.
Collection Assignment — The chunks are assigned to the workspace’s default collection (or a specific collection if you specify one via the API).

Searching

When you (or an AI agent) search the knowledge base:

Your query is converted into the same 1536-dimensional vector
pgvector finds the chunks whose vectors are most similar (cosine similarity)
Results are returned ranked by similarity score (0 to 1, where 1 is a perfect match)
Only results above the minimum similarity threshold (default: 0.7) are returned

Collections

Collections are workspace-scoped groupings of knowledge chunks. They let you organize indexed content by topic, team, or purpose.

Default Collection

Every workspace has a default collection that’s automatically created when the first file is indexed. All auto-indexed files go into this collection.

Custom Collections

You can create custom collections via the GraphQL API to organize knowledge:

Sales Playbook — battle cards, competitive intel, pricing guides
Product Knowledge — feature docs, release notes, technical specs
Onboarding — training materials, process docs, team handbook
Industry Research — market reports, analyst briefings, case studies

Collection Operations

Operation	GraphQL	Description
List collections	`knowledgeCollections` query	See all collections and their chunk counts
Create collection	`createKnowledgeCollection` mutation	Create a new named collection
Delete collection	`deleteKnowledgeCollection` mutation	Remove a collection and all its chunks
Index file	`indexFileInKnowledgeBase` mutation	Index a file into a specific collection
Search	`knowledgeSearch` query	Search across all or specific collections

For AI Agents

The knowledge base is designed to be the primary way AI agents access your team’s sales content. An agent connected via the GraphQL API can:

Search for context before a call — “What do we know about Acme Corp’s pricing concerns?”
Find relevant scripts — “What’s our objection handling for ‘we already have a solution’?”
Access methodology — “What are the MEDDIC qualification criteria for enterprise deals?”
Retrieve competitive intel — “How do we compare to Competitor X on security features?”

The agent doesn’t need to know which file contains the answer. It searches by meaning and gets the most relevant chunks back.

Supported File Types

File Type	Extension	Text Extraction
PDF	`.pdf`	Full text + page-level extraction
Microsoft Word	`.doc`, `.docx`	Full text extraction
Plain Text	`.txt`	Direct read
Markdown	`.md`	Direct read
CSV	`.csv`	Direct read
HTML	`.html`	Direct read
Images	`.png`, `.jpg`, etc.	Not indexed (stored only)
Video	`.mp4`, etc.	Not indexed (stored only)
Archives	`.zip`	Not indexed (stored only)

OCR and image-based PDFs are not currently supported. If your PDF contains scanned images instead of selectable text, the content won’t be extracted. We recommend using text-based PDFs for best results.

​How It Works

​The Indexing Pipeline

​Searching

​Collections

​Default Collection

​Custom Collections

​Collection Operations

​For AI Agents

​Supported File Types

How It Works

The Indexing Pipeline

Searching

Collections

Default Collection

Custom Collections

Collection Operations

For AI Agents

Supported File Types