Technical Architecture · AnswerVault

System layers

AnswerVault is a layered platform. Documents move up through ingestion into the knowledge layer; queries move down from the interface layer through the RAG agent and back.

Interface layer

Web app, CLI, Microsoft Teams app, Slack app, REST API. One knowledge layer behind every channel, your team picks the surface that fits the moment.

RAG agent

Retrieval-augmented generation: query parsing, context retrieval from the knowledge graph and vector store, and grounded response generation with citations.

The agent is read-only and has no tool-calling agency. It cannot access files, execute commands, run transactions, or modify any system. Its only capability is to return text answers with citations to documents you have already connected.

Knowledge graph + vector store

A PostgreSQL-backed graph database holds documents, entities, and relationships. A vector database holds dense embeddings for semantic search. Retrieval traverses both: relationship-aware where it matters, semantic where it helps.

Ingestion pipeline

Documents from connected sources are parsed, chunked, embedded, and graphed. Updates are incremental, when a SharePoint policy changes, only the affected chunks are re-indexed.

Connectors

OAuth-based connections to SharePoint Online, Google Drive, and Confluence. Dropbox, AWS S3, Azure Blob Storage, and Google Cloud Storage are on the roadmap.

Core technologies

RAG architecture

Retrieval-augmented generation combines semantic search with large language models to produce grounded answers. Query understanding and intent detection, semantic similarity search, context-aware response generation, source citation, and multi-document synthesis.

Knowledge graph

Documents, entities, and relationships stored in a PostgreSQL-backed graph database. Document relationship mapping, entity extraction, hierarchical structures, cross-document connections, and path-based reasoning.

Vector database

Dense, high-dimensional embeddings enable semantic search beyond keyword matching. Approximate nearest-neighbour search, hybrid (vector + keyword) retrieval, multi-modal embeddings, real-time indexing.

Managed AI

AnswerVault is retrieval-augmented, not training-based. We do not train, fine-tune, or otherwise alter foundation models, on your data or on anyone else's. Every answer is grounded in the documents you've connected at query time, nothing is absorbed into model weights.

You don't bring API keys or LLM subscriptions. AnswerVault selects, configures, and operates the AI inference layer for you: embedding models, retrieval models, and the generation model that produces final answers. Inference runs inside your selected data residency region (EU, UK, or US). Models are kept current, you get the upgrade automatically. See the security page for the full AI data-handling position.

Processing pipeline

Multi-format document support (PDF, Word, PowerPoint, Confluence pages, etc.), intelligent chunking, metadata extraction, incremental updates, quality validation.

Security and compliance

Security is built into every layer of the architecture. End-to-end encryption (TLS 1.3 in transit, AES-256-GCM at rest), OAuth 2.0 authentication, role-based access control, data residency selection, audit logging, GDPR + ISO 27001 alignment, zero-trust internal networking, and dedicated deployments for organisations that need physical isolation.

Technologies in use: TLS 1.3, OAuth 2.0, JWT, AES-256-GCM, RBAC, mTLS, SAML 2.0, SCIM.

The full security position is on the security page. The deeper position on data sovereignty and the CLOUD Act is in the sovereignty topic guide.

Technology stack

Backend & APIs

Node.js · TypeScript · Express · REST API · GraphQL

Databases

PostgreSQL · Redis

AI & LLMs

GPT-4 · GPT-5 · text-embedding-3

Infrastructure

AWS (primary) · Azure (AI inference via Azure OpenAI, Teams app) · sovereign tier on customer-specified non-US providers · Docker · ECS Fargate · Terraform