Syncs
What a sync does
When you connect a source and click Save & sync, AnswerVault starts a sync job. The job:
- Downloads the documents in the scope you picked, using the OAuth token from the connector
- Parses each document into structured text (PDF text extraction, Office document parsing, Confluence storage-format conversion)
- Chunks the text into retrieval-sized pieces, preserving headings and structure
- Embeds each chunk into the vector store using an embedding model in your residency region
- Graphs entities, relationships, and document structure into the knowledge graph
Once the sync finishes, the content is queryable. You don't have to wait for the whole job to finish before asking questions — chunks become available as they're processed.
What gets indexed
- Document content. PDFs, Word documents, PowerPoint, Excel, Confluence pages, plain text, Markdown.
- Document structure. Headings, sections, lists, tables (where the source preserves them).
- Metadata. Title, author, last modified date, source path.
- Inline links between documents in the same source, used for relationship-aware retrieval.
Comments and discussion threads are not indexed today.
How long it takes
The first sync against a new source is the longest because every document is new. After that, syncs are incremental — only changed documents are reprocessed.
- A typical SharePoint document library or Confluence space. A few minutes.
- Large tenants (tens of thousands of documents). Tens of minutes to a few hours for the first sync, then incremental from there.
- A single edited document. Re-indexed within a few minutes of the next incremental sync.
Sync timings depend on document size, parsing complexity (large PDFs are slower than Markdown), and overall queue load. You can use the product while a sync is running.
Managing syncs
The Syncs page in the app lists every sync job in your tenant. From there you can:
- See running, completed, and failed jobs with their start time, source, and document counts
- Inspect the per-document outcome for a finished job (skipped, indexed, error)
- Trigger a manual re-sync if you've added documents to the source and don't want to wait for the next incremental run
Failed syncs surface the underlying error. Most failures come from revoked or expired tokens (re-authorise the connector) or provider-side rate limits (the job retries automatically).
Incremental updates
After the first sync, AnswerVault polls each source for changes. New documents are added, edited documents are re-chunked and re-embedded, and deleted documents are removed from the index. Permission changes on the source are picked up too — if you lose access to a document, the next sync drops it from the index.
The polling cadence depends on the provider's API. SharePoint and Google Drive support change notifications and update within minutes. Confluence Cloud is polled.
Removing content
To remove content from the knowledge base, you have three options:
- Narrow the connector scope. Re-open the picker on the connection and untick folders or spaces. The next sync drops the content from the index.
- Disconnect the source entirely. The connection's content is removed from the index and the OAuth token is destroyed.
- Delete or restrict at the source. Standard provider-side deletion or permission change flows through on the next sync.