Syncs

What a sync does

When you connect a source and click Save & sync, AnswerVault starts a sync job. The job:

Downloads the documents in the scope you picked, using the OAuth token from the connector
Parses each document into structured text (PDF text extraction, Office document parsing, Confluence storage-format conversion)
Chunks the text into retrieval-sized pieces, preserving headings and structure
Embeds each chunk into the vector store using an embedding model in your residency region
Graphs entities, relationships, and document structure into the knowledge graph

Once the sync finishes, the content is queryable. You don't have to wait for the whole job to finish before asking questions — chunks become available as they're processed.

What gets indexed

Document content. PDFs, Word documents, PowerPoint, Excel, Confluence pages, plain text, Markdown.
Document structure. Headings, sections, lists, tables (where the source preserves them).
Metadata. Title, author, last modified date, source path.
Inline links between documents in the same source, used for relationship-aware retrieval.

Comments and discussion threads are not indexed today.

How long it takes

The first sync against a new source is the longest because every document is new. After that, syncs are incremental — only changed documents are reprocessed.

A typical SharePoint document library or Confluence space. A few minutes.
Large tenants (tens of thousands of documents). Tens of minutes to a few hours for the first sync, then incremental from there.
A single edited document. Re-indexed within a few minutes of the next incremental sync.

Sync timings depend on document size, parsing complexity (large PDFs are slower than Markdown), and overall queue load. You can use the product while a sync is running.

Managing syncs

The Syncs page in the app lists every sync job in your tenant. From there you can:

See running, completed, and failed jobs with their start time, source, and document counts
Inspect the per-document outcome for a finished job (skipped, indexed, error)
Trigger a manual re-sync if you've added documents to the source and don't want to wait for the next incremental run

Failed syncs surface the underlying error. Most failures come from revoked or expired tokens (re-authorise the connector) or provider-side rate limits (the job retries automatically).

Incremental updates

After the first sync, AnswerVault polls each source for changes. New documents are added, edited documents are re-chunked and re-embedded, and deleted documents are removed from the index. Permission changes on the source are picked up too — if you lose access to a document, the next sync drops it from the index.

The polling cadence depends on the provider's API. SharePoint and Google Drive support change notifications and update within minutes. Confluence Cloud is polled.

Removing content

To remove content from the knowledge base, you have three options:

Narrow the connector scope. Re-open the picker on the connection and untick folders or spaces. The next sync drops the content from the index.
Disconnect the source entirely. The connection's content is removed from the index and the OAuth token is destroyed.
Delete or restrict at the source. Standard provider-side deletion or permission change flows through on the next sync.