Multimodal Document
Retrieval API

Join Waitlist

polyviaStar Documentation

Products

API for developers, Platform for teams

Polyvia API

Try API

Multimodal Document Retrieval API
— for Developers of AI Agents

Python$pip install polyvia

Typescript$npm install polyvia

Agent Skills$npx skills add polyvia-ai/skills

MCP$claude mcp add --transport http polyvia https://app.polyvia.ai/mcp --header "Authorization: Bearer poly_<your-key>"

# pip install polyvia
from polyvia import Polyvia
client = Polyvia(api_key=API_KEY)
# Ingest & query scoped to a group
client.ingest.batch(["q4.pdf", "10k.pdf"], group="Q4 Earnings")
answer = client.query(
"Compare EBITDA across all filings",
group="Q4 Earnings",
).answer

Python SDK →TypeScript SDK →MCP →Agent Skills →

Polyvia Platform

Try Platform

Search and Exploration over multimodal docs
— for Knowledge Workers in Enterprises

Interactive Exploration of Multimodal Knowledge Ontology

Platform Docs →

Use Cases

How teams put Polyvia to work

Data-Room Due Diligence

Surface every revenue, churn, and customer-concentration fact across a target's decks and statements.

Cross-Filing KPI Comparison

Compare a single metric across 500+ counterparty filings in seconds.

Counterparty Credit Monitoring

Flag covenant breaches and exposure shifts across 100+ borrower reports automatically.

Image-Based Claim Processing

Extract damage type, severity, and location from claim photos; auto-route to adjusters.

Why Polyvia

Advances over current solutions

Fast at scale

Fast over 100+ multimodal docs

Agentic, file-by-file search (Claude Code, Cowork) is too slow past ~100 multimodal docs — at scale you still need retrieval. Polyvia answers in sub-200ms over 100K+ files.

File-by-fileminutes

Polyvia<200ms

End-to-end

No need for extractors or PDF parsers

One end-to-end solution for large-scale multimodal document retrieval — not a stack of vendors (Reducto, LlamaIndex) to stitch together.

VLM Visual Extractor→Knowledge Ontology→Self-Improving Retrieval Agent

Enterprise & secure

Data never leaves your systems

On-prem deployment of Polyvia agents inside your own cloud / VPC, via the Polyvia for Enterprise offering — your documents stay within your perimeter.

Encrypted

Your VPC · on-premPolyvia

What Polyvia Can Do

Production-ready from day one

Audit-ready answers

Every answer traced back to source

Which segments show the fastest growth?

Cloud services led growth (+42%).cite: 10-K p.42

10-K · p.42 ¶399.8%

Cloud services led growth at +42%, driven by enterprise contract expansion.

99.8% citation coverage

Built for scale

From 5 files to 100K+ documents

Throughput · last 24h42K / hr

Documents indexed

sub-200ms

Query latency

Facts per corpus

Extraction confidence

Integrations

Works with your stack

AWS S3

Snowflake

Google Drive

SharePoint

CRM

ERP

Notion

Dropbox

Slack

Cursor

Claude

Codex

Multimodal Ingest

Every unstructured, visual and multimodal input

Visual Document Intelligence

Charts

Complex tables

Infographics

Slides & reports

Scans, Invoices & Handwriting

Pictures

Standard

Text

Multimodal Document Intelligence

Audio & Transcripts

Video

Coming soon

Molecular & Chemical

Coming soon

CAD & Drawings

Coming soon

Geospatial & Heatmaps

Coming soon

Read · See · Listen

Pinpoint precision in every format

ReadDocuments

Cited to the paragraph

Q4 revenue rose to $24.8B, up 14% year-over-year.

Cloud services led growth at +42% Consumer hardware was flat.

Net retention reached 118%, the highest in eight quarters…

cite · 10-K · p.42

SupportsPDFDOCXMDTXT

SeeSlides

Read the chart, not the caption

cite · deck p.7 · chart 2

SupportsPPTXPDF

SeeImages

Bounding-box precision

cite · fig 4 · (140,16)→(230,104)

SupportsPNGJPGWEBP

ListenAudio

Cited to the second, not the recording

00:0002:1402:3105:00

cite · 02:14 → 02:31 · “…retention reached 118%”

SupportsWAVMP3M4A

End-to-End Pipeline for Multimodal Document Intelligence

Polyvia Engine

VLM Visual Extractor

SOTA visual document extractor and parser. Fine-tuned VLM-OCR pipeline for the hardest visual and multimodal inputs. Extracts actual data points — not 300-token descriptions.

VLMOCRfine-tuned

Multimodal Knowledge Ontology

Knowledge graph for large-scale visual file search. Disambiguates extracted facts into unique entities; connects them across the corpus. Single source of truth, cross-document reasoning across 100K+ files.

graphentity-linking100K+ files

Self-Improving Retrieval Agent

Query decomposition + iterative retrieval + LLM-As-A-Judge. Sub-200ms graph search across 100K+ documents. Every answer grounded in visual citations. Learns which retrievals lead to successful generations.

sub-200mscitedself-improving

Start building with Polyvia

Get access to Polyvia, and stay up to date with updates!
Enter your email to join.

We're rolling out access weekly.

Multimodal DocumentRetrieval API

API for developers, Platform for teams

Polyvia API

Polyvia Platform

How teams put Polyvia to work

Data-Room Due Diligence

Cross-Filing KPI Comparison

Counterparty Credit Monitoring

Image-Based Claim Processing

Advances over current solutions

Fast over 100+ multimodal docs

No need for extractors or PDF parsers

Data never leaves your systems

Production-ready from day one

Every answer traced back to source

From 5 files to 100K+ documents

Works with your stack

Every unstructured, visual and multimodal input

Charts

Complex tables

Infographics

Slides & reports

Scans, Invoices & Handwriting

Pictures

Text

Audio & Transcripts

Video

Molecular & Chemical

CAD & Drawings

Geospatial & Heatmaps

Pinpoint precision in every format

Cited to the paragraph

Read the chart, not the caption

Bounding-box precision

Cited to the second, not the recording

Polyvia Engine

VLM Visual Extractor

Multimodal Knowledge Ontology

Self-Improving Retrieval Agent

Start building with Polyvia

Multimodal Document
Retrieval API