Data Sources

Vera combines multiple sources for each enrichment to maximize data quality and coverage.

Sources

Website Scraping (`website`)

Extracts structured data directly from company websites: name, description, contact info, social links, services, and location from page content, meta tags, and schema markup.

Priority 1 — First-party, most authoritative source. Limited by site quality and content availability.

LinkedIn (`linkedin`)

Extracts company data from LinkedIn pages: industry classification, employee count, headquarters, description, founded year, and website URL.

Priority 2 — Standardized data from a verified platform. Requires the company to maintain their LinkedIn page.

AI Inference (`ai`)

AI models analyze all collected data to infer: value proposition, target customers, revenue model, business stage, competitors, and keywords.

Priority 3 — Fills gaps with strategic insights. Lower confidence than direct sources; requires sufficient input data.

Enrichment Pipeline

WEBSITE SCRAPING    Domain -> HTML -> structured data
LINKEDIN            Find company page -> extract fields
AI ANALYSIS         All data -> infer missing fields + insights
MERGE & VALIDATE    Combine -> resolve conflicts -> confidence score

When sources conflict, website data wins over LinkedIn, which wins over AI inference.

Confidence Scoring

The confidence score (0-100) reflects data certainty:

Score	Quality	Meaning
90-100	Excellent	Multiple sources confirmed
70-89	Good	Reliable primary source data
50-69	Moderate	Some fields may be inferred
Below 50	Low	Limited data, use with caution

Higher confidence: active website with rich content, maintained LinkedIn page, cross-source agreement, schema.org markup present.

Lower confidence: minimal website, no LinkedIn, conflicting sources, heavy AI inference.

Data Freshness

Trigger	What Happens
First enrichment	Full collection from all sources
Re-enrichment	Fresh collection, replaces old data
Scheduled refresh	Low-confidence companies re-enriched automatically

The enrichedAt timestamp shows when data was collected. Re-enrich every 30-90 days to keep data current.

Field Availability

Field	Availability	Primary Source
`name`	99%+	Website
`description`	90%+	Website/LinkedIn
`industry`	85%+	LinkedIn
`employeeCount`	75%+	LinkedIn
`location`	80%+	LinkedIn/Website
`socialProfiles`	70%+	Website
`valueProposition`	85%+	AI
`competitors`	70%+	AI
`fundingStatus`	40%+	AI/LinkedIn

Fields return null when information isn't publicly available or can't be reliably inferred.

Privacy & Compliance

Public data only — no personal employee data or private financials
Compliant with GDPR and CCPA
Use enriched data only for legitimate business purposes

Next Steps

Company Schema — All available fields
Enrichment Guide — How to enrich companies

Sources​

Website Scraping (website)​

LinkedIn (linkedin)​

AI Inference (ai)​

Enrichment Pipeline​

Confidence Scoring​

Data Freshness​

Field Availability​

Privacy & Compliance​

Next Steps​