Skip to main content

Data Sources

Vera combines multiple sources for each enrichment to maximize data quality and coverage.


Sources

Website Scraping (website)

Extracts structured data directly from company websites: name, description, contact info, social links, services, and location from page content, meta tags, and schema markup.

Priority 1 — First-party, most authoritative source. Limited by site quality and content availability.

LinkedIn (linkedin)

Extracts company data from LinkedIn pages: industry classification, employee count, headquarters, description, founded year, and website URL.

Priority 2 — Standardized data from a verified platform. Requires the company to maintain their LinkedIn page.

AI Inference (ai)

AI models analyze all collected data to infer: value proposition, target customers, revenue model, business stage, competitors, and keywords.

Priority 3 — Fills gaps with strategic insights. Lower confidence than direct sources; requires sufficient input data.


Enrichment Pipeline

1. WEBSITE SCRAPING    Domain -> HTML -> structured data
2. LINKEDIN Find company page -> extract fields
3. AI ANALYSIS All data -> infer missing fields + insights
4. MERGE & VALIDATE Combine -> resolve conflicts -> confidence score

When sources conflict, website data wins over LinkedIn, which wins over AI inference.


Confidence Scoring

The confidence score (0-100) reflects data certainty:

ScoreQualityMeaning
90-100ExcellentMultiple sources confirmed
70-89GoodReliable primary source data
50-69ModerateSome fields may be inferred
Below 50LowLimited data, use with caution

Higher confidence: active website with rich content, maintained LinkedIn page, cross-source agreement, schema.org markup present.

Lower confidence: minimal website, no LinkedIn, conflicting sources, heavy AI inference.


Data Freshness

TriggerWhat Happens
First enrichmentFull collection from all sources
Re-enrichmentFresh collection, replaces old data
Scheduled refreshLow-confidence companies re-enriched automatically

The enrichedAt timestamp shows when data was collected. Re-enrich every 30-90 days to keep data current.


Field Availability

FieldAvailabilityPrimary Source
name99%+Website
description90%+Website/LinkedIn
industry85%+LinkedIn
employeeCount75%+LinkedIn
location80%+LinkedIn/Website
socialProfiles70%+Website
valueProposition85%+AI
competitors70%+AI
fundingStatus40%+AI/LinkedIn

Fields return null when information isn't publicly available or can't be reliably inferred.


Privacy & Compliance

  • Public data only — no personal employee data or private financials
  • Compliant with GDPR and CCPA
  • Use enriched data only for legitimate business purposes

Next Steps