Data Sources
Vera combines multiple sources for each enrichment to maximize data quality and coverage.
Sources
Website Scraping (website)
Extracts structured data directly from company websites: name, description, contact info, social links, services, and location from page content, meta tags, and schema markup.
Priority 1 — First-party, most authoritative source. Limited by site quality and content availability.
LinkedIn (linkedin)
Extracts company data from LinkedIn pages: industry classification, employee count, headquarters, description, founded year, and website URL.
Priority 2 — Standardized data from a verified platform. Requires the company to maintain their LinkedIn page.
AI Inference (ai)
AI models analyze all collected data to infer: value proposition, target customers, revenue model, business stage, competitors, and keywords.
Priority 3 — Fills gaps with strategic insights. Lower confidence than direct sources; requires sufficient input data.
Enrichment Pipeline
1. WEBSITE SCRAPING Domain -> HTML -> structured data
2. LINKEDIN Find company page -> extract fields
3. AI ANALYSIS All data -> infer missing fields + insights
4. MERGE & VALIDATE Combine -> resolve conflicts -> confidence score
When sources conflict, website data wins over LinkedIn, which wins over AI inference.
Confidence Scoring
The confidence score (0-100) reflects data certainty:
| Score | Quality | Meaning |
|---|---|---|
| 90-100 | Excellent | Multiple sources confirmed |
| 70-89 | Good | Reliable primary source data |
| 50-69 | Moderate | Some fields may be inferred |
| Below 50 | Low | Limited data, use with caution |
Higher confidence: active website with rich content, maintained LinkedIn page, cross-source agreement, schema.org markup present.
Lower confidence: minimal website, no LinkedIn, conflicting sources, heavy AI inference.
Data Freshness
| Trigger | What Happens |
|---|---|
| First enrichment | Full collection from all sources |
| Re-enrichment | Fresh collection, replaces old data |
| Scheduled refresh | Low-confidence companies re-enriched automatically |
The enrichedAt timestamp shows when data was collected. Re-enrich every 30-90 days to keep data current.
Field Availability
| Field | Availability | Primary Source |
|---|---|---|
name | 99%+ | Website |
description | 90%+ | Website/LinkedIn |
industry | 85%+ | |
employeeCount | 75%+ | |
location | 80%+ | LinkedIn/Website |
socialProfiles | 70%+ | Website |
valueProposition | 85%+ | AI |
competitors | 70%+ | AI |
fundingStatus | 40%+ | AI/LinkedIn |
Fields return null when information isn't publicly available or can't be reliably inferred.
Privacy & Compliance
- Public data only — no personal employee data or private financials
- Compliant with GDPR and CCPA
- Use enriched data only for legitimate business purposes
Next Steps
- Company Schema — All available fields
- Enrichment Guide — How to enrich companies