2020.7 - Data Release Log

Overview

Short absence on the release log but major changes on the way. New pipelines have been established to make more consistent data improvements. Classification is now operating. Normalization of fields such as those pertaining to addresses or urls is running consistently with more plans in the works.

Record Counts

The following are the updated record counts for each vertical:

  • Business data decreased by 1.6 million to 105.5 million
  • People data increased by 1.1 million to 13.8 million
  • Product data increased by 28.3 million to 213.5 million
  • Property data increased by 34.3 million to 142.4 million

New Sources

The following data types have received new sources:

  • 1 Product data source
  • 3 Property data sources

Schema Changes

  • taxonomy field is now live
  • Taxonomy level fields for more granular queries is now live as well
  • domains field now useable in lieu of sourceURLs for more generic source queries

Fixes

  • Removed improperly gathered data from CarGurus
  • Handled issues within People data where address values were invalid
  • VRBO data that was poorly collected removed
  • Updated 8 Business data sources, 3 Product data sources, and 3 Property data sources