2020.2 - Data Release Log

Overview

February saw the removal of a large quantity of erroneous data generated by our web crawling apparatus. Most of the data came from one particular source

Record Counts

The following are the updated record counts for each vertical:

  • Business data increased by 4.2 million to 97.7 million
  • People data stayed at 11.7 million
  • Product data decreased by 29.3 million to 156.3 million
  • Property data increased by 15.6 million to 86.4 million

New Sources

The following data types have received new sources:

  • 2 Business data sources
  • 2 Product data sources
  • 1 Property data sources

Schema Changes

  • asins field is now a single value
  • Legacy fields finally removed such as crawlResultFiles and automatedCollection
  • Additional paymentTypes have been added as well as acceptable hours

Fixes

  • Postal codes are being collected more accurately as issues regarding our
  • Just under 40 million product, and some business, records were removed due to erroneous data collected.
  • Updated 1 Business data sources, and 3 Product data sources