2019.8 - Data Release Log

Overview

This month focused heavily on revisions to the schema and ensuring the conversions necessary for those revisions ran smoothly.

Record Counts

The following are the updated record counts for each vertical:

  • Business data increased by 3.1 million to 90.1 million
  • People data increased by 0.1 million to 9.5 million
  • Product data increased by 5.2 million to 129.3 million
  • Property data increased by 7.2 million to 32.2 million

New Sources

The following data types have received new sources:

  • 3 Business data sources
  • 1 Product data source

Schema Changes

  • We're still continuing our transition from crawlResultFiles to crawlIDs.
  • The Product data received several new fields relating to upc's and ean's. upca, upce, ean13, ean8, and gtins were all created in an effort to assist with searching for particular products. The original fields of upc and ean will continue to be supported until they can be properly deprecated.
  • dateSeen for several nested JSON object fields will be set to a single value depicting the latest date that object was seen by our crawlers. In the future the name of this field will be changed to dateUpdated
  • Business and Property data contain a new field called geoLocation which combines latitude and longitude into a single value. This will be used in the future for queries involving relative distances.

Fixes

  • Categories and features were being populated with bad seller ranking data, removal is currently in-progress.
  • The name field in our Product data would occasionally include "Details about" which is not part of the product.
  • Updated 1 Business data source, 4 Property data sources, and 2 Product data sources