2019.7 - Data Release Log

Overview

This is the initial release log with the goal of tracking changes to the data. This will include record count updates, new sources added, any changes to our schemas, and finally any issues or fixes addressed.

Record Counts

The following are the updated record counts for each vertical (in millions):

  • Business data increased by 3.5 million to 87 million
  • People data increased by 1.1 million to 9.4 million
  • Product data increased by 1.1 million to 124.1 million
  • Property data increased by 1.2 million to 25 million

New Sources

The following data types have received new sources:

  • 2 Business data sources
  • 1 Property data source
  • 2 Product data sources

Schema Changes

  • Our People data received a new field, linkedinURL, which contains a single value of a LinkedIn url for a given person. primaryImageURLs has been added to our Product data. This will track the single main image for a list of various domains.
  • We're still continuing our transition from crawlResultFiles to crawlIDs.

Fixes

  • Our property data was being erroneously updated during any data modifications or observations. This has been changed and going forward records will only receive an update to dateUpdated when new source data is received.
  • There were URLs containing spam adds, notably containing "doubleclick" in the URL and they have been removed.
  • Another URL issue involved API re-directs in which "citymediagrid" was being used as the URL in place of the actual URL where the data was held. Those re-directs have been removed and when possible replaced with the actual URL containing the data.
  • Updated 2 Property data sources, 2 Product data sources, and 1 Business data source