2019.7 - Data Release Log
over 5 years ago by Mark Hess
Overview
This is the initial release log with the goal of tracking changes to the data. This will include record count updates, new sources added, any changes to our schemas, and finally any issues or fixes addressed.
Record Counts
The following are the updated record counts for each vertical (in millions):
- Business data increased by 3.5 million to 87 million
- People data increased by 1.1 million to 9.4 million
- Product data increased by 1.1 million to 124.1 million
- Property data increased by 1.2 million to 25 million
New Sources
The following data types have received new sources:
- 2 Business data sources
- 1 Property data source
- 2 Product data sources
Schema Changes
- Our People data received a new field,
linkedinURL
, which contains a single value of a LinkedIn url for a given person.primaryImageURLs
has been added to our Product data. This will track the single main image for a list of various domains. - We're still continuing our transition from
crawlResultFiles
tocrawlIDs
.
Fixes
- Our property data was being erroneously updated during any data modifications or observations. This has been changed and going forward records will only receive an update to
dateUpdated
when new source data is received. - There were URLs containing spam adds, notably containing "doubleclick" in the URL and they have been removed.
- Another URL issue involved API re-directs in which "citymediagrid" was being used as the URL in place of the actual URL where the data was held. Those re-directs have been removed and when possible replaced with the actual URL containing the data.
- Updated 2 Property data sources, 2 Product data sources, and 1 Business data source