2019.8 - Data Release Log
about 5 years ago by Mark Hess
Overview
This month focused heavily on revisions to the schema and ensuring the conversions necessary for those revisions ran smoothly.
Record Counts
The following are the updated record counts for each vertical:
- Business data increased by 3.1 million to 90.1 million
- People data increased by 0.1 million to 9.5 million
- Product data increased by 5.2 million to 129.3 million
- Property data increased by 7.2 million to 32.2 million
New Sources
The following data types have received new sources:
- 3 Business data sources
- 1 Product data source
Schema Changes
- We're still continuing our transition from
crawlResultFiles
tocrawlIDs
. - The Product data received several new fields relating to upc's and ean's.
upca
,upce
,ean13
,ean8
, andgtins
were all created in an effort to assist with searching for particular products. The original fields ofupc
andean
will continue to be supported until they can be properly deprecated. dateSeen
for several nested JSON object fields will be set to a single value depicting the latest date that object was seen by our crawlers. In the future the name of this field will be changed todateUpdated
- Business and Property data contain a new field called
geoLocation
which combineslatitude
andlongitude
into a single value. This will be used in the future for queries involving relative distances.
Fixes
- Categories and features were being populated with bad seller ranking data, removal is currently in-progress.
- The name field in our Product data would occasionally include "Details about" which is not part of the product.
- Updated 1 Business data source, 4 Property data sources, and 2 Product data sources