How Product Records Are Merged
For each record we collect, we generate 1 or more keys
for the record. Each key value is based on different unique identifiers that are available from the record's data. If we see a different record with 1 or more of the same keys
values, we will merge these two records.
For example, we may generate a product record like this when crawling a web page:
{
"name": "Tectronics CD Player",
"manufacturer": "Tectronics",
"manufacturerNumber": "A342",
"upc": "014045125963"
}
This record will generate the following keys
:
"keys": [
"Tectronics/A432",
"014045125963"
]
Let's say we then crawl another web page for the same product and generate this data:
{
"name": "Tectronics CD Player",
"colors": [
"black",
"red"
],
"upc": "014045125963"
}
This record will generate the following keys
:
"keys": [
"014045125963"
]
When we import both of these records into our product database, the records will merge, since they share at least one key value. The resulting record will be:
{
"name": "Tectronics CD Player",
"colors": [
"black",
"red"
],
"manufacturer": "Tectronics",
"manufacturerNumber": "A342",
"keys": [
"Tectronics/A432",
"014045125963"
],
"upc": "014045125963"
}
Product records use the following fields to generate keys:
brand
ean
isbn
manufacturer
manufacturerNumber
upc
vin
websiteIDs
brand
and manufacturer
are always used in conjunction with manufacturerNumber
.
Updated about 6 years ago