How Property Records Are Merged
For each record we collect, we generate 1 or more keys
for the record. Each key value is based on different unique identifiers that are available from the record's data. If we see a different record with 1 or more of the same keys
values, we will merge these two records.
For example, we may generate a property record like this when crawling a web page:
{
"address": "123 Anywhere St",
"city": "Austin",
"province": "TX",
"country": "US",
"numBedroom": 3,
"numBathroom": 3
}
This record will generate the following keys
:
"keys": [
"US/TX/Austin/123AnywhereSt"
]
Let's say we then crawl another web page for the same product and generate this data:
{
"address": "123 Anywhere St",
"city": "Austin",
"province": "TX",
"country": "US",
"neighborhoods": [
"Rolling Hills",
]
}
This record will generate the same keys
value as the previous record, so the two records will be merged together. The resulting record will be:
{
"address": "123 Anywhere St",
"city": "Austin",
"province": "TX",
"country": "US",
"neighborhoods": [
"Rolling Hills",
],
"numBathroom": 3,
"numBedroom": 3
}
Property records use the following fields to generate keys:
address
city
country
province
taxID
mlsNumber
taxID
and mlsNumber
are used in conjunction with province
.
What Happens When Datafiniti Finds Conflicting Data
Datafiniti is always looking for the most up to date property data. In doing so, we might find new or updated data from the source of a property list. Sometimes this data could vary from the existing data record in Datafiniti's database. In this case we have a data validator that will determine what data gets appended or overwritten. This process is usually determined by the nature of the top level schema field.
Appended Fields
Appended Fields are meant to serve as a history of data about the property itself. When a Datafiniti finds a conflict that shows the source of the data has updated information about the property, we will append this data to the following schema fields as an array.
brokers
deposits
descriptions
domains
features
fees
imageURLs
languagesSpoken
leasingTerms
managedBy
parking
paymentTypes
people
phones
prices
propertyTaxes
reviews
statuses
Example:
When the Datafiniti's Scraper detects a change in the price of the record of 123 Anywhere St, Datafiniti will have the new price and isSold sale status to the prices array.
"address": "123 Anywhere St",
"country": "US",
"dateAdded": "2022-04-26T05:45:22Z",
"dateUpdated": "2022-06-25T23:59:53Z",
"prices": [
{
"amountMax": 389000,
"amountMin": 389000,
"availability": "true",
"currency": "USD",
"dateSeen": [
"2022-04-26T05:45:22Z"
],
"isSold": "false"
}
]
"address": "123 Anywhere St",
"country": "US",
"dateAdded": "2022-04-26T05:45:22Z",
"dateUpdated": "2022-07-30T05:45:22Z",
"prices": [
{
"amountMax": 389000,
"amountMin": 389000,
"availability": "true",
"currency": "USD",
"dateSeen": [
"2022-07-30T05:45:22Z"
],
"isSold": "false"
},
{
"amountMax": 500000,
"amountMin": 500000,
"availability": "true",
"currency": "USD",
"dateSeen": [
"2022-07-30T05:45:22Z"
],
"isSold": "false"
}
]
DateUpdated
Please note that any change (appended or Overwritten) will update the dataUpdated field in the record to signify when the change took place.
Overwritten Fields
Overwritten Fields are used as a quick source of convenient data that serves as a fluid all ways changing field. This is determined upon the most recent scrape of the data record source. In cases where Datafiniti has determine the source of the data is correct we will update the following fields where possible:
dateUpdated
floorSizeValue
listingName
lotSizeValue
mostRecentStatus
mostRecentStatusDate
mostRecentStatusFirstDateSeen
numBathroom
numBedroom
numFloor
numPeople
numUnit
mostRecentStatusFirstDateSeen
numBathroom
numBedroom
numFloor
Updated almost 2 years ago