Property Data with Python and JSON
For this guide, we're going to assume you're interested in using Datafiniti's property data to do some marketing analysis on homes in the US. Let's say you're a data scientist that's been tasked with the following:
- Collect data on homes.
- Sort the data by state.
- Find which states have the most expensive homes.
Your environment and data needs:
- You're working with Python.
- You want to work with JSON data.
Here are the steps we'll take:
Note that we are using Python 3 for the examples below.
1. Install the requests module for Python
In your terminal, run the following to install the requests
module for Python:
pip3 install requests
2. Get your API token
The next thing you'll need is your API token. The API token lets you authenticate with Datafiniti API and tells it who you are, what you have access to, and so on. Without it, you can't use the API.
To get your API token, go the Datafiniti Web Portal (https://portal.datafiniti.co), login, and click on your account name and the top-right. From there, you'll see a link to the "My Account" page, which will take you to a page showing your token. Your API token will be a long string of letters and numbers. Copy the API token or store it somewhere you can easily reference.
For security reasons, your API token will be automatically changed whenever you change your password.
For the rest of this document, we'll use
AAAXXXXXXXXXXXX
as a substitute example for your actual API token when showing example API calls.
.
3. Run your first search
The first thing we'll do is do a test search that will give us a sense for what sort of data might be available. Eventually we'll refine our search so that we get back the most relevant data.
Since we want homes in the US, let's try a simple search that will just give us online listings for US-based properties.
Write the following code in your code editor (replace the dummy API token with your real API token):
# Illustrates an API call to Datafiniti's Product Database for hotels.
import requests
import urllib.parse
import json
# Set your API parameters here.
API_token = 'AAAXXXXXXXXXXXX'
format = 'JSON'
query = 'country:US'
num_records = 1
download = False
request_headers = {
'Authorization': 'Bearer ' + API_token,
'Content-Type': 'application/json',
}
request_data = {
'query': query,
'format': format,
'num_records': num_records,
'download': download
}
# Make the API call.
r = requests.post('https://api.datafiniti.co/v4/properties/search',json=request_data,headers=request_headers);
# Do something with the response.
if r.status_code == 200:
print(r.content)
else:
print('Request failed')
You should get a response similar to this:
{
"num_found": 7983205,
"total_cost": 1,
"records": [
{
"address": "711 Kent Ave",
"brokers": [
{
"agent": "Raj Singh",
"company": "YOUR REALTY INC.",
"dateSeen": [
"2016-06-06T18:09:28Z"
],
}
],
"city": "Catonsville",
"country": "US",
"dateAdded": "2016-06-06T18:09:28Z",
"features": [
{
"key": "Air Conditioning",
"value": [
"Heat Pumps"
]
},
{
"key": "Sewer Type",
"value": [
"Public"
]
}
],
"latitude": "39.284462",
"listingName": "711 Kent Ave, Catonsville, Md 21228",
"longitude": "-76.734069",
"lotSizeValue": 0.16,
"lotSizeUnit": "Acres",
"mlsNumber": "BC9677283",
"numBathroom": 2,
"numBedroom": 4,
"postalCode": "21228",
"prices": [
{
"amountMax": 199900,
"amountMin": 199900,
"currency": "USD",
"dateSeen": [
"2016-08-08T00:00:00Z",
"2016-08-03T00:00:00Z"
],
"isSale": "false",
},
{
"amountMax": 212000,
"amountMin": 212000,
"currency": "USD",
"dateSeen": [
"2016-06-06T00:00:00Z"
],
"isSale": "false",
}
],
"propertyTaxes": [
{
"amount": 3195,
"currency": "USD",
"dateSeen": [
"2016-06-06T18:09:28Z"
],
}
],
"propertyType": "Single Family Dwelling",
"province": "MD",
"statuses": [
{
"dateSeen": [
"2016-08-09T09:16:10Z"
],
"isUnderContract": "false",
"type": "For Sale"
}
],
"id": "AV9WzHyO_RWkykBuv11F"
}
]
]
Let's break down each of the parameters we sent in our request:
API Call Component | Description |
---|---|
"query": "country:US" | query tells the API what you want to search. In this case, you're telling the API you want to search by country . Any property in the US will be returned. |
"num_records": 1 | num_records tells the API how many records to return in its response. In this case, you just want to see 1 matching record. |
Now let's dive through the response the API returned:
Response Field | Description |
---|---|
"num_found" | The total number of available records in the database that match your query. If you end up downloading the entire data set, this is how many records you'll use. |
"total_cost" | The number of credits this request has cost you. Property records only cost 1 credit per record. |
"records" | The first available matches to your query. If there are no matches, this field will be empty. Within each record returned, you'll see multiple fields shown. This is the data for each record. |
Within the records
field, you'll see a single property returned with multiple fields and the values associated with that property. The JSON response will show all fields that have a value. It won't show any fields that don't have a value.
Each property record will have multiple fields associated with it. You can see a full list of available fields in our Property Data Schema.
4. Refine your search
If you think about the original query we made, you'll realize we didn't really specify we only wanted homes for sale. There are several other types of properties (e.g., commercial, rentals) that may also be in the data. Since we only want homes for sale, we should narrow our search appropriately. Modify your code to look like this:
# Illustrates an API call to Datafiniti's Product Database for hotels.
import requests
import urllib.parse
import json
# Set your API parameters here.
API_token = 'AAAXXXXXXXXXXXX'
format = 'JSON'
query = 'country:US AND propertyType:"Single Family Dwelling"'
num_records = 10
download = False
request_headers = {
'Authorization': 'Bearer ' + API_token,
'Content-Type': 'application/json',
}
request_data = {
'query': query,
'format': format,
'num_records': num_records,
'download': download
}
# Make the API call.
r = requests.post('https://api.datafiniti.co/v4/properties/search',json=request_data,headers=request_headers);
# Do something with the response.
if r.status_code == 200:
print(r.content)
else:
print('Request failed')
This code is different in a couple ways:
- It adds
AND propertyType:"Single Family Dwelling"
to narrow down results to just US hotels. - It changes
records=1
torecords=10
so we can look at more sample matches.
Datafiniti lets you construct very refined boolean queries. If you wanted to do more complicated searches, you could use OR operations, negation, and more.
If you would like to narrow your search to just exact matches you can place the search term in quotation marks.
5. Initiate a download of the data
Once we like what we see from the sample matches, it's time to download a larger data set! To do this, we're going to update our code a fair bit (an explanation follows):
# Illustrates an API call to Datafiniti's Product Database for hotels.
import requests
import urllib.parse
import json
import time
# Set your API parameters here.
API_token = 'AAAXXXXXXXX'
format = 'JSON'
query = 'country:US AND propertyType:"Single Family Dwelling"'
num_records = 50
download = True
request_headers = {
'Authorization': 'Bearer ' + API_token,
'Content-Type': 'application/json',
}
request_data = {
'query': query,
'format': format,
'num_records': num_records,
'download': download
}
# Make the API call.
r = requests.post('https://api.datafiniti.co/v4/properties/search',json=request_data,headers=request_headers);
# Do something with the response.
if r.status_code == 200:
request_response = r.json()
print(request_response)
# Keep checking the request status until the download has completed
download_id = request_response['id']
download_status = request_response['status']
while (download_status != 'completed'):
time.sleep(5)
download_r = requests.get('https://api.datafiniti.co/v4/downloads/' + str(download_id),headers=request_headers);
download_response = download_r.json()
download_status = download_response['status']
print('Records downloaded: ' + str(download_response['num_downloaded']))
# Once the download has completed, get the list of links to the result files and download each file
if download_status == 'completed':
result_list = download_response['results']
i = 1;
for result in result_list:
filename = str(download_id) + '_' + str(i) + '.' + format
urllib.request.urlretrieve(result,filename)
print('File: ' + str(i) + ' out of ' + str(len(result_list)) + ' saved: ' + filename)
i += 1
else:
print('Request failed')
print(r)
A couple things to pay attention to in the above code:
- We changed
num_records
from10
to50
. This will download the first 50 matching records. If we wanted to download all matching records, we would removenum_records
.num_records
will tell the API to default to all available records. - We changed
download
fromfalse
totrue
.
If num_records is not specified, ALL of the records matching the query will be downloaded.
Since we've handled multiple steps of the download process in this code, we won't go into the details here, but we do recommend you familiarize yourself with those steps. Checking them out in our Property Data with Postman and CSV guide.
When using the API, you will not receive any warning if you are going past your monthly record limit. Keep a track on how many records you have left by checking your account. You are responsible for any overage fees if you go past your monthly limit.
6. Parse the JSON data
The download code will save one or more result files to your project folder.
The JSON data will actually be a text file, instead of a single JSON object. Each line in the text file is a JSON object. We format the data this way because most programming languages won't handle parsing the entire data set as a JSON object with their standard system calls very well.
We'll need to parse the file into an array of JSON objects. We can use code similar to this to handle the parsing:
import json
# Set the location of your file here
filename = 'xxxx_x.txt'
records = []
with open(filename) as myFile:
for line in myFile:
records.append(json.loads(line))
for record in records:
# Edit these lines to do more with the data
print json.dumps(record, indent=4, sort_keys=True)
You can edit the code in the for
loop above to do whatever you'd like with the data, such as store the data in a database, writing it out to your console, etc.
7. Using parsed data
For this example, we will utilize the following query.
query = 'name:Nike React Flyknit'
Now we will create a python script to parse through the data using the technique in step 6
# Illustrates an API call to Datafiniti's Product Database.
import requests
import urllib.parse
import json
# Set your API parameters here.
API_token = 'API_key_here'
format = 'JSON'
query = 'name:Nike React Flyknit'
num_records = 1
download = False
request_headers = {
'Authorization': 'Bearer ' + API_token,
'Content-Type': 'application/json',
}
request_data = {
'query': query,
'format': format,
'num_records': num_records,
'download': download
}
# Make the API call.
r = requests.post('https://api.datafiniti.co/v4/products/search',json=request_data,headers=request_headers);
# Do something with the response.
if r.status_code == 200:
print(r.content)
else:
print('Request failed')
#parse JSON response to select specific data fields.
parsedR = json.loads(r.content)
record = parsedR["records"][0]
#set price to mostRecentPriceAmount field
price = record["mostRecentPriceAmount"]
print("Your most recent price is: " + str(price))
#set color to colors in the data record
color = record["colors"]
print("Your color is: " + color)
#set where it is sold from
soldFrom = record["mostRecentPriceSourceURL"]
print("sold here: " + soldFrom)
Updated 7 months ago