Business Data with cURL and CSV
For this guide, we're going to assume you're interested in using Datafiniti's business data to do some marketing analysis on hotels. Let's say you're a data scientist that's been tasked with the following:
- Collect data on hotels.
- Sort the data by state.
- Find which states have the most hotels.
Your environment and data needs:
- You're working with cURL.
- You want to work with CSV data.
Here are the steps we'll take:
1. Open your terminal
If you want to use cURL to access the Datafiniti API, we're assuming you have access to a standard, Linux-based terminal. Open a terminal session to get started.
2. Get your API token
The next thing you'll need is your API token. The API token lets you authenticate with Datafiniti API and tells it who you are, what you have access to, and so on. Without it, you can't use the API.
To get your API token, go the Datafiniti Web Portal (https://portal.datafiniti.co), login, and click on your account name and the top-right. From there, you'll see a link to the "My Account" page, which will take you to a page showing your token. Your API token will be a long string of letters and numbers. Copy the API token or store it somewhere you can easily reference.
For the rest of this document, we'll use
AAAXXXXXXXXXXXX
as a substitute example for your actual API token when showing example API calls.
3. Run your first search
The first thing we'll do is do a test search that will give us a sense for what sort of data might be available. Eventually we'll refine our search so that we get back the most relevant data.
Since we want hotels, let's try a simple search that will just give us online listings for hotels.
Enter the following into your terminal (replace the dummy API token with your real API token):
curl --user AAAXXXXXXXXXXXX: 'https://api.datafiniti.co/v3/data/businesses?view=businesses_all&format=JSON&q=categories:hotels&records=1&download=false'
You should get a response similar to this (although it may not look as pretty in your terminal):
{
"estimated total": 139666,
"records": [
{
"address": "7030 Amin Dr",
"categories": [
"Hotels"
],
"city": "Chattanooga",
"country": "US",
"dateAdded": "2016-11-04T23:50:14Z",
"dateUpdated": "2016-11-04T23:50:14Z",
"descriptions": [
{
"dateSeen": [
"2016-11-09T19:26:53Z"
],
"sourceURLs": [
"http://www.hotels.com/ho141351/?locale=en_US&pos=HCOM_US"
],
"value": "No-frills hotel in Hamilton Place with health club"
}
],
"features": [
{
"key": "Services",
"value": [
"24-hour front desk, Dry cleaning/laundry service, Laundry facilities, Free newspapers in lobby"
]
},
{
"key": "Nearby Attractions",
"value": [
"[In Hamilton Place, Hamilton Place Mall (1.1 mi / 1.7 km), Dragon Dreams Museum (2.7 mi / 4.4 km), Tennessee Valley Railroad Museum (3.6 mi / 5.8 km), Concord Golf Club (3.8 mi / 6 km), Brown Acres Golf Course (4 mi / 6.5 km)]"
]
}
],
"imageURLs": [
"https://exp.cdn-hotels.com/hotels/1000000/120000/117400/117332/117332_95_n.jpg",
"https://exp.cdn-hotels.com/hotels/1000000/120000/117400/117332/117332_107_n.jpg"
],
"keys": [
"us/tn/chattanooga/7030amindr/1104338646"
],
"latitude": "35.04281",
"longitude": "-85.158",
"name": "Mainstay Suites Chattanooga",
"numRoom": 77,
"phones": [
"8004916126"
],
"postalCode": "37421",
"province": "TN",
"reviews": [
{
"date": "2013-11-11T00:00:00Z",
"dateAdded": "2016-11-04T23:50:14Z",
"dateSeen": [
"2016-11-06T00:00:00Z",
"2016-08-12T00:00:00Z"
],
"rating": 2,
"sourceURLs": [
"https://www.hotels.com/hotel/141351/reviews%20/"
],
"text": "Hotel was ok. Room not as up to date as I expected.",
"title": "Was ok",
"username": "A Traveler"
},
{
"date": "2014-11-03T00:00:00Z",
"dateAdded": "2016-11-04T23:50:14Z",
"dateSeen": [
"2016-08-09T00:00:00Z",
"2016-08-27T00:00:00Z",
"2016-07-18T00:00:00Z"
],
"rating": 3,
"sourceURLs": [
"https://www.hotels.com/hotel/141351/reviews%20/"
],
"text": "It was adequate for our one night stay there. Staff was very friendly and the room was clean but not very big. I would recommend to someone for a very short stay.",
"username": "scott"
}
],
"sourceURLs": [
"http://www.hotels.com/ho141351/?locale=en_US&pos=HCOM_US",
"https://www.hotels.com/hotel/141351/reviews%20/"
],
"id": "AVwcsllU_7pvs4fzx-yW"
}
]
Let's break down what the API call is all about:
API Call Component | Description |
---|---|
https:// | This is the communication protocol used by the API. It's the same one used when you visit a secure website. |
AAAXXXXXXXXXXXX:@ | Your API token. You're telling the API who you are so it can respond to your request. |
api.datafiniti.co | The location of the API. |
/v3 | You're telling the API which version to use. v3 is our most recent and current API version. |
/data | You're telling the API you're interested in querying data. |
/businesses | Specifically, you're interested in business data. |
view=businesses_all | The view tells the API in which fields you want your response. businesses_all will show all available fields in a record. |
format=JSON | The format tells the API which data format you want to see. You can set it to JSON or CSV . |
q=categories:hotels | The q tells the API what query you want to use. In this case, you're telling the API you want to search by categories . Any business that has hotels listed in its categories field will be returned. |
records=1 | The records tells the API how many records to return in the its response. In this case, you just want to see 1 matching record. |
download=false | The download tells the API if you want to initiate a download request or not. Setting it to false means you don't, so it will show the matching records immediately in the response. |
Now let's dive through the response the API returned:
Response Field | Description |
---|---|
"estimated_total" | The total number of available records in the database that match your query. If you end up downloading the entire data set, this is how many records you'll use. |
"records" | The first available matches to your query. For most queries, you'll see 1 to 10 example records. If there are no matches, this field will be empty. Within each record returned, you'll see multiple fields shown. This is the data for each record. |
Within the records
field, you'll see a single business returned with multiple fields and their values associated with that hotel. The JSON response will show all fields that have a value. It won't show any fields that don't have a value.
Each business record will have multiple fields associated with it. You can see a full list of available fields in our Business Data Schema.
4. Refine your search
If you think about the original query we made, you'll realize we didn't really specify which geography we're interested in. Since we only want US hotels, we should narrow our search appropriately.
We'll need to refine our search to make sure we're only getting hotels. To do that, we can add additional filters to the q
parameter to narrow down the results. For example:
https://AAAXXXXXXXXXXXX:@api.datafiniti.co/v3/data/businesses?view=businesses_all&format=JSON&q=categories:hotels AND country:US&records=10&download=false
This API call is different in a couple ways:
- It adds
AND country:US
to narrow down results to just US hotels. - It changes
records=1
torecords=10
so we can look at more sample matches.
Datafiniti lets you construct very refined boolean queries. If you wanted to do more complicated searches, you could OR operations, negation, and more.
You can use the following cURL command to run this API call:
curl --user AAAXXXXXXXXXXXX: 'https://api.datafiniti.co/v3/data/businesses?view=businesses_all&format=JSON&q=categories:hotels+AND+country:US&records=10&download=false'
5. Initiate a full download of the data
Once we like what we see from the sample matches, it's time to download the entire data set! To do this, we're going to make three changes to our API call:
- Change
format=JSON
toformat=CSV
. We'll want to view the files in CSV format, so it will be easier to look at. - Remove
&records=10
. - Change
businesses_all
tobusinesses_all_nested
. This will nest fields like categories and features into a single cell (instead splitting them across multiple rows and columns). - Change
download=false
todownload=true
.
After those changes, the cURL command looks like:
curl --user AAAXXXXXXXXXXXX: 'https://api.datafiniti.co/v3/data/businesses?view=businesses_all_nested&format=CSV&q=categories:hotels+AND+country:US&download=true'
After running this command, you'll see something like:
[
{
"id": 6073,
"records": "",
"query": "categories:hotels AND country:US",
"dataType": "businesses",
"dataFormat": "csv",
"accountId": "AAAXXXXXXXXXXXX",
"planId": "free",
"view": "businesses_all_nested",
"async": 1,
"status": "STARTED",
"error": "",
"numDownloaded": 0,
"numFound": 95378,
"numRequested": 95378,
"date_started": "2017-1-11 15:52:42",
"date_completed": "2017-1-11 15:52:43"
}
]
Here's what these fields mean:
Response Field | Description |
---|---|
id | This is a unique identifier for the request. |
records | |
query | The query you ran. |
dataType | The data type you queried. |
dataFormat | The data format you requested. |
accountId | Your API token. |
planId | The Datafiniti plan level you're using. |
view | The view you chose. This determines which fields are shown in the data. |
async | Should always be set to 1 . This is an alias for the download parameter. |
status | The current progress of the request. As the download request is running, this will be set to STARTED . When it's done, it will be set to COMPLETED . |
error | An error message in case anything went wrong with the download. Hopefully this is blank! |
numDownloaded | The number of records that have been downloaded so far. |
numFound | The number of total records in Datafiniti that matched your query. |
numRequested | The number of records you requested to download. |
date_started | The date and time the download started. |
date_completed | The date and time the download finished. |
6. Monitor the status of the download
As the download request runs, you can check on its status by using this command:
https://AAAXXXXXXXXXXXX:@api.datafiniti.co/v3/requests/XXXX
curl --user AAAXXXXXXXXXXXX: 'https://api.datafiniti.co/v3/requests/XXXX'
You'll want to replace XXXX
with the id
value for your request. If you keep running the above command, you'll see numDownloaded
gradually increase.
7. Download the result file(s)
Once the download finishes, the /requests/
call will show COMPLETED
for status
. You can then get links to the data files by running this command:
curl --user AAAXXXXXXXXXXXX: 'https://api.datafiniti.co/v3/results/XXXX'
(Again, replace XXXX
with the actual request id.) The response will look similar to:
[
{
"id": 120138,
"url": "http://datafiniti-downloads-qa.s3.amazonaws.com/AAAXXXXXXXXXXXX/6073_1.csv?AWSAccessKeyId=AKIAIXQMCWHOZB3O35SA&Signature=2YtBsW9xY8CZrWDECcdLzyx4Jlk%3D&Expires=1484754763",
"request_id": 6073,
"downloaded": 0,
"imported": 0,
"date_created": "2017-1-11"
}
]
If you've requested a lot of records (i.e., over 10,000), you may see more than 1 result object shown.
To download the result files, copy each url
value run a command like:
curl 'http://datafiniti-downloads-qa.s3.amazonaws.com/AAAXXXXXXXXXXXX/6073_1.csv?AWSAccessKeyId=AKIAIXQMCWHOZB3O35SA&Signature=2YtBsW9xY8CZrWDECcdLzyx4Jlk%3D&Expires=1484754763' > output.csv
You'll probably want to rename output.csv
to something specific to this request.
8. Open the result file(s) in Excel
Navigate to the file you downloaded and open it. Since it's a CSV file, it should open in Excel automatically. It will look something like:

9. Analyze the results
Using Excel, we can easily count the total number of hotels on a state-by-state basis. The province
column gives us where each hotel is located, so we can use it to tally up the numbers.
Updated almost 8 years ago