Business Data with PHP and CSV
For this guide, we're going to assume you're interested in using Datafiniti's business data to do some marketing analysis on hotels. Let's say you're a data scientist that's been tasked with the following:
- Collect data on hotels.
- Sort the data by state.
- Find which states have the most hotels.
Your environment and data needs:
- You're working with PHP.
- You want to work with CSV data.
Here are the steps we'll take:
1. Open a code editor
If you want to use PHP to access the Datafiniti API, we're assuming you'll be using a standard code editor to write your PHP code. Open your code editor to get started.
2. Get your API token
The next thing you'll need is your API token. The API token lets you authenticate with Datafiniti API and tells it who you are, what you have access to, and so on. Without it, you can't use the API.
To get your API token, go the Datafiniti Web Portal (https://portal.datafiniti.co), login, and click your settings in the left navigation bar. From there, you'll see a page showing your token. Your API token will be a long string of letters and numbers. Copy the API token or store it somewhere you can easily reference.
For security reasons, your API token will be automatically changed whenever you change your password.
For the rest of this document, we'll use
AAAXXXXXXXXXXXX
as a substitute example for your actual API token when showing example API calls.
.
3. Run your first search
The first thing we'll do is do a test search that will give us a sense for what sort of data might be available. Eventually we'll refine our search so that we get back the most relevant data.
Since we want hotels, let's try a simple search that will just give us online listings for hotels.
Write the following code in your code editor (replace the dummy API token with your real API token):
<?php
/*
Illustrates an API call to Datafiniti's Business Database for hotels.
*/
$url = 'https://api.datafiniti.co/v4/businesses/search';
// Set your API parameters here.
$APIToken = 'AAAXXXXXXXXXXXX';
$format = 'JSON';
$query = 'categories:hotels';
$num_records = 1;
$download = false;
$request_body = array(
'query' => $query,
'format' => $format,
'num_records' => $num_records,
'download' => $download
);
$options = array(
'http' => array (
'header' => "Authorization: Bearer " . $APIToken . "\r\n" .
"Content-Type: application/json\r\n",
'method' => 'POST',
'content' => json_encode($request_body)
)
);
$context = stream_context_create($options);
$result = file_get_contents($url, false, $context);
if ($result === FALSE) {
echo "Error.";
} else {
var_dump($result);
}
?>
You should get a response similar to this:
Array
(
[num_found] => 139666
[total_cost] => 1
[records] => Array
(
[0] => Array
(
[address] => 7030 Amin Dr
[categories] => Array
(
[0] => Hotels
)
[city] => Chattanooga
[country] => US
[dateAdded] => 2016-11-04T23:50:14Z
[dateUpdated] => 2016-11-04T23:50:14Z
[descriptions] => Array
(
[0] => Array
(
[dateSeen] => Array
(
[0] => 2016-11-09T19:26:53Z
)
[sourceURLs] => Array
(
[0] => http://www.hotels.com/ho141351/?locale=en_US&pos=HCOM_US
)
[value] => No-frills hotel in Hamilton Place with health club
)
)
[features] => Array
(
[0] => Array
(
[key] => Services
[value] => Array
(
[0] => 24-hour front desk, Dry cleaning/laundry service, Laundry facilities, Free newspapers in lobby
)
)
[1] => Array
(
[key] => Nearby Attractions
[value] => Array
(
[0] => [In Hamilton Place, Hamilton Place Mall (1.1 mi / 1.7 km), Dragon Dreams Museum (2.7 mi / 4.4 km), Tennessee Valley Railroad Museum (3.6 mi / 5.8 km), Concord Golf Club (3.8 mi / 6 km), Brown Acres Golf Course (4 mi / 6.5 km)]
)
)
)
[imageURLs] => Array
(
[0] => https://exp.cdn-hotels.com/hotels/1000000/120000/117400/117332/117332_95_n.jpg
[1] => https://exp.cdn-hotels.com/hotels/1000000/120000/117400/117332/117332_107_n.jpg
)
[keys] => Array
(
[0] => us/tn/chattanooga/7030amindr/1104338646
)
[latitude] => 35.04281
[longitude] => -85.158
[name] => Mainstay Suites Chattanooga
[numRoom] => 77
[phones] => Array
(
[0] => 8004916126
)
[postalCode] => 37421
[province] => TN
[reviews] => Array
(
[0] => Array
(
[date] => 2013-11-11T00:00:00Z
[dateAdded] => 2016-11-04T23:50:14Z
[dateSeen] => Array
(
[0] => 2016-11-06T00:00:00Z
[1] => 2016-08-12T00:00:00Z
)
[rating] => 2
[sourceURLs] => Array
(
[0] => https://www.hotels.com/hotel/141351/reviews%20/
)
[text] => Hotel was ok. Room not as up to date as I expected.
[title] => Was ok
[username] => A Traveler
)
[1] => Array
(
[date] => 2014-11-03T00:00:00Z
[dateAdded] => 2016-11-04T23:50:14Z
[dateSeen] => Array
(
[0] => 2016-08-09T00:00:00Z
[1] => 2016-08-27T00:00:00Z
)
[rating] => 3
[sourceURLs] => Array
(
[0] => https://www.hotels.com/hotel/141351/reviews%20/
)
[text] => It was adequate for our one night stay there. Staff was very friendly and the room was clean but not very big. I would recommend to someone for a very short stay.
[username] => scott
)
)
[sourceURLs] => Array
(
[0] => http://www.hotels.com/ho141351/?locale=en_US&pos=HCOM_US
[1] => https://www.hotels.com/hotel/141351/reviews%20/
)
[id] => AVwcsllU_7pvs4fzx-yW
)
)
)
Let's break down each of the parameters we sent in our request:
API Call Component | Description |
---|---|
"query": "categories:hotels" | query tells the API what you want to search. In this case, you're telling the API you want to search by categories . Any business that has hotels listed in its categories field will be returned. |
"num_records": 1 | num_records tells the API how many records to return in its response. In this case, you just want to see 1 matching record. |
Now let's dive through the response the API returned:
Response Field | Description |
---|---|
"num_found" | The total number of available records in the database that match your query. If you end up downloading the entire data set, this is how many records you'll use. |
"total_cost" | The number of credits this request has cost you. Business records only cost 1 credit per record. |
"records" | The first available matches to your query. If there are no matches, this field will be empty. Within each record returned, you'll see multiple fields shown. This is the data for each record. |
Within the records
field, you'll see a single business returned with multiple fields and the values associated with that business. The JSON response will show all fields that have a value. It won't show any fields that don't have a value.
Each business record will have multiple fields associated with it. You can see a full list of available fields in our Business Data Schema.
4. Refine your search
If you think about the original query we made, you'll realize we didn't really specify which geography we're interested in. Since we only want US hotels, we should narrow our search appropriately. Modify your code to look like this:
<?php
/*
Illustrates an API call to Datafiniti's Business Database for hotels.
*/
$url = 'https://api.datafiniti.co/v4/businesses/search';
// Set your API parameters here.
$APIToken = 'AAAXXXXXXXXXXXX';
$format = 'JSON';
$query = 'categories:hotels AND country:US';
$num_records = 10;
$download = false;
$request_body = array(
'query' => $query,
'format' => $format,
'num_records' => $num_records,
'download' => $download
);
$options = array(
'http' => array (
'header' => "Authorization: Bearer " . $APIToken . "\r\n" .
"Content-Type: application/json\r\n",
'method' => 'POST',
'content' => json_encode($request_body)
)
);
$context = stream_context_create($options);
$result = file_get_contents($url, false, $context);
if ($result === FALSE) {
echo "Error.";
} else {
var_dump($result);
}
?>
This code is different in a couple ways:
- It adds
AND country:US
to narrow down results to just US hotels. - It sets
num_records
to 10 so we can look at more sample matches.
Datafiniti lets you construct very refined boolean queries. If you wanted to do more complicated searches, you could use OR operations, negation, and more.
5. Initiate a full download of the data
Once we like what we see from the sample matches, it's time to download a larger data set! To do this, we're going to update our code a fair bit (an explanation follows):
When using the API, you will not receive any warning if you are going past your monthly record limit. Keep a track on how many records you have left by checking your account. You are responsible for any overage fees if you go past your monthly limit.
<?php
$request_url = 'https://api.datafiniti.co/v4/businesses/search';
// Set your API parameters here.
$API_token = 'AAAXXXXXXXX';
$view_name = 'business_basic';
$format = 'CSV';
$query = 'categories:hotels AND country:US';
$num_records = 50;
$download = true;
$request_body = array(
'query' => $query,
'num_records' => $num_records,
'view' => $view_name,
'format' => $format,
'download' => $download
);
$request_options = array(
'http' => array (
'header' => "Authorization: Bearer " . $API_token . "\r\n" .
"Content-Type: application/json\r\n",
'method' => 'POST',
'content' => json_encode($request_body)
)
);
// Make the initial API call.
echo "Making initial request...\n";
print_r($request_options);
$request_context = stream_context_create($request_options);
$request_response = file_get_contents($request_url, false, $request_context);
if ($request_response === FALSE) {
echo "Error.";
var_dump($request_response);
} else {
echo "Download request successful:\n";
$request_response_array = json_decode($request_response,true);
print_r($request_response_array);
$download_id = $request_response_array['id'];
$download_url = 'https://api.datafiniti.co/v4/downloads/' . $download_id;
$download_options = array(
'http' => array (
'header' => "Authorization: Bearer " . $API_token . "\r\n",
'method' => 'GET'
)
);
$download_status = 'running';
while ($download_status != 'completed') {
sleep(10);
echo "Checking on status: " . $download_url . "\n";
$download_context = stream_context_create($download_options);
$download_response = file_get_contents($download_url, false, $download_context);
$download_response_array = json_decode($download_response,true);
$download_status = $download_response_array['status'];
echo "Records downloaded: " . $download_response_array['num_downloaded'] . "\n";
}
if ($download_status == 'completed') {
$result_list = $download_response_array['results'];
for ($i = 0; $i < count($result_list); $i++) {
$result_link = $result_list[$i];
$filename = $download_id . "_" . $i . "." . $format;
file_put_contents($filename, fopen($result_link, 'r'));
echo "File " . ($i+1) . " out of " . count($result_list) . " saved: " . $filename . "\n";
}
}
}
?>
A few things to pay attention to in the above code:
- We changed
format
fromJSON
toCSV
. We'll want to view the files in CSV format, so it will be easier to look at. - We change
num_records
from10
to50
. This will download the first 50 matching records. If we wanted to download all matching records, we would removenum_records
. - We set
view
tobusiness_basic
. This will show a limited set of fields and nest fields like categories and features into a single cell (instead splitting them across multiple rows and columns). - We changed
download
fromfalse
totrue
.
If num_records is not specified, ALL of the records matching the query will be downloaded.
Since we've handled multiple steps of the download process in this code, we won't go into the details here, but we do recommend you familiarize yourself with those steps. Checking them out in our Business Data with Postman and CSV guide.
6. Open the result file(s) in Excel
The download code will save one or more result files to your project folder. Open one of those files in Excel. It will look something like:
7. Analyze the results
Using Excel, we can easily count the total number of hotels on a state-by-state basis. The province
column gives us where each hotel is located, so we can use it to tally up the numbers.
Updated about 1 year ago