An Easy Way for Developers to Clean Up Address Data
Address information is one of the most commonly collected forms of data for companies across the world. It is also data that can easily be collected and stored in inaccurate or incomplete form.
Street names might be misspelled. Zip Codes could be left out when addresses are entered. Multiple customers could have the same name, creating uncertainty about which addresses map to which people. These are just some of the data quality errors that could appear in address information.
Fortunately, there is an easy way for developers to clean up address data, without having to purchase complex data quality tools or get Ph.D.s in data engineering. That solution is the TomTom Online Search API, which provides a structured geocoding call that can clean up address data. It also provides accurate latitude and longitude information that can take the place of unformatted raw data in order to deliver greater accuracy and exactitude.
This article explains how to get started with the TomTom Online Search API for address validation and data cleanup. We’ll discuss the benefits of geocoding and properly formatting your address data, then walk through a sample address cleanup program that leverages the structured geocoding API call from TomTom.
The Benefits of Geocoding and Structuring Your Address Data
As described above, it’s not uncommon whaen requesting data via an online form to run into issues with the validity of the requested data. This holds true in the case of requesting address data from a customer.Misspellings, incorrect capitalization and missing fields can cause obvious issues. Imagine a situation where a company reads from this database in order to perform a mailing to all customers who provided their address data. This company would want the mailing to make its way to the correct destination, but when viewing the list of addresses, the addresses may have misspellings that could result in delivery failures. Structuring address data properly would provide an essential fix for this issue.
Through the use of the Online Search product from TomTom, an organization can pass the provided address data into the structured geocoding API call, and if they’ve received enough relevant address data to narrow down the search, they can receive the actual (properly formatted) address that will allow them to perform a mailing without any issues. In addition, the address will be geocoded, and they will have the ability to store the latitude and longitude along with the associated address in their database.
Geocoding provides several benefits that are undeniable in today’s day and age—many of which stem from an organization’s ability to analyze their customer base on a geographical level. Instead of simply staring at lines of address data on a page, they can instead look at a map of locations that may provide valuable insight into their market. Maybe their business is more successful in some parts of a particular city than others, and mapping data can help them discover where to focus their efforts. Or maybe they’ve cornered the market in one portion of a city, but there are neighborhoods that have critical similarities where the market remains untapped. Geocoding can assist in helping to analyze customer data in both of these cases.
Getting Started with the TomTom Online Search API
The first step towards utilizing structured geocoding from TomTom is to get set up with the TomTom Online Search API. Visiting the TomTom For Developers website and registering will bring you to a dashboard where you can select the option to add a new application. Providing an application name and selecting the Online Search product will provide you with an API key for use with the Online Search API.
After you received your API key, you are ready to develop your application that has access to the structured geocoding method for cleaning up geocoding addresses. An invaluable resource throughout development of an application that leverages the Online Search product from TomTom is the online documentation for the resource, located here.
A Simple Java Implementation
So let’s take a look at a sample program that utilizes the structured geocoding API call from TomTom Online Search. In an effort to demonstrate the capability of this API call, I’ve developed a simple Java implementation where a CSV input file acts as our unformatted address database, and a CSV output file acts as our formatted address database. From the input file we will read in each address, line by line, and provide the input as part of the structured geocoding request. Each address in the input file contains misspellings, missing fields. or even both.No latitude and longitude are stored for any address in the input file.
Below, you will see a screenshot from FormatAddresses.java. Class variables are set up for the API key as well as the CSV field separator (comma delimited), the new line separator(for use in writing to the output file), the headers for output file formatting and the input and output filenames with the path.For the sake of simplicity, I have written the functionality for the sample program right in the main method for the class, and we can simply run our program from our IDE.
The first thing we do in the main method is to create and open our output file and write our headers to the first line of the output file. Once we’ve appended our new line character at the end of the first line, we can ensure that we are now ready to write a formatted address to our output, which is simulating a formatted address database table.The next step is to read in our first line and instantiate a string array splitting on the comma delimiter.This will form an array where each position holds a field from the line in the input file.After this, we pass our string array to the constructor for UnformattedAddress.java, where we create an object that organizes our unformatted fields into attributes of a Java object which can then be passed to the HTTP GET request (our API call).
public class FormatAddresses {
private static final String API_KEY = "YOUR-API-KEY-GOES-HERE";
private static final String SEPARATOR = ",";
private static final String NEW_LINE_SEPARATOR = "\n";
private static final String HEADERS = "streetNumber, streetName, municipality, countryTertiarySubdivision, countrySecondarySubdivision, countrySubdivision, postalCode";
private static final String INPUT_FILENAME = "C:/documents/Addresses.csv";
private static final String OUTPUT_FILENAME = "C:/documents/FormattedAddresses.csv"
public static void main(String[] args) {
BufferedReader br = null;
String line = "";
try {
FileWriter fileWriter = new FileWriter(OUTPUT_FILENAME);
fileWriter.append(HEADERS);
fileWriter.append(NEW_LINE_SEPARATOR);
br = new BufferedReader(new FileReader(INPUT_FILENAME));
while ((line = br.readLine()) != null) {
String[] unformattedAddressArray = line.split(SEPARATOR);
UnformattedAddress unformattedAddress = new UnformattedAddress(unformattedAddressArray);
URL url = new URL("https://api.tomtom.com/search/2/structuredGeocode.JSON"
+ "?key=" + URLEncoder.encode(API_KEY, "UTF-8")
+ "&countryCode=" + URLEncoder.encode("US", "UTF-8")
+ "&streetNumber=" + URLEncoder.encode(unformattedAddress.getStreetNumber(), "UTF-8")
+ "streetName=" + URLEncoder.encode(unformattedAddress.getStreetName(), "UTF-8")
+ "&municipality=" + URLEncoder.encode(unformattedAddress.getMunicipality(), "UTF-8")
+ "&countrySecondarySubdivision=" + URLEncoder.encode(unformattedAddress.getCountrySecondarySubdivision(), "UTF-8")
+ "&countrySubdivision=" + URLEncoder.encode(unformnattedAddress.getCountrySubdivision(), "UTF-8")
+ "&postalCode=" + URLEncoder.encode(unformattedAddress.getPostalCode(), "UTF-8"));
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
InputStream in = new BufferedInputStream(conn.getInputStream());
String output = readStream(in);
JSONObject jsonObject = (JSONObject) new JSONParser().parse(output.toString());
JSONArray results = (JSONArray) jsonObject.get("results");
for(JSONObject result : results) {
JSONObject address = (JSONObject)result.get("address");
JSONObject position = (JSONObject)result.get("position");
FormattedAddress formattedAddress = new FormattedAddress(address, position);
writeFormattedAddress(formattedAddress, fileWriter);
}
fileWriter.flush();
fileWriter.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Parseexception e) {
e.printStackTrace();
} finally {
br.close();
}
}
}
/** remainder of class omitted */
}
Once we have instantiated our unformatted address object, we can create our URL that we will use for the request.As you can see in the documentation for the API call, there are several parameters required to perform the structured geocoding call, and many more that are optional.
Those that are required include the following, which we provide when we build our URL object:
- Base URL: api.tomtom.com/search/
- Version Number: 2
- Response format: I have chosen JSON for this particular example. JSONP, JS and XML are also valid options.
- Country Code: In my example, I’ll be using US for the country code.
- API Key:You can insert the API key provided to you when you added your application through the TomTom developer dashboard.
The following are the optional request parameters I provided in my particular example:
- Street Number:The street number of the address, if provided in the input file, will be added to the API request. If not, then I will pass an empty string to the call.
- Street Name
- Municipality: City or town, if provided.
- Country Secondary Subdivision:County, if provided.
- Country Subdivision:The state in which the address is located.
- Postal Code: Zip Code, if provided.
As shown in the code above, the URL for the request is built using the above parameters. (For more information on additional request parameters that may be leveraged, please visit the documentation.)
The next step in my example was to send the HTTP request, parse the JSON response and build a formatted address object, which we can then leverage to write to our output file (simulated clean database). Please see the code below to see how I put these words into action:
Take the following input, for example, taken from the first line in my input file:
- Street Number: 4
- Street Name: Yawkey
- Municipality: Boston
- Postal Code: 2215
This is, of course, the address for Fenway Park. Yet it is incomplete. After providing these fields from the input to the request for the appropriate parameters, I am met with the following JSON response:
{
"summary":{
"query":"4 02215 yawkey boston",
"queryType":"NON_NEAR",
"queryTime":42,
"numResults":1,
"offset":0,
"totalResults":1,
"fuzzyLevel":1,
"geoBias":{
"lat":42.34528579930702,
"lon":-71.10655
}
},
"results":[
{
"type":"Point Address",
"id":"US/PAD/p0/2532244",
"score":11.1,
"dist":670.4458521397014,
"address":{
"streetNumber":"4",
"streetName":"Yawkey Way, Jersey St",
"municipalitySubdivision":"Boston, Fenway",
"municipality":"Boston, Boston University, Kenmore",
"countrySecondarySubdivision":"Suffolk",
"countryTertiarySubdivision":"Boston",
"countrySubdivision":"MA",
"postalCode":"02215",
"extendedPostalCode":"022159103",
"countryCode":"US",
"country":"United States Of America",
"countryCodeISO3":"USA",
"freeformAddress":"4 Yawkey Way, Boston, MA 02215",
"countrySubdivisionName":"Massachusetts"
},
"position":{
"lat":42.34679,
"lon":-71.09865
},
"viewport":{
"topLeftPoint":{
"lat":42.34769,
"lon":-71.09987
},
"btmRightPoint":{
"lat":42.34589,
"lon":-71.09743
}
},
"entryPoints":[
{
"type":"main",
"position":{
"lat":42.34671,
"lon":-71.09889
}
}
]
}
]
}
Due to the fact that I am only interested in saving certain fields to my output “database,” I retrieve only the position and address data located at the root level in the result array. As you can see, that provides me with full address data as well as a freeform address field, in addition to the geocoding data (latitude and longitude) for Fenway Park. I simply write this address and position data to my output file and move on to the next record in the input file. Another field that may be of interest in the geocoding realm is the entry points array provided for each result. The precise location of the main entryway is given, which could be invaluable, depending upon what one is hoping to achieve by retrieving this data.
Conclusion
Unformatted address data is a common challenge. TomTom Online Search’s geocoding request feature offers an easy-to-use solution for cleaning address data and building a database of geocoded locations—and then makes it available via simple HTTP GET requests
This article originally appeared on developer.tomtom.com/blog. The original author is Scott Fitzpatrick.
Top comments (0)