Yours truly is participating at this week’s local search conference of the season: Kelsey’s Leading in Local has been a staple of social/local/mobile for years, and we’re thrilled to return as a speaker and sponsor. Stay tuned for a few product announcements!
(or, A Cautionary Tale of Geocoding)
So in this new era of open data, data is free, right? If you’ve never tried using open data, it’s harder than you might think.
For the Tableau Customer Conference last month, we thought it would be fun to show off some data that was relevant to the conference location: Washington, D.C.
Our original idea was to associate D.C. restaurant health code violation scores to buildings to provide a simple and reasonable sense of which buildings might not have the best food facilities. Building outlines or “footprints” are available from the D.C. government and OpenStreetMap, tagged with restaurant names. OSM data can be obtained many ways; the easiest might be Metro Extracts. These should not be confused with parcel boundaries, which are tied to property lines.
Enter data quality challenges. Geocoding is complicated and “free” data isn’t ever really free.
When we downloaded the restaurant health code violation data, it quickly became apparent that the geographic component of the data suffered from a fundamental problem that limited us from spatially linking the latitude and longitude of restaurants with the building outline data: the geocoder used to obtain the coordinates placed about half of the restaurants on the street and the rest were located somewhere within the property (i.e., parcel) boundary. This is a challenge as we wanted to tie the actual restaurant location to a polygon, not a point.
Before I explain further, I need to provide some terminology disambiguation. Within Tableau, geocoding means displaying data geographically. However, in common geospatial parlance, geocoding has a very specific meaning: a geocoder is a tool that is used to derive latitude and longitude from a human-readable street address. In short, geocoding makes address data geographic, but unless you understand the assumptions made in the geocoding process, your geocoding may not be useful.
While we are talking terminology, a composite geocoder is a geocoder that will try to find a point to assign to an address, but if it fails, will fall back on whatever part of the address it understands. For example, if you provide an invalid address in Bismarck, North Dakota, the composite geocoder will return a point in the center of Bismarck, North Dakota. If you give it a nonsensical street name and spell Bismarck like “Bismark”, it may return a point in the center of North Dakota. Rather than raising a flag, it gives you an answer, albeit a less accurate one. Failing all else, your geocoder will return a value of NULL, which, if interpreted by a mapping client, will represent as the latitude and longitude coordinates (0,0), aka Null Island.
Finally, rooftop geocoding returns a latitude and longitude that will land on the building that matches the address. Not all geocoders are designed to do this – for example, it is unnecessary or even disadvantageous for a geocoder built for navigation and routing to resolve addresses to rooftops. It just matters that you get there, not whether it’s a valid address. Most so-called rooftop geocoders match the parcel, but may drop the point in the parking lot rather than on the building. With a little extra geospatial wizardry, you can assign a parcel-level geocode to a building footprint. However, a geocode to the middle of the street tells you nothing about which building or parcel the point belongs to.
Returning to our restaurant health code violation scores, it appears the data was geocoded with a composite geocoder that took a first pass through a rooftop geocoder, but failing that, assigned coordinates according to an address range of a street, etc…. We would have to invest significant time in data cleaning, re-geocoding, and manually placing points to assign all restaurant locations to a building footprint.
Instead we scrapped our efforts, and found other data. Sometimes life is too short for data cleaning. (Ed note: please read our companion post about the viz we did build).
The question of how to build the most precise geocoder isn’t an easy one, but it is one we think about a lot. There is no solution that would not require an enormous amount of information about address and building configurations on the ground, but clearly It doesn’t make sense to drop one point at a time to generate hundreds of thousands of locations for civic engagement or business intelligence. In sum, the details of data matter, and dealing with them is less glamorous and more important than most of everything else you will do when creating a (geographic) data visualization.
At last month’s Tableau Customer Conference in Washington D.C., we ran a hands-on mapping session that showed how to create dual axis maps while retaining two measures, build custom regions, create a viz with open data and how to use additional data and services from Urban Mapping, Tableau’s official map provider, through Mapfluence, our mapping platform.
Because Mapfluence has a direct connection into Tableau that does an end-around WMS, high resolution imagery can be seamlessly integrated into Tableau, tiling only the portions of the image you need as a base layer under your viz. Any data viz guru will tell you that unnecessary levels of detail clutter your presentation and obscure your message, which is why it is better to avoid a complicated base map where a simpler one will do. Nevertheless, there are cases where high resolution imagery provides valuable context for your viz. For example, if you want to show parcel boundaries or building outlines in their spatial context, the benefits are obvious:
The above map is rendered by Mapfluence. It can display, customize and symbolize the features in the variety of ways you would style a filled map without any limitations on rendering boundaries. When you draw your filled map directly onto the base map, you are free to use all of your dimensions to visualize data.
With a little geospatial wizardry, you are not limited to the dimensions and measures associated with the geometries you are drawing. Using Mapfluence or geospatial software like QGIS, you can aggregate point level data to your custom geographic boundaries to create new dimensions and measures.
For the Tableau Customer Conference, we thought it would be fun to show off some data that was relevant to Washington D.C. After stumbling when trying to use open data describing restaurant health code violations (be sure to read the companion cautionary tale of open data), we found the District of Columbia produces good quality, up-to-date data on crime incidents since 2011.
Building The Viz
The source data contains location information (latitude-longitude pairs) and several associated attributes (crime type, description, etc). To maintain privacy, incidents are geocoded to the nearest block instead of the actual address, so on a map the points appear as a gridded mass of dots.
You could generate a kernel density estimation based on the point distribution (i.e. a heat map), however this masks the aggregation at block level, and could give your audience false impressions about the patterns in your data. Furthermore, we want to get a sense of overall crime rates throughout the city. Kernel density estimates are more useful for hot spot analyses of particular types of crime, and when you have an actual, non-aggregated location. When looking at crime rates overall, or any complicated and varied phenomenon, kernel densities are difficult to interpret:
When you aggregate points by census block, you are presenting the underlying data at its appropriate level of aggregation. As we can see, patterns emerge.
While the high resolution imagery allows us to see context block by block, it also complicates the viz. To show both, I varied the transparency with number of incidents by block instead of using a color ramp. That way, you can see where the greatest number of incidents were reported, but you also see as much of the underlying imagery as possible for the additional context of roads, buildings, and other geographic markers. Finally, we wanted to introduce another dimension to the viz. Because Mapfluence contains over 10,000 on-demand variables in our data catalog, we decided to overlay line and station information for the DC Metro subway system. This allows for anther dimension in the analysis.
To overcome some of the limitations in how Tableau deals with geographic polygons, we render and serve the census blocks from Mapfluence. This is effective as Mapfluence is designed as a web-based GIS and geographic analysis and representation is second nature to us. However, this is also not ideal as Tableau users like to play with all the data they can.
Working with Leigh Fonseca of Fonseca Data Science, we came to a very subtle but compelling solution: leave the heavy geo-lifting to Mapfluence, and allow Tableau to act as the reporting tool. In this way, the polygons act as a proxy for the underlying points that are available in the workbook, a la tooltip functionality! Importing data into Tableau allows users to create a dashboard on top of a custom base map that you can filter according to the underlying points.
Click the image above to explore the dashboard. You can click individual blocks or select an area to see a breakdown of the crime types compared to crime citywide. Crime definitions are explained here.
Although we do not purport to be criminologists, this viz highlights data that would be impossible to see in a spreadsheet. Here are some things we observed:
- Theft of property represents a very disproportionate number of the highest density of incidents near the Columbia Heights Metro Station (14th Street NW and Irving): 533 out of 554 reported incidents.
- Car break-ins represent a greater proportion of incidents on the outskirts of DC than in the city center.
- There are approximately half as many burglaries in Northwest as compared to Southeast or Northeast.
- Relatively few crime incidents were reported on the Washington DC Mall compared to the city overall between January, 2011 and August, 2013: 45 thefts, plus 22 car break-ins, 6 robberies, 5 car thefts, 3 assaults with weapons, and 2 sex crimes.
There are plenty of additional ways to drill down into this data for the inquisitive data nerds out there. You could normalize by population per census block to see which crimes occur more frequently were the residential population is higher or lower. You could focus on visualizing the spatial patterns of a particular type of crime or set of crimes. Or you could visualize the changes in crime patterns over time using a time slider.
At Urban Mapping we’re excited about Tableau, and we’re excited about mapping in Tableau, and we suspect you are as well. We’ve taken a moment to showcase high resolution imagery in Tableau, and the potential for using data + mapping in Mapfluence to build dashboards in Tableau for exploring geographic phenomena. Please let us know what you think and be sure to learn more about our enhanced mapping solutions for Tableau.
Why were we so enthusiastic to be at the Tableau Customer Conference this year? Because we are the proud providers of the basemaps and geographic data layers displayed with each map viz in Tableau, and it is exciting to see how much the Tableau community loves maps. Our main message this year was this: Mapfluence has a rich set of features and parameters that you can access with minor tweaks to your map card.
For our advanced mapping session, we partnered with Leigh Fonseca of Fonseca Data Science. Leigh has a great track record with Tableau, and has a knack for solving tricky data viz problems. Leigh helped us identify the challenges users most often identify when mapping in Tableau, and we pondered how we can solve them using our powerful platform for data + maps.
Here are some examples of what we came up with:
1. Create “layers” on your maps. Leigh demonstrated some best practices for parameterizing metrics on your map, and demonstrated how you can layer your metrics via a simple radio button.
2. Limit the map to your area of interest without losing a dimension. If your data focuses on a particular region, like a US state, it’s distracting and irrelevant to show neighboring regions. One clever solution to this problem is to turn off the base map and filter a filled map to show just the regions you want. This is called a dual axis map. The name refers to the fact that as with a dual axis chart, you are scaling one layer of your data with one axis, in this case the geographic coordinates of your area of interest, and the other layer of your data with another, in this case whatever filled map or mark you want to show on the map.
If you go this route, you may quickly realize the drawback, which is that you used up a dimension of your visualization for showing your base map. Now you only have one dimension left for your data. The alternative is to modify your map card so that it only shows the area of interest. That way you can use all your dimensions for data. Check out HackYourMap.com and stay tuned for step-by-step tutorials on how you can make a map like this one:
By modifying the configuration file that controls your map card, you can limit the extent of the map, change border and fill colors, change the transparency of a particular layer, and add additional overlays:
3. Bring new geographic boundaries into Tableau. We understand there is a lot of demand for this within the Tableau community. Urban Mapping sources and licenses all sorts of geographic boundaries as part of our product offerings, including census and administrative boundaries from countries worldwide, proprietary marketing boundaries, transportation and freight systems, and neighborhoods worldwide. We can embed boundaries too complex to display in Tableau directly into the base map you see in Tableau so that they are also available for reference. Where the scale is not prohibitive, we can also help you create custom geocodes based on any public boundaries or based on the sales or regulatory territories you use in-house. For example, we can provide you with DMA boundaries licensed from Nielsen that you can use to visualize your sales and marketing data:
4. Use satellite imagery as your base map. While our licensing agreements prevent us from distributing them freely, we can serve high resolution satellite imagery directly into Tableau. This is particularly useful for certain types of operations analytics, where you need to be able to see the land use, buildings, or roads underneath your map viz. For the example below, we used three years of Washington DC crime data to calculate the number of incidents per census block, and then varied the transparency on a scale of zero to 889. Additional details about the incidents are shown as points with tooltips in Tableau, and the data can be combined in a dashboard with other charts and graphs.
6. Custom sales, marketing, or regulatory regions. One thing we’ve heard over and over from Tableau customers is that they would like to be able to draw or create regions that are customized to the use case of their business or organization. We have a tool we think you’ll really like in the works, but in the meantime, Leigh developed a dashboard that calculates metrics based on custom groupings of countries.
We put together a tool for generating center points for each of those regions, so that you can color your map and crunch your numbers in Tableau, and then attach tooltip details and charts on your imported geocodes that we hope to release soon. Just click the points, label them, and email yourself the file so that you can import it into Tableau.
Due to popular demand, we’ve also included the presentation
Day 2 of the Tableau Customer Conference! Yesterday at our session we demystified the mapping connection between Tableau and Urban Mapping, provided some concrete examples of how to hack your map, and explained how we can help you get the custom boundaries and data. Stay tuned for more details and step by step tutorials for all our demos.
If you’re at the conference, come stop by, and don’t forget to RSVP for our party tonight!