Posts Tagged ‘polygon’

How is Local Search Like Storm Tracking?

Wednesday, September 19th, 2007

News from the National Weather Service that is sure to get (geo)data-wonks excited…

From the National Oceanographic & Atmospheric Adminstration (parent agency of NWS), the current means of tracking severe weather events is done in the following manner:

…The NWS currently issues and disseminates warnings for tornado, severe thunderstorm, flood and marine hazards using geopolitical boundaries.

As of 1 October 2007, this system will change to something new:

Storm-Based Warnings (threat-based polygon warnings), are essential to effectively warn for severe weather. Storm-Based Warnings show the specific meteorological or hydrological threat area and are not restricted to geopolitical boundaries. By focusing on the true threat area, warning polygons will improve NWS warning accuracy and quality…

You may want to ask Umibot “what’s the big deal?” Some graphics from the Storm-Based Warnings (NB: press release to follow on 10/01/07) illustrate this:

On the left, the county is used as the unit of measure–this means if a predicted storm path touches a county boundary, the entire county will receive an alert. This is especially cumbersome in some Western states, where counties can be extremely large. Deploying emergency resources (first responders, food, supplies, etc…) and alarming the public when not necessary could prove and expensive proposition.

The image on the right highlights the new approach: “threat-based polygons” might sound menacing, they are no different from what the NWS currently uses with a key exception: the granularity has changed such that the unit of measure is now the municipal boundary.

From UMI’s perspective, what is interesting to note is that NOAA prediction accuracy did not drive the Storm-Based Warnings program–there are meteorological (and related) advances that help officials understand patterns of severe weather, and that is independent from presenting those data. Because prediction science has become more accurate, a smaller unit of measure (ie, municpal area) can be used. From this perspective one could say predictions were ‘hiding’ behind the larger unit of measure (ie, county).

Umibot likes these kinds of stories because they play directly into his (or her?) sweet spot–the design of data. And this was the focus of a talk Ian gave last year on the very subject.

The analogy for local search is clear–data should drive the use case of an application. If one is going to offer an application that allows for (say) mobile search, will a user have the granularity that is needed to have a meaningful experience? An example here is “restaurants in San Francisco”–mobile means you are, well, mobile, on the go, and a city is (probably) not a meaningful geo-constraint. Something more granular, like a 2 mile radius (if the device is location-aware), cross street, or neighborhood will likely be more satisfying.

The Centroid Gap, or the Death of the ZIP Code?

Thursday, June 7th, 2007

A several weeks ago we posted a few thoughts about the death of the ZIP code. There’s a lot more to say from the geo-perspective on local search, and here’s some more fodder…

To give any data a geographic context, it must be spatially-referenced to the Earth. Geographic information systems (GIS) serve as a means of referencing this information. Within the context of local search, addresses, city boundaries, postal codes or other geographic data must be ‘translated’ from human terms (690 Fifth Street, San Francisco) to latitude and longitude, ie, machine terms (37.775429, -122.397314). This geocoding process allows databases to recognize human-language requests. To geospatially reference (say) a postal code, one would expect that area to be spatially-defined. When a user searches for (say) “coffee in 94107,” the ZIP code should serve as the geographic constraint, searching within this polygon. Correct?

Wrong! A variety of reasons are to blame for why the logical doesn’t happen: most obviously, ZIP codes were defined as letter carrier routes. They were not meant to serve any other purpose. As such, the ZIP may not even conform to what you expect–one side of a street, one floor of a multi-story building or one-half of a block may not be fall within what postal code you expect. In fact, many parties claim to use a ZIP code database in fact obtain this info from a sister governmental agency, and these boundaries are stylized representations of the USPS data.

More to the point, these stylized boundaries are likely not used. Instead of associating (say) 50 latitude/longitude points to define a the postal code boundary, technical optimization says one point is sufficient. The analogy here is reducing a novel to a word–in the context of local search, granularity matters, and using the mathematical center of a polygon serves to distort and misinform a user’s search. In practice, the centroid is used because it is more efficient to calculate than the actual shape. Reducing the contours and nuances of a small area to a point, often with a radius drawn around it, effectively makes all postal codes look like circles. Gaps and overlaps are formed, further distorting the expected reality for a user.

Graphically this can be represented with the ZIP code boundary and circle (with the center serving as the centroid). The circle includes area that is not shared with the postal code area and vice versa. A user searching in this ZIP will therefore not be returned all the relevant listings. Some will argue this is a technology issue, but from the above example, it clearly more of a mindset–getting product managers to think about the how and why of data will go a long way.