Demonstration Of Spatial Search Algorithm. » My Blogging

# Demonstration of Spatial Search Algorithm.

The red point is invented by the polygon, but it’s not actually inside it. It has an even number. So, the underlying primitive you need is the ability to detect intersections. That’s an important thing. Next one, polygon area. This one we demonstrated very, very fast, even on very complicated polygons. And this is because the algorithm is actually quite straightforward.

What you do is you pick a point. It can be inside the polygon or not, doesn’t really matter. You can just take the origin, for example, then what you do is you take every line segment and make a triangle between that line segment and that origin point. And you calculate the area of the triangle. And you go all around the polygon, summing those areas. But notice the fact that when the line segment inverts, the area should be negative. And you end up subtracting those red shaded areas and you end up with the actual final total.

So, it’s quite a nice simple algorithm that works surprisingly fast. In geographic coordinates, there was an alternative version of this that makes use of the polar coordinates. And I mentioned to you, in the beginning, the fact that we can use vector algebra. And the main thing is the ability to detect these intersections, and when you use this equation. I’m not going to go into more detail on this one, you can have a look at the paper. But it basically forms as well similar reasons to the other one. So, when we actually want the area, how do we do that?

Well, I mentioned we’ve got this tree structure of shells and holes and shells and holes. So, of course, you calculate the sum of all the shells and subtract all the holes to get it with the query like we see on the left. And it works really well. Next one, polygon intersection. Didn’t manage to get this into the demo, but polygon intersection is something that gets used by many of the other algorithms. As I mentioned with the area one needs it as well, and the distance one needs it as well. And the point in a polygon needs it.

So, quite a few of them need intersections. So, how does intersection work? Well, here’s one algorithm called monitoring chain sweep. What you do is you have sorted your line segments from left to right, top to bottom, and then you can just sweep across. And you basically decide if a point is in between any two line segments. And whenever you do this, you can actually see when an intersection occurs.

When the intersection occurs, you create a new point at that intersection and then continue the sweep. And this allows you to work out the complete set of all intersecting points between any two bags of line segments, which of course could be two polygons. And of course, we have repeated this algorithm in geographic coordinates. This one has one subtle challenge, and that is since the great circle is defined by that vector, it’s going all around the planet. Any two points on that circle should define a line segment. But, is it this way around the planet or that way around the planet? So, that does mean that these algorithms are sensitive to the possibility that your polygon actually envelops the Earth, which is rarely the case.

So, we’ve mostly assumed that it’s the shortest distance between those two points. That is the real line segment. This is an important point here about the fact that the shortest distance, when projected, is actually a long curve. So, it was very important to be able to take this into account. We have to switch to great circles for everything. And this affected quite a few of our algorithms. So, that monitoring suite intersection thing needed to be treated slightly differently. And that was done in geographic coordinates successfully.

So moving on to the second last algorithm, and that is distance. This one I’m going to go into slightly more detail because as I mentioned in the demo, it was very slow last year, and we had to do an optimization. So, in polar coordinates, the distance is actually not using what we would normally have done in Cartesian, we’d use Pythagoras. Here we’re going to use a polar equation. In this case, it’s arc times the arc tan of the cross products and the dot products. And this actually works very well, and it’s very, very fast, and it’s equivalent.

How do we query this? Well, this is a slow query. If you look on the right there, I’ve got three polygons, and yet I’m seeing more than three distances. And that’s because the polygon around Gothenburg has this small hole in it, and it’s actually calculated that as well. So, I mentioned that in the demo. If you look at the query over here, what am I doing? I’m doing two large match statements. I’m doing a big match that says, “Find the OSM relation.” That’s the grouping constructed represents the entire polygon, traverse down the polygon structure with a star.

See where it says the polygon structure star, that means I’m going to traverse down that whole tree and find all of the sub polygons, and I’m going to do it for the other polygon as well. So, this query I’m showing you right now is all pairs of polygons. And I’m then going to do the distance calculation at the end, and then you’re going to end up with all of these distances. So, one of the optimizations we did was to look at the shells only. That’s this query. But look what I’ve done right now, I’ve taken away the star. So, I’m actually only traversing the polygon structure to find the polygon, just to know that one exists.

So, the actual polygon structure above there, I’m not using at all. Instead, if we go down to the second half of the query where we call the function polygon show. I’m passing into the polygon to show only the original relation, not the polygon itself. Inside that function, we again traverse the tree and we find the single largest outer shell and we return that only. So, I’m simplifying the polygon from a multi polygon to a simple polygon. And then, I do the distance calculation. And that’s what you see on the right. That’s what people intended when they ask for distance. And it is faster because it did fewer calculations. But this wasn’t the main optimization. The main optimization is this one. And I did describe this when I did the demo briefly.