A Simple trick to Speed up SQL Intersect

Raphaël Léger
3 min readSep 10, 2021

An initial request taking up to a few minutes

Let’s assume you set up a geospatial database and one of your table has millions of points.

Let’s say you want to get the points that are within a given area, for instance the following region of France, a polygon itself made of ~400 points:

You would write the following SQL statement:

Classic way of querying geospatial data using ST_INTERSECTS (ex: for Postgis or Athena)

I benchmarked this request execution on a 1.000.000 entries database and it took: 1 minute and 15 seconds

And that is, with a polygon made of ~400 points. Now, you may be dealing with polygons made of thousands of points and you may have millions of points in your table. The request would then take several minutes.

Several minutes for a single request, that is definitely way too long, the request is very slow.

The way to make the request take a few seconds

Let’s try to perform the same request except that, instead of using your area, you will be using the bounding box of the area:

Here is what the SQL request would look like:

Intersection between the points and the bbox

I benchmarked this request execution on the exact same 1.000.000 entries database I used above and it took: ~2 seconds

How come there is such a different execution time? Well, checking whether a point intersects a rectangle can be done trivially with a math formula using pen and paper! Whereas checking if a point intersects a polygon involves way more computation efforts as you need to take the lines between every point into account and deal with convexity and concavity.

But wait… this request also returns points that are outside of the area.
Yes, some points are inside the bounding box but not in the area. We do not want those points.

And here comes the magic, let’s merge the two requests together to obtain the final request:

faster request with ST_INTERSECT

I benchmarked this request execution on the exact same 1.000.000 entries database I used above and it took: ~3 seconds

That’s it! By adding this second intersection operation to the initial request, the execution time is actually drastically reduced.

Why is SQL faster with this new condition: when you check multiple WHERE conditions with the AND keyword, SQL performs the first check then the second one if and only if the first one is true, meaning the request actually checks whether points intersect the area only after they are sure to be within the bounding box, which means the polygonal intersection is performed over way less points than in the initial request, that performs it for every single point in the database.

By the way, if this article helped you, don’t forget to clap & subscribe! 🙂
This helps me know that taking the time to write the article was worth it!

--

--