Day 2 of COVID-19 social distancing. All of my meetings and travel for the upcoming week are cancelled, my calendar is now strangely empty, and I could probably make use of this extra time to do something wildly productive.
Instead, I have randomly dived into this particular rabbit hole:
Read the Wiki page for the real details, but in summary: take a set of points called “seeds” (or sites). For each seed, the corresponding Voronoi cell is the set of points that are closer to that seed than to any of the other seeds. That set will be a polygon around the seed (and every point in that polygon will be closer to its own seed than to the seed of any other polygon).
An example Voronoi diagram:
Let’s say we are writing a map generator for a “world consisting of land and sea grid squares” game. We want to generate random blobs of land and sea that seem plausibly arranged. Voronoi diagrams are one way to do that – create some random Voronoi cells and choose a subset of them to be land and the rest to be sea. You can imagine how that would work using the above picture and visualizing some of those polygons as being land and the rest sea. In practice you would probably want more individual cells (i.e., finer-grained), so that as they clump together when randomly selected they look less obviously like the Voronoi polygons that they are. You might also still need to do additional post-processing on the results; you can go down another rabbit hole googling this topic. The Voronoi idea just gets you close as a starting point. But I digress… so: back to the Voronoi computation.
The “gold standard” algorithm for generating a Voronoi diagram appears to be Fortune’s algorithm.
And of course the right answer to most questions of “how to implement something in python” is “google it; because five people have probably already done it” and that is definitely the case here.
But there’s time to kill and this seems like a fun project so I decided to play with these diagrams. I did not (or have not, yet) tried implementing Fortune’s algorithm. It is complicated and requires study. Instead I plowed ahead with the somewhat-obvious “brute force” approach:
Let G be a grid Each G[x, y] being a point a.k.a "grid square" Let S be a list of points Each S[i] being the (x, y) point for the "seed" of a cell For each point, pt, in grid G: For each site, s, in list S: compute distance between pt and s assign pt to the closest Voronoi site
This works. It is not fast.
This brute-force algorithm has order n² performance characteristics, where n is the number of Voronoi cells. That analysis depends on exactly how you define n and how you think about the parameters, in particular how many Voronoi sites will be seeded into a grid of a given size.
In the “world map” example, this analysis definitely holds because if you choose a larger map you will still want the same average Voronoi polygon size, just more of them. The alternative – same number of Voronoi cells, just let them grow larger as the map grows larger, makes little sense in this example. So: Bigger grid, more sites. Thus, a larger grid means both more points to evaluate, and more Voronoi sites to evaluate them against, leading to the n² performance characteristic, which gets slow very quickly as the grids get larger.
As I started coding (and testing!) I discovered some interesting problems I haven’t seen written up anywhere (though perhaps my google-fu was just weak).
In a quantized grid of discrete x, y (integer) points, there are configurations where a given grid point could be a member of two different Voronoi cells. Here’s a very simple/obvious example of that:
2 ? b B 1 a ? b y=0 A a ? x=0 1 2
In this diagram, upper-case ‘A‘ at coordinate (0, 0) is one Voronoi site, and upper-case ‘B‘ at coordinate (2, 2) is another. The two lower-case ‘a‘ points are at (Euclidean) distance 1 from A and clearly belong in its Voronoi, as the ‘b‘ belong to B
But the three points marked with ‘?‘ are equidistant from both A and B. How should they be assigned?
a b B b b B a b B a a b OR a b b OR a b b OR ... A a a A a b A a a
“Who cares?” is a perfectly reasonable response to this question. But it came up for me when I wrote an alternate implementation and made test cases to compare the results of the two, which did not compare equal in simple cases like these but also in some surprisingly complicated cases. This prompted me to study the issue a bit deeper.
Fortune’s algorithm sidesteps the issue because it solves for the vertices of polygons. There is an exact solution for those equations, independent of any grid considerations. In the above examples each of the two Voronoi cells are triangles, with their vertices at the obvious corners. The vertices of the polygons are lines with continuous slope/intercept equations, and they uniquely and precisely define the Voronoi diagram. It is only when we take that continuous domain and map it (“sample” it) onto a grid that we get the aliasing (“jaggies”) problem. You can take yourself down the Nyquist-Shannon sampling theorem google rabbit hole yourselves if you want to at this point. 🙂
So, for example:
This depicts the true Voronoi polygons, each with three vertices (color-coded circles, shared vertices colored half-and-half). It also shows the integer grid imposed on top of this continuous figure. This is just another way of showing the aliasing problem caused by sampling the continuous Voronoi polygon solution onto the finite (blue) grid.
These trivial examples don’t seem very important, but when I tried writing a better/faster but still “brute force” algorithm I ran into more problems.
Growing Circles Algorithm
The next idea is a step up from brute-force: grow each Voronoi cell outward cell-by-cell, doing them all “in parallel” (conceptually, via algorithm logic not “real” parallelism). The idea is that grid points near their “obvious” Voronoi seed can be assigned to that cell without needing to be checked against other cells that are obviously too far away.
Let Dsq(P1, P2) (“squared distance”) between two points P1 and P2 be defined as:
dX = P1.x - P2.x dY = P1.y - P2.y Dsq = (dX*dX) + (dY*dY)
This is just a standard Euclidean distance calculation with the final sqrt step omitted. That step is unnecessary for this algorithm, so I leave it out (faster/simpler).
Next define a “ring” Rs(N) of squares around a Voronoi site, defined as the set of points that are squared-distance N away from Voronoi site S.
This diagram shows two Voronoi cells and their surrounding grid points marked with the Rs values. There is a separate Rs function for each Voronoi site; in the diagram the results are color-coded to match the corresponding sites.
For any given grid point, the lower Rs value is the one that matters, and assigns that point to the corresponding Voronoi site. The diagram shows higher values in (ironically) smaller font for some of the points where the cells begin to touch.
Starting with N=1 the algorithm builds out these “rings” (they are fragments of rings) from each Voronoi site. Look at the grid point at the top directly above B, labeled 9 in green (Rb N-value) and 10 in red (Ra N-value). When we create the Rs(9) rings for all the Voronoi sites, that point will get labeled with a 9 for the B Rs(N) function. Even though we can see later it will be right at the boundary of A and B, since we have discovered it while working on all the 9’s (and below), we can safely assign it to B, because we know by definition that if no other Voronoi site has this in its list of 9 (or below) then the closest it could be to any other site is 10 or more. We can “claim” it for green in this example, without ever having computed how far away it is from red.
Each time a point is “claimed”, the unclaimed perimeter squares of that point become candidates to be evaluated at the next N value. Said differently, each Voronoi site has an “active perimeter” which are all the unclaimed points immediately neighboring the points that have been claimed in prior iterations. It is entirely possible for a single point to end up in the active perimeter of more than one Voronoi cell at a time, which will happen as the growing blobs around each cell start to approach each other. Whichever cell gets there first (in terms of the squared-distance N value being iterated) will win that point.
Thus the key idea here being that for most points they never have to be evaluated against more than one, or maybe just a few (more on this in a bit) Voronoi sites. Instead of computing the square distance from every point to every Voronoi cell, we only compute the distances for the active perimeters of each Voronoi cell separately, and let them grow outwards unmolested until they begin to touch either other. Before they touch each other there is no chance that some other Voronoi cell would be closer; if it were, then the point in question would already be in its active perimeter.
The simplest way to understand this algorithm is to imagine N going up by one each iteration, but there are ways to optimize that to go up to the “next” N value that matters. Perhaps more on that later.
Of course there’s still the same equidistant point problem. Equidistant points show up any time two Voronoi cells are directly on a horizontal, vertical, or diagonal line with each other AND there is an odd number of points along that line between them. The simple 3×3 red/green example makes this somewhat obvious, as the “middle” grid points have to be divided in half between the two competing Voronoi cells. When there are an even number of grid points along one of these “straight” lines, the red/green border falls between two points so there is no ambiguity. It only falls into the middle of a grid point when the total number of grid points along this connecting line is odd.
However, there are other combinations that lead to equidistant points. The fact that each Dsq value computed is (by definition) the sum of two squares means we can look for this programmatically, by finding integer values that can be the sum of two (or more) pairs of squares, that is:
N = (a*a) + (b*b) = (c*c) + (d*d)
for some combination of integers a, b, c, and d, excluding the trivial case where the pair (a, b) is identical to (c, d) or (d, c). So what are some examples of these numbers? Well, as with most of those types of questions, the Online Encyclopedia of Integer Sequences has the answer, or at least part of it: Sequence A007692:
and the first few numbers that meet this criterion are 50, 65, 85, 125, 130, 145; however, I say “part of” the answer because that sequence excludes the case where one of the terms is allowed to be zero. I couldn’t find an actual OEIS sequence corresponding to “allow zeros” in the above sequence, but any pythagorean triple will form one of these ambiguous contenders with one of the terms being allowed to be zero, because:
If a**2 + b**2 == c**2 then a**2 + b**2 == c**2 + (0)**2 and (a, b) , (c, 0) will be a pair of two different pairs whose squares will sum to the same integer (c**2)
So for pythagorean triple 3, 4, 5 we see that 25 is another such value, with the corresponding pairs being (3, 4) and (5, 0).
Let’s take a look at 50: it is 25+25 and also 49+1. That means it should happen on a grid point that is deltaX=5, deltaY=5 from one Voronoi cell, and deltaX=7, deltaY=1 from another. Drawing that, with the square-distance values, results in:
This illustrates three different types of equidistant grid points. The one labeled 10 (in both red and green) is at delta (1, 3) from A and delta (3, 1) from B. This is the simplest scenario for creating equidistant squares – when they have the same “shape” of their distance from their corresponding Voronoi site.
The one labeled 25 is a pythagorean example; it is (3, 4) from A and (5, 0) from B.
Finally, Square F, which has square-distance of 50 to both A and B, is an example of something else entirely – it is (5, 5) from A and (7, 1) from B.
Things get weird (and very hard to draw) as the numbers get larger. I ran a test case with these parameters:
# three Voronoi sites S = [ (19, 0), (13, 15), (18, 7) ]
and then computed the squared-distance for every point in a 20×16 grid. The result for the lower left (starting at 0, 0) region looks like this stopping at N=324:
269b 244b 221b 200b 181b 164b 148c 125c 290b 265b 242b 221b 202b 178c 153c 130c 313b 288b 265b 241c 212c 185c 160c 137c . 313b 281c 250c 221c 194c 169c 146c . . 290a 257a 226a 197a 170a 145a . 324a 289a 256a 225a 196a 169a 144a
The lower case letters ‘a’, ‘b’, and ‘c’ correspond to the three Voronoi cells at A=(19, 0), B=(13, 15) and C=(18, 7). Their origins in this table are off to the right (and up); here is the full chart in smaller font:
The square at (1, 1), which hasn’t been computed yet in the above, has squared-distance value 325 from both A and C:
A=(19, 0), C=(18, 7), P=(1, 1) dxA = (19 - 1) = 18 dyA = (0 - 1) -1 Na = dxA**2 + dyA**2 = 325 dxC = (18 - 1) = 17 dyC = (7 - 1) = 6 Nc = dxC**2 + dyC**2 = 325
So as already belabored to death, the point at (1, 1) has to be randomly assigned to A or C and it “doesn’t matter”. Or so we think.
Say we assign the point to A. Notice that this “cuts off” C, so there are no more active perimeter points for C after choosing to give this point to A.
Now let the calculation run up to 359 and look again, after assigning that point to A:
269b 244b 221b 200b 181b 164b 148c 125c 290b 265b 242b 221b 202b 178c 153c 130c 313b 288b 265b 241c 212c 185c 160c 137c 338b 313b 281c 250c 221c 194c 169c 146c . 325a 290a 257a 226a 197a 170a 145a . 324a 289a 256a 225a 196a 169a 144a
Now the problem: the point at (0, 1) is distance 362 from A but is 360 from … (wait for it) … C!
Oooops, we aren’t even checking C any more because it got “cut off” when we arbitrarily chose to assign (1, 1) to A.
I stumbled into this problem by writing test code to generate random Voronoi sites, then running my “grow the blobs” algorithm and testing it against the brute-force implementation, with allowance as necessary for squares that might be equidistant from two different Voronoi cells. The test code very quickly generated test cases like this one where the arbitrary choice for an earlier equidistant point turned out to have implications on later choices and could cause a point to be assigned to the wrong Voronoi.
I spent quite a bit of time trying to understand this problem to see if there was any efficient way to fix it. The first thing that comes to mind is a heuristic that says “never close off the perimeter of a Voronoi cell if you don’t have to” which, in this particular case, would argue for keeping C alive and cause the correct result to come about.
I implemented that heuristic and it fixes this case, but it does not seem to fix the general case and I haven’t done enough digging into it yet to figure out why. I then tried generalizing the heuristic to assign the equidistant point to the Voronoi cell with the fewest active perimeter squares in the 3×3 neighborhood. This too does not address all cases (nor did a 5×5 neighborhood). I’m not sure if there is a generalized (and practical) algorithm that will solve for this or not yet.
I note that even with the original pure brute-force algorithm, there are bugs here depending on how you look at it. The brute force algorithm faces the same decision regarding the (1, 1) “325” point – assign it to A, or C? If the brute force algorithm assigns it to A, it will at least still find the correct C assignment for the (0, 1) point – 360 distance to C and 362 distance to A, so it will assign it to C. But in that case the brute force algorithm will have created a non-contiguous Voronoi cell, which seems like a bug (indeed this makes me wonder if some of the bug reports I saw on some other implementations when I was googling are related to this problem).
Well, it’s getting late and it’s time for a Quarantini beverage … so I’m stopping here and posting this and hoping to come back to it again with more details once I have done more work on this aliasing problem and its implications.