Perfect 2-D Hashing

Author: Han de Bruijn
Date: 2006 November

This is actually a re-write of an article posted in 'sci.math' as the (would be) thread 'Inverse Perfect Hashing'
Imagine a rectangular region which is placed quite arbitrarily inside a rectangular grid. It is questioned which gridpoints are inside this rectangular region.
A picture says more than a thousand words:

Forward Hash

Formulation of the problem

How to obtain a list of all grid points which are inside the rectangle, where each grid point occurs only once in the list. And yes, we want our algorithm to be as fast as possible.

Solution of the problem

The rectangle is transformed into a rectangle which coincides with a cartesian coordinate system at two of its edges (i.e. the top-left position becomes the origin in graphics coordinates). This is a translation combined with a rotation. Each grid point is transformed into the translated/rotated coordinates, giving (x,y) floating point values, instead of integer values:

When rounding the floating point values of the transformed interior points to integer values, memory locations may be obtained, in principle.

Such a method is commonly called hashing.

It can be demonstrated (: make a drawing) that at most two collisions can happen when rounding these (x,y) coordinates: just try to bring two grid points together in one grid cell. You will see that they can only fit in if their mutual distance is smaller than √2, the diagonal of the cell:

But two different grid points can oly give rise to the same round off values - giving the midpoint of the grid cell - if they fit into the same grid cell. The big trick is now that all lengths may be multiplied with this factor √2 (preferrably even somewhat larger for safety):

It is clear that all collisions have disappeared then: it's impossible to bring two grid points with mutual distance greater than √2 together into one 1x1 grid cell and round them to the same midpoint. With other words: we have a perfect hash (though maybe it doesn't look like it, not yet):

But if these values are rounded  . . . Look ! Here we are:

Therefore what we need is a table which is √2 times √2 = twice as big as the minimal size which would be needed for storing all grid point values "without gaps". The advantage of the overhead by a factor 2 is that an adress is attached to a grid point immediately, without any need for re-hashing (due to a collision) or searching.

Inverse Hash

Such a "Perfect Hash" can always be inverted. This is evident, because if there is a one-to-one mapping of a grid point onto an adress then such a mapping can always be inverted to a mapping of an adress onto a grid point.

The procedure is as follows. Traverse all adresses in our twice-too-big table; and apply the inverse of all transformations in reverse order. Inverse rounding in the first place, corresponding to plus or minus 1/2 with both coordinates, resulting in a little square with four vertices. Then dividing all coordinates by √2 instead of a multiply. Rotate over minus the angle, perform a translation backwards. The end-result of this (floating point) procedure is the little square with, yes or no, a grid point in it. Iff the grid point is within that little square, then it can be found by rounding the coordinates of the center of the square.

This center corresponds with the end-result by rounding, in the Forward Hash procedure. Rounding the center gives whole-number coordinates, hence a possible grid point. Next check if the candidate grid point is within that little square as well. (This is not necessarily true: approximately half of the grid points will be outside and approximately half of the grid points will be inside the little square.) At last check if the candidate grid point is within the large skew rectangle we started with. In this manner, the Inverse Hashing procedure enables us to find all grid points corresponding to adresses which can be filled with Forward Hashing.

Download

The above has been coded into a computer program. The main program produces the pictures as captured and displayed in this web page. It is accompanied, though, with a separate module (Delphi 'unit') called 'wiskunde' (Dutch for 'mathematics') and the above theory is embodied in this module. The source code, together with a (Microsoft Windows) executable, is packed into a downloadable ZIP file. Unpack in a spearate map and eventually execute.

CG application

But look what we have obtained now ! Because grid points can be interpreted as pixels - with Computer Graphics applications - Inverse Hashing is actually a recursion-free Flood Fill procedure of arbitrary rectangles in an image. The overhead measured in computation time is at most twice the theoretical minimum, that is: iff each pixel could be found immediately, which in general is not the case.

Number Theory

Key reference is the following article in the 'sci.math' thread ' 3^m/2^n => r '. The IMHO correct statement is that any area equal to = 1 , in the 2-D grid of numerators and denominators of fractions, contains at least zero and at most four fractions. This means that, for a given r and ε , there must be a search up to numerators or denominators with order of magnitude ( r / ε ) , if r is the (irrational) number to be approximated and ε is the error (tolerance) in the approximation.

Generalization

In three dimensional space, Perfect Hashing is accomplished by 3-D rotation, translation and enlarging the rectangular box with a factor √3 in each direction.

Disclaimers

Anything free comes without referee :-(
My English may be better than your Dutch.