Friday, September 14, 2012

Relational database and data mining algorithms

Some algorithms may appear prohibitively expensive to perform computationally.  One of such examples is finding for each data point in table A the nearest K neighboring data points in table B (shown in the figure below). From the surface, it involves calculating the distances between 2 trillion pairs of data points (1 million times 2 million).

However, with database tricks such as binary-tree index, we can make the above algorithm very efficient. Thus, it is helpful for algorithm guys like mathematicians to understand some database technologies.  With those database tricks, we can implement algorithms that may be hard to do otherwise.

No comments: