Saving this .pdf here.
The relational join operator is a very memory-intensive and even computationally-intensive operation. Though real-life databases can be in the TB range, there are a number of applications of smaller, memory-only databases that could feasibly fit in the 4GB or 8GB of smaller GPUs.
Its a well known fact that relational-joins (and joins-of-joins) can be parallelized. Database programmers meticulously perform planning-algorithms to optimize this important operation and parallelize it across cores or even systems. Seeing research into a natural GPU application warms my heart at least!
GPUs are well known to parallelize and improve upon sorting algorithms (see embarrassingly parallel solutions like Bitonic Sort… but also GPU-specific / SIMD-designed sorting algorithms like MergePath). One of the most common ways to perform a relational join is to sort both sets of data on the relational-join, and then linearly scan through both relations matching up (left.blah == right.blah). This paper seems to take this approach and measures how good GPUs are at this. (At least, for data that does fit in the GPU RAM).
There’s also “Hash-Join”, which is investigated in this paper as well.