I’m extremely excited about the results of Understanding the Efficiency of Ray Traversal on GPUs, and the related work by NVIDIA on ray traversal. In a programming way of course.
There’s this interesting paradigm shift from a strongly geometric grid model to one where we have persistent threads running small kernels (or actually large kernels due to the way CUDA code is currently linked) and grabbing their own jobs. The interesting thing about this shift is that this is the way PS3 developers on Cell have been writing SPU job systems for years. Now I admit that the underlying hardware is radically different (massive hardware threading and wide SIMD vs no hardware threading and more conventional SIMD), but the same simple primitives of a resident kernel using atomic increment to grab from a shared job list still apply. I have no idea where this programming model is going to converge, but it definitely looks like it is.
(Atomic increment is actually only CUDA compute 1.1, so even your 1 year old laptop with an NVIDIA mobile chipset can probably run this sort of code. Of course it’s nicer with the 1.3 voting primitives, but you can emulate these through shared memory, so no need to go bargain hunting for a GTX 260 just yet.)