Focus on lowering algorithmic complexity and don’t perform other optimizations until their usefulness is demonstrated by thorough profiling: since I completely agree with this maxim, I feel obliged to explain why my latest lightcuts update seems to contradict it.
First of all, let’s recap: in order to select an optimal subset of the available lights per sample, a lightcuts implementation needs to maintain them organized into a binary tree. The leaves of this tree are the actual lights, while intermediate nodes represent clusters of lights. A cluster of lights is a sort of imaginary light whose position, intensity and orientation is somehow representative of all its children.
The original paper spends only a paragraph on tree building and related issues. The point is that building the light tree is trivial, but building it efficiently is far less trivial. Miroslav Mikšík, in his paper dealing with implementation issues regarding lightcuts, notes that a naive algorithm may have a O(n³) complexity; he devised a technique to reduce such complexity to O(nlogn).
Keep in mind that a robust implementation should be able to deal at least with several thousands of lights, and should allow venturing into the millions! This is one of those cases where complexity does matter.
Right now, though, I’ve implemented a simple O(n²) scheme, taking advantage of the nice heap mini-library available in Blender. I’ve spent some time, on the other hand, ensuring that a single memory allocation is required for the entire tree, instead of one for each node. This is possible because the number of nodes in the tree is predictable in advance. This reduces fragmentation, is cache friendly and, since it’s possible to use offsets instead of pointers, its memory footprint is the same on 64 bits machines instead of increasing because of the doubled pointer size.
Next, I want to move on to the other parts of the algorithm, in order to have a working prototype fairly soon. It probably won’t support a large number of lights, nor oriented lights (spot) which are probably going to be added later on, as they require special treatment. But at least it will be possible to validate the overall architecture of the project, and people will be able to start playing with it.
For this purpose, the current algorithm has both acceptable performance (I’m not going to debug with large number of lights at the beginning, anyway) and is easier to debug.