It’s a pretty common scenario to have an array of some structs, where you frequently iterate through the array using only one field in the struct. This is a cache nightmare – memory is loaded into cache sequentially by prefetching, meaning you’re wasting all that bandwidth loading all the other fields of the struct that you’re not interested in. A far more efficient way is to have each “field” be in a separate array, and do all the careful maintenance of this set of arrays.
Of course, that’s an annoying way to program, and accident prone. A much better way would be if the compiler could do this magically for you, by pulling the struct apart behind the scenes and storing it as such. In this way cache would be used much more effectively, memory bandwidth usage reduced substantially, and everyone made happier.
There are of course issues with this, because modifying the length of the array would be more expensive with multiple arrays behind the scenes, and it may be difficult to determine automagically which approach is better [in programs which have multiple distinct access patterns to that array]. Still, it seems like it should be an option – there’s plenty of cases where such an optimisation could be made, and yield significant improvements.
Now if only I knew the intimate ins and outs of the gcc 4.x source. D’oh. :)