If anything is to demonstrate why I'm unsuited to blog-writing, it's the fact that I've been planning to write this post for about five years, and haven't done so because it's trivial!
When doing auto-resizing contiguous storage (e.g. C++'s std::vector), the normal growth mechanism is resizing by a multiplicative factor - e.g. doubling the size each time. My boss of 5 years ago pointed out that doubling is probably a bad factor. Imagine you have some kind of zero-overhead allocator, and this is the only allocation going on in your address space. The first allocation takes 1 unit, then 2 units, 4 units, etc. The next allocation takes 2^n units, but the sum of allocations so far take (2^n)-1 units. In other words, the new allocation must be placed in a fresh part of the address space, rather than recycling previously-used-and-now-free space. This can be viewed as rather wasteful (as such previous allocations are unlikely to be returned to the OS).
This got me thinking as to what is the optimal allocation resizing factor, in this rather simplified view of the world? We continue to assume multiplicative scaling. The array first takes 1 unit, the first resizing x units. The next resize can't go into the 1 unit space, so we tack x^2 on the end. However, we may be able to fit the next allocation (x^3) into the freed (1+x). Then we free the x^2, and x^4 goes in its place. x^5 is after that, and again we can hope to put the x^6 into the space previously occupied by (x^3 + x^4).
What x is appropriate? In both cases, the equation boils down to 1 + x = x^3. As I am lazy, I fed this into Wolfram Alpha rather than solve it myself. The solution by radicals looks ugly, so suffice to say the result is around 1.32. This is one of those cases where an approximation to e.g. 4/3 would be pretty pessimal, as the allocation then wouldn't fall into the pattern we're aiming for. Given the overhead of allocation, you're probably better off with a growth factor of 1.3 or maybe even 1.25.
At a scaling factor like that, you're going to be spending a lot of time copying data. The reallocation overhead may be a constant factor compared to getting the array size right in the first place, but that constant will be big.
What about if we're willing to do an extra step before we can start recycling memory? The real root of 1+x+x^2 = x^4 is just over 1.46. That's somewhat better. And if we're willing to do more and more steps, what is the limiting factor if we wish to eventually be able to fit the newly allocated array into some previously allocated memory? Why, the golden ratio, of course! (The limit is not a factor of 2, as it might look naively, since you have to keep the previous allocation until copying is complete)
All of this is somewhat silly and academic - in a real environment a growth factor of 2 is probably pretty reasonable in terms of avoiding unnecessary copying. On the other hand, from this we see that a growth factor of 1.5 might not be an unreasonable trade-off, in terms of reusing existing memory if you've got an appropriate allocator. A fun little toy.
Posted 2011-04-12.