The following code prints the Fibonacci sequence. You’ve probably seen it before. It’s one of the simplest and most well-known examples of a sliding window operation – where the next value depends on the preceding two (or more) values. While almost all programs do not calculate the Fibonacci sequence, many do contain similar sliding-window algorithms. And code that uses the reduce(_:_:)
/ reduce(into:_:)
methods is usually doing a similar thing, too.
var previous = UIntXL(0)
var current = UIntXL(1)
print(previous)
print(current)
while true {
let next = previous + current
previous = current
current = next
print(next)
}
Short of using a different algorithm entirely (there are much smarter ways to calculate Fibonacci numbers), you’d think it’d be optimally fast – I mean, how can the above example get any better, really?
Well, by doing this:
var previous = UIntXL(0)
var current = UIntXL(1)
print(previous)
print(current)
while true {
previous += current
swap(&previous, ¤t)
print(next)
}
Sometimes this version is significantly faster (e.g. 30% in a recent example, and I’ve seen up to 14x in some of my own, similar cases).
Admittedly, sometimes it’s merely as fast – sometimes the Swift compiler optimises the first version into the second for us. But the compiler’s optimiser is unreliable and unpredictable. So if performance is important to you, it’s best to use swap
explicitly rather than hang your hopes on the compiler.
What does swap
do?
Its purpose is pretty obvious – it swaps the contents of two variables. Logically it’s equivalent to1:
func swap(_ a: inout T, _ b: inout T) {
let tmp = a
a = b
b = tmp
}
But that’s not how it’s actually implemented. The implementation starts here, though the pertinent details are hidden behind compiler built-ins. Suffice to know that it boils down to simply swapping the raw bytes.
Why is that better?
It avoids unnecessary work: temporary object initialisation and deinitialisation, memory allocation (and frees), reference-counting, etc. That can be from the more efficient swap itself, and/or from mutating one of the values in place rather than creating a temporary third value.
It can also prevent unnecessary copy-on-write behaviour, if you’re mixing the swap in amongst other mutations on value types that implement reference semantics2 (like the standard Array
, Set
, etc). That usually saves time but can also, in some relatively rare circumstances, save memory too (by not leaving duplicate copies of internal buffers in memory indefinitely).
It also works for ~Copyable
types without any extra effort.
Is it always optimal?
For actually swapping two values in RAM, yes.
Well, maybe. Mostly? It turns out – after I initially published this post – that it’s a bit more complicated than that. Sometimes an alternative method is slightly faster: the “tuple swap”. e.g.:
var previous = UIntXL(0)
var current = UIntXL(1)
print(previous)
print(current)
while true {
previous += current
(previous, current) = (current, previous)
print(next)
}
In some benchmarks that’s faster, in some it’s slower3. It seems to always be faster than using a named temporary, just like swap
, although I don’t believe that’s guaranteed behaviour. Furthermore, it relies on the compiler successfully recognising that particular expression of the swap intent and optimising it appropriately. Plus, using swap
is shorter, clearer, and less error-prone. So I think an explicit call to swap
is still the best option.
In any case, there are sometimes faster methods which eliminate the need to actually swap the bytes around. e.g. using a pointer to toggle between the two values. To my knowledge, Swift doesn’t provide any way to actually do that for value types, since Swift tries to pretend that pointers don’t exist for them. 😔
Hypothetically the optimiser could perform that optimisation for any code like the above examples, and others, although I’ve never seen it do that.
☝️ swap
has a cousin, swapAt
, on MutableCollection
s. It’s even less known than swap
, but it’s potentially even more frequently useful in general code. Its purpose is to likewise ensure that swapping two elements within the same collection is as efficient as possible, irrespective of how the compiler is feeling that day.
But, bizarrely, its implementation does not use swap
or otherwise contain the same guaranteed optimisations. So it might not be the fastest way to swap two elements. 😕
It might be an unfortunate consequence of Swift’s exclusive access checking – specifically its lack of support for non-overlapping access within a collection – because if you do try to use swap
yourself on the elements of a collection, the compiler refuses it:
❌ Overlapping accesses to 'x', but modification requires exclusive access; consider calling MutableCollection.swapAt(_:_:)
- Ignoring support for ~Copyable types, which can be done but requires a more complicated implementation than we need to worry about here. ↩︎
- Many value types in Swift are not actually just simple sequences of bytes. They often contain pointers to potentially-shared reference objects (e.g. classes or actors), or raw memory buffers. When you copy the value – which includes simply assigning it to a new variable – there can be a lot of work involved in incrementing reference counts for these now-shared internal objects & buffers (which also means time spent decrementing those counts some time later).
And that’s assuming a type that uses copy-on-write (or similar) optimisations – otherwise, you’re going to have to copy all those internal memory buffers as well, which can be tremendously expensive.
Even mere reference counting operations are actually pretty expensive – tens to hundreds of instructions each, typically. Usually in Swift we don’t notice them because the optimiser works very hard to actually remove them, as much as possible. ↩︎ - I put together some benchmarks in researching & writing this post, with the intent of using them as specific examples of the performance differences. Unfortunately they are apparently too simple, and the optimiser is generally able to optimise the named temporary method into using
swap
or the “tuple swap” method (even though in real-world code it usually fails to do so, in my experience). ↩︎