Contents
Swift makes it relatively easy to format numbers as byte counts, with appropriate suffixes to indicate units and generally sensible auto-selection of scale factors. e.g.:
1234.formatted(.byteCount(style: .decimal))
This is just a small subset of Swift’s FormatStyle
-based formatting capabilities, with which I have a bit of a love-hate relationship even when they’re working correctly.
Alas, they don’t always work correctly; some of these formatters contain egregious bugs.
In particular, ByteCountFormatStyle
pretends to support multiple numeric bases – decimal and binary – but it doesn’t, because what it renders for binary is:
1234.formatted(.byteCount(style: .binary))
Note how it still uses decimal units, “kB”. Decimal is not binary. I mean, duh, right? But apparently Apple don’t know this.
☝️ Binary prefix abbreviations are always two characters, the second always “i”. In this case, “KiB”1. They’re easy to remember because they use the same first letter as their decimal cousins, e.g. “Ti” & “T” for tebi and tera. Their derivation is really simple – take the first two letters of the decimal cousin, e.g. “te”, and the first two letters from “binary”, “bi” – voilà, “tebi”. To abbreviate, just drop the middle letters and use titlecase.
So what?
While the above example is tolerable because, given the rounding applied, the numeric result (“1”) is technically corrected in both bases, see what happens when you use larger values:
Input number | decimal | binary |
---|---|---|
1,000 | 1 kB | 1,000 bytes |
1,024 | 1 kB | 1 kB |
1,000,000 | 1 MB | 977 kB |
1,048,576 | 1 MB | 1 MB |
1,000,000,000 | 1 GB | 953.7 MB |
1,073,741,824 | 1.07 GB | 1 GB |
1,000,000,000,000 | 1 TB | 931.32 GB |
1,099,511,627,776 | 1.1 TB | 1 TB |
1,000,000,000,000,000 | 1 PB | 909.49 TB |
1,125,899,906,842,624 | 1.13 PB | 1 PB |
Once you get up to non-trivial byte counts, you start to get different results even with the heavy rounding that’s applied by default. By the time you’re talking about GBs the error is on the order of 10%. And the error just gets proportionately larger as the input number increases. Wikipedia has a neat little table showing this.
Okay, but that’s just binary
, what about file
and memory
?
Those are just aliases to decimal
and binary
, respectively. So memory
is right out.
You might think file
is still okay, because it essentially means decimal
. But that’s only currently. The entire point of its existence is to allow its behaviour to change over time, following whatever convention Apple believes is most appropriate for files. In the past that was in fact binary
, as noted previously (pre-Snow Leopard). It might be binary
again in future. So it’s dangerous to use since you can neither know what units it’s going to use nor whether Apple will have fixed their formatters by the time it switches off of decimal
.
But context will save us!
Probably not. Go look at file sizes in the macOS Finder. Can you tell whether they’re actual decimal (as they appear) or binary?
They’re decimal, but they used to be binary, up until Mac OS X Snow Leopard (10.6). And the Finder used the same (decimal) unit abbreviations throughout. So at least it’s using the correct units now – almost by accident – but it created a confused transition and history.
Worse and more presently pertinent, not everything Finder-like uses decimal, nor correct units when they don’t. e.g. Synology’s products, not to mention countless websites. That creates a lot of confusion which bleeds across into even well-behaved software.
Point being, you both can’t rely on context to help, because context includes things you can’t control like other software, and by getting the units wrong you’re making it worse for everyone, not just yourself.
Why is this still happening?
Going back twenty years, this kind of error – confusing decimal with binary w.r.t. unit prefixes – was both the norm and somewhat excusable – binary prefixes were only formally standardised upon in 1999, so it’s only to be expected that it will take some time for everyone to adopt them.
But it’s been a quarter of a century already. There is absolutely no excuse anymore for getting this wrong.
- If you’re paying attention you’ll have noticed “k” vs “K”, i.e. the difference in case. This is not arbitrary – it’s because units have to be distinct to be functional, and “K” is the abbreviation for Kelvin (and apparently wins because it’s one of the seven base physical units, from which pretty much all other physical units are derived).
“Ki” is fine because it’s a distinct prefix (and “i” is not a valid abbreviation by itself so there’s no potential confusion about whether it means “Kelvin • i” or “kibi”). ↩︎