So, last Sunday was La Trobe’s Open Day. Rob & I volunteered (or were possibly conscripted… hard to tell) to present our project on the day. I figured it’d just be a good way to earn brownie points while doing work I should be doing anyway. :D
As it turns out, because we were up on the 3rd floor we didn’t get that much traffic, since many people weren’t aware that there was anything above the ground floor. That worked well enough for me – I was able to spend most of the time coding (well… when I wasn’t chatting or hunting down free food :D ). I, er, “encouraged” :) Rob to speak with people in most cases. Mainly because I was coding, but also because he sounds more enthusiastic about the project. Plus, I was a bit embarrassed by some of the attention we were given, particularly by Song who kept dragging people in from everywhere to show them our project. I think she got a little bit excited. :)
Anyway, I had known the AVR compiler – we’re using ImageCraft 7, because Rob says so although he can’t remember why :) – would probably be pretty basic. I was partially surprised. Some things I had expected trouble with weren’t actually that bad, and other things I’d never thought about were. Like macros… it basically doesn’t support them. It has a supposedly-C89 preprocessor, supposedly ANSI-conformant, which couldn’t handle most of my macros. I’ve been forced to manually expand many of them, which of course makes them less efficient and possibly erroneous. Grrr.
I also found that there’s no special support for ## in regards to __VA_ARGS__. So, my macros for doing logging – used only in a few hundred places – didn’t work, because not all of them have parameters to go with their format string. Great. I tried a lot of different workarounds, but just couldn’t get much success. I was further hampered by the inability of the preprocessor to expand a define to “, NULL” (without the quotes). I figured it was a nasty hack, but a simple thing to add – just append some define, say STUPID_AVR_COMPILER, to the end of any log line which didn’t have any parameters. On a real compiler that define would expand to nothing, while on the AVR it could expand to a NULL parameter, which would satisfy the preprocessor’s need for an argument without really doing anything, aside from wasting space and execution time.
Anyway, I spent a few hours on this, at the end of which I just added a NULL parameter to all the necessary macro invocations just to be done with it. When I got home that night I finally went through and redid everything by splitting the macros into ones with arguments and ones without. That took another hour or so, but at least now it all works and doesn’t too badly violate my sense of code aesthetics.
There were other minor issues – I’d forgotten to account for the fact that some of the files had never been compiled before, since they were AVR-only, so there were a few stupid typos in those… but they were trivial to fix. After a few hours, I was able to compile all the files… and found that the binary didn’t fit. It wanted over 11K of SRAM; we have of course just 8K. D’oh.
I was perplexed at first… I had no idea what was using all that SRAM; I’d chosen conservative values for the various buffer sizes, since I knew the whole shebang worked fine with tiny buffers – I designed & tested it as such – which should have left me with more than 7K of SRAM free.
After poking around for a while, I randomly tried the compiler option to store string literals in Flash only… and of course that was it… at that point we had about 10K or so of strings – nearly all of them logging messages, which by default are copied out of Flash into SRAM. Since the SRAM and Flash are accessed using separate instructions, they have [potentially] overlapping address spaces, and thus you need to know explicitly which you’re using. So now that the strings are only in Flash I have to go through and rewrite everything which uses a string to pull it from Flash first… that’ll waste space in SRAM for buffers, but it’s better than not fitting at all.
Once I did that, we had our first successful compile. Unfortunately, the “utilisation” was 93%. Since there’s not any explanation anywhere of what utilisation means in this context, we had to fiddle a whole lot to deduce that it refers to Flash use. In doing so I also found that it seems there’s a limit in the compiler which is preventing it from using the full 256k Flash – it seemed to be indicating it was using only 64K !. I’ll have to keep looking into that at some point..
Anyway, Rob wasn’t too impressed with that. I don’t see what his problem is; surely you can write the control system for a plane in just 18K (or 9K if my hunch is correct) of Flash, right? :D That’s at most 9K (4.5K) instructions, btw, to better put that in perspective.
Clearly that wasn’t good enough. But I wasn’t too surprised; I hadn’t optimised for code size in any way, only SRAM usage and correctness (of course), so I knew there was plenty of room for improvement.
At that point we also tested the code compression & optimisation in the compiler… while I probably wouldn’t trust those anyway, given how dodgy the compiler is, they did reduce the usage to 75% or so.
The biggest single change I was able to make was by removing the logging strings. Normally each log invocation would output the file name (full path with the AVR compiler, which is annoying) and line, and the message itself along with any parameters. By reducing this to just file name and line for non-debug builds, I was able to get the utilisation down to 66% (49% compressed & optimised). I was able to squeeze out another 3% by changing the error type from int to char; since it’s an 8-bit micro where ints are 16 bit, it was having to include at least two extra instructions for every err exchange, of which there are hundreds. Impressive how such a trivial change can make such a relatively big difference. There are many other places where I’ve used ints as a generic type where I could get away with chars, which I’ll probably update some time in the future if we need further space savings.
I’m disappointed though at the efficiency with which some of my macros were implemented by the compiler… a few that are used a fair bit are for reading ints (8, 16 and 32 bits) out of a character array. This really should be only a few instructions to load data from SRAM into registers – could even be done in a loop for the 32-bit integers – but the compiler is much much dumber than that… it ends up generating a good dozen or more instructions for every byte. That means reading a 32-bit integer requires about 60 instructions. I might revisit this later and see if I can’t wipe that out completely by writing my own assembly to do the load, but my first workaround was to try reducing the complexity of my macros… many of those instructions were wasted computing offsets and whatnot, since the macro invoked an address-calculating macro once for each byte, which was expanded out literally by the preprocessor and not optimised by the compiler… I tried a simple pointer cast and dereference, which compiles to a fixed 16 instructions or so per read, regardless of size (or scaling less than linearly, anyway). It saved instructions, but of course I have no idea what endianness the AVR is. Well, that’s a nonsensical statement, since the AVR is an 8-bit micro and thus has no concept of endianness. I should say, what endianness the compiler has. I’m hoping it’s little endian, because although little endian does suck, it is of course what Microsoft assumed when defining FAT. There’s no way I can think of to test that short of trying it on the AVR itself, and of course it’s not in what little documentation the compiler has… grr..
Anyway, 63% is good enough for now. Rob seems to think he’ll use only a fraction of that, but I’m not so convinced… we’ll see, in any case.