atoi and Trillions of Whales

27/12/2021

When the International Whaling Commission banned the hunting of blue whales in 1966, there were roughly three hundred times fewer blue whales than in the days before whaling. Perhaps that sounds like a big number to you, or perhaps it doesn't, but at the time of the ban there were less than one thousand blue whales in the ocean. The largest animal known to have ever existed was close to extinction.

When the numbers dip so low it's easy to see that whaling must stop. Harder to imagine is what it might have been like to be anti-whaling when pods of thousands still roamed the oceans. I expect that to whalers it must have seemed like a hopelessly petty and pessimistic viewpoint. And above all, illogical, because why would whalers destroy their own industry and livelihoods by over-fishing?

Not only that, but it must have felt like an argument not in good faith. After all, I doubt there were many whalers who actually enjoyed the fact that whales had to be killed. And while I'm sure there were some whalers who had convinced themselves in some twisted way that killing whales was actually a good thing, I believe most will have seen it as a sacrifice that was necessary for the job. After all, they had to make a living like everyone else.

But whatever they felt, the truth remains: whalers did destroy their own industry, and whaling was banned. The lies whalers told, and taught to each other were indeed just lies in the end. And now, pods of thousands of whales are something none of us will ever hope to see in our lifetimes.

"Broad on both bows, at the distance of some two or three miles, and forming a great semicircle, embracing one half of the level horizon, a continuous chain of whale-jets were up-playing and sparkling in the noon-day air. Unlike the straight perpendicular twin-jets of the Right Whale, which, dividing at top, fall over in two branches, like the cleft drooping boughs of a willow, the single forward-slanting spout of the Sperm Whale presents a thick curled bush of white mist, continually rising and falling away to leeward.

Seen from the Pequod's deck, then, as she would rise on a high hill of the sea, this host of vapory spouts, individually curling up into the air, and beheld through a blending atmosphere of bluish haze, showed like the thousand cheerful chimneys of some dense metropolis, descried of a balmy autumnal morning, by some horseman on a height.

As marching armies approaching an unfriendly defile in the mountains, accelerate their march, all eagerness to place that perilous passage in their rear, and once more expand in comparative security upon the plain; even so did this vast fleet of whales now seem hurrying forward through the straits; gradually contracting the wings of their semicircle, and swimming on, in one solid, but still crescentic centre."

It's rare that history works like this. Usually things multiply, get faster, more powerful over time. But occasionally things reduce in quantity, speed, or power. And sometimes that reduction is by several orders of magnitude.

And while most of us can live happily with fewer whales, it's scary to imagine what the world could look like if many of the things we take for granted declined in the same way. What if travel became 10x slower, if food became 10x less nutritious, or if music became 10x less enjoyable? It's sad, too, to think of a missed future where things could have been an order of magnitude greater - a world with 10x less carbon in the atmosphere, 10x less plastic in the ocean, or 10x fewer hungry childern.

Every change that produces an order of magnitude in difference is worth thinking about, seriously.

I can remember being sixteen and first learning in my computing class about the atoi function - the C function that converts a string of ASCII digits into a binary integer. At first I was confused. Why should such a function exist? After all, why would you ever want to store an integer as a string of ASCII digits? Not only was the ASCII encoding extremely inefficient as the numbers got larger, but you also wouldn't actually be able to do anything with numbers stored in this format: if you ever wanted to add, subtract, or multiply them you'd need to convert them to the binary representation anyway - so why not store the binary representation in the first place? I could see practically no advantages to ASCII storage over binary storage.

Then I realized: it wasn't about storage - atoi was for user input. Users typing numbers with a keyboard into a text box or command line had no choice but to provide the numbers they wanted in the ASCII format by typing one digit at a time. Ah! That was a relief, I thought, because it meant atoi was actually kind of a fringe function that I wouldn't have to worry about until I was building a human facing UI, and only then if there was a situation where the user had to actually type in some numbers.

It feels odd looking back at what my understanding was then, because just in the process of downloading and displaying this web page your computer has probably called atoi (or similar functions that do the same thing) hundreds, if not thousands of times. And yet you never typed any numbers into any text boxes... strange...

Even to me now, this is still a bit unfathomable. But I do now at least know why this is the case. Everything became clear during University.

It started with a course on XML, which at first I thought was kind of cool, because XML was like HTML, and I liked web stuff. But then our lecturer started talking about schemas and once again I was lost. I understood the idea - you wanted to validate that the XML provided was of a certain structure - but it all seemed very high level and abstract. I couldn't see what problem was actually being solving.

Because in my mind there were two situations to consider: 1) XML input from an external source (such as a user) which should be treated with extreme prejudice (the structure was the least of your worries) and never be trusted for security reasons. And 2) input from a known, controlled source, which should be capable of producing XML in the correct format from the beginning.

But instead of discussing this, we instead learned how you can cleverly use XML itself to define the schema for the XML you want to process... and this was the end of that section of the course. I knew what XML was - I could type some into a text editor - but I didn't really have any idea how or why to use it, let alone parse it.

Next up was JSON, which I remember thinking was kind of like Python but without the programming language part. Which seemed odd at first, but at least the whole thing was simple, digestible, familiar, and did not seem to exist to solve a whole bunch of abstract future problems I couldn't really get a feel for.

Both XML and JSON, we were taught, were "human readable". Which sounded a bit odd - or at least obvious to me - because I'd seen examples of XML and JSON on the slides. I'd also never really been shown something that was "not human readable". Perhaps because showing something "not human readable" would in some kind of paradox render it "human readable"? I had no idea.

But once I actually started writing programs, and in particular programs which communicated with each other, it all clicked. I could get the output from one program, inspect it using a text editor, and check for any errors before passing it to the next program. This was an incredible contrast to the binary formats I'd dealt with, where opening a hex editor and trying to work out what the hell had gone wrong made you feel like you were reading the fucking matrix.

And then, one day, while trying to run one of my programs on a new windows machine, I opened up a large JSON file in notepad.exe and the program hung. After almost a minute the file displayed but unfortunately each time I scrolled the mouse wheel it took several seconds to update and I couldn't get down to the section of the file I wanted to examine.

As far as I knew, I was still a human, and the file was the same as before, and yet for some reason it was unreadable.

I installed Notepad++ on the windows machine, which unlike notepad.exe, handled the large JSON file with ease (as long as word-wrap was disabled). Unfortunately I could not deny I had encountered a case where "human readability" did not seem to depend on the file format (or my own humanity). And although Notepad++ had saved me in this situation, I realized that having the right program installed was not entirely the key either. "Human readable" files were not readable to all humans. Because I'm pretty sure if I sent my mum (who is a human by the way) a .docx file and a .json file, she would tell me the .docx file was the one which was "human readable". Notepad++ installed or not.

Of course this was never really what "human readable" meant. "Human readable" meant a format which, in most contexts, most people already have a program installed on their computer which they can use to read it.

As programmers we seem to have a feeling that text editors display, and allow us to edit text as-is but as soon as we consider something as basic as the handling of line-breaks or fonts we quickly see that not only is this not true at all, but even the concept of editing data "as-is" doesn't really make any sense. All data must be displayed in some way - no data can be edited "as-is".

And if you think about it, a text editor is a large, complicated program, more similar to any custom data editing application like Microsoft Word than programmers (who perhaps like the idea of being smarter than office workers) would ever like to admit.

"Human readability" is not at all a property of a format itself - it's solely a property of the context in which some data exists.

This is obvious if we perform a little thought experiment: imagine a world where everyone has installed on their machine a nice, GUI, BSON editor.

While such an editor would store data in a binary format, there is no reason why it could not render the data in an editable text window, just like a normal text editor. The difference here would be that this editor could validate the input you gave it when you pressed save, checking if the syntax of what you entered is correct.

There would be some interesting advantages to doing things this way: how the data is displayed could be customized without embedding the formatting in the file itself (uniting the nations of both tabs and spaces), syntax highlighting would be trivial, and the editor could provide powerful, direct tools for manipulating the structure of the data.

Then, going back to my debugging problems at University, with this editor, debugging BSON wouldn't really be very different to JSON. In fact it might be better. So debugging too is all a question of context.

Another way to view these two different worlds is in respect to how atoi is used.

This theoretical BSON editor would use atoi in the same way I imagined in my sixteen year old brain. Because the text box in this BSON editor is essentially a very fancy version of the text box I imagined for use with atoi originally - and when pressing save it would do exactly as I imagined - parse the ASCII input, convert it all into a binary representation, and dump that in memory or to a file.

On the other hand, when using JSON and a normal text editor the situation is flipped on its head. While the text editor itself does not need to use atoi, the programs at either end need to use atoi (and itoa) to actually read and write the ASCII integers.

I therefore like to think of text editors as being stand-alone text boxes. And although some editors will parse the text to do things like syntax highlighting, fundamentally when you press save what happens is the text editor writes to file what is inside the text box, without parsing, processing, or validating it.

And I think that's the thing my sixteen year old brain never would have ever expected - that in the world of computing there would be these huge, complex programs acting as what are essentially stand-alone text boxes. Used to prepare input for other programs, to go into their little text boxes. Or that programs would be built to communicate via text-box-like inputs and text-box-like outputs - even when they know perfectly well the structure of the data they require as input and produce as output.

Because to sixteen year old me, a text box was not a program in itself, a text box was a component of a UI.

But once you're already in this world you can kind of convince yourself it makes sense - yes, sometimes it's difficult to prepare text inputs - you can make mistakes, and need to undo, or may want to save things so you don't lose them. And sometimes text inputs do have some implicit kind of structure that needs to be followed - like writing the address on a letter. So sometimes maybe it makes sense to type JSON into a text box - and maybe because of this it sometimes makes sense to have a program acting as a stand-alone text box to edit JSON with...

At least that was what I felt, until I took a University course on HTTP, and realized that HTTP was a text based communication protocol. It was then that I realized quite the extent to which the vast, vast, vast majority of "human readable" data is never read (or written) by humans.

And one takeaway from the course was to never write your own HTTP server - to always use a library - because rolling your own was considered such a fool-hardy exercise and existing libraries do such an excellent job at parsing, handling, and debugging HTTP responses that there was never any need for it.

But doesn't this kind of defeat the object of the protocol being "human readable," I thought? If each response is parsed by a HTTP server and immediately converted into an internal binary representation, why not send something closer to that binary representation initially? And if there are tools provided that digest errors and convert them into easy to understand messages, then is there really ever a reason to look at a HTTP response manually? The "human readable" HTTP format seemed optimized for a use case that is so rare, it's almost, by definition, impossible - the use case where responses are read more often by humans, than they are by other programs.

And when no user input is involved in producing "human readable" data, each call to a function like atoi is symmetric. It will demonstrably mean there must have been a corresponding call to a function like itoa somewhere else. Just as each time the string User Agent: is read, it means somewhere else another program must have written the string User Agent:. The answer to my previous mystery about why atoi is used so much when downloading a web page is depressingly simple: there are so many calls to atoi when a client receives a web-page because the server has performed so many calls to itoa when sending it.

I believe all programmers should take a few minutes to consider the unfathomably large number of times the itoa function has been called simply for the result to be passed to the atoi function later on. Or to think of the number of times a JSON object has been converted from an internal binary representation and written out as ASCII, simply to be converted back again into binary a few moments later by another computer. Or to think about the number of times the very same JavaScript code must have been parsed, tokenized, and compiled again and again by all of the billions of devices across the world.

"So what?" Many will think: "Computers are fast. Get it to work first. Developer time is expensive. Avoid premature optimization."

Then "human readability", and HTTP in particular, is perhaps one of the worst instances of premature optimization of all time, because if HTTP is optimized for human readability it means it's prematurely optimized for a use-case that happens probably fewer than one in a million times.

What is the cost of such "human readability"? I put together a little benchmark for the following 1) writing 1,000,000 random integers between 0 and 1000 to file using itoa, and then reading them back again using atoi 2) writing 1,000,000 random integers between 0 and 1000 to a file as binary and reading them back again.

Writing and reading the ASCII file took 0.143 seconds. Writing and reading the binary file took 0.003 seconds. That constitutes roughly a 40 times speedup.

Now the specific numbers don't really matter. Nor does it really matter if you store data as binary or ASCII. What matters is the extent to which we all fool ourselves into accepting (and teaching others) things that don't make sense, simply because they are convenient for us.

I don't need to look far for examples because we are constantly surrounded by them, and many of them are far worse than my 40x toy example. There is a well known piece of software made by a very large corporation which I use almost daily for my research. It's an open source project which I'm hugely appreciative for and I have no doubt that the authors are talented, passionate, and clever people. But if I open the (binary) file format it uses to store data in a hex editor I can instantly see why their software has performance problems: a file I have here contains the same string repeated almost 500,000 times.

Now I have no idea of the details or complexities of how this software is made and I can only guess as to why this might be the case. I cannot say if there is an easy fix or not. All I can say is that I'm using the software as intended, in the usual way, and that from my understand of what it does and how it works, I cannot see any reasonable explanation as to why it would need to do this.

I don't know the developers but I would bet if they used someone else's library to build their own software, and if that library sacrificed hundreds of MB of RAM and disk space to store the same string 500,000 times, they would say it was wrong. It's just, at some point, somewhere along the line, they convinced themselves this was okay for their project, or at least - that it was a sacrifice necessary for the job. After all, they have to make a living like everyone else.

Whaling is not the same as computing. Computation is not a finite resource. But it can act like one - because when we build software on top of other software we inherit the inefficiencies. When we communicate with other software we inherit the inefficiencies too. If you call itoa, I must call atoi. My computer only has a fixed number of cycles it can perform per second, and software will always be at least as slow as its slowest part.

But perhaps the more pertinent difference is that in computing we have what is effectively trillions of whales. And this means we can, in some bizarre and twisted way, afford to waste 500,000 times more than we use without it significantly impacting our lives. But that does not mean we should. And it does not mean we should make excuses when we do, or teach each others that doing so is fine. Because those lies catch up with you and this is not the first time humanity has run out of a resource that seemed near infinite. Anyway, it's not the total number of whales left in the ocean which determines if killing a whale is a good or a bad thing. Even a sixteen year old can see that.