A Hard Drill Makes an Easy Battle

I can’t say enough nice things about VMWare. This program has been amazingly helpful during the last few weeks as we tried to get CityDesk to work on every known version of 32-bit Windows. I have set up dozens of virtual machines, everything from a simple DOS partition (helpful as a starting point for installing other OSs), a bunch of combinations of NT 4.0, Chinese and Hebrew Win2K (even though our program is in English and doesn’t do anything fancy, we had various bugs that were revealed on these systems), assorted versions of Win 95/98/Me going all the way back to the August, 1995 release, even a small network of machines with a primary domain controller which we used for testing FogBUGZ setup.

VMWare

Getting code to work on the entire universe of Windows machines is a lot of work. That’s the real appeal of “write once, run anywhere” systems like Java. In theory, if you use the Java Virtual Machine, the burden is on the VM vendor to provide compatibility with all these platforms. In reality, as Java programmers learned, code is just too fragile for this to work very well. When I developed a game with Java I learned that Java’s inability to guarantee exactly when threads would run (a seemingly harmless concession to the fact that CPU scheduling is basically unpredictable) actually meant that on the Macintosh, some threads got starved, um, forever, basically, until the other threads tried to do i/o in fact, which is not what I had assumed and made my game not very challenging on Macs. (This was in 1996. Don’t email me with workarounds, fixes, or to say that this bug has been fixed.)

Yesterday’s Bug Du Jour is an example of the kind of thing that trips people up. Michael was allocating some memory using the ancient Windows API GlobalAlloc. Later, he was calling a function GlobalSize to determine the size of that memory. On our development systems (Windows 2000) GlobalSize returns the same value as the block you allocated. Allocate 13 bytes, and GlobalSize will return 13.

We got a bug report that “Copy and Paste don’t work” from a user with Windows 98. As you see from the screenshot above, I’ve got a Windows Me VM set up with VB6. Stepping through the code in the debugger I noticed the GlobalSize call, and remembered from the days of Win 95 that GlobalSize used to return the size of the actual allocated block, which was larger than you allocated and usually a multiple of 64. This led to the bug.

Now, the programmer at Microsoft who changed the behavior of GlobalSize probably thought he wasn’t breaking anything. The documentation for the function “GlobalSize” says clearly, “the size of a memory block may be larger than the size requested when the memory was allocated.” In fact the Microsoftee probably thought it was a minor and harmless improvement to GlobalSize to have it always return the size that you requested. Obviously any old code, that doesn’t trust the return value from GlobalSize, will continue to work. So why not improve the function?

Not every programmer studies every line of the documentation for every function they use, and if the code is working, they tend to move on to something else. And it’s not like everything is documented all that well — the type of minutiae that I’m talking about here rarely gets discussed in documentation. And this is where you get these issues. The rest of the world discovered this problem, long the bane of WINE programmers, when the second web browser shipped, and suddenly everybody was noticing how the bugs they had counted on to make their pages look kewl weren’t there any more. As we speak zillions of HTMLers are bemoaning the fact that IE6 now adheres to the standard for what <CENTER> does to the text inside a table, and so their pages that looked left-justified look like wedding invitations in IE6.

How did we get into this mess? Larry Wall famously said, “People understand instinctively that the best way for computer programs to communicate with each other is for each of the them to be strict in what they emit, and liberal in what they accept.” I think that the evolution of HTML has proven that this isn’t such a great idea. In fact, the stricter the API is about its input, the more likely the code is going to work in funny situations. The designers of Java got it right when they decided that nothing about the Java spec should leave any choice to the compiler developers (at least, not in the gratuitous way that C did, where the size of basic data types was not fixed). A better quote comes from Russian Field Marshal Suvorov: “A hard drill makes an easy battle.” You want your compiler and your development environment to be as strict as possible; you want it to literally generate random return values for GlobalSize so that you don’t get into the habit of counting on something that won’t be there everywhere; you want to use French international settings on Chinese Windows 2000 with an absurd color scheme, DVORAK keyboard, trackball, 640×480 VGA mode, and huge ugly fonts on your development system so that you remember to bake in the code that adjusts for all these things. Then your application will be buff and strong and it will laugh in the face of wimpy problems like people who use commas instead of dots as the decimal. Ha. I eat commas for breakfast, your code will say, with a Russian accent.

Anyway, this is what it takes to get software that works on hundreds of millions of computers. Those of you who develop apps that only have to run on one computer or in a controlled environment have it easy, but you’re getting flabby. One of these days you’ll need to get it to run on a second computer and you’ll need to pull an all nighter, installing a complete development environment on that computer and debugging for two hours, before you discover that you didn’t account for the possibility of spaces in the installation file path because the first computer didn’t have them. 

Hopefully in the future the concept of a Virtual Machine, whether it’s Java, .NET, or something else, will alleviate this pain, but we ain’t there yet. For now I’m happy that with ten minutes of debugging, I can make my app work for all the people who like pink on orange text and set up Windows accordingly. We spent probably 3 weeks of work, out of a 1 year development cycle, fixing configuration bugs. A small price to pay to increase the size of our potential customer base from just US Windows 2000 to the entire universe of NT 4, 95, 98, Me, and XP, worldwide. Cool.

About the author.

In 2000 I co-founded Fog Creek Software, where we created lots of cool things like the FogBugz bug tracker, Trello, and Glitch. I also worked with Jeff Atwood to create Stack Overflow and served as CEO of Stack Overflow from 2010-2019. Today I serve as the chairman of the board for Stack Overflow, Glitch, and HASH.