National Treasure: Book of Geekage

One of my test scripts was failing against a program that displays seat status information.     The test data was in German, translated into raw text for passing by perl.   The original bug report said the program puked when it hit an umlaut, and while it was not puking I could not tell what was happening since the 3 programs I had to verify the output were all showing different things:

  • cygwin linux shell – TribA ¼ne
  • Visual SlickEdit – Trib”ne, with a note showing the ascii chars U+00FC
  • TextPad and Outlook – Tribüne

And my test framework, written by another developer, was saying it was all failing as invalid JSON strings.

hmmm.  

The solution, which I understood once explained but in no way could have figured out on my own:

We assumed the string above was the source of the error.   The raw data passed to our program from a perl-based client program was “Trib\374ne”.   \374 is perl’s octal encoding for decimal 252, which is u-umlaut in the charset latin-1/iso-8859-1.   In the previous release where the bug was found, instead of character 252 being generated, our perl interpreter was skipping the translation from decimal to octal entirely and simply passing   “Trib\\ufffffffcne”.   0xfc is hex equivalent of 252 decimal, and the extra ‘f’s were from an incorrect sign extension to a longword value.   Hence the bug.

So the fact that we were getting a 252/0374/0xfc character now indicated that the fix was in place and the characters were valid, and there was a bug in the test framework’s ability to parse perl encoding back to JSON notation.

Just another day dealing with the most complicated software suite I’ve ever heard of.   Just another reason to fear internationalizing programs.