tech oriented notes to self and lessons learned
As software engineers we’re tasked with creating solutions to customer’s business problems. Being complex systems, every once in a while flaws inevitably slip in the design or implementation. And sometimes flaws creep in through use of third party software, which can can make problems all the more difficult to track down. Each bug has a story to tell and the stories about hunting the most puzzling, challenging, annoying and time-consuming bugs can sometimes live with you for a long time.
What’s the strangest bug you’ve managed to squash?
Mine was quite a few years ago when we were working on a greenfield Java EE software project for a client in the health care industry. With alpha release cycle nearing I was deploying a new build in a newly created server environment and during testing we found a bug in an isolated software feature. Initially, I thought there was something wrong with the new environment setup, but after a while I realized the bug seemed to be related with the way the new release was built. During development phase we had been building the software using Oracle JDeveloper IDE, but had moved to using Apache Ant with Sun Java JDK. So, at that point I thought – this was Java EE after all – it was a packaging issue. I carefully compared the working and broken release packages, but couldn’t find any significant differences. Though it was troublesome to reproduce, the problem was fortunately nevertheless reproducible, so I started tracking it down with remote debugging. After a while I noticed that the software was executing a weird code path I couldn’t quite explain.
Puzzled by the strange behaviour I didn’t really have a clear idea how to continue troubleshooting, but I decided to take a long shot with comparing compiled bytecode from the working and broken releases. This was my first time looking at disassembled Java bytecode, which made analysis a bit slow, and all the more interesting, but fortunately in my remote debugging sessions I had been able to identify some likely places for the bug. After staring at the disassembled bytecode for a while an initially innocuous looking bit of code started to look suspect: a mutator method was present in one class in the working build, but missing in the broken one. It turned out that when the code was built with Oracle JDeveloper it automatically generated a mutator method for a subclass, which just happened to override a buggy superclass mutator method. In the broken build such a mutator method wasn’t being generated causing the buggy superclass mutator method to execute.
This story happened years ago, and while there are lots of things I do differently nowadays, including use of different technologies, design approach, unit testing, test automation, build methods and tooling etc., for me this was one of those more memorable bug squashing sessions.
Do you have a intriguing bug hunting story to share?