“Works on the machine” – How to fix non-reproducible bugs

debuggingtesting

Very occasionally, despite all testing efforts, I get hit with a bug report from a customer that I simply can't reproduce in the office.

"Works on my machine" syndrome]
(Apologies to Jeff for the 'borrowing' of the badge)

I have a few "tools" that I can use to try and locate and fix these, but it always feels a bit like I'm knife-and-forking it:-

  • Asking for more and more context from the customer: (systeminfo)
  • Log files from our application
  • Ad-hoc tests with the customer to attempt to change the behaviour
  • Providing customer with a new build with additional diagnostics
  • Thinking about the problem in the bath…
  • Site visit (assuming customer is somewhere warm and sunny)

Are there set procedures, or other techniques than anyone uses to resolve problems like this?

Best Answer

One of the attributes of good debuggers, I think is that they always have a lot of weapons in their toolkit. They never seem to get "stuck" for too long and there is always something else for them to try. Some of the things I've been known to do:

  1. ask for memory dumps
  2. install a remote debugger on a client machine
  3. add tracing code to builds
  4. add logging code for debugging purposes
  5. add performance counters
  6. add configuration parameters to various bits of suspicious code so I can turn on and off features
  7. rewrite and refactor suspicious code
  8. try to replicate the issue locally on a different OS or machine
  9. use debugging tools such as application verifier
  10. use 3rd party load generation tools
  11. write simulation tools in-house for load generation when the above failed
  12. use tools like Glowcode to analyse memory leaks and performance issues
  13. reinstall the client machine from scratch
  14. get registry dumps and apply them locally
  15. use registry and file watcher tools

Eventually, I find the bug just gives up out of some kind of awe at my persistence. Or the client realises that it's probably a machine or client side install or configuration issue.