It was actually in a 3rd party image viewer sub-component of our application.
We found that there were 2-3 of the users of our application would frequently have the image viewer component throw an exception and die horribly. However, we had dozens of other users who never saw the issue despite using the application for the same task for most of the work day. Also there was one user in particular who got it a lot more frequently than the rest of them.
We tried the usual steps:
(1) Had them switch computers with another user who never had the problem to rule out the computer/configuration. - The problem followed them.
(2) Had them log into the application and work as a user that never saw the problem. - The problem STILL followed them.
(3) Had the user report which image they were viewing and set up a test harness to repeat viewing that image thousands of times in quick succession. The problem did not present itself in the harness.
(4) Had a developer sit with the users and watch them all day. They saw the errors, but didn't notice them doing anything out of the ordinary to cause them.
We struggled with this for weeks trying to figure out what the "Error Users" had in common that the other users didn't. I have no idea how, but the developer in step (4) had a eureka moment on the drive in to work one day worthy of Encyclopedia Brown.
He realized that all the "Error Users" were left handed, and confirmed this fact. Only left-handed users got the errors, never Righties. But how could being left handed cause a bug?
We had him sit down and watch the left-handers again specifically paying attention to anything they might be doing differently, and that's how we found it.
It turned out that the bug only happened if you moved the mouse to rightmost column of pixels in the image viewer while it was loading a new image (overflow error because the vendor had a 1-off calculation for mouseover event).
Apparently, while waiting for the next image to load, the users all naturally moved their hand (and thus the mouse) towards the keyboard.
The one user who happened to get the error most frequently was one of those ADD types that compulsively moved her mouse around a lot impatiently while waiting for the next page to load, thus she was moving the mouse to the right much more quickly and hitting the timing just right so she did it when the load event happened. Until we got a fix from the vendor, we told her just to let go of the mouse after clicking (next document) and not touch it until it loaded.
It was henceforth known in legend on the dev team as "The Left Handed Bug"
This depends wildly on the situation, the bug, the customer, and the company. There is always a trade-off to consider between correcting the implementation and potentially introducing new bugs.
If I were to give a general guideline to determining what to do, I think it'd go something like this:
- Log the defect in tracking system of choice. Discuss with management/coworkers if needed.
- If it's a defect with potentially dire consequences (e.g. your example #2), run, scream, jump up and down till someone with authority notices and determine an appropriate course of action that will mitigate the risks associated with the bug fix. This may push your release date back, save lives, wash your windows, etc.
- If it's a non-breaking defect, or a workaround exists, evaluate whether the risk of fixing it outweighs the benefit of the fix. In some situations it'll be better to wait for the customer to bring it up, since then you know you aren't spending time fixing/retesting things when it's not 100% required.
Mind you, this only applies when you're close to a release. If you're in full development mode, I'd just log the defect so it can be tracked, fix it, and call it done. If it's something that takes more than, say, half an hour to fix and verify, I'd go to the manager/team lead and see whether or not the defect should be fit into the current release cycle or scheduled for a later time.
Best Answer
Manually written values instead of constants
Example:
and thousands of use of 1, 8010, 8011 and 8096 in other places. Try to image if the default district now is 2 and 8011 moved to 8012.
Fix:
and use this constants everywhere where you need to determine default district id and/or other static values.
Or even:
to get actual values from db. But this is just an example.