Production Bugs – Understanding Problems When Things Break in Production

bugproduction

Scenario:

  • You push to production
  • The push broke multiple things
  • That same build did not break qa or dev
  • As a developer, you don't have prod access.
  • There is lots of pressure from above to get things working agian.

Specifics:

  • PHP/MVC application that is API-driven in Zend.
  • Deployed to a few servers.

My question:

While investigating, lets say I have a hunch that something is wrong. But, I don't know for sure. And, of course, I can't test things in production. If I have a suggested fix based on that hunch, would it be wise to try and apply it and see if it works, before understanding what the problem is?

Best Answer

Grab as much information about the problem as you can (logfiles etc.) and then rollback the production servers to a working state. That's a pain from the developer's point of view of course, but is most likely a given.

Next, try and see if you can reproduce the problem in a development environment. If you can, then fix it and try releasing again.

If you can't reproduce it, then see if you can add more diagnostics and release to one server for a short time to get more information about the problem.

If that's not possible then look more closely at the differences between production and the dev/qa environments and try to make a dev environment closer to production.

Related Topic