>> Patch Thy Servers
Would you believe an entire website was brought down due to the Opera browser? A friend reminded me of one our our most frustrating exercises from several years ago. We had done a fairly good job of setting up monitoring of our servers–the nature of our agile development process required it. The developer would make changes to code and we would test and roll them out quickly–before management could change their mind and request more features. So we were always ready for surprises.
We started observing random website failures…failures that at first, seemed to happen around 3am or so, cured by an IIS reset. But this did not give us the root cause of the problem. We suspected a memory leak, ran countless perfmons, went over the code with the developer and tried everything we could think of, but since we could not reliably reproduce the problem, we were stuck. Furthermore, the problem only happened every week or so at first, so it was hard to sustain investigation.
Subscribe by RSS