Friday, November 10, 2006

Buggy bugs

This past Sunday (05-NOV-2006) we installed patch 4752541 (Intermittent PLS-306 / ORA-1722 / ORA-1858 under load) to fix a problem our webapp was having on overloaded procedure calls. We were looking golden up until Wednesday when our Production system starting spiking on library cache latch waits (and log file sync waits were in there as well). Being good little Oracle DBAs, we filed a case with Oracle Support, and learned a bit, but nothing really concrete. That evening we hit another major slowdown. It was decided to yank out the Sunday patch because it did exhibit some relationship and we thought it best to be safe. So here we are, Friday, with no more critical slow downs but still hitting the original webapp errors. We now have 3 SRs open with Oracle, 4 engineers working on them, two "team leads"/duty managers keeping tabs, and a cell-phone number for the Director of Oracle Support.

All this to say that Oracle is really, very complex. We still do not even know for sure if the patch caused the slowdown, all we have is the circumstantial evidence. This also showcases why you get a 10gR2 patchset that is rife with bug fixes and has a footprint that is as large as a baseline install. 10.2.0.3 promises to be more of the same, meaning that 10.2.0.2 had little to no impact on the number of bugs. Granted, 10.2.0.2 did fix a large number of problems, but it looks like the sum of the ones that got through the cracks and newly introduced bugs (ala "Buggy bugs") totals the bugs fixed.

Just to set the record straight, I only complain and gripe about software I love.

No comments: