Friday, May 16, 2008

I am digging this learning thing

One of the mixed blessings of Oracle is that there is always something you can learn, no matter how good you are, or how much experience you have in the field. And I still have a lot to learn.

This week I had the unique pleasure of speaking directly with an Oracle Developer about an SR I had file in regards to query. In fact, this query is what lead me down the path of last week's discovery. We have a 3rd party ERP (Banner from SunGard), and one of the tables is GJBCOLR, a "collector" table for various jobs. My understanding is that it used like a scratch table, and jobs will routinely insert rows, work on them, then delete them. Sound like a TEMPORARY table at all? I am told that SunGard needs this to be a permanent table for some reason, and I am still trying to hunt down that reason. But I digress.

This table routinely has a high water mark. In the SR I filed for Oracle, we observed that the stats pegged it with 21 million consisten reads (HHWM), even though it really had about 290 rows. Sure, we can drop the HHWM (ie, truncate table, alter table shrink, etc) as explain in my previous post, but the table will just grow again due to how Banner uses it. So my next thought was to delete the stats and force dynamic sampling. To my horror, the CBO was able to find a low cost FTS, favored over a Index access by a factor of 10. According to my developer buddy, he highly suspects this is due to a bug in dynamic sampling, since the 10053 trace shows the dynamic stats put the number of blocks at an incredibly low 95. We are talking about OSEE, and he found a bug fix in that is only slightly related but he firmly believes it probably affects the same thing. One of the workarounds I wanted to run past the developer is faking the stats, putting them enormously high so that indexes are always favored, and he tentatively agreed. So hopefully that will solve our problem.

However, realizing what a rare opportunity I had to be talking to a developer, I asked him a few questions about the optimizer and the particular query we were looking at. I am sure a month from now I will be smacking my head about not asking him other questions, but hey.... =)

The query plan looks like this:

| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
| 0 | SELECT STATEMENT | | 210K| 50M| | 14353 (6)|
| 1 | SORT ORDER BY | | 210K| 50M| 106M| 14353 (6)|
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN RIGHT OUTER| | 210K| 50M| | 2412 (18)|
| 4 | TABLE ACCESS FULL | FTVDTYP | 54 | 378 | | 3 (0)|
| 5 | TABLE ACCESS FULL | FFBMAST | 210K| 49M| | 2387 (17)|
|* 6 | TABLE ACCESS FULL | GJBCOLR | 1 | 50 | | 29 (4)|

Predicate Information (identified by operation id):

2 - filter( EXISTS (SELECT /*+ */ 0 FROM "GENERAL"."GJBCOLR"
6 - filter("GJBCOLR_VALUE"=:B1 AND "GJBCOLR_ONE_UP_NO"=4549363 AND

I include the predicate information because that becomes important later on. Not knowing much about the FILTER operation other than that it works on two children, I asked him how it works. It is almost exactly like a nested loop operation. However, he specifically said the Hash Join is a source and the GJBCOLR subquery is a probe, which indicates to me a hash join, not a nested loop. But, like a nested loop, each result of the HJ child is compared to the GJBCOLR subquery, hence the subquery is executed 210K times. I asked why the HJ child is not second (as opposed to first), and he mentioned it had to do with how Oracle used part of the GJBCOLR subquery as a filter predicate. I am not entirely positive why that is the intended result, but I hope to chew on it a bit more. As you can see, the subquery was transformed into a EXISTS clause (not present in the original), and I believe that is part of the answer.

Now if only I could talk to a developer everytime I file an SR.....

Friday, May 09, 2008

Learned a couple things this week

We had a couple examples of how a high watermark affects FTS, especially when that FTS gets executed on the wrong side of a FILTER operation. Which got me to delving into the black magic of high water marks.

First off, Oracle only mentions them in passing. If you search for LHWM (low high water mark), you will get a couple hits in metalink for a bug and nothing in the documentation. So, what the heck is a low high watermark anyway? And a high high watermark?

Apparently, after a couple lessons from K Gopalakrishna, Ric Van Dyke and couple others, the LHWM is the boundary at which all blocks below it are gauranteed to be formatted and used at least once. Above the HHWM we are gauranteed that all blocks are unformatted and never used. Between the two we have the "grey area"; namely, a mixture of blocks that have and have not been used. I believe there are two bitmaps (or two levels of bitmaps?), one that maintains the freelists for below the LWHM, and another for above the HHWM. On Full Table Scans, Oracle will scan everything below the LHWM, and then use the L2 bitmap to find blocks that need to be scanned above the LHWM.

Oracle seems to have gone to great lenghts to make these two watermarks as confusing as possible. In the documentation, and when using packages like dbms_space, we are led to believe there is only one HWM, even for an ASSM tablespace.

Thankfully, folks have discovered "dump file". Norman Dunbar gives a couple quick examples. This is good stuff; there is a lot in the dump files that would warrant a closer look. But for right now, let's look at the watermark bits of information:
alter system dump datafile 5 block min 1 block max 100;

Extent Control Header
Extent Header:: spare1: 0 spare2: 0 #extents: 1 #blocks: 8
last map 0x00000000 #maps: 0 offset: 2716
Highwater:: 0x01403030 ext#: 1 blk#: 7 ext size: 8
#blocks in seg. hdr's freelists: 0
#blocks below: 7809
mapblk 0x00000000 offset: 1
Disk Lock:: Locked by xid: 0x000a.013.000003a4
Low HighWater Mark :
Highwater:: 0x01403030 ext#: 992 blk#: 7 ext size: 8
#blocks in seg. hdr's freelists: 0
#blocks below: 7809
mapblk 0x01402aa1 offset: 177
Level 1 BMB for High HWM block: 0x01403019
Level 1 BMB for Low HWM block: 0x01403019

This is not necessarily the world's best example, but heck, it is raw data. =) I have yet to decipher what all that means, but I am working on it. I believe that the LHWM "blocks below" will always be less than the HHWM "blocks below", but the difference would count the number of blocks in that grey area.

As an aside, I also found that you can dump all the file headers with:
alter session set events 'immediate trace name file_hdrs level 10';

The level is a bit confusing, as I tried various numbers and the resulting trace seem to be exactly the same.

On the RMAN side, we came across a peculiar curiosity. With no recovery catalog, if you recover a controlfile and have archive logs beyond the time of the controlfile, we found that on OSEE, RMAN is not smart enough to roll forward if you specify an "UNTIL TIME" in the future relative to the controlfile. Instead, you will see errors that resemble messages if the backup is corrupted; RMAN will attempt to go backwards in time to find a file that matches an appropiate SCN. Of course, it will never find one. This behavior is "fixed" in OSEE (we did not try any of the intermediate patch levels), and RMAN will now roll the database forward beyond the controlfile time.