Friday, May 16, 2008

I am digging this learning thing

One of the mixed blessings of Oracle is that there is always something you can learn, no matter how good you are, or how much experience you have in the field. And I still have a lot to learn.

This week I had the unique pleasure of speaking directly with an Oracle Developer about an SR I had file in regards to query. In fact, this query is what lead me down the path of last week's discovery. We have a 3rd party ERP (Banner from SunGard), and one of the tables is GJBCOLR, a "collector" table for various jobs. My understanding is that it used like a scratch table, and jobs will routinely insert rows, work on them, then delete them. Sound like a TEMPORARY table at all? I am told that SunGard needs this to be a permanent table for some reason, and I am still trying to hunt down that reason. But I digress.

This table routinely has a high water mark. In the SR I filed for Oracle, we observed that the stats pegged it with 21 million consisten reads (HHWM), even though it really had about 290 rows. Sure, we can drop the HHWM (ie, truncate table, alter table shrink, etc) as explain in my previous post, but the table will just grow again due to how Banner uses it. So my next thought was to delete the stats and force dynamic sampling. To my horror, the CBO was able to find a low cost FTS, favored over a Index access by a factor of 10. According to my developer buddy, he highly suspects this is due to a bug in dynamic sampling, since the 10053 trace shows the dynamic stats put the number of blocks at an incredibly low 95. We are talking about OSEE 10.2.0.2, and he found a bug fix in 10.2.0.3 that is only slightly related but he firmly believes it probably affects the same thing. One of the workarounds I wanted to run past the developer is faking the stats, putting them enormously high so that indexes are always favored, and he tentatively agreed. So hopefully that will solve our problem.

However, realizing what a rare opportunity I had to be talking to a developer, I asked him a few questions about the optimizer and the particular query we were looking at. I am sure a month from now I will be smacking my head about not asking him other questions, but hey.... =)

The query plan looks like this:

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 210K| 50M| | 14353 (6)|
| 1 | SORT ORDER BY | | 210K| 50M| 106M| 14353 (6)|
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN RIGHT OUTER| | 210K| 50M| | 2412 (18)|
| 4 | TABLE ACCESS FULL | FTVDTYP | 54 | 378 | | 3 (0)|
| 5 | TABLE ACCESS FULL | FFBMAST | 210K| 49M| | 2387 (17)|
|* 6 | TABLE ACCESS FULL | GJBCOLR | 1 | 50 | | 29 (4)|
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter( EXISTS (SELECT /*+ */ 0 FROM "GENERAL"."GJBCOLR"
"GJBCOLR" WHERE "GJBCOLR_VALUE"=:B1 AND "GJBCOLR_ONE_UP_NO"=4549363 AND
"GJBCOLR_JOB"='FFRAGRP' AND "GJBCOLR_NUMBER"='01'))
3 - access("FFBMAST_OTHER_SRC_DTYP_NUM"="FTVDTYP_SEQ_NUM"(+))
6 - filter("GJBCOLR_VALUE"=:B1 AND "GJBCOLR_ONE_UP_NO"=4549363 AND
"GJBCOLR_JOB"='FFRAGRP' AND "GJBCOLR_NUMBER"='01')



I include the predicate information because that becomes important later on. Not knowing much about the FILTER operation other than that it works on two children, I asked him how it works. It is almost exactly like a nested loop operation. However, he specifically said the Hash Join is a source and the GJBCOLR subquery is a probe, which indicates to me a hash join, not a nested loop. But, like a nested loop, each result of the HJ child is compared to the GJBCOLR subquery, hence the subquery is executed 210K times. I asked why the HJ child is not second (as opposed to first), and he mentioned it had to do with how Oracle used part of the GJBCOLR subquery as a filter predicate. I am not entirely positive why that is the intended result, but I hope to chew on it a bit more. As you can see, the subquery was transformed into a EXISTS clause (not present in the original), and I believe that is part of the answer.

Now if only I could talk to a developer everytime I file an SR.....

2 comments:

RichardJQ said...

You'll find a number of those "temporary tables" in Banner. Especially in the finance and alumni modules. We just fake the stats.

I'll be sure and mention your name the next time I file a TAR. Maybe it will help me get past the front-line support more quickly! ;)

Charles Schultz said...

That is what we ended up doing as well. But trying to get SunGard to sign off on it is like pulling teeth.

Can you share your list of temp tables that you have faked stats for, so far?