If you have access to SRs, you can follow the history in SR 3-314198695. For those that cannot, here is a short summary.
We had a query that suffered severe performance degradation after upgrading from 10.2.0.4 to 11.1.0.7. I attempted to use SQLT but initially run into problems with the different versions of SQLT, so I did the next best thing and looked at the 10053 traces directly. After a bit of digging, I noticed several cases where the estimated cardinality was completely off. For example:
First K Rows: non adjusted N = 1916086.00, sq fil. factor = 1.000000
First K Rows: K = 10.00, N = 1916086.00
First K Rows: old pf = 0.1443463, new pf = 0.0000052
Access path analysis for FRRGRNL
***************************************
SINGLE TABLE ACCESS PATH (First K Rows)
Single Table Cardinality Estimation for FRRGRNL[FRRGRNL]
Table: FRRGRNL Alias: FRRGRNL
Card: Original: 10.000000 Rounded: 10 Computed: 10.00 Non Adjusted: 10.00
So, the idea behind FIRST_ROWS_K is that you want the entire query to be optimized (Jonathan Lewis would spell it with an "s") for the retrieval of the first K rows. Makes sense, sounds like a good idea. The problem I had with this initial finding is that every single rowsource was being reduced to having a cardinality of K. That is just wrong. Why is it wrong? Let's say you have a table with, um, 1916086 rows. Would you want the optimizer to pretend it has 10 rows and make it the driver of a Nested Loop? Not me. Or likewise, would you want the optimizer to think "Hey, look at that, 10 rows, I'll use an index lookup". Why would you want FIRST_ROWS_K to completely obliterate ALL your cardinalities?
I realize I am exposing some of my naivete above. Mauro, my Support Analyst corrected some of my false thinking with the following statement:
The tables are scaled under First K Rows during the different calculations (before the final join order is identified) but I cannot explain any further how / when / why.
Keep in mind that the CBO tweak -> cost -> decide (CBQT is an example)
Unfortunately we cannot discuss of the CBO algorithms / behaviora in more details, they are internal materials.
Regarding the plans yes, they are different, the "bad plan" is generated with FIRST_ROWS_10 in 11g
The "good" plan is generated in 10.2.0.4 (no matter which optimizer_mode you specify, FIRST_ROWS_10 is ignored because of the limitation) or in 11g when you disable 4887636 (that basically reverts the optimizer_mode to ALL_ROWS).
Basically the good plan has never been generated under FIRST_ROWS_10 since because of 4887636 FIRST_ROWS_10 has never been used before
I still need to wrap my head around "the limitation" in 10.2.0.4 and how we never used FIRST_ROWS_K for this particular query, but I believe that is exactly what Fix 4887636 was supposed to be addressing.
Here are some of the technical details from Bug 1185896:
]]potential performance degradation in fkr mode
]]with fix to bug4887636 enabled, if top query block
]]has single row aggregation
REDISCOVERY INFORMATION:
fkr mode, top query block contains blocking construct (i.e, single row aggregation). Plan improves with 4887636 turned off
WORKAROUND:
_fix_control='4887636:off'
I assume fkr mode is FIRST_ROWS_K, shortened to F(irst)KR(ows). The term "blocking construct" is most interesting - why would a single row aggregation be labeled as a "block construct"?
Also, this was my first introduction to turning a specific fix off. That in itself is kinda cool.