Monday, November 26, 2007

Managing CRS, part 1

This CRS beasty is a bit much to chew on. Maybe it is just me.

Anyway, I started asking around how to check our mount and start options for databases registered with CRS. Strangely, nobody had an answer for me. Probably lack of me asking the right question, rather than lack of knowledge. But, I did find an easy answer:

srvctl config database -d DB_NAME -a

This helped me understand why our standby RAC databases were opening in read-only mode as opposed to mounting into a recovery mode. I had been following the MAA documentation for setting up a standby, which ostensibly does not cover the mount and start options (nor role) for standby databases. Very curious. I modified our standby databases with this kind of command:

srvctl modify database -d DB_NAME -r PHYSICAL_STANDBY -s mount -y AUTOMATIC -p +DATA/db_name/spfiledb_name.ora

While these options are well documented, you have to find it first. For those of us who are new to the scene, we do not always have the faintest idea of where to look. Is it a CRS command, or an OCR command? While you can usually depend on folks in the RAC SIG and oracle-l to help out, sometimes they are just too busy (hmm... it is Thanksgiving....). Or, in a twist of Jonathan Lewis' quotes, "Sometimes, you just get unlucky."

So now that I figured out that one small piece to the puzzle, I have stumbled upon other questions. How do you get a report for all the start/mount options for all databases? I do not really want to run srvctl config for each database. What about other services, like the listener? I tried playing around with srvctl config listener, but I am not getting anything useful out of it. Especially since there is no -a flag. I am currently trying srvctl config service, but all my half-baked guesses are not getting me anywhere. I tried variations on the listner name and what I thought the service was, and I also tried the name reported by crs_stat.

The lack of comprehensive tools (like a good 'du') still bug me with the ASM as well.

Friday, November 16, 2007

Reverse mapping ASM disks

As we have been working with our sysadmin and storage folks, I often have to do some digging to find out which ASM diskgroups belong to which volume, and what devices those volumes are on. Fortunately, we only have 4 at the moment, so it is a quick dig. However, I am always disappointed that Oracle did not provide an easy way to do this. Or if they did, they did not make obvious mentions in any of their documentation.

Google showed me a great, concise script that Alejandro Vargas wrote. I enhanced it a little to go against the ASM instance to grab diskgroup information as well.

--- start ---
. oraenv

$ORACLE_HOME/bin/sqlplus -S "/ as sysdba" << EOS 2>&1 |grep [A-Z] > asmdisks.txt
set head off feed off
select a.group_number||' '||b.disk_number||' '||||' '||b.label
from v\$asm_diskgroup a, v\$asm_disk b
where a.group_number = b.group_number

printf "%-9s %-30s %-3s %-10s %-3s\n" "ASM Disk" "Device Path [MAJ,MIN]" "GRP" "Disk Group" "DSK"
/etc/init.d/oracleasm querydisk `/etc/init.d/oracleasm listdisks` | cut -f2,10,11 -d" " | perl -pe 's/"(.*)".*\[(.*), *(.*)\]/$1 $2 $3/g;' | while read v_asmdisk v_minor v_major
v_device=`ls -la /dev | grep " $v_minor, *$v_major " | awk '{print $10}'`
grp=`grep $v_asmdisk asmdisks.txt|cut -f1 -d" "`
dsk=`grep $v_asmdisk asmdisks.txt|cut -f2 -d" "`
diskgroup=`grep $v_asmdisk asmdisks.txt|cut -f3 -d" "`

printf "%-9s /dev/%-25s %-3s %-10s %-3s\n" $v_asmdisk "$v_device [$v_minor, $v_major]" $grp $diskgroup $dsk

\rm asmdisks.txt
---- end ----

Monday, November 12, 2007

Day 3: Understanding and Assisting the CBO

Session 1: Basic Cost Arithmetic
Anyone that has read any of Jonathan's previous works will recognize the information surrounding the costing formulas; he also gives generous credit to Wolfgang Breitling and Alberto Dell'era. What is really great is that all three of these guys are very generous in sharing of their knowledge and experience, wishing to enhance the community by helping others. Just amazing!

One of the first things Jonathan went over was the environment; there are a lot of factors that play a part in calculating costs of various different pieces of Oracle. The system stats are very important in that they determine cpu speed and io latency, which in turn determine how fast basic Oracle operations occur. Another important number is multiblock read count. I found it very interesting that the hidden underscore parameter _db_file_exec_read_count defaults to db_cache_size/processes (if db_file_multiblock_read_count is not set). Processes; so let's say you set the number of processes really high "just because". You can see that your exec read count will be quite small, and for no good reason.

Jonathan also talked about the importance of sizing the memory parameters appropriately. I think the general impression is that you do not want to gimp your system by being too frugal with memory. Obviously, this will affect large operations (sorts, hashes, etc) more than anything else, but those large operations can get real nasty real fast (think multipass for hashes). Two underscore parameters that Jonathan highlighted were _pga_max_size and _smm_max_size (unfortunately, there were not many details on what these two did, or I missed them).

He made a very interesting statement in that truncating a table does not reset the stats. That sounded very strange to me. Is that true in 10g?? If Oracle goes to the trouble to reset the highwater mark, why would the stats not be updated? They merely have to be zeroed out.

We spent a good chunk of time on clustering. Not just in this session, but in others as well. There is quite a serious flaw in how clustering is calculated for data that is scattered. Consider the case where row(n) is in block(mod(n,2)), or in other words, all the odd rows are in block 1, even rows in block 0. To determine clustering, Oracle will walk the data and count each time the block id changes. Since the id changes for each row, Oracle will calculate the clustering factor really really small, when in fact, the clustering is actually pretty good (total of two blocks, half your data is in one block or the other). A low cluster factor translate into a high IO cost. An articially high IO cost may lead to a sub-optimal plan (where a good plan that has a false high IO cost is ignored in favor of a lower-cost other plan).

This also prompted me to learn more about clustering factor.

Related to that, I was convicted several times of a need to identify what our "significant" or "important" data is. That is probably the number one question I returned to again and again. What is the Banner "business important" data? There are several dictionary views and object statistics (including predicate statistics) that I want to investigate further, but I actually do have an overly optimistic hope that our vendor has some idea.

There are a couple other flaws in IO costing that Jonathan pointed out.
Flaw #1: Oracle assumes that index block reads are from disk, not cache. Always.
Flaw #2: Oracle assumes index single-block read is the same cost as table multi-block read. Always.

TO address Flaw #1, we have optimizer_index_cache, which tells us, on average, how many of our index blocks are in cache (expressed as a percentage). Jonathan stressed that this is only relevant for index root and branch blocks, as leaf blocks are not cached. I have a hard time believing leaf blocks are not cached, and that is something else I would want to look into at some point. Perhaps I merely misunderstood him.

For Flaw #2, we have optimizer_index_cost_adj which tells us what percentage of a multiblock read is an index read (probably somewhere in the vicinity of 33%).

However, for both issues, Jonathan suggests that neither be set if (a BIG IF) system stats have been collected and are correct.

Jonathan pointed out a curious rounding error introduced with optimizer_index_cost_adj; the calculated cost will be rounded down, which can potentially lead to the wrong index being chosen. Event 10183 can be used to turn off cost rounding.

On the topic of multiblock reads, Jonathan showed us a chart demonstrating how Oracle scales the parameter away from excessively high numbers. For a value of 4, the adjusted value may be 4.175. But for higher values, say 32, the adjusted value might be 16.407. 128, 40.85. Due to system stats, Jonathan is recommending that most people might want to turn off db_file_multiblock_read_count.

In light of the importance of system stats, it would be good for us to review sys.aux_stats$, and or dbms_stats.get_system_stats().

Since clustering_factor has no meaning with bitmap indexes (think about it), it is overloaded to count the number of entries in the bitmap; "some key values will have multiple rows".

Even though there was only one slide and a short blurb about this, I was struck by the last point of the session. Using first_rows_n optimizer modes is really going to play havoc with explain plans. Costs and execution paths are essentially squeezed to fit into the first row target, much like doing a "where rownum <= n". This fact alone makes it very difficult to interpret exactly what the explain plan is attempting to convey.

Session 2: Join Mechanisms
Understanding Joins is important because Oracle attempts to convert everything into a two table join, including subqueries. This is not necessarily bad, it is just the way Oracle does it.

Jonathan's slides included a few examples of correlated and non-correlated subqueries. Jonathan's observation is that most "bad" join decisions are made because of cardinality being off, be it off by one or off by millions. Being off by one can go a long way. Which, in my mind, makes Wolfgang's Cardinality Feedback all the more important. Also, "off by one" can be really critical when you consider that perhaps the stats are just a tad old and new rows have come into the table, or have been deleted. That could be the difference between a "good" join and a "bad" join. Scary, eh?

There are a number of slides that talk about Nested Loop joins, but I am going to skip over to Merge joins. Actually, skipping all the way to One Pass sorts. There is an Urban Legend that all in-memory sorts are faster than disk sorts. Jonathan prooved this is not always the case. Up until, Oracle used binary insertion trees to maintain a map for merges; the bigger the memory, the larger this tree could potentially become (smaller memory would force a flush to disk). Because of such large sizes of binary trees, the CPU requirements shot up exponentially to maintain the index. Hence, with those versions, more memory actually made those merge operations far worse.

Jonathan had a caution about 9i 10053 trace events; the "Max Area Size" listing is completely wrong and useless, for it ignores the Workarea_size_policy parameter.

And more caveats. When Oracle executes asynchronous writes, the wait time is not recorded (because it is asynchronous). You have to keep an eye on direct_path_temp to see if the numbers are stacking up.

He introduced two trace events, 10032 and 10033. The former dumps statistics about sorts, the second traces IO for sorts to disk. Since file sizes are small for event 10033, it is not impractical to set it at the system level when diagnosing a sort issue.

Multipass sorts are extremely bad. Avoid them!! Setting sort_area_size large enough to avoid multipass sorts is highly recommended. He quotes Steve Adam's as saying you should be able to get by with a sort_area_size of 45mb to sort 12gb.

Hash joins are a bit interesting. Among other things, Jonathan talked about how the workarea_size_policy comes into play. If set to manual, each hash queue will get hash_area_size/2 memory. What happens if you have more than 2 queues? That's right, you have queues whose sum of memory exceeds hash_area_size. The worst part is that each queue gets this memory regardless if it is needed or not. Say you want to hash 2k of data. You still get hash_area_size/2 for each queue. Wastage!

Setting workarea_size_policy to automatic allows the queues to only grab what they need.

Jonathan also spend some time on Trace Event 10104, "One of the most useful traces in the Oracle suite". It gives a comprehensive break down of hashes, and can be used with Trace 10046 for even greater detail. For multipasses, look for terms like how much memory is available (Memory for slots, in bytes) vs Estimated Build size. Also "Total number of partitions" vs "Number of partitions that fit in memory". That might be your first clue. There is also a stat for number of rows iterated; the fewer iterations the better.

In the end, Hashes may scale more linearly, if AND ONLY IF you do not hit an exception, and there are a number of those. Nested Loops grow faster, but tend to be smoother especially if the index/data is buffered.

Session 3: Selectivity and Hints
The rules governing selectivity start off relatively basic (although still a little bit of math involved), but quickly get complicated in my opinion. The boundary conditions really throw you. For join selectivity, one thing to be aware of is that Oracle assumes predicate independence. Jonathan has bantered this in his blog (where I first read about it). For example, consider two columns, 'month' and 'number_of_days' which tells the number of days in the month. If you use "where month between 10 and 12 and number_of_days = 30", Oracle assumes that any of those months can have 30 days. We know that is not true.

Jonathan had a great example calculating cardinality in a sample query; I'll not repeat it here for the time being.

One gotcha to remember (among many) is that when the optimizer generates numbers for you, it is not necessarily against the sql query you provided. In fact, you might as well assume it is NOT the same query; the numbers are for the interally optimized plan, of which only snippets are available to you via the 10053 trace.

We also covered transitive closure. Jonathan has a bit to say in his book and blog, and the bottmline is that sometimes Oracle will remove a predicate that it thinks is redundant (ie, a = b, b = c, therefore a = c). Lucky you. This can often lead to Cartesian joins, or alternative access paths being used (remember, generally speaking, access paths should have higher priority than filter predicates).

There are also a number of sanity checks to be aware of. One that Jonathan demonstrated is a case where Oracle will not choose ndv values from opposite tables when doing a multi-column join. I am still reading those slides, as it is taking me a little while to comprehend the ramifications. However, one of the issues seems to be that merely making a cosmetic change to your code opens it up to the possibility of hitting one of these sanity checks, and has the potential to throw the optimizer a huge wrench. Just remember that there are no certainties.

Jonathan also had some very strong words about hints. In fact, he even goes so far to say that "A 'hint' is a directive that the optimizer MUST accept and act upon." Now, as Jonathan pointed out, sometimes in an act of obedience, Oracle will ignore the hint. *grin* Yes, sounds contradictory. If you give a full(t) and index(t) hint, by "obedience" Oracle will consider the plans that both hints dictate (yes, BOTH hints), and will give you the cheapest plan. That is the key point. Oracle uses a Cost-Based Optimizer, and will always generate numbers to determine the cheapest plan. If you use hints, you want to use enough hints to trim out all other plans that the optimizer might possibly choose. The 10053 trace is handy in figuring that out.

He gave two examples to showcase this. One with a noparallel hint on the table; Oracle found a cheap plan using parallel on the index (think of a naughty two-year old). In another case, say you specify parallel on a table, but the plan is serial. That is because the serial plan is cheaper.

In another example, Jonathan shows the use_nl hint, and specifies two tables /*+ use_nl(t1 t2) */. This does *NOT* mean nest loop t1 and t2 with t1 as the first table. No no no. It means /*+ use_nl(t1) use_nl(t2) */. Yes indeed. So it tells Oracle to use a nested loop on t1 and a nested loop on t2, but it does not specify an order. In the example it seems to work because Oracle only came up with two plans, and the one we expected just happened to be the cheaper one. "Sometimes you get lucky". Sometimes, even though you did something wrong, you got the results you were expecting to see. Be careful about that.

As always, be wary of "exceptions". There is a hint (and underscore parameter) that specifically says "ignore hints." Go scratch your head over that one.

The ORDERED hint is funny in that Oracle seems to apply it in the end game of optimization. Jonathan specifically gave us a Hash example in which the Ordered hint was obeyed, but the plan was in the oppositive order. Apparently. Watch out for subqueries because remember that Oracle likes to unnest them, and might screw up your seemingly simple order. In 10g, we can use the LEADING hint instead. Still have to watch out for query block names that are dynamically generated due to optimizations; it is possible to specify non-pre-existing query blocks if you know how Oracle is going to name them. If you do not, your order may be a little different.

10g also makes it easier to logical name the hint operands. You can say "index(t1(id1))" or "index(@sub2 t4@sub2(" Obviously, those are not physical names, but labels. Jonathan has observed that it is easier to provide a negative hint than a positive one.

Bottom line, in order to use hints well:
  • Set the join order
  • Join method for n-1 tables
  • Access path for every table
  • Average two hints per table to do it well

Session 4: Telling the Truth
Last day, last session. =) Need I say more?

Some interesting things about this session. Jonathan gets into some strange anomalies and how they are avoided if you give more, seemingly irrelevant, information. For instance, defining columns (esp. mandatory columns) as "not null" helps the optimizer tremendously when using not null predicates.

The stats play a big role in "telling the truth". Old stats are just as bad as bad stats. However, Oracle does provide a way to synthesize stats if needed; I found this whole concept quite intriguing when reading his book and papers earlier. And it comes back to what is your "interesting" data? What data does the application/business like? Oracle has several rules for dealing with "normal" data, but it is entirely possible that you are interested in anything but. Which makes it important to identify the "important" data, and take advantage of the tools that Oracle has given us to help convery that to the optimizer. Stats, and possibly histograms.

Jonathan went back to the example for clustering factor. It is entirely possible to have data that Oracle thinks is scattered, when in fact it is merely grouped weird. That is a case of "lying" to the optimizer.

For the most part, the default stats work. Oracle CBO assumes a normal, boring distribution of data, so if that is the case, you are all set to go. When the data is grouped into "weird, wacky and stretchy bits", the stats and histograms may have holes. Sometimes is still does a fairly good job to compensate, and sometimes you just get unlucky. =)

He had an interesting blurb for histograms in OLTP. He is suggested that instead of letting the normal stats collect histograms, the front-end should be aware of "important" data, and code for it appropriately. Can you imagine Banner doing that? Neither can I.

I will have to wrap it up with that. I will try to over these notes from the Days with Jonathan Lewis, but I need a break.

Thursday, November 08, 2007

Day 2 with Jonathan Lewis: Writing Optimal SQL

"Writing optimal SQL" seemed a bit ambitious for this day's class. I get the feeling that Jonathan has a wealth of knowledge and it is hard for him to condense it into a mere 6-hour time slice (4 x 1.5 hour sessions). Here are my highlights and thoughts from the material covered today.

Session 1: Background
Jonathan started out by reiterating (for emphasis) the obvious difference between "local" optimization (or 'optimisation' as he writes) and "global" optimization; sure, we can make an individual query go faster, but at what cost? Which brings us to the point that one always has to make a choice on where one spends one's resources.

Today's session concentrated on global optimization strategies (any of which could be used locally if you think about it), which comes down to always keeping your eye open for wasted resources, whether it be expensive CPU operations, unnecessary buffer gets or multiple visits to disk. As with all things in life, there are exceptions, today labeled as Contradictions. For instance, sometimes to increase logical I/O, one must increase physical I/O, and vice-versa. Jonathan gave the example of a typical index read. Say it takes 1 header block, 1 branch block and 4 leaf block reads to get all data. It is possible that the same data could be gotten with a FTS of 3 block read requests. Which one is better? What's in cache?

For the most part, the most efficient (and hence "best") way to get data is to got it precisely. This goes back to the little note yesterday about using access predicates in favor of filter predicates; if you can get only the data you want on the first pass, there is not need to filter data later on (and thus discard data gotten). And even though indexes usually help us target precise blocks (via rowids), they are balanced out by the fact that they generate overhead for DML.

Jonathan had a very interesting section on column ordering in indexes. There is a widely-held thought that you put the most selective columns at the front of the index. Overall, the focus needs to be on how dense the data retrieved from the index is, which may require that you put the most repeated column first in the index. Jonathan states that, "all other things being equal", you generally want to put equality predicates at the front of the index, range predicates at the rear. Be as precise as possible. That word is going to come up again and again.

Which got me to thinking. How does one determine what "business critical" data is? What is the "interesting" data? This question is made more convoluted in a large ERP system. Surely, it is not practical to go table by table to figure out what data is most often used, most often needed. And I have this sinking feeling that the vendor is not too up-to-par on an answer for this particular question either.

There was mention of index-(hash-)join and and-equal joins; I am not very familiar with the latter and am making a note to learn more about it.

We got to talking about bitmap indexes. Bitmap indexes are great for static or "mostly read-only" tables; bad for OLTP systems. To bridge this gap, Oracle has made Bitmap Conversions more easily accessible for B-tree indexes.

Again, Jonathan made mention of using dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'). I want to do more research on that; for it seems like an awfully wonderful tool to use. Must enable row source statistics (again, there is a very handy query-level hint, /*+ gather_plan_statistics */).

Jonathan also showed us some very interesting queries for metadata analysis.

select blocks, count(*)
from (
select /*+ index(t1 t1_pk) */
count(distinct substr(rowid,1,15)) blocks
from t1
group by my_id
group by blocks
order by blocks

This query counts the number of blocks for each value (knocks off the row number from rowid to get file id + block id).

select /*+ index(t,"T1_I1") */
count(*) nrw, -- number of rows
count(distinct sys_op_lbid(49721, 'L', t.rowid)) nlb, -- number of leaf blocks
count(distinct hextoraw(sys_op_descend("DATE_ORD") || sys_op_descend("SEQ_ORD"))) ndk, -- number of distinct keys
sys_op_countchg(substrb(t.rowid,1,15),1) clf -- clustering factor
from "TEST_USER"."T1" t
where "DATE_ORD" is not null
or "SEQ_ORD" is not null

Talk about a little bit of black magic! I get the impression Jonathan ripped this out of a 10046 trace for calculating index stats, based on some of his non-standard uses. The last argument of sys_op_countchg should be the number of freelists; 2 through 16 if using ASSM.

There was a great word about documentation, especially since we have embarked upon a broad documentation project at work. Among other anecdotes, there was a good observation that "The harder it was to fix, the more documentation you need".

Session 2: Basic Coding
I did not like this section that much. Perhaps it was just me. But it seemed like a lot of the "standards" are quite subjective; is Jonathan teaching us his own peculiar style of coding?? =)

I did come away from this session with a few things, however. Due to Oracle's mostly-successful (dare I say, Mostly Harmless?) CBO, one can usually write a pretty intuitive sql and have it optimized properly. Sometimes, when we try to get clever and optimizer it ourselves, we botch it, or duplicate the work that the CBO would have done anyway, at the cost of making the code even more obscure and hard to read. As Occam's Razor infers, Keep it simple.

As popular as analytics are becoming, they are most definitely not the end-all be-all. They are meant for number crunching on small data sets.

Jonathan pointed out a rather bizarre observation of subqueries; Oracle assumes subqueries will reduce the rows to 5% (not BY 5%, but TO 5%). "This is insanely stupid. Sometimes." Yes, Jonathan qualifies everything he says. Usually.

Session 3: Sundry Methods
To kick things off, we dove into examine the costs of Nested Loop (NL) joins vs Hash Joins. Essentially:

Cost of NL = C1 + N * C2
Cost of HJ = C1 + C2

For a nested loop, C1 is the cost getting rows from table 1, C2 is the cost of getting rows from table 2, per row from table 1. Hence, if we can keep either N (the number of rows gotten from table 1) or C2 down, the total NL cost is relatively low. We can keep C2 low by using indexes and/or getting at the data precisely.

For a hash join, we are doing full table scans. We hash the join columns (and store the hash results in memory), the use the second table to probe the first, hashing the join columns on the second table to see if they match the hashed versions of the first. Note that the growth of cost is linear, whereas for nested loops is not (multiply by N).

Another problem with nested loops is that Oracle assumes that the tables are statistically independent. Jonathan (and others including Alberto Dell'era) have made a number of observations about this online.

One thing to keep in mind about hash joins is that it greedily gobbles memory. Do not try to make a large table the first table of a hash join.

Another important part about writing sql is that it helps to have a clear description of what exactly the query is to do, and to have a picture or diagram to further enhance the description. Sometimes a query is not optimal merely because of the way the "english" description implied how the query get data. There is more than one way to skin a cat.

Jonathan used an example of a "typical" FIFO index supporting a workflow process, in which a status is pushed through several states. The index will eventually have a long tail of mostly empty blocks, as all the activity is on recent timestamps. To address this issue, it helps to 1) only index relevant data instead of empty of null data using a function-based index, 2) have multiple smaller indexes. The benefit of #2 is that it is much easier to coalesce it, thereby keeping it well maintained. While talking about FBI, Jonathan quipped:

Encapsulation is wonderful. Hiding things is naughty.

Repeating a horrid modern-day developer practice of pushing procedural logic into sql, Jonathan touched on filter subqueries (where you have one subquery that handles one case and another subquery for another case). Jonathan made mention of the lnnvl function, which stands for "logical not null value"; a good way to test the value of a potentially null value.

He also talked about several "what if..." scenarios. In these, he states things like "if it is cheaper to get into table 2 through table 1...". I am not quite sure what he means by this. Something I need to follow up with him about.

We also covered the "untunables", or particular queries that did not lend themselves to being optimized by Oracle very well. Jonathan went to details to say that Oracle prefers and forces queries into a "left-deep" tree. Some queries actually perform much better when the access paths and join order is not "left-deep", but instead "bushy tree", or more balanced. This was showcased by a query with a 5-table join, with entries at either end. A "bushy tree" works towards the table in the middle, thus reducing the rows needed to propagate predicates and join conditions.

Session 4: More Methods
Jonathan showed us a fairly simple example of query rewrite using dbms_advanced_rewrite.declare_rewrite_equilvalence. This really does raise some questions; like, how different can source_stmt be from destination_stmt? Does it strictly follow the same rules as cursor sharing? Or can you completely rewrite the query as long as the columns are of the same datatype?

We also looked at subquery caching. I had a great talk with Ric Van Dyke (awesome dude!) afterwards about this. I was curious if there was any way to see or monitor subquery caching in affect, other than attempting to guess at its potential use via results in an explain plan. He did not know of a way, but guessed that it might possibly be exposed somehow in one of the x$ tables. I find subquery caching to be quite interesting, if for nothing else that it can really play havoc with your explain plan and make them unexplainable unless you really know what is going on. Surely, there is a way to tell when your plan used subquery caching, and to what degree.

Jonathan showed us "update join views", which I thought were quite clever. I have used dynamic materialized views (using the "with" clause, or "out of line views") to generate a view from a table join for an update, but having an update join view seems much more elegant. It does have some strict requirements; the driving table must be key-preserved, meaning that the column we want to update is deterministically joined.

We looked at pagination, aka judicious use of rownum, to efficiently get top-n rows. Another related trick (for more complex situations) is to manipulate index rowids directly. This happens when the logic requires that you get and sort all data prior to using rownum. One can do all sorts of sorting and aggregation, returning index-only columns and the rowid, then take the rownum from that and access the table using the relevant rowids. Quite clever that.

We took a brief glance at OEM, and moved quickly to profiles and outlines. Cool stuff. When talking about profiles, Jonathan made it a point to implore that we ask our Sales/Tech Reps when (NOT if) opt_estimate would be documented and supported. Profiles are essentially a series of opt_estimate hints that alter existing statitstics. How cool is that?!? =)

Jonathan ended on a note about hints. In general, hints are bad. Don't use them. But sometimes you have to, and when you do, make sure you do not hold back. A lone hint on a 7-table join is next to useless; it is better to provide at least 6 hints to determine the join order, plus anything else to determine the query plan you want. Or to put it a different way, and average of 2 hints per table, 1 to determine join method, one to determine access path. Jonathan had some interesting things to say about hints, his seemingly prolific use of them notwithstanding:

  • They are badly documented
  • Nobody knows what they do
  • They change usage occasionally

Jonathan really likes to say that "sometimes you stop getting lucky". Meaning that sometimes you use something (a hint) incorrectly but, lucky you, it happens to work the way you think it should. When it "stops working", it is actually working correctly and you just stopped getting lucky about it.

That is about it for today. My ass is really hurting after sitting in those chairs.

Wednesday, November 07, 2007

Day 1 with Jonathan Lewis: Execution Plans

Here is my rough draft from notes I took during the first of Jonathan Lewis's 3 1-day CBO classes. These are merely highlites, for I could not possibly be comprehensive.

Session 1: Finding execution plans
If a database has been upgraded from 9i, the plan table may be of an old version. Additionally, it will not be global temporary table. You can run @?/rdbms/admin/catplan.sql to set it up properly. This also creates a public synonym and grants acces to public.

When tracing, helfpful to use query block names (qb_name) in sql to identify later on in trace file.

The 10132 trace is new, and provides a great alternative to running the 10053 trace (much more condense and practical). Great for baselining all query plans when set at a system level.

tkprof is a little tricky - traceonly does NOT mean that the query is not run; it only eliminates the resulting output of a query. Hence, inserts, updates and deletes still execute, but a straight select will not do much (no output).

Can format autotrace formatting with _plus_exp columns.

The pipelined dbms_xplan.display function is very nice. There are several other functions in that package I was not aware of, like display_cursor, display_awr and display_sqlset. display_cursor is great for getting the plan of a cursor currently in memory.

Since the function is pipelined, it can be used in a table join operation. Jonathan showed a greate example finding the query plan for all cursors that contained a FULL TABLE SCAN.

Jonathan's tests with reporting sql also showed easy ways to implement query-level optimizer paramters via hints (ie, /*+ opt_param('parallel_execution_enabled','false') */).

Rowsource execution statistics can be generated by setting statistics_level = all, or more surgically with _rowsource_execution_statistics = true. Can also be set with a hint (/*+ gather_plan_statistics */).

What I really like about rowsource execution stats is that you can easily generate Estimated and Actual stats, and can use something like Wolfgang's Cardinality Feedback to get a fairly good idea where performance problems may be popping up. The "ALLSTATS LAST" parameter of dbms_xplan.display_cursor displays this quite nicely. I need to do some more research on this function so we can use it more readily. Great tool for developers and DBAs alike.

Finding cursors in v$sql is made efficient with hash value and sql address. Unfortunately, 10g statspack has an "old" hash value that is inefficient. Do not use it.

When looking at parsed cursors, especially via sprepsql, the version count can be quite useful. Since the stats are aggregated, you have to be careful about summarizing and translating what the stats actually are trying to say about the query.

Any "explain" may be untrue.

Like House says, "Everyone lies."

Section 2: Interpreting Execution Plans

The good news about reading explain plans is that it is (should be and can be) much easier than it might first look. For the most part, it comes down to the good old standby, "Divide and Conquer". To that extent, Jonathan has two basic rules when breaking down plans, which essentially follow the concept of finding all your child cursors until you run out of descendants (think left tree-walk). The bad news is that there are, as always, exceptions. But the rules cover a vast majority of plans. And they are simple.

There are no complicated execution plans, only long execution plans.

First off, it really helps to look at the query plan with some aids. The PID (Parent ID) column is essential. Having the children indented incrementally also helps.

RULE 1 - Single-Child Parent
The child follows the parent and feeds the parent.

RULE 2 - Multi-Child Parent
The parent is an operation that determines how the children's results are combined.

All explain plans are a combination of Rule 1 and Rule 2. So what you do is break it down using those rules. That is what this section does; I'll leave it to you to go through Jonathan's slides yourself. =)

Rule 1 is relatively free of tricks. Rule 2, however, is a lot more convoluted when you start looking at the various different operations, and the various ways those operations are implemented. So while it may be "easy" to group them together, interpreting them takes a little time to understand the myriad subrules. However, the context usually helps that a bit. For instance, if you see a BITMAP AND operation, you are likely to see BITMAP CONVERSION operations; knowing what an operation is expecting (ie, ROWIDs, bitmap numbers, etc) helps you figure out what the child operation is passing up. I did not take many notes on this section because the slides are quite informative and easy to grasp once you see them.

When talking about filter predicates, I learned that access predicates are much more preferred if they can replace filter predicates. The reason being that filter predicates usually happen after you get the data, while access predicates determine how you get the data. This can be showcased by an inappropriate range scan; if you have an index with 3 columns and the query only specifies column 1 and column 3, the optimizer has to do a range scan for column 2. If the query can be adjusted to included column 2, the optimizer no longer has to do a range scan, and can move the filter predicate to an access predicate. Basically, without column 2, the range of "interested rows" in the index is bounded by the smallest value of (column 1, column 3) and the largest value; a range which may (quite possibly) include values that you are not really interested in (and have to be FILTERED out). If you define all columns, you only grab the rows you want in the first place.

select * from t1 where id = nvl(:b1, id)

"Horrendously bad practice."
"The ideal is to go back and smack someone."

The execution plan has to include a path for both :b1 is null and :b1 is not null. Instead, the application should have two different queries for these conditions. Additionally, the above code does not catch rows when id is null.

Jonathan stressed the idea of watching your query plans to see if the optimizer uses any unusual enhancements, so that one may learn from them. For instance, Oracle has a MIN/MAX optimization for indexes. This is especially useful for correlated subqueries that can be optimized by (redundantly) copying a predicate from the parent query that just happens to contain extra columns of an existing index. Sometimes work will be completely removed, like in the case with ORDER BY elimination.

Section 3: Problems with Plans

So now that Jonathan has razzle-dazzled us with all this awesome stuff, here comes some bad news.

  • Explain plan may be telling lies
  • Trace files don't always have stat lines
  • tkprof might not dump the stat lines even if they are in the trace file
  • tkprof may dump the "wrong" stat lines
  • v$sql_plan might be out of date
  • v$sql_plan can have holes

Explain plans are not aware of the datatypes for bind variables. Hence, if you are trying to get an acurate explain plan, it is a good idea to explicitly declare the datatypes of bind variables with a cast or to_ function.

Jonathan made some interesting comments about "good" applications. In particular, he stated that they do not close their cursors. We have been telling our developers to close cursors. Why is that? I think the statement needs to be taken in the context, or at least the assumption, that the cursors will be held open for the intent of reusing them, opposed to forgetting to close them when not needed. When cursors are not closed, their stat lines are not dumped to a trace file. Hence, if you want plans and statistics for a unclosed cursor, have to get them out of v$ views.

In cases where a cursor has multiple versions and/or executions, tkprof will only report on the first one in the trace file, even though it summarizes the stats for all of them. It is entirely possible that one bad execution caused a majority of the "bad" stats, and that the explain plan in tkprof is for a completely different execution.

While showing us some sql to demonstrate subquery factoring, he showed us his materialize hint (/*+ materialize */). There is also a inline hint that does the opposite.

Scalar subqueries are slightly problematic in that they are not costed properly, due to the fact that Oracle does not know how many times it will run.

Jonathan talked about "hash table cache" and "scalar subquery cache", both of which I am quite unfamiliar with, and want to research them a little more. Oracle will automatically cache hashed results of a scalar subquery so that it does not have to execute the subquery over and over. However, if two values of the subquery happen to hash to the same address, the first one in wins, and the second value has to be calculated again and again and again...

Additionally, Oracle has introduced "order by" elimination with 10g. Unfortunately, sometimes "order by" is quite useful when dealing with scalar subqueries and thus one may need to use the no_eliminate_oby hint.

There is another hint that can be used to swap the join inputs of a hash join (/*+ use_hash(t3) swap_join_inputs(t3) */). 10g also allows use of the no_swap_join_hint to further fine-tune exactly which table you want to be thrown into memory as the "first" table.

Section 4: Advanced Features

The MIN/MAX optimization that we talked about earlier has another nice little feature. It is not a filter predicate, but rather an access predicate, meaning it gets exactly the one little piece of data that it wants and nothing else.

Have to watch query plans. If you expect an optimal subquery but see a filter operation for it, you may want to re-evaluate it.

Remote operations are quite an interesting phenonemon when it comes to query plans. Although remote statistics are fetched, they are not fully used when generating a query plan. For instance, a 3-table join may be ordered such that a remote table is joined to a local table, which is then joined to a remote table. How much sense does that make? In most cases, the query will perform better if remote operations are done in as large a chunk as possible (and reasonable), instead of making many small trips back and forth.

Parallel operations are quite unique. Contrary to popular belief, they are not always a good thing.

Parallel query is designed to be brutal and inefficient.

I believe Jonathan said this because the idea of parallel query is to use up more resources (ie, more CPU) than a serial query would. In fact, it attempts to max out the CPU in an attempt to get the query done as fast as possible. I think he even said that parallel query should never be used in an OLTP database, but I did not write that down, so I could be wrong about that.

Anyway, parallel query has to be carefully tuned, for it is far too easy to thrash your system. Looking at an explain plan and trace files, we have to be careful about what we find. The infamous "idle" waits associated with parallel operations may indicate problems, or the may not, it depends on where they are happening. If an explain plan shows many P->P (parallel to parallel) operations, that could easily be a source of contention. Much better to have PCWP (forgot to write the definition) operations. In the case of a HASH join, you do not want to broadcase a large table to all the parallel query slaves; instaed, broadcast the small one (for memory) and read from the disk directly for the larger table (and joined locally). For nested loops, it is better to join and aggregate smaller chunks at a time, thus reducing "an entire layer of slaves".

Near the end, I had trouble keeping up with my notes (both because my butt was sore from the hard chair, getting tired, and being overwhelmed with so much information). We moved on to partitioned queries, and talked about the benefits of being careful when designed partitions and how to take advantage of parallel query. To fully realize the potential of partition-wise joins, it is best to partition early in the plan so that correllated, similar work can be parallelized.

On a slightly off-topic of partition statistics, Jonathan mentioned:

You ought to be faking your statistics. Sometimes.

Especially true since subpartition statistics are not dealt with properly at all.

And finally, if you see a "PX SEND BROADCAST" in your query plan, that automatically means you are not doing any partition-wise joins. They are mutually exclusive.

One last quote from Jonathan Lewis:

You stop getting lucky
In reference to performance problems that happen all the sudden even though "nothing changes"