Thursday, November 08, 2007

Day 2 with Jonathan Lewis: Writing Optimal SQL

"Writing optimal SQL" seemed a bit ambitious for this day's class. I get the feeling that Jonathan has a wealth of knowledge and it is hard for him to condense it into a mere 6-hour time slice (4 x 1.5 hour sessions). Here are my highlights and thoughts from the material covered today.

Session 1: Background
Jonathan started out by reiterating (for emphasis) the obvious difference between "local" optimization (or 'optimisation' as he writes) and "global" optimization; sure, we can make an individual query go faster, but at what cost? Which brings us to the point that one always has to make a choice on where one spends one's resources.

Today's session concentrated on global optimization strategies (any of which could be used locally if you think about it), which comes down to always keeping your eye open for wasted resources, whether it be expensive CPU operations, unnecessary buffer gets or multiple visits to disk. As with all things in life, there are exceptions, today labeled as Contradictions. For instance, sometimes to increase logical I/O, one must increase physical I/O, and vice-versa. Jonathan gave the example of a typical index read. Say it takes 1 header block, 1 branch block and 4 leaf block reads to get all data. It is possible that the same data could be gotten with a FTS of 3 block read requests. Which one is better? What's in cache?

For the most part, the most efficient (and hence "best") way to get data is to got it precisely. This goes back to the little note yesterday about using access predicates in favor of filter predicates; if you can get only the data you want on the first pass, there is not need to filter data later on (and thus discard data gotten). And even though indexes usually help us target precise blocks (via rowids), they are balanced out by the fact that they generate overhead for DML.

Jonathan had a very interesting section on column ordering in indexes. There is a widely-held thought that you put the most selective columns at the front of the index. Overall, the focus needs to be on how dense the data retrieved from the index is, which may require that you put the most repeated column first in the index. Jonathan states that, "all other things being equal", you generally want to put equality predicates at the front of the index, range predicates at the rear. Be as precise as possible. That word is going to come up again and again.

Which got me to thinking. How does one determine what "business critical" data is? What is the "interesting" data? This question is made more convoluted in a large ERP system. Surely, it is not practical to go table by table to figure out what data is most often used, most often needed. And I have this sinking feeling that the vendor is not too up-to-par on an answer for this particular question either.

There was mention of index-(hash-)join and and-equal joins; I am not very familiar with the latter and am making a note to learn more about it.

We got to talking about bitmap indexes. Bitmap indexes are great for static or "mostly read-only" tables; bad for OLTP systems. To bridge this gap, Oracle has made Bitmap Conversions more easily accessible for B-tree indexes.

Again, Jonathan made mention of using dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'). I want to do more research on that; for it seems like an awfully wonderful tool to use. Must enable row source statistics (again, there is a very handy query-level hint, /*+ gather_plan_statistics */).

Jonathan also showed us some very interesting queries for metadata analysis.

select blocks, count(*)
from (
select /*+ index(t1 t1_pk) */
count(distinct substr(rowid,1,15)) blocks
from t1
group by my_id
group by blocks
order by blocks

This query counts the number of blocks for each value (knocks off the row number from rowid to get file id + block id).

select /*+ index(t,"T1_I1") */
count(*) nrw, -- number of rows
count(distinct sys_op_lbid(49721, 'L', t.rowid)) nlb, -- number of leaf blocks
count(distinct hextoraw(sys_op_descend("DATE_ORD") || sys_op_descend("SEQ_ORD"))) ndk, -- number of distinct keys
sys_op_countchg(substrb(t.rowid,1,15),1) clf -- clustering factor
from "TEST_USER"."T1" t
where "DATE_ORD" is not null
or "SEQ_ORD" is not null

Talk about a little bit of black magic! I get the impression Jonathan ripped this out of a 10046 trace for calculating index stats, based on some of his non-standard uses. The last argument of sys_op_countchg should be the number of freelists; 2 through 16 if using ASSM.

There was a great word about documentation, especially since we have embarked upon a broad documentation project at work. Among other anecdotes, there was a good observation that "The harder it was to fix, the more documentation you need".

Session 2: Basic Coding
I did not like this section that much. Perhaps it was just me. But it seemed like a lot of the "standards" are quite subjective; is Jonathan teaching us his own peculiar style of coding?? =)

I did come away from this session with a few things, however. Due to Oracle's mostly-successful (dare I say, Mostly Harmless?) CBO, one can usually write a pretty intuitive sql and have it optimized properly. Sometimes, when we try to get clever and optimizer it ourselves, we botch it, or duplicate the work that the CBO would have done anyway, at the cost of making the code even more obscure and hard to read. As Occam's Razor infers, Keep it simple.

As popular as analytics are becoming, they are most definitely not the end-all be-all. They are meant for number crunching on small data sets.

Jonathan pointed out a rather bizarre observation of subqueries; Oracle assumes subqueries will reduce the rows to 5% (not BY 5%, but TO 5%). "This is insanely stupid. Sometimes." Yes, Jonathan qualifies everything he says. Usually.

Session 3: Sundry Methods
To kick things off, we dove into examine the costs of Nested Loop (NL) joins vs Hash Joins. Essentially:

Cost of NL = C1 + N * C2
Cost of HJ = C1 + C2

For a nested loop, C1 is the cost getting rows from table 1, C2 is the cost of getting rows from table 2, per row from table 1. Hence, if we can keep either N (the number of rows gotten from table 1) or C2 down, the total NL cost is relatively low. We can keep C2 low by using indexes and/or getting at the data precisely.

For a hash join, we are doing full table scans. We hash the join columns (and store the hash results in memory), the use the second table to probe the first, hashing the join columns on the second table to see if they match the hashed versions of the first. Note that the growth of cost is linear, whereas for nested loops is not (multiply by N).

Another problem with nested loops is that Oracle assumes that the tables are statistically independent. Jonathan (and others including Alberto Dell'era) have made a number of observations about this online.

One thing to keep in mind about hash joins is that it greedily gobbles memory. Do not try to make a large table the first table of a hash join.

Another important part about writing sql is that it helps to have a clear description of what exactly the query is to do, and to have a picture or diagram to further enhance the description. Sometimes a query is not optimal merely because of the way the "english" description implied how the query get data. There is more than one way to skin a cat.

Jonathan used an example of a "typical" FIFO index supporting a workflow process, in which a status is pushed through several states. The index will eventually have a long tail of mostly empty blocks, as all the activity is on recent timestamps. To address this issue, it helps to 1) only index relevant data instead of empty of null data using a function-based index, 2) have multiple smaller indexes. The benefit of #2 is that it is much easier to coalesce it, thereby keeping it well maintained. While talking about FBI, Jonathan quipped:

Encapsulation is wonderful. Hiding things is naughty.

Repeating a horrid modern-day developer practice of pushing procedural logic into sql, Jonathan touched on filter subqueries (where you have one subquery that handles one case and another subquery for another case). Jonathan made mention of the lnnvl function, which stands for "logical not null value"; a good way to test the value of a potentially null value.

He also talked about several "what if..." scenarios. In these, he states things like "if it is cheaper to get into table 2 through table 1...". I am not quite sure what he means by this. Something I need to follow up with him about.

We also covered the "untunables", or particular queries that did not lend themselves to being optimized by Oracle very well. Jonathan went to details to say that Oracle prefers and forces queries into a "left-deep" tree. Some queries actually perform much better when the access paths and join order is not "left-deep", but instead "bushy tree", or more balanced. This was showcased by a query with a 5-table join, with entries at either end. A "bushy tree" works towards the table in the middle, thus reducing the rows needed to propagate predicates and join conditions.

Session 4: More Methods
Jonathan showed us a fairly simple example of query rewrite using dbms_advanced_rewrite.declare_rewrite_equilvalence. This really does raise some questions; like, how different can source_stmt be from destination_stmt? Does it strictly follow the same rules as cursor sharing? Or can you completely rewrite the query as long as the columns are of the same datatype?

We also looked at subquery caching. I had a great talk with Ric Van Dyke (awesome dude!) afterwards about this. I was curious if there was any way to see or monitor subquery caching in affect, other than attempting to guess at its potential use via results in an explain plan. He did not know of a way, but guessed that it might possibly be exposed somehow in one of the x$ tables. I find subquery caching to be quite interesting, if for nothing else that it can really play havoc with your explain plan and make them unexplainable unless you really know what is going on. Surely, there is a way to tell when your plan used subquery caching, and to what degree.

Jonathan showed us "update join views", which I thought were quite clever. I have used dynamic materialized views (using the "with" clause, or "out of line views") to generate a view from a table join for an update, but having an update join view seems much more elegant. It does have some strict requirements; the driving table must be key-preserved, meaning that the column we want to update is deterministically joined.

We looked at pagination, aka judicious use of rownum, to efficiently get top-n rows. Another related trick (for more complex situations) is to manipulate index rowids directly. This happens when the logic requires that you get and sort all data prior to using rownum. One can do all sorts of sorting and aggregation, returning index-only columns and the rowid, then take the rownum from that and access the table using the relevant rowids. Quite clever that.

We took a brief glance at OEM, and moved quickly to profiles and outlines. Cool stuff. When talking about profiles, Jonathan made it a point to implore that we ask our Sales/Tech Reps when (NOT if) opt_estimate would be documented and supported. Profiles are essentially a series of opt_estimate hints that alter existing statitstics. How cool is that?!? =)

Jonathan ended on a note about hints. In general, hints are bad. Don't use them. But sometimes you have to, and when you do, make sure you do not hold back. A lone hint on a 7-table join is next to useless; it is better to provide at least 6 hints to determine the join order, plus anything else to determine the query plan you want. Or to put it a different way, and average of 2 hints per table, 1 to determine join method, one to determine access path. Jonathan had some interesting things to say about hints, his seemingly prolific use of them notwithstanding:

  • They are badly documented
  • Nobody knows what they do
  • They change usage occasionally

Jonathan really likes to say that "sometimes you stop getting lucky". Meaning that sometimes you use something (a hint) incorrectly but, lucky you, it happens to work the way you think it should. When it "stops working", it is actually working correctly and you just stopped getting lucky about it.

That is about it for today. My ass is really hurting after sitting in those chairs.


mathewbutler said...

Another comprehensive review of the course. It's taking me a while to get through these - some of the things you mention require some additional thought and time to investigate.

How did you find Jonathans pace? I attended the three day design for performamce seminar. Well worth it, but he covered lots of things in a short amount of time. It's the kind of course I'd like to attend again, once for each new DB release.

Mathew Butler

Charles Schultz said...

It requires a bit of extra thought on my part, both in writing and readying. For example, it took me about two hours to type up each post (total of about six hours). And I still do not get it all.

I liked his pace a lot. He filled my brain with questions and usually answered most of them. Plus, there were others I could bounce questions off (Ric Van Dyke of HotSOS and Mark Bobak). Having read his blog and books in the past gave me some crucial background so I was not completely lost. Those resources are also reassuring because I can go back to them for more research.

Martin Berger said...

Hi Charles,
To your question What is the "interesting" data?
You do not need to ask the business, ask the one, which is doing the work: The RDBMS. There are some views like v$bh or v$segment_statistics.
As Jonathan for sure will have warned you against the danger of ratios, do not ty to 'tune' any numbers in these views. Use them only as an indicator what is handled often, and what might be worth tuning.