Showing posts with label histograms. Show all posts
Showing posts with label histograms. Show all posts

Wednesday, October 25, 2006

Support Analyst caught asking Tom

I found this very interesting.
http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:707586567563
Search for "We have noticed several system tables where"

Keeping that in mind, here is my question in SR# 5684375.992 on 11-AUG-06:
This SR is for information only. We have noticed several system tables where dba_tab_col_statistics reports HEIGHT BALANCED histograms where bucket_cnt > distcnt (count of distinct values). It was my understanding that a similar problem was fixed in 10.2.0.1 (dbms_stats method_opt size n generates a FREQUENCY histogram where n >= NDV). However, even more perplexing is why do we have many height balanced histogram buckets on a column with only 1 distinct value?

SQL > select ENDPOINT_NUMBER,ENDPOINT_VALUE
from dba_histograms
where table_name = 'INDPART$'
and column_name = 'TS#';

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
217 79


So, some questions?
1. in sys.hist_head$, how is bucket_cnt calculated?
2. what determines how many buckets a column gets?

Is it just me, or did my support analyst ask my question on AskTom, verbatim?

While I like Tom's cavalier attitude in general (ala "and you don't need/want to be querying hist_head$, just use the normal views like the rest of us :)"), I sure wish he answered the question.

Another interesting point: this question was posed on August 14th, the same day I got fed up with the Support Analyst and asked to switch to someone else. I liked the 2nd guy much better! You could give him awards. But that first guy… wow, he needs help.

As I said, just very interesting. I wonder how many other Support Analysts I can find on AskTom.

Tuesday, October 24, 2006

OCM, histograms and more Oracle

I debated about how publicly I wanted to announce that I am taking the OCM. Seeing as how nobody reads this blog, if anyone ever finds out, it will be after the fact. My impression of the OCM is that it is rather tough to take, but is mostly a PR thing - Oracle gets an easy $2000, and in return, I get bragging rights with a denim shirt. Looks good for our HR department though. The thing is, there are a lot of really smart DBAs out there who could easily pass the OCM and simply have not the time or inclination to do so. I think there is also the stigma that someone who is a certified Master should not make any mistakes and knows everything. *laugh* Don't expect that from me.

So, on to histograms. I am still beleaguered by how best to describe the data. I have tossed the idea of unlimited histogram buckets to Oracle, but the rub I get back is that the overhead is not worth it. Overhead = physical storage + cost of getting bucket information. I will be the first to admit that my number theory is not the strongest - in fact, I am one of the worst kinds of methodical scientists because I am not, well, methodical. I just want to try out a 10-million bucket histogram and see if it actually is worth it for myself. In talking to Wolfgang Breitling, I see there are other ramifications to consider as well, like join cardinalities. Call me naive, but it seems that if you have 100% accurate histograms (no matter how large/how many), your join cardinalities will be 100% accurate as well. That is the problem with a little bit of knowledge - there is so much more you do not know, but you do not necessarily know it. *grin*

More Oracle: Actually, I should say, more SCT. We do a lot of work with SCT (now called SunGard Higher Education, or SHE) BANNER, an ERP from California. As with any monstrous ERP, there are a number of issues, but probably the worst is that they were the lowest bidder, and we are their largest customer. Do the math. Good for them, bad for us. And we who do the performance diagnostics and troubleshooting affirm that every day. Again, going back to Wolfgang, I wish there was an outside expert/consultant in the realm of Oracle/BANNER, a real Application DBA. At this point in my career, I am saying to myself "That's what I want to be when I grow up." Maybe not necessarily for SCT/SHE (and they sorely need them!), but .... maybe.

Another dream I have is to get inside the guts of the Oracle kernel. This is hard to do when
  1. you don't work for Oracle
  2. you despise the scientific method
  3. you play a lot of video games
But, I find that I have a number of questions. Questions I would love to document here, when I get a chance to kick myself in the butt and update this blog more regularly. Questions of how things work, and why they were designed that way. I am fully confident that a majority of the design desicions were made by very smart people who think, breathe and live databases. But sometimes you just wonder.

So if you know of any rogue Oracle developers that teach Internal classes, let me know. *grin*

Monday, September 11, 2006

The lure of histograms

I had been doing a lot of research into histograms. I scooped a lot of references (which I will list later), and filled my head with no shortage of verbiage. Our current dilemma now is, "Does one introduce more problems when proactively creating targeted histograms (frequency histograms on non-unique columns) than you solve?" I am of the opinion that the benefits outweigh the risks. But I am being cautious because people smarter than I (ie, Jonathan Lewis, Wolfgang Breitling, Dave Ensor) all exercise extreme prejudice and use histograms as little as absolutely necessary. Or at least, that is my understanding. I firmly believe that if you tell the CBO absolutely all the facts about your data and resources, you will get the optimal plan 100% of the time. Which infers that if you fail to tell the truth 100% of the time, you are running the chance of getting a "bad" plan, or even the "worst" plan.

And than the whole topic of equi-depth histograms, or "height balanced". Granted, there will always be a limit to the number of buckets you can use, currently 254 in Oracle 10.2.0.2. If you have more distinct values than max buckets, you will have to resort to summarizing your data as best as possible. This is a vast, grey area. On the one hand, you can shoot in the dark, sample your data and hope for the best (which works most of the time). A slightly better method would be to describe your most popular values. However, the whole exercise is moot if your business needs require specific information about the least-popular values. This is the part that makes equi-depth histograms so messy. There is no silver bullet.

For this reason alone, I wish Oracle would vastly increase the number of histogram buckets one could collect, so as to avoid the whole guessing game. You have to admit, it is much easier to calculate frequency histograms than dive into your data and business logic to determine which data most needs to be described. Yes, it would also be more resource intensive, but my point is it would be easier on the dba.