Wednesday, June 13, 2007

RAC class, day 3

We had another long, grueling day. I am the kind of person who loves labs, so since we did a ton of lecture today, I found it to be quite burdensome and hard to maintain my mental stamina. What kept me on edge, honestly, was the problem my partner and I were working on. As mentioned in a previous comment, we installed our ASM in the wrong directory and attempted to correct it by removing it via the OIU. Big mistake. That led us down a 2-day path of learning and excavation. *grin* This afternoon, I discovered Metalink Note 357261.1, which turned out to be a lifesaver. I had to try it a few times before we finally got it right, but we did get it right eventually. It was a mess.

So, on to the material we covered. Andy gave us a lot of good stuff; no, not a lot, TONS of good stuff! Too much for me to go over in one sitting. I took a lot of notes, so I will type as much as I can, but I am quite tired.

We spent a majority of the day talking about internals. In retrospect, it is easy to find oneself being treated to one of Andy's many expositions about Oracle internals; he is long-winded afterall, and he does not jabber about innocuous, irrelevant drivel. First up was supposedly about ASM, but turned out to be more in the context of Disk i/o. He repeated something I had heard at Dell, but Andy went into such great detail that you cannot help but to utterly believe him.

Your harddrive is a spinning disc, and as everyone knows, the outer edge of the disk spins faster than the inner edge. Typically, the outer edge of the disk is filled first. One of Andy's findings was that if you have data on the inner edge and have to move the disk head that much, you can slow down your disk access by as much as 4 times! Solution, do not put anything on the inner edtge. You can take that one step further and limit your disk use exclusively to the outer edge, vastly minimizing seek time and disk latency. So for a relatively significantly speed-up in i/o timings, one can give up 60% of their storage and only use the outer 40%. I am going to have to head over to Kevin Closson's site some time and read what he has said about that. I know this is an old idea that has been given a lot of airtime, but is it practical? Andy did qualify his statements and specified that this strategy is not practical for a majority of the datafiles; you would probably only want to use this on something like REDO logs. Something small and requiring high speeds.

Additionally, 2k inodes can typically consume 7% of the disk. Consider using larger inodes, like 16k, which can reduce "wasted space" down to 1%.

I also got to thinking, why do disk manufactors not make separate heads for separate zones (outer, middle, inner)? Surely I am not the first to think of this, hence there must be a good reason this is not done. Google.....

We talked about the ASM architecture, focusing on the RBAL and ARB processes. We also got into BIGFILEs, the encoded block id # and ramifications. I will get into those another day.

We took a lengthy diversion into RMAN, learning about the history of the Recovery Manager and working up to the current version (10g, no 11g secrets unfortunately). Andy mentioned, amongh other things, how Oracle is replicating significant portions of data between the controlfile and the recovery catalog, paving a path for obviating the catalog. However, the catalog still is quite handy when using Data Guard, so it is not going to die quickly. Talked about 10g features like Block Change Tracking, Block Compression and Flash Recovery Area. One thing I did not know was that one can use the Flash Recover Area (FRA) as a hot failover in case of the primary datafiles becoming corrupted. For example, if your System tablespace becomes corrupt (a recovery scenario that requires the database to go down), one can opt to failover to the FRA instead. How cool is that!?!

After that, we dug into RAC Performance Tuning. Again, we covered a lot of ground and I can only skim the treetops here, now. Talked about about the impact of remastering global cache locks, and talked more in detail about what role the LMON and SMON processes play in instance recovery. Even though 10g introduced FAST_START_MTTR_TARGET, this parameter is not used (or rather, it is ignored) in RAC, there is an alternative with an underscore parameter; _FAST_START_INSTANCE_RECOVERY_TARGET.

RAC tuning follows these steps:
  1. Application tuning; not merely rewriting the sql, but more importantly segregating the dml intelligently to reduce block transfer
  2. Resize (read, double) and tuen Buffer Cache
  3. Reduce Full Table Scans in OLTP
  4. Use ASSM
  5. Increase the size of caches for sequences
  6. Use partitioning to disperse traffic
  7. Avoid hard parses
  8. Minimize locking
  9. Remove unselective indexes

We talked a lot about blocks and extents. Specifically data blocks and segments under manual or automatic control. Andy was adamant about explaining the underlying architecture (aka, internals) because, as he says, "If you know how it works, you can figure everything else out." I believe Andy is on a mission to explain exactly how badly Block Transfer (via Cache Fusion) can really bring down your interconnect, which is the lifeline of your RAC. Keep in mind that for a very simply scenario, if you have a dirty block that needs to be shipped to another node, you have to also ship two undo blocks (undo header and the undo block itself). So that is a minimum of 3 blocks per request. Now, what if your are accessing a row that has an index? Or 10 indexes? With a depth of 6? So now we are talking about an extra 10 * 6 blocks (a branch header block for each branch level for each index). Now, what if you are making an update to that row? Now you have to send all the associated undo for those index blocks. What if you updating a 10 million row table? How big is your interconnect?

ASSM, or bitmap blocks, and multiple freelists help reduce the massive impact, but their contribution can be minimized. For example, using freelist groups is good, but with a lot of dml, freelists can migrate back to the header block, which is where you do not want them. ASSM also has a downside in pure Data Warehouses, because master block locks can exponentially increase.

We also went over ITL Waits. Andy calls these a poison pill (he has found a few poison pills, by the way). They are rather wicked because 1) they do not throw an error, 2) few tools can recognize them, and 3) they can really bite. Apparently, ADDM can report them. You also want to watch out for Row Migration (checked from CHAIN_CNT); migrated rows maintain a persistent hold on the ITL slot (as opposed to releasing it). And chained rows need 2 undo blocks.

Lastly, indexes. Skewed indexes in RAC are bad (look up a couple paragraphs about the interconnect freight train; moving all those index branch blocks). You can use reverse key indexes to improve ordered, monotonically increasing keys like sequences. Or, you can cache a huge chunk of sequence numbers per node, which help to disperse the range of updates to indexes. One other last trick is to isolate all DML to one node so as to significantly reduce Block Transfer. Did we say Block Transfer can be the death of your system?

That is all I have time for tonight. I hope to go back at some point and hit the "lesser" points. We did have more famous Andy pictures. I guess I could say, "Wish you were there."

2 comments:

Anonymous said...

How practical is it to fail over to FRA? Eventually, all the data files will need to be moved back their locations, no?

Charles Schultz said...

Good point. Maybe it is a "best case scenario" given a couple bad choices. Ideally, if one has the money to burn and a severe SLA that requires a lot of uptime, I would think failing over to the FRA is a better option than stopping the database, right? Would it not be possible to then schedule a time to get the database back to where it should be?

Granted, there is a a lot about Flashback that I do not know. For instance, what is the likelihood of the primary datafiles of the SYSTEM tablespace becoming corrupt in such a way that they are not covered by ASM but yet still good in the Flashback area. Wouldn't you have to introduce a time delay to catch it time (ie, dropped objects, bad data)? Time delays in a RAC environment just sounds bad. =)