Monday, June 11, 2007

RAC Admin class, Day 1

Wow, for a first day, Andy sure packed a punch. I am up in Chicago taking the 10g Real Application Cluster Administration class with Andy Fortunak. Andy is great, and I refuse to take any more DBA classes from other instructors unless they are on my short list (Andy Fortunak, Sue Jang or Rick Pandya). I am not going to go over Andy's bio at this point, but he really knows his stuff and has been teaching for quite some time.

So, I was one of 12 folks in the classroom, each of us occupying one computer. We were given 3 class books (or rather, powerpoint printouts with embellished notes); two of the books are rather thick, the third is merely the last 3 appendixes (of 5). Actually, to put things in a little perspective, I learned a little later during the day that one of the fat books is 550 pages of lab material. Amazing! Full solutions with screenshots and step by step instructions. 550 pages!!

Andy is long-winded. Usually, this is a good thing, as he starts talking about the nitty-gritty details, stuff you will never hear from a sales person, a talking head or any Support Engineer. Probably not from any other instructor, either. But it took us about 30 minutes to get around to general introductions. The class is 5 days long, with 12 sections and the intro. Despite the books "recommended" curriculum, Andy plowed ahead with his own agenda. Typical Andy Fortunak. We spent the entire day (until well past the "normal" closing time of 5:pm) talking through the Introduction. And boy oh boy, what an intro.

First off, myth 1: RAC will make your system run faster.
Truth: RAC has the potential to make certain things run faster (especially in a warehouse), but more than likely, your application will run slower. Or maybe you will not notice any performance difference at all. It depends.

Services are big in RAC. Service Names are just the tip of the iceberg. A majority of 10g RAC Services are applications (ie, code, pl/sql, java, etc). While little used, Resource Manager is meant to be used with Services and can be quite powerful when wielded correctly.

Later on (later in the week), we will go over Backup & Recovery for the OCR file; it is that important.

Ignore Enqueue Services (GES), tune Cache Services (GCS).

A sidenote about Banner and RAC; BAD IDEA!! At least, until the fine folks at SGHE can rewrite the application to take advantage of parallelized operations. Otherwise, performance is going to suck and not worth the investment. If High Availability is a requirement, there are other alternatives with much better cost-benefit ratios. I cannot support the idea that the U of I be the ones on the cutting edge in terms of RAC. SunGard really needs to lead the charge in this, and they need to prove their leadership with large loads.

The Keep pool is important for RAC. More later.

RAC does not guarantee 100% uptime. If 1 node goes down, the database enters a brown-out time while the Lock Monitor (LMON) processes broadcasts to all surviving nodes that they must quiesce while the observing node remaster all global block locks. That is fancy-speak for saying that the a lot of work has to be done to redistribute the work that the now-dead node was doing. It takes a lot of time and resources to do that.

Automatic checkpointing (ie, fast_start_mttr_target = 0) in RAC is not good. You want better control over the checkpointing process because of all the extra events that trigger checkpoints and redo log buffer flushes in RAC.

Global Hash Partitioned Indexes can be good for non-partitioned tables with a sequential key.

Myth: UNDO is the next evolutionary step in Rollbacks.
Truth: UNDO = Public Rollback segments in a LMT (Locally Managed Tablespace)

RAC traditionally attempts to meet two objectives: scale-out and speedup. Scale-out is usually obvious; you add more hardware to handle more load. Speedup is more elusive; under optimal situations, you may see a max of %69 faster operations.

Andy drew many diagrams on the whiteboard which I will not replicate here. He went to lenghts to demonstrate how the degree of parallelism can quickly consume resources. This is mostly due to the fact that you get (2 * parallel degree) query slaves, each of which inherit specifications like Sort Area Size from the PGA. If your sort area size is 1Mb, and you set parallel degree at 4, you are not consuming a mere 1MB for sorting, but rather 9MB (1MB for the PGA + 8MB for the query slaves). Likewise, you also exponentially exacerbate the load on the CPU for each degree of parallelism. You start to get the picture that RAC is about parallelizing work. Tread here carefully. RAC solves memory and CPU bottlenecks by distributing the load among several nodes.

Global Temporary Tablespace (or Temporary Tablespace Groups) are also another big thing. RAC has the ability to automatically associate an affinity between a Temp Tablespace in a Group with a participating instance in the node if you have at least one "Global" Temporary Tablespace per node. Also, since Temp tablespaces never need recovery (recovery is impossible), Andy suggests that you never mirror the disks (Striping, ala RAID 0, is still a good thing, though).

Andy dropped a hint that Enterprise Manager R3 is out, and highly recommends using it over R2. Clarification, the 10.3.0.x version of Grid Control, not 10.2.0.3 of the database. He is supposed to provide a link for that in the near future.

Expect the SYSAUX tablespace to grow very quickly. This is because each instance maintains its own copy of the AWR, which is stored in SYSAUX. The default retention is 7 days; imagine what happens if you want to save 32 days.

Like Temp Tablespace Groups, Andy also recommends one UNDO tablespace per node. Alledgedly, there is a parameter that aligns each UNDO with a specific node.

As opposed to the well-known practice of multiplexing controlfiles 3 ways, Andy suggests that controlfiles be multiplexed 2 ways over mirrored disk (RAID 1, not unlike a REDO setup). This is mostly for recoverability purposes; RAC is going to hit the controlfile even harder.

Andy says:
Oracle is moving away from Recovery Catalog like nobody's business

Apparently, Oracle is focusing more on the controlfile to keep track of RMAN. However, the one exception is that the catalog can still be very useful with Data Guard, due to the fact that the standby controlfiles are different than the primary's.

RAC Redo logs can be configured differently for each instance. I have yet to find out why (why it is possible, and why you would want to in the first place).

Andy was very concerned about the interconnects, Cache Fusion and Block Transfer. He is of the opinion that you need to spend the big bucks on the interconnect hardware. Interesting how Dell did not have quite the same opinion. It will be interesting to see how things bear out for us, but I suspect that with our small load, we will be fine with the "commodity" Gigabit NICs for the meantime.

Andy was also quite adamant about not using the Flashback Logs. I think he will follow that up with more details later in the week.

We spent a good amount of time on Lock Manager. At startup, RAC propagates a complete list of all blocks and assigns a lock for each one (yes, every single one) to each participating node. Each lock starts with NULL, and there is quite an interesting scheme to upgrade and downgrade block lock levels. What it comes down to is that downgrading a lock take a long time. Andy went to pains to demonstrate this with an extremely detailed picture (he is known for such pictures). Also typical of Andy, he provided a number of "extra" slides that he had developed over the years (some dating back to 8i OPS), and gave some good reference information for the various lock states. Essentially, it is a very expensive operation to downgrade a lock because the node that masters the lock must flush the redo log buffer. So if Node 2 requests a block lock for a block that is already in Exclusive mode on Node 1, Node 1 must first downgrade the lock causing the log buffer to be flushed, then upgrade the lock mode for Node 2, and finally transmit the block and associated Undo information (2 additional blocks) to Node 2. You can see why an OLTP system would have heyday with that. And as mentioned earlier, when a node crashes, all those mastered block locks must be redistributed amongst the surviving nodes.

And that was only what I wrote down. There is a lot more on the whiteboard, in the class book and in the supplemental slides.

Tomorrow we will dive into a full day of labs. Andy has promised that it will be challenging and frought with sandtraps; if we make a mistake, it takes about 7 hours to rebuild the computer. Don't ask me why; I have always been amazed at how disruptive and slow these computer rebuilds are for Oracle classrooms. But back to tomorrow. We are paired up such that each pair of computers will become one node in a two-node cluster. One person will be Node 1, who, it is rumored, will do more work. Node 2 must be done in a synchronized, serial fashion. I guess we will learn more. My partner is Bob, a Sys Admin from Northern Trust. I figured between the two of us, we have all the OS and DBA skills covered. =)

The rest of the week promises to be grueling. We will be starting at 9 and marching all the way to 6 (at the earliest). Hopefully we will have some reprieve on Friday. I am excited about the lab section tomorrow; we will be doing some raw install and configurations, starting with naked hardware. I hope to pass this experience on to my colleagues who are eagerly awaiting hardware of our own.

6 comments:

The Human Fly said...

Nice one.
We have recently implemented 2 node RAC and it is doing good.
This September, Murali Vallath is going to present 4 day workshop about RAC internals and functionality here at our premises.

Jaffar

Charles Schultz said...

Thanks for dropping by, Jaffar. Where is "our premises"? Out in Riyadh?

Murali seems to be a cool guy; I attended one of his webcasts last month sponsored by the RAC SIG. That is a great group to be involved in.

Anonymous said...

I am glad that you have \"discovered\" Andy. He is really a great instructor.

Ciao

The Human Fly said...

Yes, the RAC workshop will be conducted in Riyadh.

Jaffar

Dan Norris said...

FYI, I know of only one "successful" RAC implementation with Banner. And you're right, SG doesn't really "support" it. From what I gathered, it sounds like they send their support issues or other customers asking about running Banner on RAC to this site for answers.

From the discussion I had, I would have to agree that it's not the right time to put RAC under a Banner system.

Charles Schultz said...

Ironically, I posted these sentiments on a SunGard-hosted Oracle list, and word got back to my Oracle Sales/Tech Consultant, which meant he had to do a lot of paperwork. As a result of all that paperwork, I know now that there are several schools running Banner on RAC, and I have also been made aware of several options available to tweak and tune RAC to mitigate Banner-specific issues.

What I am still looking for is solid proof that Banner is written well enough to work with our specific load under RAC. It is my opinion that the "other" schools are small, relative to us. I could be wrong.