Thursday, June 14, 2007

RAC class, day 4

We covered a lot of ground today, starting off with some really great stuff but ending the day with rather boring material.

Andy re-emphasized the need to be very careful about DML on RAC nodes, because Block Transfer and Cache Fusion can really bring down your performance. Some general strategies are to keep all DML on one node (and master the block locks for those associated tables on that node). Or you can intelligently design parallel DML across more than one node (remembering to include partitioned tables, even if the tables are a measly 20k rows). Also, when a truncate operation is performed, it must clear out all relevant blocks from all nodes.

So, Global Cache Services (GCS) is the big archWait we need to watch out for. There are four distinct problems with GCS:
InterconnectIncrease bandwidth with more interconnects and/or a bigger pipe
Downgrade Global Cache locksCommit more often
Flushes to RedoIncrease size and # of redo logs
Building and shipping blocks via LMSIncrease number of LMS processes (accomplished with an underscore parameter)

The LMD process requests locks for a block
The LCK process manages block locks
The LMS process builds and ships the block

You can see interconnect waits in v$sqlstats.
v$segment_statistics has RAC-relevant groups:
- Global Cache Services (GCS)
- Global Enqueue Services (GES)
- Messages Sent

We spent a bit of time covering Services. I mentioned this on day 1; this isn't Kansas anymore, Dorothy. I am glad Andy spent so much time going over Services. However, he also spent a lot of time going over Resource Manager which is when my attention started to slip a bit. But lets go over Services first.

Services are a grouping of sessions that do the same kind of work.

There some really good attributes listed in the book (page 7-11). I can post them later, but there is a bit there so I am saving that for another day. Besides, I do not have the book here with me. One can create services in DBCA, EMGC or even on the command line if one is so compelled. Although I advise against the CLI for the first-timers. I found the lab to be very helpful in introducing services; however, even though I got to create a service and connect to it, I still have a hard time visualizing how we would use them in "real life." I think this is why Andy tied it to Resource Manager. I do know that you can aggregate statistics to a service (in fact, if you do some Performance monitoring via EMGC, you can group by Services rather easy), but using Services with the intent of governing resource usage seems to be the key thing in a RAC environment.

Andy walked us through the evolution of Resource Manager. Personally, my experience with Resource Manager has all been negative, even after using it on the OCM. I simply found it a bother and impractical. I guess if you do not use it often and have no business reason to use it, it does seem like an extraneous feature. So, anyway, Resource Manager allows one to cap the resource usage for your sessions, or group of sessions. You can manage cpu usage, estimated query time and UNDO space. With 10g, you can also limit idle time, including and specifically idle time spent blocking another session. We should probably take advantage of that at AITS. *grin* Yes, I was able to find one practical, down-to-earth use for this beast.

However, we started discussing the rest of chapter 7 (services) and I was very distracted. First off, there is just too much junk in Chapter 7; too many slides showing you pl/sql code that Andy simply read. How useful is that. It was almost as if he was bored with this as well. *grin* Additionally, I got to reflecting on all this stuff. Oracle is encouraging us to manage the storage (ASM), the OS, database and application tier (Grid Control), everything to do with users and connections (Resource Manager).... and, as Andy said, "EVERYTHING!". Why? Why does Oracle want to push DBAs to have control of all this stuff? I am not comfortable with it. For one, how in the world can you expect anyone to be good at everything? Secondly, there are folks who already do a good job at a lot of those tasks, so why cross-train a database-specialist in a new field? Is Oracle so greedy and power-hungry that they want everyone to think like they do? I do not. I am a DBA, and a DBA I want to remain.

Do not get me wrong; I think it helps to step over the line a little bit, but in both directions. It helps for the DBA to know some about the storage, and helps for the storage guy to know a little about the database. But I do not agree in the trend to consolidate job roles. Bad idea.

This was my big complaint for the rest of the day. We jumped into TAF (Transparent Application Failover) which was really cool, followed closely by FAN (Fast Application Notification) and ONS (Oracle Notification Services), but based on Advanced Queueing, and both of which I despise. Again, why should the DBA take over the responsibility of guaranteeing that the application is always available? I am not arguing that the DBA should be clueless and ignorant, but rather that there should be no expectation that only the DBA has responsibility for those strategic goals.

So, let's go over something worthy of class report. *grin*
Transparent Application Failover

First, a huge warning for those of us running 1 listener to serve lots of databases with lots of concurrent connections. The Oracle listener can only handle 110 connections per second. Ergo, Andy says
Everyone should have more than 1 listener for Production Databases!
He also suggests that if you do use more than 1 listener, it would be wise to avoid the default port of 1521, which can be very confusing. 10g databases with detect all listeners anyway, so do not worry about LOCAL_LISTENER.

To start at the basic end of TAF, one can enable Client load-balancing by using tnsnames.ora parameters LOAD_BALANCE and FAILOVER. Using at least 2 listeners, you can use these parameters to load balance and failover between the two. This works even if the listeners are on different nodes, or pointing to different instances.

Next, one can enable connection load balancing via REMOTE_LISTENER. And finally for the really good stuff. When a connection is made (create session), the dedicated server process registers against a table and stores information in the HEAP of the Shared Pool. In a RAC environment, the Shared Pool HEAP is automagically conveyed to other nodes via the interconnect. Meanind that remote nodes are always kept up to date with session information like execution status. So what happens if the node crashes? Since the session information persists in the HEAP of a remote node, the session can simply failover to another node. How cool is that!?! It almost sounds too good to be true. I guess it is possible to be too good; perhaps I missed some critical details, or misunderstood what Andy was saying. Need to research that a little more.

We talked about some other tnsnames parameters like PRECONNECT, which can establish a shadow session on a secondary instance (ie, Standby). If the Primary fails, the failover occurs faster because the sessions already exist. The cost of doing this is that it takes longer to create the initial session.

We spent a bit of time talking about FAN and ONS, which then merged into a talk about the Scheduler and more on the Resource Manager. My brain started to check out, as per my rant above. To quote Andy (literally), "Blah blah blah blah."

One other thing I want to bring back to work is another question. How are we planning to keep our middle-tier Highly Available, specifically for Global Campus? I think we have the database nailed, especially (but not only) because of RAC. What about that application? =)


Sudo said...

Very nice explaination of the RAC concepts. Incidentally, I too am working/studying RAC. Found it quiet useful. Also support your stand that a DBA should be a DBA and crossing over the roles is really not good.

Charles Schultz said...

We are still dealing with those cross-over issues. In 11g, Oracle seems to be pushing new asm and crs owners, which makes things even more confusing. If only things were perfect. =)