Sunday, June 17, 2007

RAC class, the last day

What a week. And what a way to wrap it all up. I had to take some time away from it all; the class, blogging, thinking about it... But I do not want to delay too long, because I want to commit this to another media before I start forgetting about it.

Friday, we covered the last 4 chapters in 4 hours. Or more like 3, if you consider we started at 9:15, took a couple breaks and ended at 12:30. There are a couple factors why we were able to rip through them so fast. You will see.

Chapter 9: Clusterware
Due to all the issues we had earlier in the week in our efforts to remove ASM from the OCR, I was looking forward to this chapter. Why the sections that covered various CRS (Cluster Registry Services) commands were a little light, the combination of us having to dive into this stuff blind and having a great teacher like Andy facilitated my understanding of the material. Plus, the chapter goes over a bit of architecture, which I find very conducive to laying the foundation for the "big" picture.

Andy started off by addressing a fairly common problem of automatic (and frequent) instance restarts. Since the most common root problem is slow disk, one needs to introduce slight timing delays, as shown on 9-38. Basically, for each resource in CRS, you can manipulate the AUTO_START (as), RESTART_ATTEMPTS (ra), and UPTIME_THRESHOLD (ut) parameters. The page suggests that you set these to 2,1,7d (respectively) for the instance, ASM and the database.

To help speedup the interconnect, aside from increasing the bandwidth, one can also increase he tcp/ip packet size.

Since the OCR and Voting Disks are so critical, it is suggested that they be placed on isolated physical disks, and to either mirror them on the backend or multiplex them up front. Also, the CRS automatically backs up the OCR (default location = $CRS_HOME/cdata/$CLUSTER_NAME). Curious that the Voting Disks are not also backed up at the same time. Curious also that the book recommends one use symbolic links for the Voting Disk path, which makes it easier to restore a prior version (since the OCR cannot be modified directly, and it stores the path to the Voting Disk.... how else are you going to change it?).

One of the biggest problems with this scheme is that the OCR has to be synchronized with your software. If you have a database that was recently upgraded and you wish to revert to the downgraded version, you have to also restore the relevant point-in-time version of the OCR to match. That sounds like a major headache.

Andy recommends that one make the OCR 200mb, the Voting Disk 20mb.

The rest of the chapter deals with some specifics involving the OCR and CRS, giving brief examples of using the crs_* APIs. Not much to go on, as mentioned earlier, but at least a taste.

Chapter 10: Clusterware and RAC Diagnosis
The VERY FIRST point in this chapter is to make sure that all your nodes are using synchronized time (ie, NTP). Let me say that again. THE VERY FIRST point in this chapter is to make sure that all your nodes are using synchronized time. Why did I repeat that? In our lab, none of the RAC nodes were setup with NTP. This is a RAC class. There is no NTP. What is wrong with this picture? Several students in the class (us included) were unable to complete the labs on the first day because of this problem. And remember, it takes 6 or 8 ostentatious hours to rebuild the machines. So keep that in mind, NTP makes a difference.

The rest of this extremely short chapter (like 10 minutes) focuses on CLUVFY, the "cluster verify" tool. It is very handy, very versatile, and I see lots of RAC experts out there using it in their documentation. Some other highlights from this chapter include a map of the clusterware main log files (yes, Watson, a map!); we are talking about 13 different log locations. Oh the insanity! There is also a perl-based diagnostics collection script which looks like it might actually be quite useful. It is located at $CRS_HOME/bin/diagcollection.pl.

Chapter 11: Node Addition and Removal
I was looking forward to this chapter based on the problems we had with removing ASM. Surely, I thought, we were going to cover all those pesky CRS and OCR API commands we attempted to use. Ironically, Andy spent less time on this chapter than chapter 10 due to the fact that the complete chapter is obsoleted by Grid Control Release 3 (the book was written for Release 2). In a way, I was sorta glad; the chapter is simply full of screenshots, which have poor presentation quality if you are marching through them during a lecture. Bores me to death.

The one thing that Andy did say about cleanup operations that I wanted to pass along was that adding a node adds about 250mb to SYSAUX, and removing a node subtracts the same amount. So if you have a 16-node cluster, keep in mind that your SYSAUX is going to be over 4gb in size.

Chapter 12: High Availability
Finally the last chapter. Unfortunately, in an effort to breeze through this chapter, Andy reverted to reading a lot of slides which I find particularly unhelpful (because I can read them myself, thank you very much). Additionally, the whole chapter is a summary of Oracle's published Maximum Availability Architecture. But on with my notes.

As noted in my posts from IOUG, 11g will feature rolling upgrades. One of the big topics being pushed is that you will be able to upgrade from 10gR2 to 11g without downtime. I am sure there are strings attached, but we will have to wait and see. 11g is supposed to "unveiled" this coming July 11th.

The mindset one must have when developing against a RAC is to not lock tables. Obviously, there are times when you are required to lock a table, but care must be taken to do so as infrequently and as quickly as possible. For the rare occasion when you have a really long row (lots of columns, large datatypes), here are some helpful hints:
- Keep frequently used columns at the front of the table definition
- Always define precision and scale

One question I got to thinking of is if we add a node to our Chicago RAC, will the Urbana failover cluster also get a new node. It should. It better! *grin*

The database will use either standard REDO or standby REDO, but never both at the same time. Hence, it is suggested to define both, and on the same disk.

Along the lines of establishing redo logfiles, the limiting parameter (MAXLOGMEMBERS) used to specify a very hard limit. In fact, you could not change the parameter without recreating the controlfile. This has been changed in R2 (or so I am told) and the following parameters are effectively dynamic (what does that mean?):
- MAXLOGFILES
- MAXLOGMEMBERS
- MAXINSTANCES

Also, if you plan to stay away from an RMAN catalog, it would be wise to bump up the MAXLOGHISTORY to 10,000.

Andy pointed out an interesting revelation if you are successfully able to segregate all DML away from query operations; simply point your DML applications to the primary and redirect query users to a logical standby with all the required indexes. A big bonus for dividing the workload and use of the system.

Fast-Start Failover is also covered in the chapter, but Andy whipped through it with the comment "It DOES NOT work!"

In summary of the white papers and publishes Best Practices, page 12-23 has a chart. In light of everything we talked about this week, Andy made some corrections. For instance, in direct opposition to one of the suggestions, "DO NOT use Flashback Database!". Also, do not autotune checkpointing, do it manually.

I am going to close this post here, but I do hope to collate and organize my thoughts on what to "bring home" as we start our own RAC project. Definitely the Best Practices point out some good things to look into. My biggest concern is that we will not have a load to test with, hence some of the finer aspects of Performance Tuning are going to be hidden and we will have to go with what others have said.

Time for some real work. *grin*

3 comments:

Anonymous said...

Ha, how ironic that the day I get back to work I run across your blog. I was in the class across the isle from you. One of the key points I took away from the class is quote "a RAC database behaves very differently than a single instance database". It's easy to get caught up trying to tune a RAC database like you would any normal single instance database. There are many gotcha's involved.

Dan Norris said...

Regarding rolling upgrades, are you sure that you heard that right? I believe that rolling upgrades will be supported from 11g to some post-11g release and also for many of the patches on top of 11g. However, I think in order to take advantage of those features, you'll have to bite the bullet and get upgraded to 11g first (which may involve some downtime). I'm not sure there's a lot of good information available on this yet, though.

Charles Schultz said...

Yes, I am sure I heard that right. Granted, Andy does some times get mixed up (for instance, the deal with 10gR3 EM actually being 10.2.0.3, which he said it was not).

I did a quick search of the newly released 11g documents on OTN, but did not find anything about specifically upgrading from any version of 10g. I am not saying it is not there, only that I did not find it.

So, how about whoever finds out first put a little blurb here. =)