Tuesday, June 26, 2007

RAC: The big install

So, we are attempting our first "real" RAC install; not canned, not pre-fabricated, but using only software downloaded from OTN and following Oracle Documentation and various forms of cliff notes. This is one of those things that is really sweet if it works 100%. Otherwise, you are in for a headache. We have a headache.

That RAC class was good for teaching things, but it also perpetuates a false sense of security when things go wrong. And from what I can tell from all the notes and pleas for help out there, things go wrong often. One of the mantras I hear is to follow the documentation exactly! This is all good, but the documentation itself comes in many forms. Do you follow what Oracle has said, or do you pick some expert (K Gopal or Julian Dyke) and follow what they say? Cluster Verify (cluvfy) is also a little misleading; it will not check for 100% compatibility with the installed RPMs. In fact, I even had one Oracle Support analyst tell me that was the DBAs job. That is a lot to swallow. Take a DBA who does not know anything about Linux and tell him to verify that 24 RPMs not only exist, but are compatible with the required RPMs. I tried to write a script for it, but in the end, the only "failsafe" way to do it is by hand. I say "failsafe" because human error plays a large role in these RAC-related problems as well.

It would seem to me that one good way to eliminate, or at least reduce, human error is to automate. Dell IT has taken this to extremes and automates a vast majority of their day-to-day tasks. Checking the RPMs is just a small fraction of something that could easily be automated. What about user equilvalence? What about all those silly root scripts? Or running oracleasm to configure and create disks by hand? What boggles my mind is that 10g RAC does so much that is really cool and automated; when the sun shines, life is good! Why are some basic things left out, but you have some nifty tools like cluvfy that is really slick at verifying a good chunk of your install work?

Ironically, our CRS installation was hunky-dory. The rootpre.ksh was a bit weird (why is it checking for 9i CM??), and double-checking all the paths and homes is about the only thing that slowed us down. Things went south when it was time to install ASM. Our first warning flag was that the swap space was not big enough. Thinking it was a red herring, we ignored the warning. Later on, after the software was installed and the configuration assistants were running, we hit our first major roadblock; link not satisfied on njni10. Not much that seem relevant on google or metalink. Oracle Support told us to attempt the installation again. Now think about this; the analyst assigned to us specializes in NetCA (that is why we filed the SR). This guy tells us to simply re-install ASM. Having had ASM problems in class, I was not exactly happy about that. Remove Oracle Homes, zero out raw disks, make sure no processes are running, and away we go. This time around, ASM cannot see all the disks. So when I tell my support analyst that we have new problems, he has to bring in a database specialist because the original guy does not know anything about ASM. What a joke! On top of that, he "reminds" me to keep the scope of the SR to one issue. GRRR!!! Of course, we are subjected to the usual onslaught of new questions and request for an RDA. I am actively ignoring them. We were able to work around a large number of our problems, but in the end, we want to simply wipe the slate clean and start over.

Deleting everything and wiping the slate clean is not easy. No sir-ee. This is where having root privs come in really handy, because of someone's ultimately wishful thinking, the CRS Oracle Home is installed with root as the owner. By default, oracle does not have any privileges to remove or modify anything in the directory, and only limited privs to execute anything. For instance, running crsctl evokes a "not enough privileges" error. Not to mention the slew of root-owned processes (crs, css, emv) that have to be dealt with.

What fun.

On a separate note, we were supposed to have a webinar with our ERP vendor (SunGard Higher Education, or SHE as some say) on the topic of Oracle RAC. *cough cough* I went with the intention of mildly heckling them, but they had technical difficulties with the virtual presentation. Sounds like even putting the letters R-A-C on something is prone to make it break. *grin*

Seriously, though, I know we will not be moving towards RAC any time soon for our production ERP system, and I am very curious to see how other schools manage it. In a morbid sense, I am also curious if they are buying the line from some sales person about how it will help their system, or some form of HA. RAC looks great on paper, but after scratching the surface as I have, it ain't all that pretty underneath. Don't get me wrong, as I mentioned earlier, it does a lot of cool stuff, and it does it well. But there are two sides to that coin, so it would be wise to keep things in perspective.

10 comments:

Kevin Closson said...

Charles,

Just out of curiosity, what distribution of Linux did you do this on and how many nodes are there in the cluster?

Charles Schultz said...

Hey Kevin, good to hear from you!

RHEL4 with a mere 2 nodes. We were working quite closely with Dell's cooked install recipes, but opted not to run their "RAC-in-can" image, yet. We are talking about doing that soon. We wanted to try something on our own just to get the hang of it. And yes we are learning a lot. =)

Kevin Closson said...

Hmm, RHEL4 and two nodes. Things should not be that difficult. I think what you have is more on your hands than RAC. I've seen OCFS2, and ASM. That means you also have simple raw disks for OCR/CSS and since this is Dell, is my guess right that you have EMC storage with PowerPath?

Lot's on your plate. You know me, I'd say NAS...

Ok, I'm sorry for SPAMing your site, Charles, but your situation is precisely what I talk about. You are a Certified Master who has also been to specific RAC training and you are experiencing this much difficulty on a 2 node clsuter using a modern Linux distro. Further, most of your problems seem to be storage related. I think that all speaks volumes. Am I off-base?

Charles Schultz said...

"Should" being the operative word, right? =)

I posted a follow-up on oracle-l:
http://www.freelists.org/archives/oracle-l/06-2007/msg00823.html

Our Linux SysAdmin guys considered EMC PowerPath, but ultimately decided to go with the built-in version (is it actually called Linux Multipathing?). And yes I agree whole-heartedly with your statements; my boss made the same observations after we had already sunk over 40 FTE of 2 highly skilled DBAs plunking around with the installation. The biggest issues, for us, came down to having trouble with the SAN. Even the Linux SysAdmin guys are exploring new territory, since we have not had networked-attached storage for any of our Linux boxes, yet.

Kevin, I need to read more of your posts about this stuff. It seems to me that Raw gives you the best performance at the cost of a bit of overhead. In our situation, our dinky little application is not going to see any IO performance problems for quite a few years; if anything, it will be an issue of the application doing silly things in a RAC environment. So RAW is probably overkill. But it comes out in the Best Practice papers, and everybody knows you have to follow Best Practice. *grin*

We talked briefly (very briefly) about NAS with the folks at Dell (ie, Logan McLeod). Unfortunately, we did not ask about NAS in the context of datafiles, but only for the Oracle Home. I need to revisit that topic. As you have already observed, our understanding of the storage architecture is in the early stages. We have some really smart folks in the Storage Team, but they do not talk Oracle at all, and the idea of using clustered systems is still new to all of us.

Which brings me to another point that RAC is really pushing the boundaries between various different traditional job roles. On the one hand, Oracle wants DBAs to have the root password and the final say on exactly how the disk is laid out. On the other hand, Oracle also wants the DBA to commandeer the application support roles via OAS (and ONS and FAN and...). What happen to "segregation of duties"?

Charles Schultz said...

Whoops, blogger chopped up my link for oracle-l pretty bad. Here it is: Oracle-l link

Kevin Closson said...

Charles,

There is a reason the worlds largest collection of Oracle databases (Oracle's On Demand hosted operation in Autin, Tx) uses NAS! Raw versus concurent+Direct I/O is such a lark in likely 98% of all deployments. The amount of I/O typical apps do wont push that envelope. Most people are woefully unaware at just how little overhead there is with direct I/O. And Oracle on Linux with NAS/NFS uses direct I/O. I don't even start measuring the difference between RAW and concurrent+direct I/O until I'm seeing on the order of 500 IOPS per core...per CORE! That I/O rate likely represent the type of I/O seen at, what, maybe 2% of all Linux production Oracle shops? The difference between NFS and SAN is wire and protocol. Most people underconfigure the number of spindles to the point where the wire and protocol are moot anyway. I do feel sorry for most shops trying to go at this without a total understanding of what it means to deploy Oracle in the commodity computing paradigm. Legacy Unix? Sure, hardware your 4Gb FCP to the murderously expensive DMX or whatever if it is pumping 10-20,000 IOPS. Linux with 2 or 4 nodes RAC? NAS, nothing else makes sense with today's technology (emphasize today's). Just ask Oracle On Demand!

Charles Schultz said...

Kevin,

I (finally) took the time to read your "Manly Man" series. *grin* Interesting and fun to read.

We do have one big, good reason why we are using Fibre; because 95% of the Oracle databases in our shop are hosted in big, expensive Sun E10ks stored on big, expensive EMC disk (not all DMX, but maybe 50/50). If I had to take a wild guess, I say we have about 10Tb just for the 150+ databases. So, this little dinky RAC we are putting up is technically using existing storage; the connectivity is totally new to us (Linux + multipathing to EMC LUNS), but not the hardware itself.

From reading your blog, I believe you are making the point that commodity hardware is probably the way to go for new grid solutions. Obviously, no silver bullet, but also obviously, no need to go with the Cadillac when a Honda will do.

I do appreciate the time and attention you pour forth to making things easier. =) Is it just me, or does the market have a flirtatious affair with all things complex? Yes, using standard filesystems is so much simpler, opposed to the complexity that is introduced with RAW devices. And if performance is not an issue, why are we masochistically giving ourselves headaches?

You have given me something to think about, and for that I thank you. Keep on posting in your blog, and I'll see if I can keep up.

Chen Shapira said...

I completely agree with you on two points:

1) RAC course and installation guide are inexcusably optimistic. Even if you follow instructions to the word you can easily run into undocumented errors and issues. We see this all the time. Adding and Removing nodes from a cluster is pretty much a toss of a dice on whether or not it will work this time.

2) Oracle's support for RACs is horribly inefficient. I think we only had one or two SRs that were competently and satisfactory resolved by Oracle, out of 50 or so. Not to mention the endless requests for more and more useless diagnostics information. Often I'm spending full time job following up on SRs, only to end up reinstalling the whole thing from scratch because they were so useless.

I know that Oracle is aware of the problem and are working hard to improve their RAC support, but for now it is a major headache.

Charles Schultz said...

Hey, Chen,

From what I have learned in classes, reading blogs, interacting with other users and flipping through the documentation, RAC has definitely matured since 9i. More so if you count that it is the next evolutionary step of OPS (v8). As with all things Oracle, it continually gets more complex, which means that it does some really cool things, at the cost of being hard to understand, hard to maintain and diagnose.

However, I will say that with the rising popularity of RAC, some have gone out of their way to make RAC more reachable for the masses; hat tips to Werner Puschitz, John Smiley and Jeff Hunter who have posted articles at OTN, Dan Norris and Matt Topper who make RAC on VMWare practically as easy as using a screwdriver. There are many others, not to mention all the awesome resources that are sprouting up in Oracle-sponsored blogs, the RAC Pack, RAC SIG and our friends at oracle-l. There are just a ton out there.

The trick is phrasing your exercise (aka, problem, error, crash) in such a way in such a place at such a time that some knowledgeable and philanthropic soul will descend upon your situation and shed light on your path. I wish it were more like summoning a djini out of a bottle, but that's me. *grin*

Another aspect to all this is that with all the complexity, the technology really separates the sheep from the goats. I am learning so much through all the problems that we have encountered in our first baby steps, and I know I am going to learn more. It is easy to get frustrated with the "lack of support" and give up. But there is treasure to be gotten, and I still have room to gather more.

Sorry about going on and on. *smile* I didn't think I did much of that anymore, but this blog thing is bringing it out of me.

Anonymous said...

Boiling all of this down to its essence: RAC is a real pain to get right. And even when you think you've got it right, it can still trip you up down the line.

I have a fair few years of experience as a DBA, but RAC had me gasping for air!

It's one of Oracle's hottest offerings, yet is not as fully documented as it should be, and, simply put, is frustratingly brittle. IMHO.