Tuesday, September 25, 2007

RAC: "How not to design an application"

We have a new Luminis database that was being tested for an upgrade (4.01 or something like that). We found that we were constantly maxing out db_recovery_file_dest_size (database hangs, look in alert.log). The following is what I sent to my team members.

In RAC class, Andy warned us about the redo logs and how they can be a source of contention for some very basic things. Let us use LUMDEV as an object lesson.

Luminis has this little heartbeat function (I do not know the official name or the official purpose, but this is my observation) that updates 1 row of 1 table; a table with only 1 row. It does this continually. Better yet, this is execute not from one session, but from 7 sessions! So you have 7 sessions banging against this 1 row of this 1 table all the time. Obviously, the data in that row is not meant to stick around a long time.

Usually, in a normal database this is not such a big deal. Might cause some waits because of an exclusive transaction, but nothing that causes the rest of the database to wait or contend. In RAC, the world changes drastically.

Andy taught us that each instance masters its own list of global cache locks. These locks exist for every single block in a database (you can imagine what a nightmare it is for large databases). The list of master block locks are split up evenly between all participating nodes (and must be remastered when a node is added or deleted). These locks are often upgraded and downgraded (exclusive and shared), and often sent across the private interconnect (fancy word for NIC) to other instances. When an exclusive lock is requested on a different node than what is already holding the exclusive lock, the lock must first be downgraded, then re-upgraded for the requesting instance. Every time a lock is downgraded like this, the redo log buffer must be flushed.

See where this is going? You have 1 table with 1 row, meaning one block. Because the application is repeatedly requesting an exclusive lock from both instances, the global cache lock is constantly downgraded and upgraded. This produces many many many many many many archived logs for all those log buffer flushes. LUMDEV can easily consume over 18gb of redo space on this one operation alone.

Crazy, eh?


On a similar thread, I am not all that crazy about the dual management of both the ASM FRA diskgroup and db_recovery_file_dest_size. Here is what I posted to the ASM Oracle Forums:

For those of us who run multiple databases on a box with shared disk for FRA, I am finding the extra layer of ASM and db_recovery_file_dest_size to be a minor inconvenience. The Best Practice white papers I have found so far say that you should use db_recovery_file_dest_size, but they do not specify how you should set it. Currently, we have been setting db_recovery_file_dest_size rather small, as the databases so far are small and even at 3x the database size, the parameter is still significantly smaller than the total disk available in that diskgroup.

So, my question; is there any downside to setting db_recovery_file_dest_size equal to the total size of the FRA diskgroup for all databases? Obviously, this means that the amount of free space in the diskgroup may be consumed even if db_recovery_file_dest_size is not yet full (as reflected in the instance V$RECOVERY_FILE_DEST). But is that really a big deal at all? Can we not simply monitor the FRA diskgroup, which we have to do anyway? This eliminates the need to worry about an additional level of disk management. I like to keep things simple.

The question is relevant to folks using other forms of volume management (yes, I know, ASM is "not a volume manager"), but seems germane to the ASM forum because most articles and DBAs that I have talked to are using ASM for FRA.

Most importantly, what ramifications does "over-sizing" db_recovery_file_dest_size have? Aside from the scenario above.


I am really toying with the idea of making the recovery destination super-large and relying on the management of the FRA. To check space issues in the FRA, I can use asmcmd (the "du" output sucks, IMO) or query against the instance itself:

+ASM1_SQL > select FREE_MB, trunc((TOTAL_MB-FREE_MB)/TOTAL_MB,2) "% Space Used" FROM V$ASM_DISKGROUP
2 where NAME = 'FRA';

FREE_MB % Space Used
---------- ------------
43426 .37

I am still looking for a way to generate a "du"-like report for all databases in the FRA.

In the database, I can check the amount of space in the recovery area:

SQL> select (SPACE_LIMIT-SPACE_USED)/1024/1024 "Space Left (MB)", trunc(SPACE_USED/SPACE_LIMIT,2) "% Space Used" from V$RECOVERY_FILE_DEST;

Space Left (MB) % Space Used
--------------- ------------
18624 .04

I would hate to have to login to each database to find out how much recovery space is left before the database hangs.

UPDATE:
Arul made a most informative and excellent response:
http://forums.oracle.com/forums/thread.jspa?forumID=385&threadID=566242#2101818

Basically, set your db_recovery_file_dest_size appropriately, and you will not have to worry about the ASM diskgroup at all; let Oracle maximize/optimize the space when the parameter set to size that tailored to the needs of the database.

Wednesday, August 22, 2007

New blog buddies

For some reason, blogger is not letting me add new elements to the blogroll. And I do not have time to ferret out the issue.

Anyway:
Dan Norris
Jeremy Schneider
Kirk McGowan

Seems like there was one other....

Tuesday, July 24, 2007

RAC is not easy

I have a good friend to whom I like to complain a lot (sounds like my good friend is getting the short end of the stick, doesn't it?). With good intentions, he sincerely suggested that RAC is easy. I put in a fence, working 43 hours over a 4-day weekend. Shall I call that easy? It only lasted 4 days. He is still a good friend, he just thinks everything is easy. *grin*

K Gopalakrishnan once said:
Believe me RAC install is very simple and straight forward if you religiuoly complete the pre requisites.

Ok, enough of the griping. My hopes in posting up some of our specific dilemmas is to document what we are doing, since so often I try to google for specific errors and never find anything useful (I am not counting the Chinese sites, because even after Google translates them, I am not sure I see any resolutions apart from the questions).

So, first problem. I am following Metalink note 357261.1, very religiously. That was a mistake, as it is not complete, yet. Anyway. You attempt to remove asm:
srvctl remove asm -n urbdb1,urbdb2

You check to make sure it was removed:
srvctl config asm -n urbdb1
+ASM1 /u01/app/oracle/product/asm

It is not, so try again:
srvctl remove asm -n urbdb1
PRKS-1033 : Failed to remove configuration for ASM instance "+ASM1" on node "urbdb1" from cluster registry, [PRKS-1023 : Failed to remove CRS resource for ASM instance "+ASM1" on node "urbdb1", [CRS-0214: Could not unregister resource 'ora.urbdb1.ASM1.asm'.]]
[PRKS-1023 : Failed to remove CRS resource for ASM instance "+ASM1" on node "urbdb1", [CRS-0214: Could not unregister resource 'ora.urbdb1.ASM1.asm'.]]

What do you do?

Oracle Support has told me that crs_unregister is buggy and not supported. *cough cough* But I am going to attempt it anyway, since Bill Wagman had some luck with it (if you follow the discussion from oracle-l, you will see that Peter McLarty suggested it).

/u01/app/oracle/product/crs/bin: crs_unregister ora.urbdb1.ASM1.asm
CRS-0214: Could not unregister resource 'ora.urbdb1.ASM1.asm'.

/u01/app/oracle/product/crs/bin: oerr crs 214
214, 0, "Could not unregister resource '%s'."
// *Cause: There was an internal error while unregistering the resource.
// *Action: Check the CRS daemon log file.


Grrr... How quaint, check some log file somewhere on your system, and that will solve all your problems. Having no idea where my "CRS daemon log file" actually is, I use RDA to browse around and finally come up with /u01/app/oracle/product/crs/log/urbdb1/crsd/crsd.log. Unfortunately, the CRS daemon log file is not helping me much. What am I looking for?

Update: 5:19 pm
After a day of reading manuals and discussing options with the fine folks on oracle-l, we still have the same problem, albeit now I have quite a few new tools on my belt. Yes, RAC is not easy, I think I have proved that. To be a little more granular, working with the OCR is a pain in the butt.

So, new tools.
  • strace: a very low-level OS trace utility. I did not benefit from this, but I was able to show the output to others smarter than I. I used it on srvctl and crs_unregister.
  • The "force" flag (-f) of certain commands, like srvctl. I believe it removed something, but I do not know what; I still have my root problem.
  • Appendix A of the Clusterware Deployment and Admin Guide: has a ton of information, most of which would probably be helpful under "normal" circumstances. Did I mention we still have our root problem? However, I have to give credit to the authors, for they did a great job. There is a lot of information about log file locations (wish I knew about that earlier), how to debug various components and resources, and some descriptions of the syntax used for commands. I thought the OCR section was quite thin; perhaps I am biased because I am looking for a specific solution.
  • SRVM_TRACE=TRUE: This is documented in the above Appendix A, but I point it out because it spews out a bit more information. While not immediately helpful, it seemed like something that I should file away.
  • USER_ORA_DEBUG: mentioned one time in the Appendix, I found out that you could crank this all the way to 5. I have no idea what it does or what the appropriate values are - google is not giving much on it, yet.

More to follow. My Support Analyst just requested that I reboot the nodes because he has no idea why the resources exist in the OCR, as ocrdump does not list them.

Update: 11:22 AM, Wednesday
LS Cheng on oracle-l pointed out what ended up being the winning goal.
crs_stop ora.urbdb1.ASM1.asm

I still do not completely understand why this is an issue. Or even how one determines that is the solution. I hope to hear back more from LS Cheng so we can understand how he arrived at that conclusion.

What we ended up doing was restoring ocr to a point before we attempted to follow note 357261.1. Since the services were already down, it was straight-forward to delete the databases, the ASM instance and finally the ASM database. I was actually surprised it worked so well, given all the problems and headaches we had yesterday.

I will add another update when I learn more. Right now we are happy we have a RAC install back in working condition and can move forward with our projects. Oracle Support did not score any points in this round. This is becoming a bad trend.

Monday, July 23, 2007

A few notes about some "new" metalink notes

After a week of vacation, I came back to work to wrap up some Service Requests with Oracle Support and query for a status on others. During that process, I was made aware of a "new" vehicle for patch delivery, documented in Metalink note 438314.1. Apparently, Oracle is trying something new with NApply CPU patch; it sounds promising, and I sure hope it successfully addresses the issues from past CPU patches and the difficulty with merges, as proudly declared in the note. What I do not quite understand is that there are still going to be Super Merge patches; we have a particular bug that did not make the April CPU, so it was being worked on for the July CPU. It missed that, so we were told it was going in the July Super Merge patch. How does NApply deal with that?

What I am most confused by is the claim in the first set of bullet points:
Elimination of the CPU Merge Patch as a vehicle for patch conflict resolution


What exactly does that mean? Obviously, there will still be merge patches; just read further down in the note and you can see how a customer has to report a conflict, and Oracle will supply a merge patch. That is the documented procedure. Perhaps what Oracle is trying to say is that they will not ship a CPU that has your particular patch merged in, but rather a small little merge fix (aka "bandaid") that will allow the existing/original CPU to be applied. That sounds like a good thing. I remain skeptical and want to see it in action, though. *grin*

And on another note (literally), I see that the draft 10.2.0.4 bug list has been published. 1329 undocumented bugs!! Aside from all the "documented" bugs, we are hit with 1329 bug fixes for which we, as a customer, have no clue. 1329 is a big number. How is it that there are that many bugs in the 3rd patchset of its 2nd major release of its newest version of the flagship software? I am baffled.

I long for the day when Oracle has to post a page on OTN that says:
We apologize for the lack of patchsets in the past year; there simply has not been enough bug fixes/patches to justify a large patchset

And to beat another dead horse, how is it that all those bugs are "undocumented". Isn't there a more appropriate word for "fixes and patches for bugs that we are not willing to share any information whatsoever about"? "Unpublished" would be a better word. And granted, I am not going to read through all the "documented" bugs, let alone 1329. I just have a problem with "undocumented" bugs.

Did anything significant happen in the year 1329?

Thursday, July 12, 2007

Kevin Closson's "Manly Men"

Kevin has a number of articles about "Manly Men", or rather, folks who do not know better than to deploy RAC on complex configurations utilizing SANs. Like me. I encourage you to check out his whole series (most recent is Part VII).

After reading his thoughts and corresponding with Kevin, I am compelled to question why we have gone this route.

I hate creating SRs

I seem to create a lot of them. Oracle [Support] Configuration Manager is supposed to help with this. In my latest note to a Senior Customer Support Manager, it was recommended that I create an SR to resolve the issue where my Configuration Manager is not working correctly.

Just shoot me now.

I'll add more. Later. I am just feeling quite SR weary. What I really want is a really smart secretary who could initiate these for me. That way, the Oracle Support front-line could deal with my front-line, and we would all be happy. Right?

Tuesday, June 26, 2007

RAC: The big install

So, we are attempting our first "real" RAC install; not canned, not pre-fabricated, but using only software downloaded from OTN and following Oracle Documentation and various forms of cliff notes. This is one of those things that is really sweet if it works 100%. Otherwise, you are in for a headache. We have a headache.

That RAC class was good for teaching things, but it also perpetuates a false sense of security when things go wrong. And from what I can tell from all the notes and pleas for help out there, things go wrong often. One of the mantras I hear is to follow the documentation exactly! This is all good, but the documentation itself comes in many forms. Do you follow what Oracle has said, or do you pick some expert (K Gopal or Julian Dyke) and follow what they say? Cluster Verify (cluvfy) is also a little misleading; it will not check for 100% compatibility with the installed RPMs. In fact, I even had one Oracle Support analyst tell me that was the DBAs job. That is a lot to swallow. Take a DBA who does not know anything about Linux and tell him to verify that 24 RPMs not only exist, but are compatible with the required RPMs. I tried to write a script for it, but in the end, the only "failsafe" way to do it is by hand. I say "failsafe" because human error plays a large role in these RAC-related problems as well.

It would seem to me that one good way to eliminate, or at least reduce, human error is to automate. Dell IT has taken this to extremes and automates a vast majority of their day-to-day tasks. Checking the RPMs is just a small fraction of something that could easily be automated. What about user equilvalence? What about all those silly root scripts? Or running oracleasm to configure and create disks by hand? What boggles my mind is that 10g RAC does so much that is really cool and automated; when the sun shines, life is good! Why are some basic things left out, but you have some nifty tools like cluvfy that is really slick at verifying a good chunk of your install work?

Ironically, our CRS installation was hunky-dory. The rootpre.ksh was a bit weird (why is it checking for 9i CM??), and double-checking all the paths and homes is about the only thing that slowed us down. Things went south when it was time to install ASM. Our first warning flag was that the swap space was not big enough. Thinking it was a red herring, we ignored the warning. Later on, after the software was installed and the configuration assistants were running, we hit our first major roadblock; link not satisfied on njni10. Not much that seem relevant on google or metalink. Oracle Support told us to attempt the installation again. Now think about this; the analyst assigned to us specializes in NetCA (that is why we filed the SR). This guy tells us to simply re-install ASM. Having had ASM problems in class, I was not exactly happy about that. Remove Oracle Homes, zero out raw disks, make sure no processes are running, and away we go. This time around, ASM cannot see all the disks. So when I tell my support analyst that we have new problems, he has to bring in a database specialist because the original guy does not know anything about ASM. What a joke! On top of that, he "reminds" me to keep the scope of the SR to one issue. GRRR!!! Of course, we are subjected to the usual onslaught of new questions and request for an RDA. I am actively ignoring them. We were able to work around a large number of our problems, but in the end, we want to simply wipe the slate clean and start over.

Deleting everything and wiping the slate clean is not easy. No sir-ee. This is where having root privs come in really handy, because of someone's ultimately wishful thinking, the CRS Oracle Home is installed with root as the owner. By default, oracle does not have any privileges to remove or modify anything in the directory, and only limited privs to execute anything. For instance, running crsctl evokes a "not enough privileges" error. Not to mention the slew of root-owned processes (crs, css, emv) that have to be dealt with.

What fun.

On a separate note, we were supposed to have a webinar with our ERP vendor (SunGard Higher Education, or SHE as some say) on the topic of Oracle RAC. *cough cough* I went with the intention of mildly heckling them, but they had technical difficulties with the virtual presentation. Sounds like even putting the letters R-A-C on something is prone to make it break. *grin*

Seriously, though, I know we will not be moving towards RAC any time soon for our production ERP system, and I am very curious to see how other schools manage it. In a morbid sense, I am also curious if they are buying the line from some sales person about how it will help their system, or some form of HA. RAC looks great on paper, but after scratching the surface as I have, it ain't all that pretty underneath. Don't get me wrong, as I mentioned earlier, it does a lot of cool stuff, and it does it well. But there are two sides to that coin, so it would be wise to keep things in perspective.