Tuesday, May 22, 2007

Before I forget

We have been dealing with a number of issues here; a whole slew of number one priorities. =) So quickly, aside from Global Campus and all that fun, here are some other things we need to dig into:

1) A solid method (I hate when people say "methodology" when they could simply say "method") to apply Oracle Critical Security Patches (aka, CPU). Just came across the Best Practice White Paper (March 2007). One of the little gems I was alerted to (thanks Job!):
You can detect potential patch conflict in advance by running the following command:
opatch apply -silent -no_bug_superset -report
The –report option detects any conflicts without applying patch.

2) Clean up the state of our Backup & Recovery policies. I am tempted to say we need to rewrite the whole thing, but I do not think it is that drastic. We probably should look into incremental backups more closely (see previous post), get more aggressive with rman-to-tape backups in light of ASM, and make sure we are all good to go on all types of Recovery.

3) The Grid cometh..... Get more comfortable with Enterprise Manager Grid Control. I referenced Logan McLeod earlier; I think he gave us a little present in terms of documentation. Maybe. But I can not talk about it.

Monday, May 21, 2007

Onsite with Dell

Last week (May 16th), we met with a number of technies at Dell. Overall, I think it was a good experience, and I am glad that we as a group were able to interact with some folks "in the know". While I think it was good that we went, there were some downsides that overshadowed our time.

The Dell PR folks did a great job of making a wonderful first impression and reaching out to us. Kudos to Rooney Thomas; I personally thought the organization and planning that went into our itinerary was excellent and was by far the best I have been treated. I am not merely talking about a most excellent dinner at Eddie B's (you gotta check that place out!!) and steep discounts at a super-friendly, classy and very comfortable Marriot. I am also talking about the networking that Rooney did to pull so many folks together for a little chat. Unfortunately, some of those folks were... how do I say this politely? Not exactly needed. But I was impressed with Rooney and the agenda he set up. Along with him were Craig (last name withheld because I simply forgot) and Carey Dietert (more techy/PR guys). I liked Carey - I felt he has a great grasp on the Oracle/Dell relationship, understands the technology really well and knows how to more forward. On top of that, he is a good communicator. There was actually an Oracle representative, but he came late, didn't say anything and left early. He handed out his business card, which I kept for the sole reason of mentioning in this blog and am now tossing away. Apparently he is some "Business Development Manager" for the Dell/Oracle Partnership. Into the bit bucket that one goes.

I will start with the bad and go from there. Rooney pulled together some senior folks in the IT department and about half of them could have stayed in their cubicles (or office or whatever). Take Peter (last name withheld to avoid flash mobs, death threats, etc). Even though he has a couple certifications (Microsoft being one of them *grin*), he is a "Development Engineer" and works with a number of Oracle technologies. Problem is, he was just plain wrong on a couple accounts. Other times, I had no idea why he was even talking. He liked to listen with his eyes closed and his arms crossed. Know the type? Now there are two cards adorning a little basket under my desk.

Another was a DBA-type. I did not get his name, but he presented something early on. Powerpoint and a talking head.... not my style. Fortunately, a lot of questions were asked during his presentation which helped to keep things interesting. Sometimes. A lot of tangents.

Aaron Burns showed up representing the PeopleSoft, and now SunGard BANNER, side of their collaborative testing efforts. He made a relatively (keyword, relatively) impressive pitch, and I have an email in to him. Still waiting to hear back, but I would love to know more. My boss says he saw the presentation at the SCT conference (SUMMIT) and discovered there were a lot of "smoke and mirrors". I am holding out hope that there is actually something worth mining in this contact.

So with all that, I have painted a picture that makes it seem like our trip was a complete waste of time. Well, for some of the Dell folks, I think it was. I got the impression that there was a common misconception that we were possibly considering a migration from our current E10k platform running Banner which is at the core of the University of Illinois to a Dell/Linux/RAC solution. Yes, I am sure they salivated over that. Imagine the almost audible bubble-popping sound when we made it very clear that we were simply considering a 2-node deployment for a infantile yet highly visible and somewhat risky online classroom concept.

In my opinion, the meat and potates were covered by the other folks that came out to play. I will fill in their names when I get them. One dude walked us through their operational centers, exhibiting rows upon rows of rack-mounted hardwired all to various kvm switches. In another room, rows up on rows of desks where employees were connecting to various numbers of the servers and testing things: RAC, Microsfot, PeopleSoft, etc. He gave us a live-demo of killing one node of a two-node RAC which seemed pretty canned to me. I think it was good for the others to see, as I saw something very similar at IOUG. A very simple test in a very controlled environment, yet practical for all that. We were able to ask a lot of relevant questions, talk to the engineers doing the work (those that were with us) and learn a lot just from the adhoc discussion.

For the last couple hours, we crammed into a room with a an offering of sandwiches and sides, engaging in more "Question and Answer". Another Dell Engineer joined us, and he seemed to be very knowledgeable, able to answer technical questions. There we hashed out ideas and a rough road map of how to proceed. Dell has been down the RAC road so many times they now have an image you can download into a CD, which can be used for a local "Big Bang" to jumpstart a typical install. They also have copious documentation, some of which they are willing to share, and a butt-load of advice. Logan McLeod joined us near the end and clarified a few things for us, but also painted a very realistic picture; a number of big shops are already running RAC, and more are heading that way. I have said this before, but he really likes to automate things, and in the context of what he said, I am a new convert. Automate the procedures to deploy a new databsae or new Oracle software. Automate and consolidate monitoring. And the very first step is to establish standards, which drive and focus the automation.

What did we come away with? I think some definites are the use of ASM and Logan practically praised the efforts being done with 11g and is rather confident that 11g will be an unusual first release in that it will not be as buggy as most of Oracle RDBMS first releases. Obviouisly, we will let others test the water a little bit, but I think is the way to go for now. Also, there is some interest to use Dell's cookie-cutter install, even if only to see how it tastes. We also came away with a better idea how to carve up the disk, and more realistic expectations; Linux kernel patches still require database-wide downtime, even with RAC.

I hope to add more later. We have a meeting tomorrow and I look forward to hear what the others have to say.

In closing, I was really impressed with Dell. They have some really smart people doing really smart things. They are confidently on the edge of Oracle's technology, and they have established their confidence in rigorous testing. Dell is aggresively going after the market niche of customers who want to stop paying exorbitant license fees for colossal hardware. Granted, RAC is not necessarily the "best of breed" for everyone; I think too often the technology drives the SLA, instead of the SLA driving the technology. It will be interesting to see how this relationship matures.

Thursday, May 10, 2007

service names; an old feature that has matured

Reading through Jeremy Paul Schneider's Oracle Services paper (not yet publicly available). He talks a bit about how Oracle Services has grown over the years, and the impact of services in today's 10g databases. I am convicted, caught red-handed, because I was largely clueless about the important of service names. Thankfully, we switched most of our connection strings over to service_names in the past due to peculiar connection issues, but I see that we still have a few connection strings using SID. Need to change those.

I like Jeremy's paper; good humor, easy style, combined with intelligence and well-researched ideas. One of the drawbacks to reading good papers is that it makes me feel artificially smart. *grin* I am the type of person that learns slowly from books and usually needs to reinforce everything by getting my hands dirty.

I like the references as well. Quotes from Oracle Documentation, himself, his colleague Dan Norris (who sits 20ft from him), not to mention other experts in the field like S. Kumar, J. Morle and HOTSOS (Cary?).

Jeremy posted the paper on his blog:

Wednesday, May 09, 2007

Reading the documentation: incremental backups

Some smart people have recommended that I simply read the documentation from front to back. Tom Kyte suggested that when he was in town for our local IOUG, and Job Miller has mentioned it a few times. Reading is hard, especially a technical document. Even if you only focus on a few of the books; my brain is well-trained to expect sci-fi/fantasy when I read. *grin*

There is a ton of good stuff in the documentation. I have to give a standing ovation for the folks that write them. Yes, sometimes there are errors, or an old section is apparently skipped over during a version upgrade. But by and large, it is good stuff!

Just today I was reading about incremental backups. And I came across this little feature I had not realized before:
To reduce backup sizes for NOARCHIVELOG databases. Instead of making a whole database backup every time, you can make incremental backups.


Wow, that is quite handy. So if you are ever tempted to put a DEV database in archivelog mode just to get point-in-time recovery, you can go with incremental backups instead. How cool is that?

Of course, there are a lot of other handy ramifications of using incremental backups (ie, a precurssor to snapshot standby's, you can convert your physical standby to a reporting database and then convert it back at the end of the day).

Tackling the 10g dataguard GUI

After hearing all these great things about the dataguard management in 10g EMGC (our shorthand for Enterprise Manager Grid Control) and having been badly scarred from the 9i dataguard implementation, I just had to try this out.

For starters, I tried to create a database. Strangely, EMGC which can clone Oracle Homes, deep-drill into performance problems, manage tablespaces, backups, workloads, stats, coffee cups, user objects, and can slice-n-dice the universe, this EMGC... I could not find a button to create a database. Maybe I missed it. Which is easy to do in EMGC, because it has so many features (most of which work quite well), so many links and buttons and gizmos and ways to get to where you think you want to go. But I could not find "Create Database". So I opted to clone instead. I was not disappointed, either. The prompts and progression of screens was very intelligent, logical and reassuring. I have found that to be the case with a number of "advanced features" in EMGC; of course, there are times when you come to expect such great performance and useability and you run head-first into a brick wall of errors. But I'll get to that in a second...

Database, check. Navigate to the Data Guard section under "Management", and the feeling that you are being well cared for continues. Create Standby. Oops, no spfile. EMGC reports that you need to create an spfile. Strange that the GUI does not offer to do that for you. It is really simple; "create spfile from pfile;" and reboot the database. Ok, spfile, check. Create Standby.

What does that mean? No details, no message, no mention of a log file. This is my most frequent and biggest complaint with EMGC to date. They often forget to inform the user where they can look for more details. "Go to the log file" you would think. This is EMGC, which of the many logfiles do you want to look in? I do some looking around and spot emoms.log, which has a couple more characters, but still quite useless:
2007-05-08 15:08:15,273 [EMUI_15_08_15_/console/database/dataguard/create] ERROR em.dataguard validate.1125 -
CreateBean: ClassNotFoundException: null
2007-05-08 15:08:15,286 [149::EMUI_15_08_15_/console/database/dataguard/create] ERROR jobs.dbclone executeQuer
y.173 - TemplateUtil.executeQuery(): SQLException: Closed Connection SQL is
SELECT status FROM v$instance

So, gritting my teeth, I file an SR. I should have a separate blog entry on that. Filing SRs for OMS is not that fun. But... let us not digress anymore.

So after 20 hours, my SR is still unsolved. Fortunately, I did some simple experimentation and found that if I put the database in archivelog mode, the standby creation is just hunky dory. So again, why would the GUI not do that? Very simple to do. And why the ultra cryptic error?

Archivelog mode, check. Create Standby; finally, the GUI goes to the next screen and asks for some specifics. I want a standby on a remote host, using ASYNC LOG transport (MAXIMIZE Performance). I fill in passwords and click Submit. I am treated with a status of sorts ("View Job") and after 10 minutes or so (yes, a small database), I have a fresh, hot out of the oven standby database with a dataguard broker all set up. Cool beans!

The dataguard management screens are adequate. You can change the standby mode between Protection, Availability and Performance with a few clicks; these operations do a lot of work in the background, like create standby redo logs and reconfigure both sites. Very nice. There is an option to run a test application that generates a bit of redo (1.5 mb/sec in my case), which is another nice feature to do a really simple load test of your configuration. When in MAXIMIZE PERFORMANCE mode, I could not find a way to switch between ARCH transport and REDO; again, another easy operation that I would expect to see as part of the management screen.

Overall, I am pleased with this version of the dataguard GUI. I still have an itch over those errors, but at least the major objective was accomplished with no bloodshed. I plan to play around a little more with Failover and Switchover.