Thursday, December 04, 2008

Upgrading clusterware/ASM from 10g to 11g - a complaint

I have a RAC setup with 3 environments; DEV, BCP and Production. So, starting with my development cluster, I made an attempt to upgrade the clusterware and ASM from 10g to 11g. That was quite painful. I will not spend much time complaining about the documentation here (and there is a lot I could say *grin*). But allow me to take a little tangent.

I collected all my "constructive criticisms" and suggestions into what I hoped was a cogent document and fired it off to Oracle Support and Erik Peterson of the RAC SIG. Erik forwarded my note to a document manager, and his response was overwhelmingly encouraging and very understanding. The Oracle Analyst in Support that picked up my case was also quick to understand my frustrations. Both of these two different groups were able to collaborate and redeliver a nutshell document that was supposed to supplant/supplement the online documentation.

For the clusterware upgrade, they hit it on the nose. I was able to upgrade the clusterware in my BCP environment with no problem. Well.... I admit, I am still very confused about the CRS and ASM software owners. There seems to be a push to use different users (in both cases) other than the normal "oracle" user. There is not much justification in the documentation, IMO; rather a bland "for security reasons" statement.

The ASM upgrade was a little more tricky. All the documentation that I have, including the new nutshell cookbook, indicate that ASM can be upgraded in a rolling fashion. That is a lie! *grin* I have a new SR open that is exploring this little nuance (or not so little, depending on how much time you have spent trying to wrap your head around this). Apparently, ASM Rolling Upgrades are available in 11gR2. Oh..... well, guess I'll just go download 11gR2 then. Wait - it is not available yet?!? So the documentation lists steps that are not applicable to any version that is currently available? Did I read the right?

So confusing. The good news is that I was able to complete the BCP upgrade with less pain. Since the documentation strongly suggests that one create a new ASM user, there are a ton of manual steps to follow if you choose to go that route. However, smart DBA that I am *sly grin*, I kept the same user and was able to use the DBUA to go from 10g to 11g ( to - that was sweet. Why would I want to switch users and go through many manual processes? What is the rationale for that? The patchset upgrade from to was less friendly. As I mentioned above, all the documentation states that you should do a rolling upgrade. This is not possible. So I hacked my way through; had to shutdown abort my instances which left shared memory segments hanging around that I had to manually kill off. Not a big deal, but the error messages can be very confusing (ie, ora-12547 TNS lost contanct).

I am looking forward to the document changes that will result from my contacts with Oracle.

Thursday, September 18, 2008

A Tuning experiment

As I learn more about the CBO and Banner, I start to wonder about "system tuning". I like Cary Millsap's "Method R" concept; I was thinking about how to adapt that in the narrow field of sql statements. If, for example, I turn set statistics_level = ALL, I know I can generate rowsource statistics at the cost of perhaps 5% overhead to my system. Doing this in a development environment, I could utilize Wolfgang Breitling's "Cardinality Feedback" to find sql statements that are not performing optimally, and maybe even rank them by degradation. Maybe.

Why would I want to do that? Good question. In and of itself, it is a hard sell if my only purpose is to go on a witch hunt and try to tune the system purely from the sql. However, I think the bigger advantage is to use the various sql statements as teachable moments, a way to develope real-life examples for our DBAs that demonstrate how our application(s) behave, and how to attack "slow" statements. The fortunate side-affect is that we end up making some things faster.

Just an idea. I have not been able to drill down into 11g's automatic tuner yet, but I have to believe it is using something similar to cardinality feedback. If memory serves, I believe it does spend some extra overhead reconsidering previous sql plans in an attempt to find those that are not optimal (or that crazy phrase "suboptimal"). Now I have to go back and read up on that again just to refresh myself....

Wednesday, August 27, 2008

Character Sets

I have been tasked with presenting on the topic of converting the character set for our Banner system databases. We have been through the exercise several times, which has been quite an eye-opening experience in itself (The DataPump Index work-around, dbms_pclxutil.build_part_index, creating tablespaces). Essentially, the vendor has decided in all its god-like wisdom to force all customers to start using a multibyte Unicode character set that will support international characters (AL32UTF8) when they upgrade to the latest version 8. Yea, not very practical for us at a Big Ten school.

Anyway, I have taken this thing to heart and have been learning a lot about codepoints and various facts about character sets. We will be converting from WE8ISO8859P1 to AL32UTF8, a single-byte ISO 8859-1 West European somewhat basic (eg, simple) codepoint to the latest'n'greatest multibyte grandaddy of Universal character sets. The big issue is making sure that all the data stays the same, even if physically it is stored a little different. Along the way, I have had to come to terms with phrases such as "Lossy Data" and "Truncation Data". April Sims has been a great help in answering some questions and getting me to focus my searches. In the end, Metalink note 252352.1 laid out an example that finally opened my eyes for Lossy Data.

I was having a hard time understanding why one must go through WE8MSWIN1252 first to resolve lossy data. And what was meant by the notation in Metalink Note 260192.1:
if you have a WE8ISO8859P1 database and lossy then changing your WE8ISO8859P1 to WE8MSWIN1252 will most likly solve you lossy [sic]

Why? Well, the Euro example in note 252352.1 is great. To paraphrase, the Euro is stored as codepoint 128 in WE8ISO8859P1. In this case, that character set essentially has an empty hole there, no previous definitions for that codepoint. So if one were to attempt to select it in some kind of "native" WE8ISO8859P1 environment, there would be nothing. So how did it get there? It is passed in from a different character set. And if using that same character set on the client side, you can easily retrieve the correct symbol.

Since WE8WSWIN1252 is a superset of WE8ISO8859P1 (meaning it has all the same codepoints for existing symbols, and then some), and it is most likely that a client in the US inserted data with a US Microsoft character set, the codepoint will be mapped to a valid symbol. Since that codepoint is "valid" in WE8MSWIN1252, it can then be converted to any other character set that also supports that symbol.

Where this falls apart (the "likly" part) is if the client using a "foreign" character set inserts a symbol that is mapped to a completely different symbol (via the codepoint) in our target client character set. In that case, you are screwed. =) Unless you would want to convert that data manually for each case of incompatible character sets. Not I.

Truncation data is a little easier to grasp. Metalink note 119119.1 mentions the English pound sign (£):
The pound sign for example is indeed 163 ( A3 in hexadecimal) in the WE8ISO8859P1 and WE8MSWIN1252 charactersets, but in AL32UTF8 the pound sign is code 49827 (C2 A3 in hex).

In our case, the fix is handled by setting nls_length_semantics=CHAR (opposed to the default BYTE). We also found out through experimentation that nls_length_semantics is completely ignored for SYS and SYSTEM objects; in fact, the data dictionary is always with BYTE semantics. Interesting, eh?

Metalink Notes:
  • 260192.1 Changing WE8ISO8859P1/ WE8ISO8859P15 or WE8MSWIN1252 to (AL32)UTF8
  • 225938.1 Database Character Set Healthcheck
  • 252352.1 Euro Symbol Turns up as Upside-Down Questionmark
  • 279068.1 Usage of Character Set Scanner Utility (csscan) version 1.1
  • 137127.1 Character Sets, Code Pages, Fonts and the NLS_LANG Value
  • 241047.1 The Priority of NLS Parameters Explained
  • 158577.1 NLS_LANG Explained (How does Client-Server Character Conversion Work?)
  • 179133.1 The correct NLS_LANG in a Windows Environment
  • 119119.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications

Oracle® Database Globalization Support Guide 10g Release 2 (10.2)

Friday, August 01, 2008

Software Configuration Manager

I learned that Oracle Support has a couple Oracle blogs up:

I think I was just checking on them when I noticed that the supportPortal was being a little more aggressive looking for more feedback. Over the years, I have participated in many of the Survey's that Oracle sends you after you close an SR, and have made no small number of comments (both good and bad) to a growing number of contacts I have established within Oracle, so this seemed like yet another opportunity to voice my opinion. I think I do that much more when I get the impression Oracle is actually listening; I don't share the same thoughts here because, well.... I highly doubt Oracle is reading this. =) No offense.

So let me blur the lines a little. First, I think Oracle is doing a much better job at collecting client feedback. However, the more critical part is what they are doing with that feedback. Unfortunately, for us end users, those comments seem to disappear into a black box, never to surface again (much like Enhancement Requests). I chatted with some really nice folks at Oracle (Kristal and Cherryl) and they both reassured me that Oracle is taking our comments seriously, and in fact hinted that the ongoing evolution of Metalink 3 is largely due to user feedback, and even newer things are coming down the pike.

Taking the invitation from the SupportPortal blog seriously, I called up for an appointment and got hooked up right away. I think the lady running the feedback session was working at home (heard a small child in the background), and she had some setup work to accomplish before we got down to it. But 90 minutes later I had made numerous observations about many aspects of the Software Configuration Manager and Metalink 3 in general. It was refreshing (and exhausting) to not only state some things I either liked or did not, but also to demonstrate via a recorded web session. Very satisfying; I recommend this to anyone who deals with metalink on a frequent basis.

So, what did I say? In general, I like the eye-candy of the new interface, but it is also a little slow and sometimes downright confusing. For instance, there is no way to bookmark bugs. The options to configure the layout are a nice thought, but still a little limited; it reminds me of how limited blogger is compared to WordPress, if that helps at all. Don't get me wrong, I think it is quite handy to configure the layout, but I really look forward to improvements. The key point, Software Configuration Manager, is a great concept. Unfortunately, I have some political roadblocks to fully utilizing SCM with my employer, which makes it a little frustrating when attempting to comment on the SCM because it is not SCM's fault at all. =) We require that we either have separate CSI numbers, or implement some sort of access to control such that not all our end-users have equal access to a site-wide license. But if we ignore that for a little while, I think SCM is awesome and has a lot of potential. Just the fact that it helps speed up SRs significantly is a big bonus. I have heard some misgivings about allowing Oracle to retrieve information in such a fashion, even if only passively, but I personally do not have an issue with it.

I do not like how some of the questions are completely inane. For instance, 99% of my SRs are for the Oracle Server - Enterprise Edition. 99% of the time I forget what exactly my "Product" is called and have to hunt for it amongst a list of several hundred product options, most of which we are not even licensed for. Why? What about a list of 10 most recent products for my CSI? What about a list that only contains products I am licensed for? And when attempting to find a category that somewhat matches the subject of an SR, I hate having to select one from the list of 30 or so. I want a searchable list. And a fast search at that; the LOVs in the patch pages really suck.

I concluded by saying that it would be vitally important to me see how Oracle is using this feedback. I have repeated this message to my Oracle contacts and with nearly all feedback I give to Oracle. I long to see what difference I make. Don't you?

Kudos to the teams involved in the feedback gathering operations. And BIG PROPS to Cheryl, Joey and Kristal (all of Oracle).

Thursday, June 26, 2008

create tablespace ddl

I recently went looking for ways to recreate tablespaces, as we prepare to migrate our production database to a different character set (necessitated by the vendor, grrr). DBMS_METADATA is really cool, and I like it alot. Data Pump essentially uses DBMS_METADATA (I believe), so I thought one of those cool tools would be the way to go. Close, but not quite. If you do a complete export/import of your database, you end up reorganizing all your data, in essence defragging the whole thing. Which is good (nay, GREAT!). But Data Pump only lets you create tablespaces with the original datafile sizes.

Granted, considering that this is Production, prudence would say that one will have tested this several times (as we are currently doing) before actually running the process in Production. And in doing so, one will know exactly how large the files should be, and can in fact extract the ddl (metadata api, data pump, TOAD, etc) from the test database. As a simple exercise, I still want a way to generate slightly customized tablespace ddl. So I ended up writing my own.

Some of the key things I wanted was to be to dynamically adjust the initial sizes of the datafiles. Also, for the sake of portability, I added a runtime parameter for maxsize. I am still torn about LMT being locked into UNIFORM, but since that is all we use here, I am not too worried for us. =) But I do want LMT and ASSM to be default; for 10g, I see no reason to turn these off at all.

Blogspot is going to horribly screw up the formatting for this, and I have not found a way around it. Sorry.

REM create_tablespace_ddl.sql
REM Charles Schultz
REM 25-Jun-2008
REM Generates tablespace DDL
REM Assumes the following:
REM - All datafiles AUTOEXTEND, initial size = half of current, or size of initial extent, maxsize defaults to 4000M
REM Retains INITIAL_EXTENT settings for LMT
REM Builds 5 inline materialized views
REM - tablespace_specs: gets the data from dba_tablespaces and dba_data_files, ordering by file_id
REM - first_ddl: generates the initial "CREATE TABLESPACE" statement, along with the first datafile
REM - last_ddl: generates the tablespace options (LMT, ASSM)
REM - mid_ddl: generates statements for any datafiles that are not the first
REM - tablespace_ddl: Combines all ddl, ordered by tablespace name and section
REM Limitations
REM - Does not allow for BIGFILES, but this is a trivial change
REM - Does not allow for LMT Autoallocate. Again, not a difficult modification

PROMPT This script will generate tablespace ddl, making the following assumptions:
PROMPT - All tablespaces are Locally Managed (LMT) with UNIFORM sizes
PROMPT - All tablespaces have automatic segment space management (ASSM)
PROMPT - All datafiles autoextend

accept sqlfile default 'create_tablespaces.sql' char prompt 'Enter name for generated sql file [create_tablespaces.sql]: '
accept initial_size default '50' char prompt 'Percent by which to shrink datafile initial sizes, 0 being none [50]: '
accept max_size default '4000' char prompt 'MAXSIZE of autoextend datafiles, in megabytes [4000]: '

set head off newpage 0 pages 0 verify off feed off lines 200
spool &SQLFILE
with tablespace_specs as
(select row_number() over (partition by tb.tablespace_name order by df.file_id) rn_up,
row_number() over (partition by tb.tablespace_name order by df.file_id desc) rn_down,
greatest(df.BYTES*((100-greatest(least(&INITIAL_SIZE,100),0))/100),tb.INITIAL_EXTENT) file_size,
from dba_tablespaces tb,dba_data_files df
and tb.tablespace_name not in ('SYSTEM','SYSAUX','USERS','TEMP','TOOLS','UNDOTS')
first_ddl as
(select tablespace_name,
1 section_num,
'create tablespace '||tablespace_name||' datafile '''||file_name||''' size '||file_size||' autoextend on next '||increment_by||' maxsize '||&MAX_SIZE||'M' text
from tablespace_specs where rn_up = 1),
last_ddl as
(select tablespace_name,
3 section_num,
from tablespace_specs where rn_down = 1),
mid_ddl as
(select tablespace_name,
2 section_num,
','''||file_name||''' size '||file_size||' autoextend on next '||increment_by||' maxsize '||&MAX_SIZE||'M' text
from tablespace_specs where rn_up > 1),
tablespace_ddl as
(select text,tablespace_name, section_num from first_ddl
select text,tablespace_name, section_num from mid_ddl
select text,tablespace_name, section_num from last_ddl
order by tablespace_name, section_num)
select text from tablespace_ddl
spool off

Wednesday, June 11, 2008


Tom Kyte popularized the STRAGG function, a sometimes critical piece missing from Oracle's arsenal of cool commands, has been bandied about quite a bit:

We were recently tasked with using a STRAGG-like function but with dynamic delimiters. Tom Kyte's standard STRAGG does not handle that. Enter Adrian Billington, who took it one step further (look for "10g goes one further"). I really liked his example, and it caused me to research the COLLECTION functionality a little deeper. A little - I do not claim to have mastered it by any means.

Unfortunately for us, we seem to be hitting bug 4381035 (not exactly sure about that, but the best hit I can find). This is under

Type created.

nt_in IN stragg_type,
delimiter_in IN VARCHAR2 DEFAULT ','
) RETURN varchar2 IS

v_str varchar2(32000);
v_dlm VARCHAR2(10);


v_idx := nt_in.FIRST;
v_str := v_str || v_dlm || nt_in(v_idx);
v_dlm := delimiter_in;
v_idx := nt_in.NEXT(v_idx);

RETURN v_str;

END stragg;
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Function created.

BANDEV_SQL > drop table test;

Table dropped.

BANDEV_SQL > create table test (a char(2), b char(1));

Table created.

BANDEV_SQL > insert into test values ('aa','1');

1 row created.

BANDEV_SQL > insert into test values ('aa','2');

1 row created.

BANDEV_SQL > insert into test values ('aa','3');

1 row created.

BANDEV_SQL > select a, oracle.stragg(CAST(COLLECT(b) as ORACLE.STRAGG_TYPE)) b from test group by a;
select a, oracle.stragg(CAST(COLLECT(b) as ORACLE.STRAGG_TYPE)) b from test group by a
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected - got -

This was a bummer. After a bunch of reading and experimenting, I finally hit upon something that seems to be working:

BANDEV_SQL > select a, oracle.stragg(cast(multiset(select b from test) as oracle.STRAGG_TYPE),'|') b from test group by a;


I am not exactly sure why MULTISET is working better than COLLECT in this case; it seems to me that both should be producing nested tables.

Friday, May 16, 2008

I am digging this learning thing

One of the mixed blessings of Oracle is that there is always something you can learn, no matter how good you are, or how much experience you have in the field. And I still have a lot to learn.

This week I had the unique pleasure of speaking directly with an Oracle Developer about an SR I had file in regards to query. In fact, this query is what lead me down the path of last week's discovery. We have a 3rd party ERP (Banner from SunGard), and one of the tables is GJBCOLR, a "collector" table for various jobs. My understanding is that it used like a scratch table, and jobs will routinely insert rows, work on them, then delete them. Sound like a TEMPORARY table at all? I am told that SunGard needs this to be a permanent table for some reason, and I am still trying to hunt down that reason. But I digress.

This table routinely has a high water mark. In the SR I filed for Oracle, we observed that the stats pegged it with 21 million consisten reads (HHWM), even though it really had about 290 rows. Sure, we can drop the HHWM (ie, truncate table, alter table shrink, etc) as explain in my previous post, but the table will just grow again due to how Banner uses it. So my next thought was to delete the stats and force dynamic sampling. To my horror, the CBO was able to find a low cost FTS, favored over a Index access by a factor of 10. According to my developer buddy, he highly suspects this is due to a bug in dynamic sampling, since the 10053 trace shows the dynamic stats put the number of blocks at an incredibly low 95. We are talking about OSEE, and he found a bug fix in that is only slightly related but he firmly believes it probably affects the same thing. One of the workarounds I wanted to run past the developer is faking the stats, putting them enormously high so that indexes are always favored, and he tentatively agreed. So hopefully that will solve our problem.

However, realizing what a rare opportunity I had to be talking to a developer, I asked him a few questions about the optimizer and the particular query we were looking at. I am sure a month from now I will be smacking my head about not asking him other questions, but hey.... =)

The query plan looks like this:

| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
| 0 | SELECT STATEMENT | | 210K| 50M| | 14353 (6)|
| 1 | SORT ORDER BY | | 210K| 50M| 106M| 14353 (6)|
|* 2 | FILTER | | | | | |
|* 3 | HASH JOIN RIGHT OUTER| | 210K| 50M| | 2412 (18)|
| 4 | TABLE ACCESS FULL | FTVDTYP | 54 | 378 | | 3 (0)|
| 5 | TABLE ACCESS FULL | FFBMAST | 210K| 49M| | 2387 (17)|
|* 6 | TABLE ACCESS FULL | GJBCOLR | 1 | 50 | | 29 (4)|

Predicate Information (identified by operation id):

2 - filter( EXISTS (SELECT /*+ */ 0 FROM "GENERAL"."GJBCOLR"
6 - filter("GJBCOLR_VALUE"=:B1 AND "GJBCOLR_ONE_UP_NO"=4549363 AND

I include the predicate information because that becomes important later on. Not knowing much about the FILTER operation other than that it works on two children, I asked him how it works. It is almost exactly like a nested loop operation. However, he specifically said the Hash Join is a source and the GJBCOLR subquery is a probe, which indicates to me a hash join, not a nested loop. But, like a nested loop, each result of the HJ child is compared to the GJBCOLR subquery, hence the subquery is executed 210K times. I asked why the HJ child is not second (as opposed to first), and he mentioned it had to do with how Oracle used part of the GJBCOLR subquery as a filter predicate. I am not entirely positive why that is the intended result, but I hope to chew on it a bit more. As you can see, the subquery was transformed into a EXISTS clause (not present in the original), and I believe that is part of the answer.

Now if only I could talk to a developer everytime I file an SR.....

Friday, May 09, 2008

Learned a couple things this week

We had a couple examples of how a high watermark affects FTS, especially when that FTS gets executed on the wrong side of a FILTER operation. Which got me to delving into the black magic of high water marks.

First off, Oracle only mentions them in passing. If you search for LHWM (low high water mark), you will get a couple hits in metalink for a bug and nothing in the documentation. So, what the heck is a low high watermark anyway? And a high high watermark?

Apparently, after a couple lessons from K Gopalakrishna, Ric Van Dyke and couple others, the LHWM is the boundary at which all blocks below it are gauranteed to be formatted and used at least once. Above the HHWM we are gauranteed that all blocks are unformatted and never used. Between the two we have the "grey area"; namely, a mixture of blocks that have and have not been used. I believe there are two bitmaps (or two levels of bitmaps?), one that maintains the freelists for below the LWHM, and another for above the HHWM. On Full Table Scans, Oracle will scan everything below the LHWM, and then use the L2 bitmap to find blocks that need to be scanned above the LHWM.

Oracle seems to have gone to great lenghts to make these two watermarks as confusing as possible. In the documentation, and when using packages like dbms_space, we are led to believe there is only one HWM, even for an ASSM tablespace.

Thankfully, folks have discovered "dump file". Norman Dunbar gives a couple quick examples. This is good stuff; there is a lot in the dump files that would warrant a closer look. But for right now, let's look at the watermark bits of information:
alter system dump datafile 5 block min 1 block max 100;

Extent Control Header
Extent Header:: spare1: 0 spare2: 0 #extents: 1 #blocks: 8
last map 0x00000000 #maps: 0 offset: 2716
Highwater:: 0x01403030 ext#: 1 blk#: 7 ext size: 8
#blocks in seg. hdr's freelists: 0
#blocks below: 7809
mapblk 0x00000000 offset: 1
Disk Lock:: Locked by xid: 0x000a.013.000003a4
Low HighWater Mark :
Highwater:: 0x01403030 ext#: 992 blk#: 7 ext size: 8
#blocks in seg. hdr's freelists: 0
#blocks below: 7809
mapblk 0x01402aa1 offset: 177
Level 1 BMB for High HWM block: 0x01403019
Level 1 BMB for Low HWM block: 0x01403019

This is not necessarily the world's best example, but heck, it is raw data. =) I have yet to decipher what all that means, but I am working on it. I believe that the LHWM "blocks below" will always be less than the HHWM "blocks below", but the difference would count the number of blocks in that grey area.

As an aside, I also found that you can dump all the file headers with:
alter session set events 'immediate trace name file_hdrs level 10';

The level is a bit confusing, as I tried various numbers and the resulting trace seem to be exactly the same.

On the RMAN side, we came across a peculiar curiosity. With no recovery catalog, if you recover a controlfile and have archive logs beyond the time of the controlfile, we found that on OSEE, RMAN is not smart enough to roll forward if you specify an "UNTIL TIME" in the future relative to the controlfile. Instead, you will see errors that resemble messages if the backup is corrupted; RMAN will attempt to go backwards in time to find a file that matches an appropiate SCN. Of course, it will never find one. This behavior is "fixed" in OSEE (we did not try any of the intermediate patch levels), and RMAN will now roll the database forward beyond the controlfile time.

Wednesday, April 16, 2008


While working on our datapump workaround, I stumbled upon dbms_pclxutil.build_part_index. This is a handy little bugger that has been around since at least 8.1.5, and has some interesting limitations. For starters, the said index has to be partitioned and unusable. Not a biggie, but why? I also found out (the hard way) that not only do you have to run the procedure as the table owner, but also the index owner as well. Or to put it in other words, you have to create the index in the same schema as the table, and run the procedure from that schema only. That rather confounds me; when I asked Oracle Support about this, I was given some schtick about how they can't have users running around building partitioned indexes on other people's tables. If you ask me, I am thinking some developer back in the early '90s wrote this handy piece of code, but made some assumptions, and those assumptions were never corrected going forward. Strange how after all these releases, it is still not very well documented.

ah well....

Friday, April 11, 2008

11g and DBControl, Part 2

After finding two bugs with 11g DBControl already (and that was just the installation), I am finding another basic problem. I am experimenting with RAT and attempting to replay a workload. The ability to export a workload from a database and replay it in 11g is a huge thing. If it works. If it works, we may have a huge demand for it; shipping off our workload to 3rd-party vendors (software and hardware) for benchmarking, internal testing of various upgrade options, etc.

So I set out to generate and capture a workload in a freshly upgraded database. First I have to set pre_11g_enable_capture to true, which is a little odd. Why is it default to off? From there, you can loosely follow the online documentation:
backup source database
capture workload
restore database elsewhere
preprocess workload
replay workload

I was good up until the last step. Since the topic is DBControl, I decided to use DBControl to preprocess the workload after I datapumped my source into a target 11g database. DBControl then directs you to hit a link to replay the workload by configuration and initiating worker threads (aka, Client Connections). My workload was a mere 20 minutes (just for an initial test). My "Client Connections" have been attempting to start for 14 hours so far. They would probably run indefinitely if I let them.

I have Oracle Support working on this, but I have to admit, all these little buggy problems with my first experience in 11g is not very encouraging. No, let me restate that - they are downright discouraging. The "base" realease is, which seems to indicate the 5th minor patch. How did these issues make it through 5 rounds of testing? I am rather confused. Perplexed. Mystified. Discombulated. Nothing like kicking the tires and finding that they are flat.

Just as a side note, I also ran a workload capture for 14 hours (different box). Oracle generated over 37 thousand files! Wow.

Updated 19:35 pm
Here are all the bugs that matched from WebIV (kinda scary):
3 Bug.6141226 (80) REPLAY HANG AFTER 10 HOURS CAPTURE Gen V111:
3 Bug.6046429 (80) TST&PERF - WRC HANG WHEN WORKLOAD REPLAY Gen V1110:

Thursday, April 03, 2008

Metalink 3 coming "next year"

While communicating with an Oracle Global Customer Care employee today, I learned that Metalink 3 is coming next year. This should be interesting.

I filed 3 feedback SRs today:
  • Allow Bugs older than 14 days to be views on Headlines
  • Remove "News & Notes" from headlines
  • Correct problems with "Certify & Availability" section of Headlines

I also some outstanding requests to view more than 365 days worth of SRs, and allow some sort of hierarchies within a single CSI.

Update Thu Apr 3 15:44:03 CDT 2008
From the horse's mouth:

Faster problem resolution
- Speed problem resolution with quick access to the information you need with our new personalized dashboard
- Facilitate faster resolution with a robust knowledgebase and powerful self-service tools
- Optimize system availability with enhanced Service Request (SR) management
Extended preventative, automated support capabilities
- Receive a simple, fast, easy way to maintain your Oracle systems
- Mitigate risk with simplified configuration management solutions such as Software Configuration Manager
- Proactively prevent problems and accelerate problem resolution with automated HealthChecks and Product Alerts
Increased personalization
- Utilize increased personalization and knowledge integration via OracleMetaLink 3's new user interface for powerful information management
- Personalize your homepage content for easier access to self-service tools and configuration data
- Quickly access your Service Request history with MySRs

Wednesday, April 02, 2008

11g and DBControl

After hearing and reading about so many of the wonderful new toys with 11g, I decided to give some of them a whirl; I tend to learn better when I can break it myself.

Unfortunately, I ended up breaking something else even before I could begin. When installing the software with the basic default options (starter database, DBConsole), emca blew chunks when trying to start dbconsole. This is not the kind of headache I wanted, but I am hoping to use this as a learning opportunity. Somehow. Hopefully I will be able to provide a resolution when this is over with.

For starters, the trace and emConfig.log are a bit hard to read. I did find some key occurences of "OC4J Configuration issue", so I turned to Metalink and google. Metalink was particularly unhelpful. The closest match I could find was Note 549079.1, which is good in and of itself, but 1) it says 10g, not 11g, 2) none of the 3 cases in step 5 applied (I tried all of them). I am still slogging through Google hits, but so far the only solutions I am finding are for 10g, and they do not seem to apply to 11g.

From emConfig.log
CONFIG: OC4J Configuration issue. /u01/app/oracle/product/11.1.0/oc4j/j2ee/OC4J_DBConsole_server1_TEMP not found.

Apr 2, 2008 2:12:05 PM oracle.sysman.emcp.util.PlatformInterface executeCommand
WARNING: Error executing /u01/app/oracle/product/11.1.0/bin/emctl start dbconsole
Apr 2, 2008 2:12:05 PM oracle.sysman.emcp.EMConfig perform
SEVERE: Error starting Database Control
Refer to the log file at /u01/app/oracle/cfgtoollogs/dbca/TEMP/emConfig.log for more details.
Apr 2, 2008 2:12:05 PM oracle.sysman.emcp.EMConfig perform
CONFIG: Stack Trace:
oracle.sysman.emcp.exception.EMConfigException: Error starting Database Control
at oracle.sysman.emcp.EMDBPostConfig.performConfiguration(
at oracle.sysman.emcp.EMDBPostConfig.invoke(
at oracle.sysman.emcp.EMDBPostConfig.invoke(
at oracle.sysman.emcp.EMConfig.perform(

Now, why would a required file not be found? Keep in mind this is coming straight out of the OUI, so I have this expectation that all required files would be automatically created.

I manually created the missing directories (recursive copy of existing directories) and the emca -repos create command was able to do a lot more. Now I get this really strange message:
CONFIG: Unable to determine local host vide /u01/app/oracle/product/11.1.0/server1_TEMP/sysman/config/ : No such file or directory

What is a "host vide"? Nothing on metalink or Google for that one.

UPDATE Fri Apr 4 13:56:56 CDT 2008
Resolved. See comments.

Thursday, February 14, 2008

Love/hate, Part I

I have this love/hate relationship with Oracle. Although I rarely hear it, I would find it hard to believe that anyone who works with Oracle for very long does not feel the same way in these days. But hey, perhaps I am wrong and the people who gush over Oracle really do just have a passion for the software.

Allow me to ramble a little. This is a blog after all.

On the plus side, I have learned a lot, and that Oracle RDBMS-specific education has really boosted my career. While I still have a long way to go, I have come a long way from the Sybase newbie I started out as, not having any clue what SQL meant, let alone what it was or how to use it. I gained a solid appreciation not only for all the hard work that goes into all the cool stuff that Oracle does, but the wealth of support from the User community as well. There are some really slick, intelligent and brilliant professionals out there who are not Oracle employees. And not merely a handful, but a throng.

Now a negative. The folks who are leading Oracle are way too ambitious in my opinion. They want to control everything. There is way too much emphasis on the bleeding edge, and not enough support for the "tried and true". I am a DBA, and my niche focus hardly ever veers off into other Oracle products like BI, AS, SOA or any other slew of acronyms. If the database server is truly Oracle's flagship product, it is not because of its superior quality, in my opinion, but rather because they charge an arm and a leg for support. I could write a whole book about Oracle Support.

Another positive thing. Oracle's RDBMS is fodder for some most excellent theoretical and scientific arguments. Watch some interchanges between Oracle gurus on the usenet, oracle-l or the Oracle forums; some of those discussions are worthy of a doctoral thesis. Oracle does some really mind-blowing things, especially in the areas of manipulating data. In general, the software is fast, scalable and rather solid.

And the pendulum swings. Oracle tries to do too much. Let's look at the RDBMS. 8i came out in 1999, 9i in 2001, 10g in 2003, in 2005, 11g in 2007. Each new release introduced hundreds of new features, and thousands of new bugs. The worst part, every two years a stable-but-older version is falling off the support list. This problem is exacerbated within patch levels, where the rate of turn-over seems to be about 1 patch per year, each with new features and new bugs, killing support for the new two-off patches. Oh how I wish Oracle would commit to sticking with a "stable" version, making it bullet-proof.

All these new features are a double-edged sword. The media loves "new features", Oracle Support, I am sure, hates them. As a customer, user and DBA, I love the new technologies and the learning opportunities, but I hate how there is often poor documentation, lack of support and a host of accompanying bugs.

Am I complaining? You bet. Am I a whiner? I could care less. Taking a step back, I have a big beef with software development in general. Example, Microsoft Vista. Like the Grammy's, there should be awards for best-written and most-stable products. Or perhaps that would be too depressing.

In my opinion, one of the most ironic statements to come out of Ellison's mouth was the whole "War on Complexity". Oracle definitely lost that war. Look at the monstrosity called "Oracle Enterprise Manager". Sure, its free, and that is why other companies are capitalizing and making money by offering a commercial, better product for managing databases. If there really is a war on complexity, then I will walk to the moon. What I see is a war on simplicity. And it is not even fair to call it a war; rather, a wholesale rape.

I have had the opportunity to interact directly with Oracle employees at various levels. For the most part, I have been blown away by the graciousness, intelligence and customer-focus these men and women have. In talking with them, I have been convinced that that there are a lot of good people working hard to make an excellent product. What I dislike is the general direction of the product, and the decisions being made that influence it thus. And again, for the record, when I say "the product", I only really care about the RDBMS. I do not have anything good to say about other offerings. =)

Tuesday, January 08, 2008

My first chain blog [sic]

I usually delete emails that have anything to do with "being tagged", forward to x number of buddies and family members, etc. Irregardless of who they are from, I just marked them as spam and forget about them. So I get this email from Dan Norris and my finger is hovering over the kill button when I start to read a little of the preview. Hmm... a blog tag. Still, I am not overly excited, but it is from Dan, so I open it up and read it. Then I read the links (to his blog and his buddy Jake), then I keep reading more links. Interesting stuff.

Ok, so just this once I will indulge these silly chain mail things (well don't I sound mature). Actually, I was honored to be included in Dan's 8, and I am having a hard time coming up with 8 folks I would contact. But not being a big fan of such things anyway, I have no problems bending the rules a little.

And just for the record, I did enjoy this exercise, especially reading about other folks with whom I have communicated, and some I have never met.

1. I was awarded a "Citizenship Award" from the American Daughters of the Revolution. To this day, I have never taken the time to find out who that group (cult?) is. I was in high-school at the time, and when asked what I wanted to do with my life, I said I wanted to make video games. Yes, I was an outstanding citizen.

3. My beautiful wife originally was attracted to me when I had a monster beard and earlocks to boot. Make no doubt, the hirsute appearance has not the draw; she liked me despite that.

4. In boyscouts, our troupe was a drum and bugle corp. I played a contra bass which I thoroughly enjoyed even though I was not that great (and that is being generous). I enjoyed traveling around Illinois, even done to Missouri once, not to mention several parades in downtown Chicago.

2. I put a pitchfork through my foot when I was kid. It hurt. It was right after my dad said "Be careful with that thing, you could hurt yourself." I proved him right.

10. When I started blogging about Oracle stuff, I had only just heard about the late Lex de Haan. He seemed like a really impressive and well respected man, so I named my first blog in his honor. Ironically, blogger updated their software and I have not been able to go back to the old blog.

7. I am the oldest of 5 children, and I am proud of them all. One brother plays professional volleyball and travels a lot (even outside the states), another brother is helping special needs students at our old high school and working on his Education certification; one is taller than me, the other is bigger, and I am not a small person by any means. One sister was working at McGraw Hill until she switched to a smaller publisher that she enjoys much more, and my other sister, on a lark, picked up Muay Thai with the well-known Team Toro for kicks.

13. One of these is false.

6. Going to college at the University of Illinois in Champaign-Urbana (I know, it is called UIUC), I had the wonderful experience of trying out the grassroots of the WWW with Mosaic. I maintained a webpage that blatantly said "I am NOT the author of Peanuts", but I still received fan mail. I even had a teacher ask me to write a special comic strip for her classroom. I was touched, but on the other hand, I got to thinking that some people just want to believe something no matter what you tell them.

0. I bailed hay once, and we had roosters for a short time while I was growing up. To put this in perspective, I grew up in the Chicago 'burbs, 10 miles outside of downtown. There are no farms within 10 miles of downtown, at least none that are not called zoos. I bailed hay because my parents at one time thought we were going to move out to a farm (Minnesota), so we tried out farm-life for a week. Crazy idea that. A couple years later, we somehow brought home roosters as pets and kept them in the basement of our neighborhood bungalow. Another crazy idea that. They did not last long; one died and the other was called out because our neighbors did not like how it crowed before the sun came up.

1.6180339 I jumped out of a perfectly normal, safe, solid, airplane at over 14000 feet. Instead of a parachute strapped to my back, I had a another guy who was risking his life for a measly $200. Also, if you did not get the golden ratio hint, my first dbms was Sybase (in a dusty attic somewhere in my memory).

I am going to tag Job Miller (great excuse to start a blog!) and Ravi Gaur.

Wednesday, January 02, 2008

The DataPump Index work-around

As I mentioned in a previous post, I did not like the way Data Pump does indexes. Apparently, many other folks feel the same way.

UPDATES: We are working on a new version which we hope to have ready in the next couple days. While it does not do anything fancy like Data Pump does, it does successfully parallelize those operations that current ( limited to one worker process. Additionally, I have communicated with the fine folks working on Data Pump, and am very excited with the work they are doing to improve it.

Here is the script I am using to run indexes in parallel (the formatting sucks - thanks Blogger!):
## Charles Schultz 19-Dec-2007
## Copied from
## Assume paresh = Parallel Execution Shell
## Additionally, the original script was written for bourne shell. I have switched it to
## use kshell since that is what we use; I have left as much of the original code in tact,
## and attempted to comform to the style.
## Still could use some cleaning up. The error reporting is not stellar, but I believe it
## is passable.

# SETX="set -x" # Uncomment this line to set -x

# OLD SYNTAX: paresh command_file parallel_count
# NEW SYNTAX: paresh command_file parallel_count [$ORACLE_SID]

## GLOBAL VARIABLES -- Added by Charles Schultz 19-Dec-2007
LOCAL_BIN="/u01/app/oracle/local/bin" # Directory where dbhome and oraenv are located
TMP="/var/tmp" # Customizable TEMP directory
SLEEP_SEC=.1 # Duration of pause between get_shell requests
egrep="grep -E"
DRYRUN=${DRYRUN:-"N"} # Did not want to mess with command-line parameters,
# so I made this either a hard-coded or environment var

# message
# Establish a timestamp and echo the message to the screen.
# Tee the output (append) to a unique log file.
timestamp=`date +"%D %T"`
echo "$timestamp $*" | tee -a $logfile
# run_sql
# Added by Charles Schultz 19-Dec-2007
# Takes SQL as a parameter and runs it via sqlplus
# Connects "/ as sysdba"; avoids having to deal with passwords,
# but could be a security risk if not careful about the input file
$SQLPLUS /nolog <<> $SQLLOGTMP 2>&1
connect / as sysdba
set echo on
set timing on

errors=`$egrep "ORA-|SP2-" $SQLLOGTMP`
if [ "$errors" != "" ]
# get_shell
# This function is responsible for establishing the next
# command to be processed. Since multiple processes might
# be requesting a command at the same time, it has a built-
# in locking mechanism.
# MODIFICATIONS Charles Schultz 19-Dec-2007
# Workfile will have SQL DDL statements, and thus span more than
# one line. I have adapted get_shell to read a chunk of lines
# up to either the CREATE or DECLARE statement. This is specific
# for DDL generated by IMPDP index metadata.
# Also, updated the worker thread to go against sqlplus instead
# of executing the command "as is".
echo "`date` $1 Shell Request $$" >> $lklogfile
while : # until a command or end
next_shell="" # initialize command
if [ ! -s ${workfile} ] # if empty file (end)
then #
break # no more commands
fi #
if [ ! -f $lockfile ] # is there a lock?
then # not yet...
echo $$ > $lockfile # make one
echo "`date` $1 Lock Obtained $$" >> $lklogfile
if [ "$$" = "`cat $lockfile`" ]
then # we created it last
## START -- Added by Charles 19-Dec-2007
## Get the next line number of the last statement we are interested in
## FOR ALTER DDL (ie, Constraints)
# chunk_end=`$egrep -n "ALTER" $workfile|head -2|sed 's/:.*//'`
## FOR Data Pump Index DDL
chunk_end=`$egrep -n "CREATE|DECLARE" $workfile|head -2|sed 's/:.*//'`

if [ "$chunk_end" = "" ]
break # No more chunks found, exit

line_num=${chunk_end##* } # if two line numbers found, line_num != chunk_end
# grab the last number as the line number

if [ $line_num = $chunk_end ] # if only one line found, run everything else
## Run whatever is left in the workfile
next_shell=`cat $workfile`
echo "" > $workfile
else # else get the next chunk
next_shell=`head -${line_num} $workfile` # Get chunk of work
sed -e 1,${line_num}d $workfile > ${workfile}.tmp # Chop off chunk of work
mv ${workfile}.tmp $workfile
## END -- Added by Charles 19-Dec-2007

rm -f $lockfile # turn off lock
echo "`date` $1 Shell Issued " >> $lklogfile
return # done, command in
else # variable "next_shell"
echo "`date` $1 Lock FAULTED $$" >> $lklogfile
fi # double check faulted
# else # locked by other
# echo "`date` $1 Lock Wait $$" >> $lklogfile
sleep $SLEEP_SEC # brief pause
done # try again
return # only if no commands
# paresh_slave
# This code is executed by each of the slaves. It basically
# requests a command, executes it, and returns the status.
# Modified by Charles Schultz 19-Dec-2007
# Passes next_shell to run_sql
export SQLFILE="$OUTPUT/paresh_${1}.sql" # Sql file for this slave
SLAVE_ERROR_LOG="$OUTPUT/paresh_${1}_sql.errors" # Error log file for this slave
echo "" > $SLAVE_ERROR_LOG
SQLLOG="$OUTPUT/paresh_${1}_sql.log" # Sql log file for this slave
echo "" > $SQLLOG
SQLLOGTMP="$OUTPUT/paresh_${1}_sql.log.tmp" # Temp Sql log file for this slave
shell_count=0 # Commands done by this slave
get_shell $1 # get next command to execute
while test "$next_shell" != ""
do # got a command
shell_count=`expr $shell_count + 1`
message "Slave $1: Running sql: $next_shell"
echo "set sqlblanklines on" > $SQLFILE # This was added to avoid errors with blank lines in source
echo "$next_shell" >> $SQLFILE
if [ "$DRYRUN" = "Y" ]
message "NOTE: This is a DRYRUN; no actual work will be done"
run_sql # execute command
# shell_status=$? # get exit status
if [ "$shell_status" -gt 0 ]
then # then message
message "Slave $1: ERROR IN SQLPLUS status=$shell_status"
echo "Slave $1: ERROR IN Shell SQLPLUS status=$shell_status" >> $errfile
# message "Slave $1: Finished Shell"
get_shell $1 # get next command
done # all done
message "Slave $1: Done (Executed $shell_count Shells)"
return # slave complete
# paresh_driver
# This code is executed by the top level process only. It
# parses the arguments and spawns the appropriate number
# of slaves. Note that the slaves run this same shell file,
# but the slaves execute different code, based on the
# exported variable PARESH.
rm -f $lklogfile # start a new log file
if [ "$1" = "" ] # first argument?
then # no?
master_file="master.list" # default value
else # yes?
if [ ! -f "$1" ] # does file exist?
then # no?
echo "$0: Unable to find File $1"
exit 1 # quit
else # yes?
master_file="$1" # use specified filename
if [ "$2" = "" ] # Second Argument?
then # no?
else # Yes?
if [ "$2" -lt 1 ] # Less than 1?
then # Yes?
echo "$0: Parallel Process Count Must be > 0"
exit 1 # quit
else # no?
parallel_count=$2 # Use Specified Count

## Added by Charles 13-Dec-2007
export PATH="$LOCAL_BIN:$PATH" # Setup Oracle Environment
export ORAENV_ASK=NO;. $LOCAL_BIN/oraenv
export SQLPLUS="$ORACLE_HOME/bin/sqlplus"
if [ ! -e $SQLPLUS ]
echo "Oracle Home $ORACLE_HOME not valid for Oracle SID $ORACLE_SID - exiting"
exit 1

mkdir -p $OUTPUT
## Added by Charles 13-Dec-2007

message "------------------------------"
message "Master Process ID: $PARESH"
message "Processing File: $master_file"
message "Parallel Count: $parallel_count"
message "Log File: $logfile"
message "Working Output Directory: $OUTPUT"
message "------------------------------"
cp $master_file $workfile # make a copy of commands file
while test $parallel_count -gt 0
if [ ! -s $workfile ]
message "All Work Completed - Stopped Spawning at $parallel_count"
break # Quit spawning
$0 $parallel_count &
message "Spawned Slave $parallel_count [pid $!]"
parallel_count=`expr $parallel_count - 1`
message "All Done"
# main
# This is the main section of the program. Because this shell
# file calls itself, it uses a variable to establish whether or
# not it is in Driver Mode or Slave Mode.
if [ "$PARESH" != "" ] # If variable is set
then # then slave mode
workfile=$TMP/$PARESH # Work file with parent pid
lockfile=$TMP/paresh.lock.$PARESH # Lock file with parent pid
logfile=$TMP/paresh.log.$PARESH # Log File with parent pid
errfile=$TMP/paresh.err.$PARESH # Error File with parent pid
paresh_slave $* # Execute Slave Code
PARESH="$$"; export PARESH # Establish Parent pid
workfile=$TMP/$PARESH # Work File with parent pid
lockfile=$TMP/paresh.lock.$PARESH # Lock File with parent pid
logfile=$TMP/paresh.log.$PARESH # Log File with parent pid
errfile=$TMP/paresh.err.$PARESH # Error File with parent pid
rm -f $errfile # remove error file
paresh_driver $* # execute Driver Code
rm -f $workfile # remove work file
rm -f $lklogfile # remove lock log file
if [ -f $errfile ] # Is there was an error
message "*************************************************"
message "FINAL ERROR SUMMARY. Errors logged in $errfile"
cat $errfile | tee -a $logfile
message "*************************************************"
exit 1