Friday, December 21, 2007

Data Pump Rocks!

For the past week, we dove into Data Pump with a vengeance. I have never played with it before; I think I may have tried a table export or two, but nothing really serious. So last week I had no clue what a huge improvement over the "traditional" exp/imp Data Pump is. Of course, it goes without saying that there will be bugs and gotchas, but that pretty much comes with the territory. Sad but true.

The purpose we engaged in this activity in the first place is that our ERP ostentatiously decided that they want to support international characters, which means that all of us state-side have to upgrade regardless of any character needs. They didn't ask me! *grin* But the silver lining is that we have learned so much about Data Pump.

Just for the sake of numbers, we were able to pump the data from a 478gb database to a set of dumpfiles totaling 175gb in about 3 hours, and then turn around and pump that back into an empty database in about 4 hours or so. Yes, there are some hidden truths in there. For instance, we completely ignored indexes; they will explode your timeframe significantly.

EXPDP
We hit some issues early on with the export. At first, we thought we were hitting the LOB issue. After filing an SR, we learned of Metalink note 286496.1 which covers tracing and an "undocumented" METRICS parameter. METRICS seems like a vastly handy little piece of information, so I am quite flabbergasted that it is "undocumented". According to the note, we set TRACE=480300, which has some trace information for the Master and Worker processes. In addition, I set event 10046 to get the waits. That was eye opening. The database was spending an enormous amount of time waiting on "library cache pin" while creating Global Temporary Tables. Very odd. After playing ping-pong with the Support Analyst and thinking about it for a while, I realized that all of the objects experiencing a wait had FGAC enabled for VPD. Ok, one strike against me for not choosing a VPD-free login (ie, one that has been granted EXEMPT ACCESS POLICY), but one strike for Data Pump for doing something rather poorly. I am hoping to hear more about this particular behavior and how it will be resolved.

We also set our TEMP space to extend without limit and set the pools (shared pool and buffer cache) a bit higher than normal.

The parameter file we used:
directory=DPUMP
dumpfile=${ORACLE_SID}_full%U.dmp
logfile=${ORACLE_SID}_full_debug.log
full=y
parallel=16
metrics=y
userid="xxx/yyy"
TRACE=480300


IMPDP
The very first thing I tried was NETWORK_LINK; for "traditional" exp/imp, we use a pipe as an intermediate file instead of exp to a dump file and imp from the dump file. Since Data Pump writes asynchronously, this is not possible, but the alternative is to communicate via a Database Link. Unfortunately, LONG objects are not yet supported via this method, excluding this option as a viable method.

The next problem we encountered was that the metadata contained a tablespace specification for a non-existent tablespace on two of our partitioned tables. This turned out to be extremely counterintuitive. If you precreate the table (on the proper, existing tablespaces), IMPDP will fail trying to create the table (on the wrong, non-existing tablespace). Even if you specify TABLE_EXISTS_ACTION=TRUNCATE!! Our Support Analyst is telling me that this is the expected behavior. Was not my expectation at all. To fix it, we create the tablespace and viola, we have a working import process.

Lastly, we struggled for a long time with the arduous process of creating indexes. Data Pump
says it is creating indexes in parallel. In reality, it using the parallel degree clause of the CREATE INDEX statement, utilizing the RDBMS parallel server processes. This seems rather antithetical to the rest of Data Pump, especially if you consider that no matter how much parallelize, you bottleneck with the query coordinator. I much prefer that Data Pump use a Parallel Degree of 1 for the index creations, but launch multiple creations at the same time. In fact, I downloaded a script called paresh. I had to modify it a bit to use the DDL generated by IMPDP Metadata for indexes, but it seems to work quite well. I need to modify it more for error checking, but at least it creates indexes truly in parallel.

So, with that out of the way, we are now working on a "clean" import given these exceptions. For the init.ora, we use:
nls_length_semantics = CHAR

## Import Debugging
max_dump_file_size = unlimited
# event="10046 trace name context forever, level 12"

## Import Speed-up parameters
shared_pool_size = 1000M
sga_max_size = 2000M
parallel_max_servers = 24
_disable_logging = TRUE
DB_BLOCK_CHECKSUM=FALSE ## DEFAULT = TRUE
## DISK_ASYNCH_IO=TRUE ## DEFAULT
## DB_BLOCK_CHECKING=FALSE ## DEFAULT


A note of WARNING about _disable_logging. It is an underscore parameter, so all the usual warnings accompany that. I found out the hard way what happens if you shutdown abort while attempting to rollback sql statements:
SMON: following errors trapped and ignored:
ORA-01595: error freeing extent (3) of rollback segment (1))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4193], [114], [232], [], [], [], [], []

More information can be found in Metalink Note 39282.1.

Our IMPDP parameter file is similar to EXPDP, but excluding some objects:
directory=DPUMP
dumpfile=${ORACLE_SID}_full%U.dmp
logfile=${ORACLE_SID}_full_no_indexes.log
full=y
parallel=16
metrics=y
userid="xxx/yyy"
TRACE=480300
EXCLUDE=index,constraint


We are still playing with the fastest way to migrate constraints. One thought is to do two passes:
  • import metadata
  • disable constraints
  • import data
  • enable constraints in parallel
  • build indexes in parallel

Another thought is to disable the constraints on the source, but that may not be practical for our situation. We will see.

In the End
I am optimistic that our latest changes are going to produce some fast numbers. Of course, there are other areas that could be tweaked (place dumpfiles on independent mount points, for example), but we went with the low-hanging fruit, and I think we scored. Another outcome is that we have made contact with the Data Pump Product Manager and her supervisor, which is priceless! They are excellent people and very patient and willing to listen. Amazing folks over there!

Friday, December 07, 2007

All I want for Christmas

We have a particularly naughty database this week. However, it still has high hopes for the "giving season":

SQL > select dbms_random.string('U', 4) from dual;
DBMS_RANDOM.STRING('U',4)
------------------------------------------------------------------------------------------------------------------------
RIBS


That was for real. We had quite a laugh about that little coincidence this morning.