Wednesday, February 28, 2007

Setting up a logical standby database

Never having done this before, I wanted to give this a crack. Also, I was recently handed a project where we will either be using Streams or a Logical Standby. I was happy to learn that these two solutions are very similiar; one might even ask what the difference is. In the conventional sense, I think "Streams" implies a technology that is geared towards a set of objects, whereas logical standby is used for a whole database. At least, that is my impression early in the game.

To get started, I started reading the Oracle documentation about logical standby databases. This can be found in Chapter 4 of the Data Guard Concepts and Administration. Since I am using a really small and simple database (only non-system object is the demo SCOTT schema), I skipped the perquisite checks and stuff and jumped ahead to the part about setting up a Physical standby first. I got my primary all set up and used RMAN to make a physical standby. One can follow the directions from Chapter 3, but I have done a bit with RMAN duplication so I like it better, plus Paul Moen of Phythian has a great blog entry that details a simple recipe. In my example, I have the primary and the standby on the same host (remember, I said simple); for some reason, lock_name_space is deprecated in 10gR2, and I have been unable to find an alternative. So I use it anyway. One thing to keep in mind is that when using lock_name_space, you wind up with an instance_name that is not equal to your service_names (by default). TNSNAMES will have to point to the instance_name for the time being.

With a working, viable physical standby setup, I move on. The next couple steps are fairly straight-forward; stop the apply process in the standby and build the data dictionary in the primary redo logs. I am not concerned about role transitions (except transitioning the physical standby to logical, but the role does not change). The documentation makes a point about creating standby redo logs; I believe these are critical, but I still have a lot to learn about them. I created them, hence I also added a destination for local redo (as opposed to redo coming from the primary). What is disconcerting to me is the volume of redo that is generated on the standby; next time I will have to make them bigger (and possibly more of them, something to keep in mind when creating the controlfile with maxlogfiles).

The process of transitioning to a logical standby is actually pretty cool. Cool because it is so simple, yet the result is a bit complex. First, one has to convert the database:
ALTER DATABASE RECOVER TO LOGICAL STANDBY db_name;
This is where the lock_name_space and tnsname entry comes back to bite us (with the benefit of hindsight). If one is using a pfile (as I am), the in-memory name of the database will actually change, giving a warning about changing it in the pfile later on. I was impressed. However, now that the name has officially changed, make sure you remove lock_name_space, and that the name can be resolved properly by tns. Would be a good idea to actually pay attention to that warning and make the change in the pfile permament, as well. Another tricky thing about this operation is that it is expecting to read the data dictionary from the redo log. If it cannot find it, the operation will hang (forever).

After converting the standby, following the documentation, one has to shut it down to make some other changes, including the password file and a destination for local redo. With that, we are on the home stretch with the last couple commands:
ALTER DATABASE OPEN RESETLOGS;
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;

Again, this is some slick bit of operation. As the documentation says, the standby is now essentially an equal copy, but "transactionally different". You have a fully functional database open for read-write, but still accepting redo from an external source. None of the messy overhead setting up a Streams database with various CAPTURE, PROPAGATE and APPLY handlers and various RULEs.

I did run into a bit of a problem right away, and still continue to struggle with this.

ARC1: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (1041)
ARC1: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
PING[ARC1]: Error 1041 when pinging standby STANDBY1.

oerr ora 1041
01041, 00000, "internal error. hostdef extension doesn't exist"
// *Cause: Pointer to hstdef extension in hstdef is null.
// *Action: Report as a bug


Nice error message, eh? Looking at Google, this error has been around in some form or fashion since Oracle 7. Yuck. For some reason, whenever my logical standby is bounced, the primary looses the handshake. I can resolve this by bouncing the primary, but of course this is far less than optimal, and I need to figure out what it is doing. Rather annoying. At first I thought it was because the listener was registering the standby instance fast enough, but after adding an entry in listener.ora, I was still getting this error.

I also played with creating a conflict on the standby by updating scott.emp in the standby, then the primary. The standy noticed the conflict right away and made a note about it, registering subsequent redo logs.

LOGSTDBY stmt: update "SCOTT"."EMP"
set
"SAL" = 810
where
"EMPNO" = 7369 and
"SAL" = 800 and
ROWID = 'AAAAAAAAEAAAAAyAAA'
LOGSTDBY status: ORA-01403: no data found
LOGSTDBY id: XID 0x000a.015.000005c1, hSCN 0x0000.000372b0, lSCN 0x0000.000372b0, Thread 1, RBA 0x00b8.0000002a.10, txnCscn 0x0000.000372b9, PID 18086, oracle@xxxxxx (P004)
LOGSTDBY Apply process P004 pid=25 OS id=18086 stopped
Wed Feb 28 11:08:39 2007
Errors in file /u01/app/oracle/admin/STANDBY1/bdump/standby1_lsp0_16624.trc:
ORA-12801: error signaled in parallel query server P004
ORA-01403: no data found
LOGSTDBY Apply process P006 pid=27 OS id=18095 stopped
LOGSTDBY Apply process P007 pid=28 OS id=18098 stopped
LOGSTDBY Apply process P005 pid=26 OS id=18092 stopped
LOGSTDBY Apply process P008 pid=29 OS id=18100 stopped
Wed Feb 28 11:08:43 2007
LOGMINER: Log Auto Delete - deleting: /u02/oradata/STANDBY1/log_STANDBY1_615653782_1_183.log
Deleted file /u02/oradata/STANDBY1/log_STANDBY1_615653782_1_183.log
LOGSTDBY Analyzer process P003 pid=24 OS id=18082 stopped
Wed Feb 28 11:10:15 2007
RFS[1]: Successfully opened standby log 3: '/u04/oradata/STANDBY1/redolog3STANDBY1.log'
Wed Feb 28 11:10:15 2007
RFS LogMiner: Client enabled and ready for notification
Wed Feb 28 11:10:15 2007
RFS LogMiner: Registered logfile [/u02/oradata/STANDBY1/log_STANDBY1_615653782_1_185.log] to LogMiner session id [1]


I still need to read about this registration process. Fearing that I would have "reinstatiate" the table (ie, like I might have to in Streams), all I did was update the table again back to what it was on the standby (no, I did not rollback because it was already committed; it was pretty much like a flashback), and restart the APPLY process. Very smooth and very intelligent.

Obviously, my early steps in Logical standbys and Streams in general are nothing new or earth-shattering for those who have already been around the block a few times. However, coming this far, I have to wonder why Streams is so difficult. For that matter, why are many other features so mind-boggling complicated and difficult to setup (LDAP or Advanced Replication just to name a couple)? Kudos to whoever worked on Streams, Logical Standbys in particular. While I still have some gripes about Streams in general, this exercise has given me some hope and a glimpse at a brighter future.