Wednesday, March 04, 2009

Resolving file header contention on tempfiles

We are finding many bottlenecks as we attempt to optimize a character-set conversion for our 450gb database. One of the tricks is using a custom-made script (paresh) to build indexes in parallel (ie, many indexes at once), since datapump likes to do them one at a time, albeit using a degree of parallelism.

So, the most recent bottleneck was buffer busy waits. So after a little digging around, we finally came up with this query:
select p1 "File #", p2 "Block #", p3 "Reason Code" from v$session_wait where event = 'buffer busy waits';


The tricky part was that we were getting back p1 values that exceeded the highest file_id found in both dba_data_files and dba_temp_files. Tanel Poder and K Gopalakrishnan helped us out by stating that these extraordinary numbers were actually the "real" file_id for the tempfiles, which are numbered file_id+db_files, and you get db_files from v$parameter.

With that out of the way, we realized that were hitting file header block contention (Reason Code 13) on the temp files. As Tanel succintly put it, "it's the block 2 in the file, this is where the LMT extent management bitmap header block lives." So our index builds were hammering the few lucky header blocks in our tempfiles that contained the extent bitmap.

My first thought was to increase the size of the tablespace extents, thinking that perhaps if there were few extents to worry about, the extent bitmap would not be in such high demand. Turns out the effectiveness of that idea was rather poor - or at least unnoticeable. Then I decided to increase the number of header blocks. One simple way to do that is add more tempfiles. And Viola! Now I create x number of tempfiles for x number of slave processes, and I have yet to see any more file header block contention on my tempfiles. Coolness.

On to the next bottleneck...

1 comment:

Anonymous said...

We have a job that runs 15 concurrent processes that uses 5k-10k global temp tables. I reduced the extent size of the temp tablespace to cut down on free buffer waits and improve IO (our GTT's are tiny and a 1MB uniform extent is a waste of memory and IO), but buffer busy waits went through the roof. I was able to identify block 2 of our temp files as being the culprit. Thanks for the post, it confirms what I had planned on doing to resolve our issue.