Friday, January 15, 2010

Performance Issues using PL/SQL Tables in SQL Statements

Recently, I ran into an interesting performance problem while writing some PL/SQL code for work:

  • We’re using a PL/SQL table to collect some id that have to be updated (using UPDATE aTable Set .. Where id IN (…)).
  • The update statement was very slow.

I’ve posted question #2057055 on Stackoverflow and got some very helpful replies that helped me figure out what was going on and what to do.

Problem Description

The best way to describe the problem is with some code:

create table atable (acolumn number, id number);

insert into atable select rownum, rownum from dual connect by level < 150000;

alter table atable add constraint atab_pk primary key (id);

create type type_coll is table of number(4);

exec dbms_stats.gather_table_stats(ownname => user, tabname => 'ATABLE');

declare
    a_coll type_coll;
begin
  a_coll := type_coll(1,2,3,4);
  for n in 1..1000
  loop
    UPDATE aTable
    SET aColumn = 1
    WHERE id IN (SELECT COLUMN_VALUE
                     FROM   TABLE (cast (a_coll as type_coll)) 
                     );
  end loop;
end;
/ 

This script creates a new table with 150.000 records and does some housekeeping to define a primary key and make sure the optimizer has decent statistics. Then the PL/SQL table type is defined (it has to be defined as a type on the database level so it can be used in SQL statements)

The final statements (starting with declare) form an anonymous PL/SQL block that simulates the problems I was seeing:

  • The PL/SQL table a_coll is declared and filled with some values. (There are only a few values, which was also similar to the scenario I was having problems with.)
  • Then the UPDATE statement I was describing above is executed 1000 times. The statement looks a bit weird, but that is how you can use the contents of a_coll in a SQL statement. (There may be a way to get rid of the cast, but I couldn’t get that to work.)

When running the PL/SQL block, it was running quite a while (about 4 minutes in our environment, so about 4 updates per second) .. way too long! (The scenario I was implementing required more than 150.000 updates which would have been 10 hours .. totally unacceptable.)

Performance Analysis

The first thing with a slow statement is to look at the execution plan. As I couldn’t get an ‘Explain Plan’ for the whole PL/SQL block or just the ‘Update’ statement to execute, I just tried to get a plan for this statement which I felt was what was executed by the database:

UPDATE aTable
SET aColumn = 1
WHERE id IN (1,2,3,4);

For this, I got a plan that looked good, it was using the primary key defined on id, so it should run fast enough. When I used that exact statement in the block (instead of all the SELECT Column_Value stuff), the execution time dropped dramatically to basically sub-second performance for 1000 updates. So obviously something else is going on …

When talking about this problem with a colleague, he showed me a way to get to the execution plan of the statement that is actually executed. He was using the Oracle Enterprise Manager to look at my session and the current statement and plan, later I was able to do the same thing on my machine using TOAD and the Session Browser.

When looking at the execution plan of the statement that is executed when running the PL/SQL block, we realized that the primary key is not utilized, but instead a full table scan is used. Obviously, that is much slower than using the index for the few records that were required, so that explained the slow performance I was seeing.

Improvement I

The first possible solution to the performance problems I was seeing is to force the use of the primary key index. In Oracle, this is done using Optimizer Hints that are enclosed in /*+ <hint> */. Apparently, they were very common in older versions of Oracle, but haven’t been that important in newer versions (when the Optimizer switched to cost-based instead of rule-based and was much better in coming up with a good plan).

We can force the optimizer to use the index by writing the update like this:

UPDATE /*+ INDEX(a_Table SYSxxx) */a_Table
SET aColumn = 1
WHERE id IN (SELECT COLUMN_VALUE
FROM TABLE (cast (a_coll as type_coll)) );

with SYSxxx being the index for the primary key that is generated by Oracle.

When using this Update-Statement, the performance was the same as with  the ‘naive’ Update shown earlier. The execution plan showed that indeed the index was used.

While this works, it is a bit clumsy .. we have to figure out the name of the generated index (which may change when the table is dropped and re-created or may be different between dev, qa and production databases).

Improvement II

In order to build a better solution, first we have to figure out what is going on and why the optimizer comes up with the bad execution plan. As first explained by Nils Weinander:

The bad execution plan is probably unavoidable (unfortunately). There is no statistics information for the PL/SQL table, so the optimizer has no way of knowing that there are few rows in it.

(Nils also suggested to use the index hint as explained above.)

This explains why the optimizer chooses a full table scan: There may be a lot of matching rows, so a full table scan may be quicker. (I’m not sure where the cut-off would be, but probably when more than 10% of the rows are affected, a full-table scan is faster then identifying the rows through the index and then retrieving the full record for the update).

However, there is a way to tell the optimizer how many rows there are in the PL/SQL table using the cardinality hint (suggested by APC):

UPDATE aTable 
SET aColumn = 1 
WHERE id IN (SELECT /*+ cardinality( pt 10 ) */ 
                     COLUMN_VALUE 
              FROM   TABLE (cast (a_coll as type_coll)) pt ); 

This tells the optimizer that there are about 10 records in the table pt (which is an alias for our PL/SQL TABLE). When using this statement, everything works as expected, it is quick and also uses the primary key index (without explicitly having to reference it in the SQL statement).

With this, I was able to improve the execution time of my initial scenario from about 10 hours down to about 5 minutes total execution time. Not too shabby an improvement!

No comments:

Post a Comment