Wednesday, September 19, 2018

Using Arrays for Lookup Tables

Using Arrays for Lookup Tables


Most systems contain a number of small static lookup tables that are accessed regularly during the lifetime of a session.  In the case of batch jobs, these simple lookups may be performed many thousands of times in one cycle of the job.  In some situations performance can be improved drastically by caching this lookup data in PL/SQL tables rather than constantly referencing the base table.
The create_cached_lookup_tab.sql script creates and populates a test table which is used by the example code in this section.
create_cached_lookup_tab.sql
DROP TABLE cached_lookup_tab;
CREATE TABLE cached_lookup_tab (
  id           NUMBER(10)  NOT NULL,
  description  VARCHAR2(50) NOT NULL
);
ALTER TABLE cached_lookup_tab ADD (
  CONSTRAINT cached_lookup_tab_pk
  PRIMARY KEY (id)
);
DECLARE
  TYPE t_tab IS TABLE OF NUMBER;
  l_tab  t_tab := t_tab();
BEGIN
  FOR i IN 1 .. 1000 LOOP
    l_tab.extend;
    l_tab(l_tab.last) := i;
  END LOOP; 
  FORALL i IN l_tab.first .. l_tab.last
    INSERT INTO cached_lookup_tab (id, description)
    VALUES (l_tab(i), 'Description for ' || TO_CHAR(l_tab(i)));
  COMMIT;
END;
/
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'cached_lookup_tab', cascade => TRUE);
Once the test table is created, a PL/SQL API can then be created that will cache the lookup data and allow user access.  For the purpose of this test, the following procedures are used:
populate_tab – This procedure populates an associative array (index-by table) with the data from the cached_lookup_tab table.
get_cached_info – This procedure returns the specified lookup information from the cached table.
get_db_info – This procedure returns the specified lookup information from the database table, not the cached table.
Comparing the relative speeds of the get_cached_info and get_db_info procedures gives an indication of the performance improvements associated with caching the lookup data.  The cached_lookup_api.sql script shown below creates a package specification and body that implements these three procedures.
cached_lookup_api.sql
CREATE OR REPLACE PACKAGE cached_lookup_api AS
PROCEDURE populate_tab;
PROCEDURE get_cached_info (p_id    IN   cached_lookup_tab.id%TYPE,
                           p_info  OUT  cached_lookup_tab%ROWTYPE);
PROCEDURE get_db_info (p_id    IN   cached_lookup_tab.id%TYPE,
                       p_info  OUT  cached_lookup_tab%ROWTYPE);
END cached_lookup_api;
/
SHOW ERRORS
CREATE OR REPLACE PACKAGE BODY cached_lookup_api AS
TYPE t_tab IS TABLE OF cached_lookup_tab%ROWTYPE
  INDEX BY BINARY_INTEGER;
g_tab  t_tab;
-- -----------------------------------------------------------------
PROCEDURE populate_tab AS
-- -----------------------------------------------------------------
BEGIN
  FOR i IN (SELECT *
            FROM   cached_lookup_tab)
  LOOP
    g_tab(i.id) := i;
  END LOOP;
END populate_tab;
-- -----------------------------------------------------------------
-- -----------------------------------------------------------------
PROCEDURE get_cached_info (p_id    IN   cached_lookup_tab.id%TYPE,
                           p_info  OUT  cached_lookup_tab%ROWTYPE) AS
-- -----------------------------------------------------------------
BEGIN
  p_info := g_tab(p_id);
END get_cached_info;
-- -----------------------------------------------------------------
-- -----------------------------------------------------------------
PROCEDURE get_db_info (p_id    IN   cached_lookup_tab.id%TYPE,
                       p_info  OUT  cached_lookup_tab%ROWTYPE) AS
-- -----------------------------------------------------------------
BEGIN
  SELECT *
  INTO   p_info
  FROM   cached_lookup_tab
  WHERE  id = p_id;
END get_db_info;
-- -----------------------------------------------------------------
BEGIN
  -- Load table during package initialization.
  populate_tab;
END cached_lookup_api;
/
SHOW ERRORS
The package body contains an initialization block. This means the populate_tab procedure is executed during package initialization.  As a result during the lifetime of the session, the populate_tab procedure is never called manually unless a refresh of the cached table is required.
Now that a test table and an API to access the data is available, a batch process that requires the lookup information needs to be simulated.  The cached_lookup_ap_testi.sql script does this by repeatedly executing the lookup procedures, passing in random ID values.
cached_lookup_api_test.sql
-- *****************************************************************
-- Parameters:
--   1) loops – The number of loop iterations.
-- *****************************************************************
SET VERIFY OFF
SET SERVEROUTPUT ON
DECLARE
  l_seed   BINARY_INTEGER;
  l_start  NUMBER;
  l_loops  NUMBER := &1;
  l_id     cached_lookup_tab.id%TYPE;
  l_row    cached_lookup_tab%ROWTYPE;
BEGIN 
  -- Seed the random number generator.
  l_seed := TO_NUMBER(TO_CHAR(SYSDATE,'YYYYDDMMSS'));
  DBMS_RANDOM.initialize (val => l_seed); 
  -- Time the cached lookup.
  l_start := DBMS_UTILITY.get_time;
  FOR i IN 1 .. l_loops LOOP
    l_id := TRUNC(DBMS_RANDOM.value(low => 1, high => 1000));   
    cached_lookup_api.get_cached_info(p_id   => l_id,
                                      p_info => l_row);
  END LOOP; 
  DBMS_OUTPUT.put_line('Cached Lookup (' || l_loops || ' rows): ' ||
                       (DBMS_UTILITY.get_time - l_start));
  -- Time the db lookup.
  l_start := DBMS_UTILITY.get_time;
  FOR i IN 1 .. l_loops LOOP
    l_id := TRUNC(DBMS_RANDOM.value(low => 1, high => 1000));   
    cached_lookup_api.get_db_info(p_id   => l_id,
                                  p_info => l_row);
  END LOOP; 
  DBMS_OUTPUT.put_line('DB Lookup (' || l_loops || ' rows)    : ' ||
                       (DBMS_UTILITY.get_time - l_start));
  DBMS_RANDOM.terminate;
END;
/
The random values are generated using the dbms_random package, and all timings are calculated in hundredths of a second using the dbms_utility.get_time function.  The cached_lookup_ap_testi.sql script accepts a parameter that indicates the number of calls to the lookup procedures that should be performed.  The following output shows the results for several loop sizes.
SQL> @cached_lookup_api_test.sql 100
Cached Lookup (100 rows): 0
DB Lookup (100 rows)    : 2
PL/SQL procedure successfully completed.
SQL> @cached_lookup_api_test.sql 1000
Cached Lookup (1000 rows): 2
DB Lookup (1000 rows)    : 18
PL/SQL procedure successfully completed.
SQL> @cached_lookup_api_test.sql 10000
Cached Lookup (10000 rows): 12
DB Lookup (10000 rows)    : 181
PL/SQL procedure successfully completed.
SQL> @cached_lookup_api_test.sql 100000
Cached Lookup (100000 rows): 127
DB Lookup (100000 rows)    : 1807
PL/SQL procedure successfully completed.
The results show that the cached lookup faster than the table lookup, which makes a strong case for its usage.
Before launching into a full scale rewrite of your existing code base, the following issues and how they relate to your system should be considered:
Data Volatility – Caching data is generally a bad idea with respect to data integrity because there is no way of knowing if the underlying data has been altered since it was cached.  As a result, this type of processing should be limited to static tables.  In some circumstances, the benefits of using this method offset the possible risks for medium volatility tables, but this should be judged on a case-by-case basis.
Memory Usage – The number and size of cached tables should be considered when using this method.  PL/SQL collections are held in memory, so large tables or large numbers of small tables may result in considerable memory requirements.  In addition, this data is session-specific so the memory requirements are multiplied by the number of sessions caching data.  It may be wise to restrict the caching of data to a few select processes, like batch operations.
Initialization Time  This method, because of the time it takes to cache the data, increases package initialization time.  For this reason, functions requiring the cached data should be separated from those that do not.  This way the performance of sessions that do not require the cached data will not be affected.
Lookup Type – This method is restricted to lookups based on a single column number or string searches.  In Oracle9i Release 2, associative arrays (index-by tables) could be indexed using string values for the first time, so prior to this release the method is restricted to single number column searches only.
Bad Implementation - Constantly test the performance improvements to make sure they actually do what is expected.  It is possible to take a good idea and implement it badly ending up with poor results.
The cached_lookup_api package is implemented in a slightly different way to reinforce the last point.  The previous examples proved that caching data in collections is quicker than hitting the database. Therefore the packages created by the cached_lookup_api_incorrect.sql script should work faster than database calls.
cached_lookup_api_incorrect.sql
CREATE OR REPLACE PACKAGE cached_lookup_api AS
PROCEDURE populate_tab;
PROCEDURE get_cached_info (p_id    IN   cached_lookup_tab.id%TYPE,
                           p_info  OUT  cached_lookup_tab%ROWTYPE);
PROCEDURE get_db_info (p_id    IN   cached_lookup_tab.id%TYPE,
                       p_info  OUT  cached_lookup_tab%ROWTYPE);
END cached_lookup_api;
/
SHOW ERRORS
CREATE OR REPLACE PACKAGE BODY cached_lookup_api AS
TYPE t_tab IS TABLE OF cached_lookup_tab%ROWTYPE;
g_tab  t_tab;
-- -----------------------------------------------------------------
PROCEDURE populate_tab AS
-- -----------------------------------------------------------------
BEGIN
  g_tab := t_tab();
  FOR i IN (SELECT *
            FROM   cached_lookup_tab)
  LOOP
    g_tab.extend;
    g_tab(g_tab.last) := i;
  END LOOP;
  /*
  -- More efficient.
  -- Use this for Oracle9i Release 2 upwards.
  SELECT *
  BULK COLLECT INTO g_tab
  FROM   cached_lookup_tab;
  */
END populate_tab;
-- -----------------------------------------------------------------
-- -----------------------------------------------------------------
PROCEDURE get_cached_info (p_id    IN   cached_lookup_tab.id%TYPE,
                           p_info  OUT  cached_lookup_tab%ROWTYPE) AS
-- -----------------------------------------------------------------
BEGIN
  << array_loop >>
  FOR i IN g_tab.first .. g_tab.last LOOP
    IF g_tab(i).id = p_id THEN
      p_info := g_tab(i);
      EXIT array_loop;
    END IF;
  END LOOP array_loop;
END get_cached_info;
-- -----------------------------------------------------------------
-- -----------------------------------------------------------------
PROCEDURE get_db_info (p_id    IN   cached_lookup_tab.id%TYPE,
                       p_info  OUT  cached_lookup_tab%ROWTYPE) AS
-- ----------------------------------------------------------------
BEGIN
  SELECT *
  INTO   p_info
  FROM   cached_lookup_tab
  WHERE  id = p_id;
EXCEPTION
  WHEN NO_DATA_FOUND THEN
    p_info := NULL;
END get_db_info;
-- -----------------------------------------------------------------
BEGIN
  -- Load table during package initialization.
  populate_tab;
END cached_lookup_api;
/
SHOW ERRORS
Before discussing the implementation of this package, recreate the previous test and see how the results vary.
SQL> @cached_lookup_api_test.sql 100
Cached Lookup (100 rows): 3
DB Lookup (100 rows)    : 1
PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.
SQL> @cached_lookup_api_test.sql 1000
Cached Lookup (1000 rows): 21
DB Lookup (1000 rows)    : 18
PL/SQL procedure successfully completed.
SQL> @cached_lookup_api_test.sql 10000
Cached Lookup (10000 rows): 208
DB Lookup (10000 rows)    : 181
PL/SQL procedure successfully completed.
SQL> @cached_lookup_api_test.sql 100000
Cached Lookup (100000 rows): 2075
DB Lookup (100000 rows)    : 1809
PL/SQL procedure successfully completed.
These results clearly demonstrate that using the collection is slower than hitting the database; the opposite of what was proved earlier.
The reason for this discrepancy is the type of collection used in this example.  The first example used an associative array (index-by table), which allowed the desired row to be referenced directly because the array index was the ID value.  The second example used a regular table collection, so the only way to retrieve the data was to loop through it looking for a match. This method proved very inefficient, especially for large tables.

No comments:

Post a Comment

SQL Important Queries

  How to delete rows with no where clause The following example deletes  all rows  from the  Person.Person  the table in the AdventureWork...