Skip to main content

IBM DB2 Staging Environment

Introduction

Efficient performance of the IBM DB2 Database server is crucial for the optimal functioning of the enov8 Test Data Management (TDM) Tool. This document outlines the standard configurations required for a DB2 staging server and provides a comprehensive set of troubleshooting steps for Database Administrators (DBAs) to ensure that the database aligns with these standards and operates at peak efficiency.

Disclaimer

This documentation applies to all IBM DB2 versions currently supported by IBM. The configurations and SQL queries provided are recommendations based on typical staging setups and may not be suitable for all environments. It is essential to evaluate these recommendations in the context of your specific staging environment and requirements. Consult with your organization's database administrators and IBM DB2 documentation for guidance tailored to your specific needs.


1. Standard Database Server Configuration

1.1 Hardware Requirements

The hardware requirements should scale based on the size of the database and the volume of data being masked. Below is a comprehensive guideline for CPU, memory, disk, and network:

CPU

  • Cores: Start with 4-6 cores, add 2 cores per 1 TB of data.
  • Example:
    • For a 1 TB database: 4 cores.
    • For a 5 TB database: 12 cores.

Memory (RAM)

  • Base Requirement: Start with 16 GB, add 16 GB per 1 TB of data.
  • Example:
    • For a 1 TB database: 16 GB of RAM.
    • For a 5 TB database: 80 GB of RAM.

Disk

  • Type: Use SSDs for optimal performance, especially for I/O-intensive masking operations.
  • Capacity: Provision at least 2× the total data size to accommodate transaction logs, temporary tables, and reorg work space.
  • IOPS: Minimum of 10,000 IOPS for SSDs; 300 IOPS for HDDs.
  • Example:
    • For a 1 TB database: 2 TB of SSD storage.

Network

  • Bandwidth: At least 1 Gbps network interface card (NIC).
  • Latency: Low-latency network connections to minimise delays between the enov8 TDM application server and the DB2 instance.

1.2 System-Level Database Optimizations

Optimizing for Bulk Data Masking

  • Use NOT LOGGED INITIALLY for Temporary Tables:

    • Purpose: The enov8 TDM tool creates temporary masking tables with NOT LOGGED INITIALLY, which suppresses transaction logging for the initial data load. This significantly reduces log I/O during bulk masking operations.
    • Implication: Ensure the staging database's active log space is sufficient to handle the remaining logged operations (updates, merges) after the initial load.
  • Increase Active Log Space:

    • Purpose: Masking operations involve large UPDATE and MERGE statements. Insufficient log space causes transactions to roll back mid-job with SQL0964C (transaction log full).
    • Implementation:
      -- Connect as instance owner or SYSADM
      UPDATE DB CFG FOR staging_db USING LOGFILSIZ 65536; -- 64 MB per log file
      UPDATE DB CFG FOR staging_db USING LOGPRIMARY 50; -- 50 primary log files
      UPDATE DB CFG FOR staging_db USING LOGSECOND 20; -- 20 secondary log files
    • Verification:
      SELECT NAME, VALUE FROM SYSIBMADM.DBCFG
      WHERE NAME IN ('logfilsiz', 'logprimary', 'logsecond');
  • Enable Intra-Partition Parallelism:

    • Purpose: The TDM tool explicitly calls SYSPROC.ADMIN_SET_INTRA_PARALLEL('YES') and SET CURRENT DEGREE = 'ANY' before chunk-based merge operations to parallelise work across CPU cores. The DB2 instance must be configured to allow this.
    • Implementation:
      UPDATE DBM CFG USING INTRA_PARALLEL YES;
    • Verification:
      SELECT NAME, VALUE FROM SYSIBMADM.DBMCFG WHERE NAME = 'intra_parallel';
  • Increase Sort Heap:

    • Purpose: Merge and join operations used during masking require adequate sort memory. Insufficient sort heap causes sort overflow to disk, slowing masking jobs significantly.
    • Implementation:
      UPDATE DB CFG FOR staging_db USING SORTHEAP 65536;   -- 512 MB
      UPDATE DB CFG FOR staging_db USING SHEAPTHRES_SHR 512000;
  • Table Reorg Awareness:

    • Purpose: The TDM tool calls ADMIN_CMD('REORG TABLE ...') automatically whenever a column drop or update is performed, because DB2 requires a reorg to reclaim space after ALTER TABLE ... DROP COLUMN. DBAs should be aware that reorg operations will occur during masking and ensure sufficient disk space is available for reorg work files.
    • Disk allowance: Reserve at least 1.5× the size of each masked table for reorg work space.

1.3 DB2 Internal Configuration

Buffer Pool Sizing

  • Purpose: DB2 uses buffer pools to cache data and index pages. Adequate buffer pool memory reduces physical I/O during masking scans and updates.
  • Implementation:
    -- Increase the default buffer pool size
    ALTER BUFFERPOOL IBMDEFAULTBP SIZE 262144; -- 2 GB (assuming 8 KB page size)
  • Verification:
    SELECT BPNAME, NPAGES FROM SYSCAT.BUFFERPOOLS;

Lock List and Lock Escalation

  • Purpose: Large UPDATE and MERGE transactions during masking hold many row locks simultaneously. If the lock list fills up, DB2 escalates to table-level locks, causing contention with other sessions.
  • Implementation:
    UPDATE DB CFG FOR staging_db USING LOCKLIST 131072;  -- 128 MB lock list
    UPDATE DB CFG FOR staging_db USING MAXLOCKS 80; -- Escalate at 80% of lock list

Auto-Runstats

  • Purpose: DB2's automatic statistics collection (AUTO_RUNSTATS) keeps optimizer statistics current, which is important for efficient masking query plans. Ensure it is enabled on the staging database.
  • Verification:
    SELECT NAME, VALUE FROM SYSIBMADM.DBCFG WHERE NAME = 'auto_runstats';
  • Note: The TDM tool reads row counts from the CARD column in SYSCAT.TABLES, which is populated by RUNSTATS. If AUTO_RUNSTATS is disabled, run RUNSTATS manually after any data load before running a masking job.

Connection Concentration

  • Purpose: The TDM tool opens a single connection per masking job. For environments with multiple concurrent TDM users, ensure the MAXAPPLS parameter is set high enough to accommodate all sessions.
  • Implementation:
    UPDATE DB CFG FOR staging_db USING MAXAPPLS 200;

1.4 Network Configuration

  • Default Port: DB2 listens on port 50000 (TCP/IP). Ensure the enov8 TDM application server can reach the DB2 instance on this port.

  • Connection Protocol: The TDM tool connects using PROTOCOL=TCPIP in the connection string:

    DATABASE=dbname;HOSTNAME=host;PORT=50000;PROTOCOL=TCPIP;UID=user;PWD=pass;
  • DB2 Communication Settings:

    • Ensure the DB2 instance is configured for TCP/IP communication:
      db2set DB2COMM=TCPIP
      db2 UPDATE DBM CFG USING SVCENAME 50000
      db2stop && db2start
  • Firewall Rules:

    • Open port 50000 (or the configured service port) from the TDM application server to the DB2 host.

1.5 Security and Patch Management

  • Patching:

    • Apply IBM DB2 fix packs via the standard IBM Fix Central download process.
    • Keep the DB2 instance at the latest fix pack for your version to receive security and performance fixes.
  • Authentication:

    • The TDM tool authenticates using the UID/PWD parameters in the connection string. DB2 delegates authentication to the operating system by default (AUTHENTICATION SERVER).
    • Ensure the TDM service account is a valid OS user on the DB2 host with database connect privileges.
  • Security Parameters:

    • Restrict the TDM service account to the staging database only. Do not grant SYSADM, SYSCTRL, or SYSMAINT authority.
    • Ensure the service account password does not expire during masking job windows.

2. Troubleshooting Steps for DBAs

When experiencing performance issues with the TDM masking tool on DB2, the following steps can help identify and resolve common problems.

2.1 Verify Hardware and OS Configuration

Check CPU and Memory

  • Purpose: Ensure hardware meets specifications.
  • Commands:
    • Linux:
      lscpu
      free -h
  • Verification: Confirm CPU core count and RAM meet the requirements in section 1.1.

Disk I/O Performance

  • Purpose: Ensure disk I/O supports masking and reorg operations.
  • Commands:
    iostat -x 1 5
  • Expected Output:
    Device:         rrqm/s wrqm/s   r/s   w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await svctm  %util
    sda 0.00 0.00 10.00 20.00 5120 10240 512.00 0.05 2.00 0.50 1.50
  • Verification: The %util column should remain below 80%. High utilisation during masking indicates I/O bottlenecks that can slow reorg operations and log writes.

2.2 Validate DB2 Configuration Parameters

Transaction Log Configuration

  • Purpose: Confirm that sufficient log space is allocated to avoid SQL0964C errors during masking.
  • Commands:
    SELECT NAME, VALUE
    FROM SYSIBMADM.DBCFG
    WHERE NAME IN ('logfilsiz', 'logprimary', 'logsecond');
  • Expected Output:
    NAME         VALUE
    ---------- -----
    logfilsiz 65536
    logprimary 50
    logsecond 20
  • Verification: Total active log space = LOGFILSIZ × (LOGPRIMARY + LOGSECOND) × 4 KB. This should be at least 10–20 GB for databases over 1 TB.

Intra-Partition Parallelism

  • Purpose: Confirm the instance allows the parallelism that the TDM tool enables via SYSPROC.ADMIN_SET_INTRA_PARALLEL.
  • Commands:
    SELECT NAME, VALUE FROM SYSIBMADM.DBMCFG WHERE NAME = 'intra_parallel';
  • Expected Output:
    NAME             VALUE
    -------------- -----
    intra_parallel YES
  • Verification: Must be YES for chunk-based merge operations to execute in parallel.

Lock List Size

  • Purpose: Confirm that the lock list is large enough for bulk masking transactions.
  • Commands:
    SELECT NAME, VALUE FROM SYSIBMADM.DBCFG
    WHERE NAME IN ('locklist', 'maxlocks');
  • Expected Output:
    NAME       VALUE
    -------- ------
    locklist 131072
    maxlocks 80
  • Verification: If lock escalation is occurring during masking (check db2diag.log for ZRC=0x80100009), increase LOCKLIST or reduce the masking batch size.

2.3 Assess Database Performance

Check Active Log Usage

  • Purpose: Monitor log utilisation during masking to catch near-full conditions before they cause transaction rollbacks.
  • Commands:
    SELECT LOG_UTILIZATION_PERCENT, TOTAL_LOG_USED_KB, TOTAL_LOG_AVAILABLE_KB
    FROM SYSIBMADM.LOG_UTILIZATION;
  • Expected Output:
    LOG_UTILIZATION_PERCENT   TOTAL_LOG_USED_KB   TOTAL_LOG_AVAILABLE_KB
    ----------------------- ----------------- ----------------------
    15 204800 1331200
  • Verification: If LOG_UTILIZATION_PERCENT exceeds 80% during masking, increase LOGPRIMARY or LOGSECOND as shown in section 1.2.

Check Reorg-Pending Status

  • Purpose: The TDM tool checks SYSIBMADM.ADMINTABINFO before every column drop and update to determine whether a reorg is needed. If REORG_PENDING = 'Y', the tool automatically runs a reorg. DBAs can monitor this proactively.
  • Commands:
    SELECT TABSCHEMA, TABNAME, REORG_PENDING
    FROM SYSIBMADM.ADMINTABINFO
    WHERE TABSCHEMA = 'STAGING_SCHEMA'
    ORDER BY TABNAME;
  • Verification: Any table with REORG_PENDING = 'Y' will trigger an automatic reorg before the next TDM update or column drop. This adds time to masking jobs on large tables.

Monitor Active Connections

  • Purpose: Confirm the TDM service account has an active connection and is not blocked.
  • Commands:
    SELECT APPLICATION_HANDLE, APPLICATION_NAME, AUTH_ID, CONNECTION_START_TIME, CLIENT_HOSTNAME
    FROM SYSIBMADM.APPLICATIONS
    WHERE AUTH_ID = 'TDM_USER';
  • Verification: Confirm one active application handle per TDM masking job.

2.4 Check Session and Query Performance

Identify Long-Running Statements

  • Purpose: Find masking queries consuming excessive time or resources.
  • Commands:
    SELECT
    APPLICATION_HANDLE,
    ELAPSED_TIME_SEC,
    SUBSTR(STMT_TEXT, 1, 200) AS SQL_SAMPLE
    FROM SYSIBMADM.MON_CURRENT_SQL
    WHERE AUTH_ID = 'TDM_USER'
    ORDER BY ELAPSED_TIME_SEC DESC;
  • Verification: Investigate statements with ELAPSED_TIME_SEC significantly higher than expected. Use EXPLAIN to review execution plans for problematic queries.

Review Execution Plans

  • Purpose: Determine if inefficient plans are causing slow masking performance.
  • Commands:
    EXPLAIN PLAN FOR
    UPDATE "STAGING_SCHEMA"."MY_TABLE"
    SET "COL1" = 'masked_value'
    WHERE "COL1" IS NOT NULL;

    SELECT OPERATOR_TYPE, OBJECT_NAME, TOTAL_COST
    FROM SYSTOOLS.EXPLAIN_OPERATOR
    ORDER BY OPERATOR_ID;
  • Verification: Look for TBSCAN (table scan) operators on large tables. These indicate missing or stale statistics and should be addressed with RUNSTATS.

Check Lock Waits

  • Purpose: Identify sessions waiting on locks, which can stall masking jobs.
  • Commands:
    SELECT
    SUBSTR(LOCK_OBJECT_NAME, 1, 30) AS LOCKED_TABLE,
    LOCK_MODE,
    LOCK_STATUS,
    APPLICATION_HANDLE
    FROM SYSIBMADM.LOCKS_HELD
    WHERE LOCK_STATUS = 'W';
  • Verification: Lock waits during masking indicate that other sessions are holding conflicting locks on staging tables. Ensure no other workloads access the staging schema during masking jobs.

2.5 Evaluate Storage and Space Usage

Check Database Size and Schema Storage

  • Purpose: The TDM tool uses SYSCAT.TABLES joined with SYSIBMADM.ADMINTABINFO to calculate schema size. Confirm the staging schema has sufficient space.
  • Commands:
    SELECT
    T.TABSCHEMA,
    SUM((A.DATA_OBJECT_P_SIZE + A.INDEX_OBJECT_P_SIZE + A.LONG_OBJECT_P_SIZE +
    A.LOB_OBJECT_P_SIZE + A.XML_OBJECT_P_SIZE) * 1024) / 1024 / 1024 AS SIZE_GB
    FROM SYSCAT.TABLES T
    JOIN SYSIBMADM.ADMINTABINFO A
    ON T.TABNAME = A.TABNAME
    WHERE T.TABSCHEMA = 'STAGING_SCHEMA'
    GROUP BY T.TABSCHEMA;
  • Verification: Ensure sufficient tablespace free space exists for masking temporary tables and reorg work files (budget at least 2× the schema size).

Check Tablespace Free Space

  • Purpose: Confirm tablespaces have headroom for temporary tables created during masking.
  • Commands:
    SELECT TBSP_NAME, TBSP_TOTAL_PAGES, TBSP_USED_PAGES,
    (TBSP_TOTAL_PAGES - TBSP_USED_PAGES) AS FREE_PAGES,
    DECIMAL((TBSP_USED_PAGES * 100.0 / TBSP_TOTAL_PAGES), 5, 2) AS USED_PCT
    FROM SYSIBMADM.TBSP_UTILIZATION
    ORDER BY USED_PCT DESC;
  • Verification: No tablespace used by the staging schema should exceed 80% utilisation.

2.6 Review Statistics and Indexes

Identify Tables with Stale Statistics

  • Purpose: The TDM tool reads row counts from the CARD column in SYSCAT.TABLES. If statistics are stale, row counts are inaccurate and the optimizer may generate poor masking query plans.
  • Commands — identify tables with zero or stale cardinality:
    SELECT TABNAME, CARD AS ROW_COUNT, STATS_TIME
    FROM SYSCAT.TABLES
    WHERE TABSCHEMA = 'STAGING_SCHEMA'
    AND TYPE = 'T'
    AND (CARD = -1 OR STATS_TIME < CURRENT_TIMESTAMP - 7 DAYS)
    ORDER BY TABNAME;
  • Remediation (DBA action — run before masking jobs):
    RUNSTATS ON TABLE "STAGING_SCHEMA"."MY_TABLE"
    WITH DISTRIBUTION AND DETAILED INDEXES ALL;

Review Index Information

  • Purpose: The TDM tool queries SYSCAT.INDEXES for index metadata. Confirm that key columns have appropriate indexes to support masking join and merge operations.
  • Commands:
    SELECT INDNAME, TABNAME, UNIQUERULE, COLNAMES
    FROM SYSCAT.INDEXES
    WHERE TABSCHEMA = 'STAGING_SCHEMA'
    ORDER BY TABNAME;
  • Verification: Primary key indexes (UNIQUERULE = 'P') and unique indexes (UNIQUERULE = 'U') are critical for merge-based masking performance. Missing indexes on frequently joined columns will result in full table scans.

2.7 Network Diagnostics

Test Connectivity on Port 50000

  • Purpose: Confirm the enov8 TDM application server can reach the DB2 instance.
  • Commands (from the TDM application server):
    # Windows
    Test-NetConnection -ComputerName <db2-hostname> -Port 50000
    # Linux
    nc -zv <db2-hostname> 50000
  • Expected Output (Linux):
    Connection to <db2-hostname> 50000 port [tcp] succeeded!

Verify DB2 TCP/IP Listener

  • Purpose: Confirm the DB2 instance is listening on the expected port.
  • Commands (on the DB2 host):
    db2 GET DBM CFG | grep -i svcename
    netstat -tlnp | grep 50000
  • Expected Output:
    TCP Service name (SVCENAME) = 50000
    tcp 0 0 0.0.0.0:50000 0.0.0.0:* LISTEN <pid>/db2sysc

2.8 Verify Security and Patch Levels

Check DB2 Version

  • Purpose: Confirm the DB2 version for compatibility. The TDM tool calls SYSPROC.ENV_GET_INST_INFO() at runtime to retrieve version information.
  • Commands:
    SELECT * FROM TABLE(SYSPROC.ENV_GET_INST_INFO()) AS T;
  • Verification: Confirm the instance is running a version of DB2 currently supported by IBM.

Check Applied Fix Packs

  • Purpose: Ensure the DB2 instance is on a supported fix pack level.
  • Commands (on the DB2 host):
    db2level
  • Expected Output:
    DB21085I  This instance or install (instance name, where applicable: "db2inst1")
    uses "64" bits and DB2 code release "SQL12xxx" with level identifier "xxxxxxxx".
    Informational tokens are "DB2 vXX.X.X.X", ...

2.9 Application Interaction

Confirm Driver Installation

  • Purpose: The TDM tool uses the ibm_db Python library, which requires the IBM Data Server Driver to be installed on the TDM application server.
  • Verification:
    • Confirm ibm_db is installed: pip show ibm_db
    • Confirm the IBM Data Server Driver package is present and the DB2_HOME or IBM_DB_HOME environment variable points to it.
    • The connection string used by the TDM tool is:
      DATABASE=<dbname>;HOSTNAME=<host>;PORT=50000;PROTOCOL=TCPIP;UID=<user>;PWD=<pass>;

Monitor TDM Session Activity

  • Purpose: Verify that TDM masking sessions are active and progressing normally.
  • Commands:
    SELECT
    APPLICATION_HANDLE,
    AUTH_ID,
    CONNECTION_START_TIME,
    ROWS_READ,
    ROWS_WRITTEN
    FROM SYSIBMADM.MON_CONNECTION_SUMMARY
    WHERE AUTH_ID = 'TDM_USER';
  • Expected Output: ROWS_READ and ROWS_WRITTEN should increase over time as masking progresses.

2.10 System Resource Monitoring

Monitor CPU and Memory Usage

  • Purpose: Detect system-level bottlenecks limiting masking throughput.
  • Commands (Linux):
    top
  • Verification: Check for high CPU usage by db2sysc processes and for memory pressure causing page-outs, which degrade buffer pool effectiveness.

Check Swap Usage

  • Purpose: Ensure the OS is not swapping DB2 memory to disk.
  • Commands:
    swapon -s
    free -h
  • Verification: Minimal swap usage. DB2 performance degrades substantially when buffer pool memory is paged out to swap.

3. Database Permissions

The following table outlines the specific database permissions required for the Enov8 Test Data Management (TDM) Tool to function properly with IBM DB2. Ensure that the TDM service account has the appropriate privileges on both the staging schema and the system catalog views.

PrivilegeObjectDescriptionUsed By
SELECTSYSCAT.COLUMNSRequired for reading column metadata, data types, and lengths for all profiling, masking, and validation operations.All
SELECTSYSCAT.TABLESRequired for listing tables, reading row counts (CARD), and retrieving last-modified timestamps.All
SELECTSYSIBM.SYSTABLESRequired for listing tables in the target schema via the legacy catalog view.All
SELECTSYSCAT.SCHEMATARequired for schema existence validation at connection time.All
SELECTSYSCAT.TABCONSTRequired for reading constraint metadata (primary key, unique, foreign key) to identify unmaskable columns.All, Masking
SELECTSYSCAT.KEYCOLUSERequired for reading constraint-to-column mappings for PK/FK/UK identification.All, Masking
SELECTSYSCAT.INDEXESRequired for reading index metadata used in column analysis and masking operations.All, Masking
SELECTSYSIBMADM.ADMINTABINFORequired for checking REORG_PENDING status before column drops and updates, and for calculating schema size.All, Masking
SELECTSYSIBMADM.AUTHORIZATIONIDSRequired for counting database users during forensic scan operations.Forensic Scan
EXECUTESYSPROC.ENV_GET_INST_INFORequired for retrieving the DB2 version at connection time for feature compatibility checks.All
EXECUTESYSPROC.ADMIN_SET_INTRA_PARALLELRequired for enabling intra-partition parallelism during chunk-based merge masking operations.Masking
EXECUTESYSPROC.ADMIN_CMDRequired for running REORG TABLE automatically before column drops and updates.Masking
SELECT, INSERT, UPDATE, DELETETarget schema tablesRequired for reading source data and writing masked values back to staging tables.Masking
CREATE TABLETarget schemaRequired for creating temporary masking tables (NOT LOGGED INITIALLY) and preview tables (_ENOV8_PREVIEW).Masking
DROP TABLETarget schemaRequired for cleaning up temporary tables created during masking operations.Masking
ALTER TABLETarget schemaRequired for adding and dropping temporary columns, and for enabling/disabling triggers (ALTER TABLE ... ENABLE/DISABLE TRIGGER).Masking
TRUNCATETarget schema tablesRequired for clearing staging tables between masking runs.Masking