IBM DB2 Staging Environment
Introduction
Efficient performance of the IBM DB2 Database server is crucial for the optimal functioning of the enov8 Test Data Management (TDM) Tool. This document outlines the standard configurations required for a DB2 staging server and provides a comprehensive set of troubleshooting steps for Database Administrators (DBAs) to ensure that the database aligns with these standards and operates at peak efficiency.
This documentation applies to all IBM DB2 versions currently supported by IBM. The configurations and SQL queries provided are recommendations based on typical staging setups and may not be suitable for all environments. It is essential to evaluate these recommendations in the context of your specific staging environment and requirements. Consult with your organization's database administrators and IBM DB2 documentation for guidance tailored to your specific needs.
1. Standard Database Server Configuration
1.1 Hardware Requirements
The hardware requirements should scale based on the size of the database and the volume of data being masked. Below is a comprehensive guideline for CPU, memory, disk, and network:
CPU
- Cores: Start with 4-6 cores, add 2 cores per 1 TB of data.
- Example:
- For a 1 TB database: 4 cores.
- For a 5 TB database: 12 cores.
Memory (RAM)
- Base Requirement: Start with 16 GB, add 16 GB per 1 TB of data.
- Example:
- For a 1 TB database: 16 GB of RAM.
- For a 5 TB database: 80 GB of RAM.
Disk
- Type: Use SSDs for optimal performance, especially for I/O-intensive masking operations.
- Capacity: Provision at least 2× the total data size to accommodate transaction logs, temporary tables, and reorg work space.
- IOPS: Minimum of 10,000 IOPS for SSDs; 300 IOPS for HDDs.
- Example:
- For a 1 TB database: 2 TB of SSD storage.
Network
- Bandwidth: At least 1 Gbps network interface card (NIC).
- Latency: Low-latency network connections to minimise delays between the enov8 TDM application server and the DB2 instance.
1.2 System-Level Database Optimizations
Optimizing for Bulk Data Masking
-
Use NOT LOGGED INITIALLY for Temporary Tables:
- Purpose: The enov8 TDM tool creates temporary masking tables with
NOT LOGGED INITIALLY, which suppresses transaction logging for the initial data load. This significantly reduces log I/O during bulk masking operations. - Implication: Ensure the staging database's active log space is sufficient to handle the remaining logged operations (updates, merges) after the initial load.
- Purpose: The enov8 TDM tool creates temporary masking tables with
-
Increase Active Log Space:
- Purpose: Masking operations involve large UPDATE and MERGE statements. Insufficient log space causes transactions to roll back mid-job with
SQL0964C (transaction log full). - Implementation:
-- Connect as instance owner or SYSADM
UPDATE DB CFG FOR staging_db USING LOGFILSIZ 65536; -- 64 MB per log file
UPDATE DB CFG FOR staging_db USING LOGPRIMARY 50; -- 50 primary log files
UPDATE DB CFG FOR staging_db USING LOGSECOND 20; -- 20 secondary log files - Verification:
SELECT NAME, VALUE FROM SYSIBMADM.DBCFG
WHERE NAME IN ('logfilsiz', 'logprimary', 'logsecond');
- Purpose: Masking operations involve large UPDATE and MERGE statements. Insufficient log space causes transactions to roll back mid-job with
-
Enable Intra-Partition Parallelism:
- Purpose: The TDM tool explicitly calls
SYSPROC.ADMIN_SET_INTRA_PARALLEL('YES')andSET CURRENT DEGREE = 'ANY'before chunk-based merge operations to parallelise work across CPU cores. The DB2 instance must be configured to allow this. - Implementation:
UPDATE DBM CFG USING INTRA_PARALLEL YES; - Verification:
SELECT NAME, VALUE FROM SYSIBMADM.DBMCFG WHERE NAME = 'intra_parallel';
- Purpose: The TDM tool explicitly calls
-
Increase Sort Heap:
- Purpose: Merge and join operations used during masking require adequate sort memory. Insufficient sort heap causes sort overflow to disk, slowing masking jobs significantly.
- Implementation:
UPDATE DB CFG FOR staging_db USING SORTHEAP 65536; -- 512 MB
UPDATE DB CFG FOR staging_db USING SHEAPTHRES_SHR 512000;
-
Table Reorg Awareness:
- Purpose: The TDM tool calls
ADMIN_CMD('REORG TABLE ...')automatically whenever a column drop or update is performed, because DB2 requires a reorg to reclaim space afterALTER TABLE ... DROP COLUMN. DBAs should be aware that reorg operations will occur during masking and ensure sufficient disk space is available for reorg work files. - Disk allowance: Reserve at least 1.5× the size of each masked table for reorg work space.
- Purpose: The TDM tool calls
1.3 DB2 Internal Configuration
Buffer Pool Sizing
- Purpose: DB2 uses buffer pools to cache data and index pages. Adequate buffer pool memory reduces physical I/O during masking scans and updates.
- Implementation:
-- Increase the default buffer pool size
ALTER BUFFERPOOL IBMDEFAULTBP SIZE 262144; -- 2 GB (assuming 8 KB page size) - Verification:
SELECT BPNAME, NPAGES FROM SYSCAT.BUFFERPOOLS;
Lock List and Lock Escalation
- Purpose: Large UPDATE and MERGE transactions during masking hold many row locks simultaneously. If the lock list fills up, DB2 escalates to table-level locks, causing contention with other sessions.
- Implementation:
UPDATE DB CFG FOR staging_db USING LOCKLIST 131072; -- 128 MB lock list
UPDATE DB CFG FOR staging_db USING MAXLOCKS 80; -- Escalate at 80% of lock list
Auto-Runstats
- Purpose: DB2's automatic statistics collection (
AUTO_RUNSTATS) keeps optimizer statistics current, which is important for efficient masking query plans. Ensure it is enabled on the staging database. - Verification:
SELECT NAME, VALUE FROM SYSIBMADM.DBCFG WHERE NAME = 'auto_runstats'; - Note: The TDM tool reads row counts from the
CARDcolumn inSYSCAT.TABLES, which is populated byRUNSTATS. IfAUTO_RUNSTATSis disabled, runRUNSTATSmanually after any data load before running a masking job.
Connection Concentration
- Purpose: The TDM tool opens a single connection per masking job. For environments with multiple concurrent TDM users, ensure the
MAXAPPLSparameter is set high enough to accommodate all sessions. - Implementation:
UPDATE DB CFG FOR staging_db USING MAXAPPLS 200;
1.4 Network Configuration
-
Default Port: DB2 listens on port 50000 (TCP/IP). Ensure the enov8 TDM application server can reach the DB2 instance on this port.
-
Connection Protocol: The TDM tool connects using
PROTOCOL=TCPIPin the connection string:DATABASE=dbname;HOSTNAME=host;PORT=50000;PROTOCOL=TCPIP;UID=user;PWD=pass; -
DB2 Communication Settings:
- Ensure the DB2 instance is configured for TCP/IP communication:
db2set DB2COMM=TCPIP
db2 UPDATE DBM CFG USING SVCENAME 50000
db2stop && db2start
- Ensure the DB2 instance is configured for TCP/IP communication:
-
Firewall Rules:
- Open port 50000 (or the configured service port) from the TDM application server to the DB2 host.
1.5 Security and Patch Management
-
Patching:
- Apply IBM DB2 fix packs via the standard IBM Fix Central download process.
- Keep the DB2 instance at the latest fix pack for your version to receive security and performance fixes.
-
Authentication:
- The TDM tool authenticates using the
UID/PWDparameters in the connection string. DB2 delegates authentication to the operating system by default (AUTHENTICATION SERVER). - Ensure the TDM service account is a valid OS user on the DB2 host with database connect privileges.
- The TDM tool authenticates using the
-
Security Parameters:
- Restrict the TDM service account to the staging database only. Do not grant
SYSADM,SYSCTRL, orSYSMAINTauthority. - Ensure the service account password does not expire during masking job windows.
- Restrict the TDM service account to the staging database only. Do not grant
2. Troubleshooting Steps for DBAs
When experiencing performance issues with the TDM masking tool on DB2, the following steps can help identify and resolve common problems.
2.1 Verify Hardware and OS Configuration
Check CPU and Memory
- Purpose: Ensure hardware meets specifications.
- Commands:
- Linux:
lscpu
free -h
- Linux:
- Verification: Confirm CPU core count and RAM meet the requirements in section 1.1.
Disk I/O Performance
- Purpose: Ensure disk I/O supports masking and reorg operations.
- Commands:
iostat -x 1 5 - Expected Output:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 10.00 20.00 5120 10240 512.00 0.05 2.00 0.50 1.50 - Verification: The
%utilcolumn should remain below 80%. High utilisation during masking indicates I/O bottlenecks that can slow reorg operations and log writes.
2.2 Validate DB2 Configuration Parameters
Transaction Log Configuration
- Purpose: Confirm that sufficient log space is allocated to avoid
SQL0964Cerrors during masking. - Commands:
SELECT NAME, VALUE
FROM SYSIBMADM.DBCFG
WHERE NAME IN ('logfilsiz', 'logprimary', 'logsecond'); - Expected Output:
NAME VALUE
---------- -----
logfilsiz 65536
logprimary 50
logsecond 20 - Verification: Total active log space =
LOGFILSIZ × (LOGPRIMARY + LOGSECOND) × 4 KB. This should be at least 10–20 GB for databases over 1 TB.
Intra-Partition Parallelism
- Purpose: Confirm the instance allows the parallelism that the TDM tool enables via
SYSPROC.ADMIN_SET_INTRA_PARALLEL. - Commands:
SELECT NAME, VALUE FROM SYSIBMADM.DBMCFG WHERE NAME = 'intra_parallel'; - Expected Output:
NAME VALUE
-------------- -----
intra_parallel YES - Verification: Must be
YESfor chunk-based merge operations to execute in parallel.
Lock List Size
- Purpose: Confirm that the lock list is large enough for bulk masking transactions.
- Commands:
SELECT NAME, VALUE FROM SYSIBMADM.DBCFG
WHERE NAME IN ('locklist', 'maxlocks'); - Expected Output:
NAME VALUE
-------- ------
locklist 131072
maxlocks 80 - Verification: If lock escalation is occurring during masking (check
db2diag.logforZRC=0x80100009), increaseLOCKLISTor reduce the masking batch size.
2.3 Assess Database Performance
Check Active Log Usage
- Purpose: Monitor log utilisation during masking to catch near-full conditions before they cause transaction rollbacks.
- Commands:
SELECT LOG_UTILIZATION_PERCENT, TOTAL_LOG_USED_KB, TOTAL_LOG_AVAILABLE_KB
FROM SYSIBMADM.LOG_UTILIZATION; - Expected Output:
LOG_UTILIZATION_PERCENT TOTAL_LOG_USED_KB TOTAL_LOG_AVAILABLE_KB
----------------------- ----------------- ----------------------
15 204800 1331200 - Verification: If
LOG_UTILIZATION_PERCENTexceeds 80% during masking, increaseLOGPRIMARYorLOGSECONDas shown in section 1.2.
Check Reorg-Pending Status
- Purpose: The TDM tool checks
SYSIBMADM.ADMINTABINFObefore every column drop and update to determine whether a reorg is needed. IfREORG_PENDING = 'Y', the tool automatically runs a reorg. DBAs can monitor this proactively. - Commands:
SELECT TABSCHEMA, TABNAME, REORG_PENDING
FROM SYSIBMADM.ADMINTABINFO
WHERE TABSCHEMA = 'STAGING_SCHEMA'
ORDER BY TABNAME; - Verification: Any table with
REORG_PENDING = 'Y'will trigger an automatic reorg before the next TDM update or column drop. This adds time to masking jobs on large tables.
Monitor Active Connections
- Purpose: Confirm the TDM service account has an active connection and is not blocked.
- Commands:
SELECT APPLICATION_HANDLE, APPLICATION_NAME, AUTH_ID, CONNECTION_START_TIME, CLIENT_HOSTNAME
FROM SYSIBMADM.APPLICATIONS
WHERE AUTH_ID = 'TDM_USER'; - Verification: Confirm one active application handle per TDM masking job.
2.4 Check Session and Query Performance
Identify Long-Running Statements
- Purpose: Find masking queries consuming excessive time or resources.
- Commands:
SELECT
APPLICATION_HANDLE,
ELAPSED_TIME_SEC,
SUBSTR(STMT_TEXT, 1, 200) AS SQL_SAMPLE
FROM SYSIBMADM.MON_CURRENT_SQL
WHERE AUTH_ID = 'TDM_USER'
ORDER BY ELAPSED_TIME_SEC DESC; - Verification: Investigate statements with
ELAPSED_TIME_SECsignificantly higher than expected. UseEXPLAINto review execution plans for problematic queries.
Review Execution Plans
- Purpose: Determine if inefficient plans are causing slow masking performance.
- Commands:
EXPLAIN PLAN FOR
UPDATE "STAGING_SCHEMA"."MY_TABLE"
SET "COL1" = 'masked_value'
WHERE "COL1" IS NOT NULL;
SELECT OPERATOR_TYPE, OBJECT_NAME, TOTAL_COST
FROM SYSTOOLS.EXPLAIN_OPERATOR
ORDER BY OPERATOR_ID; - Verification: Look for
TBSCAN(table scan) operators on large tables. These indicate missing or stale statistics and should be addressed withRUNSTATS.
Check Lock Waits
- Purpose: Identify sessions waiting on locks, which can stall masking jobs.
- Commands:
SELECT
SUBSTR(LOCK_OBJECT_NAME, 1, 30) AS LOCKED_TABLE,
LOCK_MODE,
LOCK_STATUS,
APPLICATION_HANDLE
FROM SYSIBMADM.LOCKS_HELD
WHERE LOCK_STATUS = 'W'; - Verification: Lock waits during masking indicate that other sessions are holding conflicting locks on staging tables. Ensure no other workloads access the staging schema during masking jobs.
2.5 Evaluate Storage and Space Usage
Check Database Size and Schema Storage
- Purpose: The TDM tool uses
SYSCAT.TABLESjoined withSYSIBMADM.ADMINTABINFOto calculate schema size. Confirm the staging schema has sufficient space. - Commands:
SELECT
T.TABSCHEMA,
SUM((A.DATA_OBJECT_P_SIZE + A.INDEX_OBJECT_P_SIZE + A.LONG_OBJECT_P_SIZE +
A.LOB_OBJECT_P_SIZE + A.XML_OBJECT_P_SIZE) * 1024) / 1024 / 1024 AS SIZE_GB
FROM SYSCAT.TABLES T
JOIN SYSIBMADM.ADMINTABINFO A
ON T.TABNAME = A.TABNAME
WHERE T.TABSCHEMA = 'STAGING_SCHEMA'
GROUP BY T.TABSCHEMA; - Verification: Ensure sufficient tablespace free space exists for masking temporary tables and reorg work files (budget at least 2× the schema size).
Check Tablespace Free Space
- Purpose: Confirm tablespaces have headroom for temporary tables created during masking.
- Commands:
SELECT TBSP_NAME, TBSP_TOTAL_PAGES, TBSP_USED_PAGES,
(TBSP_TOTAL_PAGES - TBSP_USED_PAGES) AS FREE_PAGES,
DECIMAL((TBSP_USED_PAGES * 100.0 / TBSP_TOTAL_PAGES), 5, 2) AS USED_PCT
FROM SYSIBMADM.TBSP_UTILIZATION
ORDER BY USED_PCT DESC; - Verification: No tablespace used by the staging schema should exceed 80% utilisation.
2.6 Review Statistics and Indexes
Identify Tables with Stale Statistics
- Purpose: The TDM tool reads row counts from the
CARDcolumn inSYSCAT.TABLES. If statistics are stale, row counts are inaccurate and the optimizer may generate poor masking query plans. - Commands — identify tables with zero or stale cardinality:
SELECT TABNAME, CARD AS ROW_COUNT, STATS_TIME
FROM SYSCAT.TABLES
WHERE TABSCHEMA = 'STAGING_SCHEMA'
AND TYPE = 'T'
AND (CARD = -1 OR STATS_TIME < CURRENT_TIMESTAMP - 7 DAYS)
ORDER BY TABNAME; - Remediation (DBA action — run before masking jobs):
RUNSTATS ON TABLE "STAGING_SCHEMA"."MY_TABLE"
WITH DISTRIBUTION AND DETAILED INDEXES ALL;
Review Index Information
- Purpose: The TDM tool queries
SYSCAT.INDEXESfor index metadata. Confirm that key columns have appropriate indexes to support masking join and merge operations. - Commands:
SELECT INDNAME, TABNAME, UNIQUERULE, COLNAMES
FROM SYSCAT.INDEXES
WHERE TABSCHEMA = 'STAGING_SCHEMA'
ORDER BY TABNAME; - Verification: Primary key indexes (
UNIQUERULE = 'P') and unique indexes (UNIQUERULE = 'U') are critical for merge-based masking performance. Missing indexes on frequently joined columns will result in full table scans.
2.7 Network Diagnostics
Test Connectivity on Port 50000
- Purpose: Confirm the enov8 TDM application server can reach the DB2 instance.
- Commands (from the TDM application server):
# Windows
Test-NetConnection -ComputerName <db2-hostname> -Port 50000
# Linux
nc -zv <db2-hostname> 50000 - Expected Output (Linux):
Connection to <db2-hostname> 50000 port [tcp] succeeded!
Verify DB2 TCP/IP Listener
- Purpose: Confirm the DB2 instance is listening on the expected port.
- Commands (on the DB2 host):
db2 GET DBM CFG | grep -i svcename
netstat -tlnp | grep 50000 - Expected Output:
TCP Service name (SVCENAME) = 50000
tcp 0 0 0.0.0.0:50000 0.0.0.0:* LISTEN <pid>/db2sysc
2.8 Verify Security and Patch Levels
Check DB2 Version
- Purpose: Confirm the DB2 version for compatibility. The TDM tool calls
SYSPROC.ENV_GET_INST_INFO()at runtime to retrieve version information. - Commands:
SELECT * FROM TABLE(SYSPROC.ENV_GET_INST_INFO()) AS T; - Verification: Confirm the instance is running a version of DB2 currently supported by IBM.
Check Applied Fix Packs
- Purpose: Ensure the DB2 instance is on a supported fix pack level.
- Commands (on the DB2 host):
db2level - Expected Output:
DB21085I This instance or install (instance name, where applicable: "db2inst1")
uses "64" bits and DB2 code release "SQL12xxx" with level identifier "xxxxxxxx".
Informational tokens are "DB2 vXX.X.X.X", ...
2.9 Application Interaction
Confirm Driver Installation
- Purpose: The TDM tool uses the
ibm_dbPython library, which requires the IBM Data Server Driver to be installed on the TDM application server. - Verification:
- Confirm
ibm_dbis installed:pip show ibm_db - Confirm the IBM Data Server Driver package is present and the
DB2_HOMEorIBM_DB_HOMEenvironment variable points to it. - The connection string used by the TDM tool is:
DATABASE=<dbname>;HOSTNAME=<host>;PORT=50000;PROTOCOL=TCPIP;UID=<user>;PWD=<pass>;
- Confirm
Monitor TDM Session Activity
- Purpose: Verify that TDM masking sessions are active and progressing normally.
- Commands:
SELECT
APPLICATION_HANDLE,
AUTH_ID,
CONNECTION_START_TIME,
ROWS_READ,
ROWS_WRITTEN
FROM SYSIBMADM.MON_CONNECTION_SUMMARY
WHERE AUTH_ID = 'TDM_USER'; - Expected Output:
ROWS_READandROWS_WRITTENshould increase over time as masking progresses.
2.10 System Resource Monitoring
Monitor CPU and Memory Usage
- Purpose: Detect system-level bottlenecks limiting masking throughput.
- Commands (Linux):
top - Verification: Check for high CPU usage by
db2syscprocesses and for memory pressure causing page-outs, which degrade buffer pool effectiveness.
Check Swap Usage
- Purpose: Ensure the OS is not swapping DB2 memory to disk.
- Commands:
swapon -s
free -h - Verification: Minimal swap usage. DB2 performance degrades substantially when buffer pool memory is paged out to swap.
3. Database Permissions
The following table outlines the specific database permissions required for the Enov8 Test Data Management (TDM) Tool to function properly with IBM DB2. Ensure that the TDM service account has the appropriate privileges on both the staging schema and the system catalog views.
| Privilege | Object | Description | Used By |
|---|---|---|---|
| SELECT | SYSCAT.COLUMNS | Required for reading column metadata, data types, and lengths for all profiling, masking, and validation operations. | All |
| SELECT | SYSCAT.TABLES | Required for listing tables, reading row counts (CARD), and retrieving last-modified timestamps. | All |
| SELECT | SYSIBM.SYSTABLES | Required for listing tables in the target schema via the legacy catalog view. | All |
| SELECT | SYSCAT.SCHEMATA | Required for schema existence validation at connection time. | All |
| SELECT | SYSCAT.TABCONST | Required for reading constraint metadata (primary key, unique, foreign key) to identify unmaskable columns. | All, Masking |
| SELECT | SYSCAT.KEYCOLUSE | Required for reading constraint-to-column mappings for PK/FK/UK identification. | All, Masking |
| SELECT | SYSCAT.INDEXES | Required for reading index metadata used in column analysis and masking operations. | All, Masking |
| SELECT | SYSIBMADM.ADMINTABINFO | Required for checking REORG_PENDING status before column drops and updates, and for calculating schema size. | All, Masking |
| SELECT | SYSIBMADM.AUTHORIZATIONIDS | Required for counting database users during forensic scan operations. | Forensic Scan |
| EXECUTE | SYSPROC.ENV_GET_INST_INFO | Required for retrieving the DB2 version at connection time for feature compatibility checks. | All |
| EXECUTE | SYSPROC.ADMIN_SET_INTRA_PARALLEL | Required for enabling intra-partition parallelism during chunk-based merge masking operations. | Masking |
| EXECUTE | SYSPROC.ADMIN_CMD | Required for running REORG TABLE automatically before column drops and updates. | Masking |
| SELECT, INSERT, UPDATE, DELETE | Target schema tables | Required for reading source data and writing masked values back to staging tables. | Masking |
| CREATE TABLE | Target schema | Required for creating temporary masking tables (NOT LOGGED INITIALLY) and preview tables (_ENOV8_PREVIEW). | Masking |
| DROP TABLE | Target schema | Required for cleaning up temporary tables created during masking operations. | Masking |
| ALTER TABLE | Target schema | Required for adding and dropping temporary columns, and for enabling/disabling triggers (ALTER TABLE ... ENABLE/DISABLE TRIGGER). | Masking |
| TRUNCATE | Target schema tables | Required for clearing staging tables between masking runs. | Masking |