Skip to main content

Teradata Staging Environment

Introduction

Efficient performance of the Teradata Database server is crucial for the optimal functioning of the enov8 Test Data Management (TDM) Tool. This document outlines the standard configurations required for a Teradata staging server and provides a comprehensive set of troubleshooting steps for Database Administrators (DBAs) to ensure that the database aligns with these standards and operates at peak efficiency.

Disclaimer

This documentation applies to all Teradata Database versions currently supported by Teradata (including Teradata Vantage), specifically for use in staging environments. The configurations and SQL queries provided are recommendations based on typical staging setups and may not be suitable for all environments. It is essential to evaluate these recommendations in the context of your specific staging environment and requirements. Consult with your organization's database administrators and Teradata support for guidance tailored to your specific needs.


1. Standard Database Server Configuration

1.1 Hardware Requirements

Teradata uses a Massively Parallel Processing (MPP) architecture in which work is distributed across Access Module Processors (AMPs). Hardware sizing should account for both the number of AMPs and the total data volume across the system.

CPU

  • AMPs per Node: A minimum of 4 AMPs per node is recommended for staging environments.
  • Cores per Node: Start with 8 cores per node; add one node per 2 TB of additional data.
  • Example:
    • For a 2 TB database: 1 node, 8 cores.
    • For a 10 TB database: 3 nodes, 24 cores.

Memory (RAM)

  • Base Requirement per Node: Start with 32 GB per node, add 16 GB per 1 TB of data per node.
  • Example:
    • For a 2 TB database on 1 node: 32 GB of RAM.
    • For a 10 TB database across 3 nodes: 64 GB per node.

Disk

  • Type: Use SSDs for optimal performance, especially for I/O-intensive masking operations.
  • Capacity: Provision at least 2× the total data size to accommodate spool space, journals, and temporary tables.
  • IOPS: Minimum of 10,000 IOPS per AMP for SSDs; 500 IOPS per AMP for HDDs.
  • Example:
    • For a 2 TB database: 4 TB of SSD storage across the system.

Network

  • Bandwidth: At least 10 Gbps network interface for the BYNET interconnect between nodes.
  • Client Network: At least 1 Gbps NIC for application-to-Teradata connectivity.
  • Latency: Low-latency network connections to minimize delays between the enov8 TDM application server and the Teradata system.

1.2 System-Level Database Optimizations

Optimizing for Bulk Data Masking

  • Disable FALLBACK on Staging Tables:

    • Purpose: FALLBACK stores a duplicate copy of each row on a separate AMP, doubling I/O during masking. Disabling it removes this overhead in staging.
    • Implementation:
      -- Create new staging tables without FALLBACK
      CREATE MULTISET TABLE staging_db.my_table, NO FALLBACK (
      col1 INTEGER,
      col2 VARCHAR(100)
      ) PRIMARY INDEX (col1);

      -- Alter an existing table to remove FALLBACK
      ALTER TABLE staging_db.my_table, NO FALLBACK;
  • Disable Journaling on Staging Tables:

    • Purpose: Before/after journaling records row images for recovery. Disabling it significantly reduces I/O during bulk masking operations.
    • Implementation:
      -- Create table with journaling disabled
      CREATE MULTISET TABLE staging_db.my_table,
      NO BEFORE JOURNAL,
      NO AFTER JOURNAL (
      col1 INTEGER,
      col2 VARCHAR(100)
      ) PRIMARY INDEX (col1);

      -- Alter existing table to disable journaling
      ALTER TABLE staging_db.my_table,
      NO BEFORE JOURNAL,
      NO AFTER JOURNAL;
  • Increase Spool Space for the TDM User:

    • Purpose: Spool space is used for intermediate result sets and temporary tables during masking operations. Insufficient spool causes masking jobs to fail with "No more spool space" errors.
    • Implementation:
      -- Modify spool space for the TDM service account
      MODIFY USER tdm_user AS SPOOL = 100000000000; -- 100 GB in bytes
  • Allocate Sufficient PERM Space for the Staging Database:

    • Purpose: PERM (permanent) space determines the maximum storage available to a database. The staging database must have enough PERM space for original data plus temporary masking tables.
    • Implementation:
      -- Allocate PERM space to the staging database
      MODIFY DATABASE staging_db AS PERM = 500000000000; -- 500 GB in bytes

Workload Management (TASM/WLM)

  • Assign Masking Sessions to a Dedicated Workload:
    • Purpose: Isolate TDM masking sessions from production workloads to prevent resource contention and ensure predictable masking throughput.
    • Implementation: In Teradata Viewpoint or TASM, create a workload classification rule that routes sessions from the TDM service account to a dedicated masking workload with an allocated resource partition.
    • Example classification filter:
      -- Verify active workload assignment for a session
      SELECT SessionNo, Username, WDName
      FROM DBC.SessionInfoV
      WHERE Username = 'TDM_USER';

1.3 Teradata Internal Configuration

Session Character Set

  • Parameter: SESSION CHARACTER SET UNICODE PASS THROUGH ON
  • Purpose: The enov8 TDM tool sets this at connection time to ensure consistent UTF-8 character handling across all masking and profiling operations, particularly for columns using non-Latin character sets (UNICODE, KANJISJIS, GRAPHIC).
  • Verification:
    -- Confirm the session character set in use
    SELECT CharacterSet FROM DBC.SessionInfoV WHERE SessionNo = (SELECT SESSION);
  • Note: The TDM tool applies SET SESSION CHARACTER SET UNICODE PASS THROUGH ON automatically on every connection. Ensure the Teradata system supports Unicode character sets.

Session Mode

  • Recommended: TD2 (Teradata session mode)
  • Purpose: The TDM tool connects using logmech=TD2 by default (TD2 authentication). ANSI mode changes certain SQL behaviours (e.g., transaction semantics, case sensitivity) that can affect masking operations.
  • Connection string reference: teradata://user:pass@host:1025/dbname?driver=Teradata

Statistics Collection (DBA Prerequisite)

  • Purpose: Teradata relies heavily on optimizer statistics to generate efficient query plans. The enov8 TDM tool reads row counts directly from DBC.StatsV rather than running COLLECT STATISTICS itself. It is the DBA's responsibility to ensure statistics are current before running masking jobs; outdated or missing statistics cause full-AMP scans and slow masking performance.
  • Implementation (run manually by the DBA):
    -- Collect statistics on primary index and key columns before masking
    COLLECT STATISTICS ON staging_db.my_table INDEX (col1);
    COLLECT STATISTICS ON staging_db.my_table COLUMN (col2);
  • Cadence: Collect statistics after any large data load or refresh, before running a full masking job.

AMP Count and Row Distribution

  • Purpose: Uneven row distribution across AMPs (AMP skew) causes some AMPs to be overloaded while others are idle, reducing parallelism during masking.
  • Check skew:
    SELECT
    Vproc AS AMP,
    COUNT(*) AS RowCount
    FROM staging_db.my_table
    GROUP BY Vproc
    ORDER BY RowCount DESC;
  • Recommendation: Choose a Primary Index with high cardinality and even distribution to minimise skew.

1.4 Network Configuration

  • Default Port: Teradata listens on port 1025 (COP/JDBC) and port 8002 (REST). Ensure the enov8 TDM application server can reach the Teradata system on port 1025.

  • COP (Connection to a Processor) Entries:

    • COP entries in DNS resolve Teradata hostnames to individual node IP addresses for load-balanced connections.
    • Verify COP entries are correctly configured so the TDM tool connects to the appropriate nodes:
      nslookup cop1-<teradata-hostname>
      nslookup cop2-<teradata-hostname>
  • TDGSS (Teradata Generic Security Services):

    • Ensure TDGSS is configured to allow TD2 authentication from the enov8 TDM application server IP range.
    • Relevant configuration file: $TDDIR/tdgss/site/TdgssUserConfigFile.xml
  • Connection Protocol:

    • Use the teradatasql or teradatasqla driver (SQLAlchemy dialect) for all TDM connections.
    • On Linux, the TDM tool uses the teradatasql driver directly; on Windows it uses the ODBC-based Teradata driver via SQLAlchemy.

1.5 Security and Patch Management

  • Patching:

    • Apply Teradata patches via the Teradata Update Manager (TUMP).
    • Maintain Teradata Database at the latest maintenance release for your major version to receive security and performance fixes.
  • Authentication:

    • logmech=TD2: Standard Teradata username/password authentication. Used by the TDM tool by default.
    • For environments requiring stronger authentication, LDAP (logmech=LDAP) can be configured without impacting TDM operations.
  • Security Parameters:

    • Ensure the TDM service account has a non-expiring password or a managed rotation process so masking jobs are not interrupted.
    • Restrict the service account to only the staging database — do not grant access to production databases.

2. Troubleshooting Steps for DBAs

When experiencing performance issues with the TDM masking tool on Teradata, the following steps can help identify and resolve common problems.

2.1 Verify Hardware and OS Configuration

Check CPU and Memory

  • Purpose: Ensure each node meets the hardware specifications.
  • Commands:
    • Linux (per node):
      lscpu
      free -h
  • Verification: Confirm that CPU core counts and RAM per node align with the requirements in section 1.1.

Disk I/O Performance

  • Purpose: Ensure disk I/O across AMPs supports masking operations.
  • Commands:
    iostat -x 1 5
  • Verification: The %util column should remain below 80%. Sustained high utilization indicates a disk I/O bottleneck affecting AMP throughput.

Node Health

  • Purpose: Verify all nodes in the Teradata system are online and healthy.
  • Commands (run as DBC in BTEQ or SQL Assistant):
    SELECT NodeID, NodeType, NodeStatus
    FROM DBC.ResUsageSAWT
    ORDER BY NodeID;
  • Verification: All nodes should report an active status. Degraded nodes reduce available parallelism for masking jobs.

2.2 Validate Teradata Parameters

Spool Space

  • Purpose: Confirm the TDM service account has sufficient spool space allocated.
  • Commands:
    SELECT Username, SpoolSpace
    FROM DBC.UsersV
    WHERE Username = 'TDM_USER';
  • Expected Output:
    Username    SpoolSpace
    ---------- ----------
    TDM_USER 100000000000
  • Verification: SpoolSpace should be at least 100 GB (100,000,000,000 bytes) for databases over 1 TB.

PERM Space

  • Purpose: Confirm the staging database has sufficient permanent space.
  • Commands:
    SELECT DatabaseName, PermSpace, SpoolSpace
    FROM DBC.DatabasesV
    WHERE DatabaseName = 'STAGING_DB';
  • Expected Output:
    DatabaseName   PermSpace       SpoolSpace
    ------------- -------------- ----------
    STAGING_DB 500000000000 0
  • Verification: PermSpace should accommodate the full dataset plus temporary masking tables (typically 2× the raw data size).

Session Character Set

  • Purpose: Confirm sessions are connecting with the correct character set.
  • Commands:
    SELECT SessionNo, Username, CharacterSet
    FROM DBC.SessionInfoV
    WHERE Username = 'TDM_USER';
  • Expected Output:
    SessionNo   Username    CharacterSet
    --------- ---------- ------------
    12345 TDM_USER UNICODE
  • Verification: CharacterSet should be UNICODE to match the TDM tool's SET SESSION CHARACTER SET UNICODE PASS THROUGH ON setting.

Session Mode

  • Purpose: Confirm sessions are running in TD2 (Teradata) mode, not ANSI mode.
  • Commands:
    SELECT SessionNo, Username, LogonSource
    FROM DBC.SessionInfoV
    WHERE Username = 'TDM_USER';
  • Verification: Review LogonSource to confirm the connection string includes logmech=TD2 and the driver is operating correctly.

2.3 Assess Database Performance

Enable and Query DBQL (Database Query Logging)

  • Purpose: Capture query-level performance data for masking operations.
  • Enable DBQL for the TDM user:
    BEGIN QUERY LOGGING WITH SQL ON ALL FOR USER TDM_USER;
  • Query recent executions:
    SELECT
    QueryId,
    StartTime,
    AMCTime,
    TotalIOCount,
    CPUTime,
    SpoolUsage,
    SUBSTR(SqlTextInfo, 1, 200) AS SqlSample
    FROM DBC.QryLog
    WHERE UserName = 'TDM_USER'
    ORDER BY StartTime DESC;
  • Verification: Review SpoolUsage for queries hitting spool limits and CPUTime/TotalIOCount for queries with unexpectedly high resource consumption.

Check Current Disk Space Usage

  • Purpose: Verify the staging database is not running out of PERM space, which would cause masking failures.
  • Commands:
    SELECT
    DatabaseName,
    SUM(CurrentPerm) / 1024 / 1024 / 1024 AS UsedGB,
    SUM(MaxPerm) / 1024 / 1024 / 1024 AS AllocatedGB
    FROM DBC.DiskSpace
    WHERE DatabaseName = 'STAGING_DB'
    GROUP BY DatabaseName;
  • Expected Output:
    DatabaseName   UsedGB   AllocatedGB
    ------------- ------- -----------
    STAGING_DB 210.50 500.00
  • Verification: UsedGB should not exceed 80% of AllocatedGB. If it does, either increase PermSpace or archive/truncate data before masking.

2.4 Check Session and Query Performance

Identify Active Sessions

  • Purpose: Find sessions currently running to determine if TDM masking jobs are active and not blocked.
  • Commands:
    SELECT
    SessionNo,
    Username,
    State,
    CurrentQuery
    FROM DBC.SessionInfoV
    WHERE Username = 'TDM_USER'
    ORDER BY SessionNo;
  • Expected Output:
    SessionNo   Username    State   CurrentQuery
    --------- ---------- ------ ------------
    12345 TDM_USER Active UPDATE "staging_db"."my_table" ...
  • Verification: Confirm the expected number of TDM sessions are active and not in a blocked or waiting state.

Monitor Long-Running Queries

  • Purpose: Identify masking queries that are running longer than expected.
  • Commands:
    SELECT
    QueryId,
    Username,
    StartTime,
    (CAST(CURRENT_TIMESTAMP AS FLOAT) - CAST(StartTime AS FLOAT)) * 86400 AS ElapsedSec,
    SUBSTR(SqlTextInfo, 1, 200) AS SqlSample
    FROM DBC.QryLog
    WHERE UserName = 'TDM_USER'
    AND ElapsedSec > 300
    ORDER BY ElapsedSec DESC;
  • Verification: Investigate queries with elapsed times significantly higher than expected. Review their execution plans using EXPLAIN to identify full-AMP scans or missing statistics.

Review Execution Plans

  • Purpose: Determine if poor execution plans are causing slow masking performance.
  • Commands:
    EXPLAIN
    UPDATE "staging_db"."my_table"
    SET "col2" = 'masked_value'
    WHERE "col1" = 1;
  • Verification: Look for All-AMPs scans on large tables without a PI filter, which indicate missing statistics or a poor Primary Index choice.

2.5 Evaluate Storage and Space Usage

Check Per-Table Row Counts

  • Purpose: Verify that statistics-based row counts match actual data for profiling and masking operations.
  • Commands:
    SELECT
    TableName,
    CAST(RowCount AS INTEGER) AS RowCount
    FROM DBC.StatsV
    WHERE DatabaseName = 'STAGING_DB'
    ORDER BY TableName;
  • Verification: Row counts should reflect the current data volume. If they are significantly out of date, run COLLECT STATISTICS on affected tables before the next masking job.

Check AMP Skew

  • Purpose: Identify tables with uneven row distribution that will slow down parallelism during masking.
  • Commands:
    SELECT
    TableName,
    MAX(RowCount) AS MaxAmpRows,
    MIN(RowCount) AS MinAmpRows,
    AVG(RowCount) AS AvgAmpRows,
    (MAX(RowCount) - AVG(RowCount)) / AVG(RowCount) * 100 AS SkewPct
    FROM DBC.TableSizeV
    WHERE DatabaseName = 'STAGING_DB'
    GROUP BY TableName
    ORDER BY SkewPct DESC;
  • Verification: Tables with SkewPct above 10% should have their Primary Index reviewed.

2.6 Review Statistics and Indexes

Identify Tables with Missing or Stale Statistics

  • Purpose: The TDM tool reads row counts from DBC.StatsV, so statistics must be current for accurate profiling results. Missing statistics also cause the Teradata optimizer to generate poor execution plans for masking queries, resulting in unnecessary full-AMP scans.
  • Commands — identify tables with no statistics collected:
    SELECT a.TableName
    FROM DBC.TablesV a
    WHERE a.DatabaseName = 'STAGING_DB'
    AND a.TableKind = 'T'
    AND a.TableName NOT IN (
    SELECT TableName
    FROM DBC.TableStatsV
    WHERE DatabaseName = 'STAGING_DB'
    )
    ORDER BY a.TableName;
  • Remediation (DBA action — collect statistics manually before masking):
    COLLECT STATISTICS ON staging_db.my_table INDEX (col1);
    COLLECT STATISTICS ON staging_db.my_table COLUMN (col2, col3);
  • Verification: All tables involved in masking operations should have current statistics. Run statistics collection after any large data refresh and before the next masking job.

Review Index Information

  • Purpose: Confirm that Primary and Secondary Indexes are appropriate for the access patterns used by TDM masking queries.
  • Commands:
    SELECT
    b.TableName,
    c.ColumnName,
    c.IndexType,
    c.UniqueFlag
    FROM DBC.TablesV a
    INNER JOIN DBC.ColumnsV b
    ON a.TableName = b.TableName
    AND a.DatabaseName = b.DatabaseName
    LEFT OUTER JOIN DBC.IndicesV c
    ON b.ColumnName = c.ColumnName
    AND b.DatabaseName = c.DatabaseName
    AND b.TableName = c.TableName
    WHERE a.DatabaseName = 'STAGING_DB'
    ORDER BY b.TableName, c.IndexType;
  • Verification: Ensure the Primary Index (IndexType = P) has high-cardinality columns to minimise AMP skew. Secondary Indexes (USI/NUSI) on frequently filtered columns can improve masking query performance.

2.7 Network Diagnostics

Test Connectivity on Port 1025

  • Purpose: Confirm the enov8 TDM application server can reach the Teradata system.
  • Commands (from the TDM application server):
    # Test TCP connectivity to the Teradata gateway port
    Test-NetConnection -ComputerName <teradata-hostname> -Port 1025 # Windows
    nc -zv <teradata-hostname> 1025 # Linux
  • Expected Output (Linux):
    Connection to <teradata-hostname> 1025 port [tcp] succeeded!

Verify COP Entries

  • Purpose: Confirm DNS COP entries resolve correctly for load-balanced multi-node access.
  • Commands:
    nslookup cop1-<teradata-hostname>
    nslookup cop2-<teradata-hostname>
  • Verification: Each COP entry should resolve to a different node IP address. Missing COP entries may cause the TDM tool to connect to only a single node, reducing parallelism.

Check Teradata Gateway Status

  • Purpose: Confirm the Teradata gateway process is running and accepting connections.
  • Commands (on the Teradata system as root or tdatuser):
    tpareset -n "check gateway"
    pdestate -a
  • Expected Output:
    PDE state is RUN/STARTED.
    DBS state is 5: Logons are enabled - Users are logged on
  • Verification: DBS state 5 (or 4) indicates the system is operational and accepting logins.

2.8 Verify Security and Patch Levels

Check Teradata Database Version

  • Purpose: Confirm the Teradata version to ensure compatibility with the TDM tool.
  • Commands:
    SELECT InfoData AS TeradataVersion
    FROM DBC.DBCInfoV
    WHERE InfoKey = 'VERSION';
  • Expected Output:
    TeradataVersion
    ---------------
    17.20.xx.xx
  • Verification: Ensure the instance is running a version currently supported by Teradata. The TDM tool queries DBC.DBCInfoV at connection time to determine version-specific feature support.

Check Applied Patches

  • Purpose: Confirm the Teradata system is up to date with maintenance releases.
  • Commands (on the Teradata system):
    tpareset -s
  • Verification: Cross-reference the build number with the latest Teradata maintenance release for your major version available on the Teradata support portal.

2.9 Application Interaction

Confirm Connection Management

  • Purpose: Ensure the TDM tool is connecting with the correct driver and authentication mechanism.
  • Verification:
    • On Linux: The TDM tool uses the teradatasql Python driver (teradatasql://user:pass@host:port/dbname).
    • On Windows: The TDM tool uses the SQLAlchemy Teradata dialect with the ODBC Teradata driver (teradata+pyodbc://user:pass@host:port/dbname?driver=Teradata).
    • Confirm the Teradata ODBC driver (Windows) or teradatasql Python package (Linux) is installed and the correct version.

Monitor TDM Session Activity

  • Purpose: Verify the expected number of masking sessions are active and consuming appropriate resources.
  • Commands:
    SELECT
    SessionNo,
    Username,
    State,
    ReqPhysIO,
    ReqCPU
    FROM DBC.SessionInfoV
    WHERE Username = 'TDM_USER';
  • Expected Output: A session in Active state with progressively increasing ReqPhysIO and ReqCPU values indicates normal masking progress.

2.10 System Resource Monitoring

Monitor CPU and Memory Usage per AMP

  • Purpose: Detect AMP-level resource bottlenecks that are constraining masking throughput.
  • Commands:
    SELECT
    NodeID,
    CPUTime,
    IOWaitTime,
    MemoryUsage
    FROM DBC.ResUsageSpma
    ORDER BY CPUTime DESC;
  • Verification: High IOWaitTime on specific nodes indicates disk I/O bottlenecks. Evenly distributed CPUTime indicates good AMP parallelism.

Check Swap Usage (Linux)

  • Purpose: Ensure minimal swap usage across Teradata nodes.
  • Commands:
    swapon -s
    free -h
  • Verification: Confirm minimal swap usage. Significant swap activity on Teradata nodes degrades AMP performance substantially.

3. Database Permissions

The following table outlines the specific database permissions required for the Enov8 Test Data Management (TDM) Tool to function properly with Teradata Database. Ensure that the TDM service account has the appropriate grants on both the staging database and the DBC system views.

PrivilegeObjectDescriptionUsed By
SELECTDBC.TablesVRequired for checking database and table existence, listing tables, and retrieving table metadata.All
SELECTDBC.ColumnsVRequired for reading column metadata, data types, character sets, and lengths for all profiling, masking, and validation operations.All
SELECTDBC.TableStatsVRequired for identifying tables with missing statistics and driving statistics collection.Profile
SELECTDBC.StatsVRequired for retrieving statistics-based row counts per table for profiling and forensic scan operations.Profile, Forensic Scan
SELECTDBC.IndicesVRequired for reading index metadata (Primary Index, Secondary Indexes) used in column and table analysis.All, Masking
SELECTDBC.All_RI_ChildrenVRequired for reading referential integrity (foreign key) relationships between tables.All, Forensic Scan
SELECTDBC.DiskSpaceRequired for calculating database size and current permanent space usage.Forensic Scan
SELECTDBC.UsersVRequired for reading user spool/perm space allocations during forensic scan.Forensic Scan
SELECTDBC.RoleMembersVRequired for resolving role-based access during forensic scan user enumeration.Forensic Scan
SELECTDBC.TABLESRequired for legacy table listing queries used in certain metadata operations.All
SELECTDBC.DBCInfoVRequired for retrieving the Teradata version at connection time for feature compatibility checks.All
SELECTDBC.TriggersVRequired for discovering and managing triggers on staging tables during masking operations.Masking
SELECTDBC.SessionInfoVRequired for verifying session character set and workload assignment.All
SELECT, INSERT, UPDATE, DELETETarget staging database tablesRequired for reading source data and writing masked values back to staging tables.Masking
CREATE TABLETarget staging databaseRequired for creating temporary masking tables (ENOV8_PREVIEW, row-number staging tables).Masking
DROP TABLETarget staging databaseRequired for cleaning up temporary tables created during masking operations.Masking
ALTER TABLETarget staging databaseRequired for adding and dropping temporary columns used during in-place masking.Masking
ALTER TRIGGERTarget staging databaseRequired for enabling and disabling existing triggers on staging tables during masking (ALTER TRIGGER ... ENABLED/DISABLED).Masking