Teradata Staging Environment
Introduction
Efficient performance of the Teradata Database server is crucial for the optimal functioning of the enov8 Test Data Management (TDM) Tool. This document outlines the standard configurations required for a Teradata staging server and provides a comprehensive set of troubleshooting steps for Database Administrators (DBAs) to ensure that the database aligns with these standards and operates at peak efficiency.
This documentation applies to all Teradata Database versions currently supported by Teradata (including Teradata Vantage), specifically for use in staging environments. The configurations and SQL queries provided are recommendations based on typical staging setups and may not be suitable for all environments. It is essential to evaluate these recommendations in the context of your specific staging environment and requirements. Consult with your organization's database administrators and Teradata support for guidance tailored to your specific needs.
1. Standard Database Server Configuration
1.1 Hardware Requirements
Teradata uses a Massively Parallel Processing (MPP) architecture in which work is distributed across Access Module Processors (AMPs). Hardware sizing should account for both the number of AMPs and the total data volume across the system.
CPU
- AMPs per Node: A minimum of 4 AMPs per node is recommended for staging environments.
- Cores per Node: Start with 8 cores per node; add one node per 2 TB of additional data.
- Example:
- For a 2 TB database: 1 node, 8 cores.
- For a 10 TB database: 3 nodes, 24 cores.
Memory (RAM)
- Base Requirement per Node: Start with 32 GB per node, add 16 GB per 1 TB of data per node.
- Example:
- For a 2 TB database on 1 node: 32 GB of RAM.
- For a 10 TB database across 3 nodes: 64 GB per node.
Disk
- Type: Use SSDs for optimal performance, especially for I/O-intensive masking operations.
- Capacity: Provision at least 2× the total data size to accommodate spool space, journals, and temporary tables.
- IOPS: Minimum of 10,000 IOPS per AMP for SSDs; 500 IOPS per AMP for HDDs.
- Example:
- For a 2 TB database: 4 TB of SSD storage across the system.
Network
- Bandwidth: At least 10 Gbps network interface for the BYNET interconnect between nodes.
- Client Network: At least 1 Gbps NIC for application-to-Teradata connectivity.
- Latency: Low-latency network connections to minimize delays between the enov8 TDM application server and the Teradata system.
1.2 System-Level Database Optimizations
Optimizing for Bulk Data Masking
-
Disable FALLBACK on Staging Tables:
- Purpose: FALLBACK stores a duplicate copy of each row on a separate AMP, doubling I/O during masking. Disabling it removes this overhead in staging.
- Implementation:
-- Create new staging tables without FALLBACK
CREATE MULTISET TABLE staging_db.my_table, NO FALLBACK (
col1 INTEGER,
col2 VARCHAR(100)
) PRIMARY INDEX (col1);
-- Alter an existing table to remove FALLBACK
ALTER TABLE staging_db.my_table, NO FALLBACK;
-
Disable Journaling on Staging Tables:
- Purpose: Before/after journaling records row images for recovery. Disabling it significantly reduces I/O during bulk masking operations.
- Implementation:
-- Create table with journaling disabled
CREATE MULTISET TABLE staging_db.my_table,
NO BEFORE JOURNAL,
NO AFTER JOURNAL (
col1 INTEGER,
col2 VARCHAR(100)
) PRIMARY INDEX (col1);
-- Alter existing table to disable journaling
ALTER TABLE staging_db.my_table,
NO BEFORE JOURNAL,
NO AFTER JOURNAL;
-
Increase Spool Space for the TDM User:
- Purpose: Spool space is used for intermediate result sets and temporary tables during masking operations. Insufficient spool causes masking jobs to fail with "No more spool space" errors.
- Implementation:
-- Modify spool space for the TDM service account
MODIFY USER tdm_user AS SPOOL = 100000000000; -- 100 GB in bytes
-
Allocate Sufficient PERM Space for the Staging Database:
- Purpose: PERM (permanent) space determines the maximum storage available to a database. The staging database must have enough PERM space for original data plus temporary masking tables.
- Implementation:
-- Allocate PERM space to the staging database
MODIFY DATABASE staging_db AS PERM = 500000000000; -- 500 GB in bytes
Workload Management (TASM/WLM)
- Assign Masking Sessions to a Dedicated Workload:
- Purpose: Isolate TDM masking sessions from production workloads to prevent resource contention and ensure predictable masking throughput.
- Implementation: In Teradata Viewpoint or TASM, create a workload classification rule that routes sessions from the TDM service account to a dedicated masking workload with an allocated resource partition.
- Example classification filter:
-- Verify active workload assignment for a session
SELECT SessionNo, Username, WDName
FROM DBC.SessionInfoV
WHERE Username = 'TDM_USER';
1.3 Teradata Internal Configuration
Session Character Set
- Parameter:
SESSION CHARACTER SET UNICODE PASS THROUGH ON - Purpose: The enov8 TDM tool sets this at connection time to ensure consistent UTF-8 character handling across all masking and profiling operations, particularly for columns using non-Latin character sets (UNICODE, KANJISJIS, GRAPHIC).
- Verification:
-- Confirm the session character set in use
SELECT CharacterSet FROM DBC.SessionInfoV WHERE SessionNo = (SELECT SESSION); - Note: The TDM tool applies
SET SESSION CHARACTER SET UNICODE PASS THROUGH ONautomatically on every connection. Ensure the Teradata system supports Unicode character sets.
Session Mode
- Recommended:
TD2(Teradata session mode) - Purpose: The TDM tool connects using
logmech=TD2by default (TD2 authentication). ANSI mode changes certain SQL behaviours (e.g., transaction semantics, case sensitivity) that can affect masking operations. - Connection string reference:
teradata://user:pass@host:1025/dbname?driver=Teradata
Statistics Collection (DBA Prerequisite)
- Purpose: Teradata relies heavily on optimizer statistics to generate efficient query plans. The enov8 TDM tool reads row counts directly from
DBC.StatsVrather than runningCOLLECT STATISTICSitself. It is the DBA's responsibility to ensure statistics are current before running masking jobs; outdated or missing statistics cause full-AMP scans and slow masking performance. - Implementation (run manually by the DBA):
-- Collect statistics on primary index and key columns before masking
COLLECT STATISTICS ON staging_db.my_table INDEX (col1);
COLLECT STATISTICS ON staging_db.my_table COLUMN (col2); - Cadence: Collect statistics after any large data load or refresh, before running a full masking job.
AMP Count and Row Distribution
- Purpose: Uneven row distribution across AMPs (AMP skew) causes some AMPs to be overloaded while others are idle, reducing parallelism during masking.
- Check skew:
SELECT
Vproc AS AMP,
COUNT(*) AS RowCount
FROM staging_db.my_table
GROUP BY Vproc
ORDER BY RowCount DESC; - Recommendation: Choose a Primary Index with high cardinality and even distribution to minimise skew.
1.4 Network Configuration
-
Default Port: Teradata listens on port 1025 (COP/JDBC) and port 8002 (REST). Ensure the enov8 TDM application server can reach the Teradata system on port 1025.
-
COP (Connection to a Processor) Entries:
- COP entries in DNS resolve Teradata hostnames to individual node IP addresses for load-balanced connections.
- Verify COP entries are correctly configured so the TDM tool connects to the appropriate nodes:
nslookup cop1-<teradata-hostname>
nslookup cop2-<teradata-hostname>
-
TDGSS (Teradata Generic Security Services):
- Ensure TDGSS is configured to allow TD2 authentication from the enov8 TDM application server IP range.
- Relevant configuration file:
$TDDIR/tdgss/site/TdgssUserConfigFile.xml
-
Connection Protocol:
- Use the
teradatasqlorteradatasqladriver (SQLAlchemy dialect) for all TDM connections. - On Linux, the TDM tool uses the
teradatasqldriver directly; on Windows it uses the ODBC-based Teradata driver via SQLAlchemy.
- Use the
1.5 Security and Patch Management
-
Patching:
- Apply Teradata patches via the Teradata Update Manager (TUMP).
- Maintain Teradata Database at the latest maintenance release for your major version to receive security and performance fixes.
-
Authentication:
logmech=TD2: Standard Teradata username/password authentication. Used by the TDM tool by default.- For environments requiring stronger authentication, LDAP (
logmech=LDAP) can be configured without impacting TDM operations.
-
Security Parameters:
- Ensure the TDM service account has a non-expiring password or a managed rotation process so masking jobs are not interrupted.
- Restrict the service account to only the staging database — do not grant access to production databases.
2. Troubleshooting Steps for DBAs
When experiencing performance issues with the TDM masking tool on Teradata, the following steps can help identify and resolve common problems.
2.1 Verify Hardware and OS Configuration
Check CPU and Memory
- Purpose: Ensure each node meets the hardware specifications.
- Commands:
- Linux (per node):
lscpu
free -h
- Linux (per node):
- Verification: Confirm that CPU core counts and RAM per node align with the requirements in section 1.1.
Disk I/O Performance
- Purpose: Ensure disk I/O across AMPs supports masking operations.
- Commands:
iostat -x 1 5 - Verification: The
%utilcolumn should remain below 80%. Sustained high utilization indicates a disk I/O bottleneck affecting AMP throughput.
Node Health
- Purpose: Verify all nodes in the Teradata system are online and healthy.
- Commands (run as DBC in BTEQ or SQL Assistant):
SELECT NodeID, NodeType, NodeStatus
FROM DBC.ResUsageSAWT
ORDER BY NodeID; - Verification: All nodes should report an active status. Degraded nodes reduce available parallelism for masking jobs.
2.2 Validate Teradata Parameters
Spool Space
- Purpose: Confirm the TDM service account has sufficient spool space allocated.
- Commands:
SELECT Username, SpoolSpace
FROM DBC.UsersV
WHERE Username = 'TDM_USER'; - Expected Output:
Username SpoolSpace
---------- ----------
TDM_USER 100000000000 - Verification:
SpoolSpaceshould be at least 100 GB (100,000,000,000 bytes) for databases over 1 TB.
PERM Space
- Purpose: Confirm the staging database has sufficient permanent space.
- Commands:
SELECT DatabaseName, PermSpace, SpoolSpace
FROM DBC.DatabasesV
WHERE DatabaseName = 'STAGING_DB'; - Expected Output:
DatabaseName PermSpace SpoolSpace
------------- -------------- ----------
STAGING_DB 500000000000 0 - Verification:
PermSpaceshould accommodate the full dataset plus temporary masking tables (typically 2× the raw data size).
Session Character Set
- Purpose: Confirm sessions are connecting with the correct character set.
- Commands:
SELECT SessionNo, Username, CharacterSet
FROM DBC.SessionInfoV
WHERE Username = 'TDM_USER'; - Expected Output:
SessionNo Username CharacterSet
--------- ---------- ------------
12345 TDM_USER UNICODE - Verification:
CharacterSetshould beUNICODEto match the TDM tool'sSET SESSION CHARACTER SET UNICODE PASS THROUGH ONsetting.
Session Mode
- Purpose: Confirm sessions are running in TD2 (Teradata) mode, not ANSI mode.
- Commands:
SELECT SessionNo, Username, LogonSource
FROM DBC.SessionInfoV
WHERE Username = 'TDM_USER'; - Verification: Review
LogonSourceto confirm the connection string includeslogmech=TD2and the driver is operating correctly.
2.3 Assess Database Performance
Enable and Query DBQL (Database Query Logging)
- Purpose: Capture query-level performance data for masking operations.
- Enable DBQL for the TDM user:
BEGIN QUERY LOGGING WITH SQL ON ALL FOR USER TDM_USER; - Query recent executions:
SELECT
QueryId,
StartTime,
AMCTime,
TotalIOCount,
CPUTime,
SpoolUsage,
SUBSTR(SqlTextInfo, 1, 200) AS SqlSample
FROM DBC.QryLog
WHERE UserName = 'TDM_USER'
ORDER BY StartTime DESC; - Verification: Review
SpoolUsagefor queries hitting spool limits andCPUTime/TotalIOCountfor queries with unexpectedly high resource consumption.
Check Current Disk Space Usage
- Purpose: Verify the staging database is not running out of PERM space, which would cause masking failures.
- Commands:
SELECT
DatabaseName,
SUM(CurrentPerm) / 1024 / 1024 / 1024 AS UsedGB,
SUM(MaxPerm) / 1024 / 1024 / 1024 AS AllocatedGB
FROM DBC.DiskSpace
WHERE DatabaseName = 'STAGING_DB'
GROUP BY DatabaseName; - Expected Output:
DatabaseName UsedGB AllocatedGB
------------- ------- -----------
STAGING_DB 210.50 500.00 - Verification:
UsedGBshould not exceed 80% ofAllocatedGB. If it does, either increasePermSpaceor archive/truncate data before masking.
2.4 Check Session and Query Performance
Identify Active Sessions
- Purpose: Find sessions currently running to determine if TDM masking jobs are active and not blocked.
- Commands:
SELECT
SessionNo,
Username,
State,
CurrentQuery
FROM DBC.SessionInfoV
WHERE Username = 'TDM_USER'
ORDER BY SessionNo; - Expected Output:
SessionNo Username State CurrentQuery
--------- ---------- ------ ------------
12345 TDM_USER Active UPDATE "staging_db"."my_table" ... - Verification: Confirm the expected number of TDM sessions are active and not in a blocked or waiting state.
Monitor Long-Running Queries
- Purpose: Identify masking queries that are running longer than expected.
- Commands:
SELECT
QueryId,
Username,
StartTime,
(CAST(CURRENT_TIMESTAMP AS FLOAT) - CAST(StartTime AS FLOAT)) * 86400 AS ElapsedSec,
SUBSTR(SqlTextInfo, 1, 200) AS SqlSample
FROM DBC.QryLog
WHERE UserName = 'TDM_USER'
AND ElapsedSec > 300
ORDER BY ElapsedSec DESC; - Verification: Investigate queries with elapsed times significantly higher than expected. Review their execution plans using
EXPLAINto identify full-AMP scans or missing statistics.
Review Execution Plans
- Purpose: Determine if poor execution plans are causing slow masking performance.
- Commands:
EXPLAIN
UPDATE "staging_db"."my_table"
SET "col2" = 'masked_value'
WHERE "col1" = 1; - Verification: Look for
All-AMPsscans on large tables without a PI filter, which indicate missing statistics or a poor Primary Index choice.
2.5 Evaluate Storage and Space Usage
Check Per-Table Row Counts
- Purpose: Verify that statistics-based row counts match actual data for profiling and masking operations.
- Commands:
SELECT
TableName,
CAST(RowCount AS INTEGER) AS RowCount
FROM DBC.StatsV
WHERE DatabaseName = 'STAGING_DB'
ORDER BY TableName; - Verification: Row counts should reflect the current data volume. If they are significantly out of date, run
COLLECT STATISTICSon affected tables before the next masking job.
Check AMP Skew
- Purpose: Identify tables with uneven row distribution that will slow down parallelism during masking.
- Commands:
SELECT
TableName,
MAX(RowCount) AS MaxAmpRows,
MIN(RowCount) AS MinAmpRows,
AVG(RowCount) AS AvgAmpRows,
(MAX(RowCount) - AVG(RowCount)) / AVG(RowCount) * 100 AS SkewPct
FROM DBC.TableSizeV
WHERE DatabaseName = 'STAGING_DB'
GROUP BY TableName
ORDER BY SkewPct DESC; - Verification: Tables with
SkewPctabove 10% should have their Primary Index reviewed.
2.6 Review Statistics and Indexes
Identify Tables with Missing or Stale Statistics
- Purpose: The TDM tool reads row counts from
DBC.StatsV, so statistics must be current for accurate profiling results. Missing statistics also cause the Teradata optimizer to generate poor execution plans for masking queries, resulting in unnecessary full-AMP scans. - Commands — identify tables with no statistics collected:
SELECT a.TableName
FROM DBC.TablesV a
WHERE a.DatabaseName = 'STAGING_DB'
AND a.TableKind = 'T'
AND a.TableName NOT IN (
SELECT TableName
FROM DBC.TableStatsV
WHERE DatabaseName = 'STAGING_DB'
)
ORDER BY a.TableName; - Remediation (DBA action — collect statistics manually before masking):
COLLECT STATISTICS ON staging_db.my_table INDEX (col1);
COLLECT STATISTICS ON staging_db.my_table COLUMN (col2, col3); - Verification: All tables involved in masking operations should have current statistics. Run statistics collection after any large data refresh and before the next masking job.
Review Index Information
- Purpose: Confirm that Primary and Secondary Indexes are appropriate for the access patterns used by TDM masking queries.
- Commands:
SELECT
b.TableName,
c.ColumnName,
c.IndexType,
c.UniqueFlag
FROM DBC.TablesV a
INNER JOIN DBC.ColumnsV b
ON a.TableName = b.TableName
AND a.DatabaseName = b.DatabaseName
LEFT OUTER JOIN DBC.IndicesV c
ON b.ColumnName = c.ColumnName
AND b.DatabaseName = c.DatabaseName
AND b.TableName = c.TableName
WHERE a.DatabaseName = 'STAGING_DB'
ORDER BY b.TableName, c.IndexType; - Verification: Ensure the Primary Index (IndexType =
P) has high-cardinality columns to minimise AMP skew. Secondary Indexes (USI/NUSI) on frequently filtered columns can improve masking query performance.
2.7 Network Diagnostics
Test Connectivity on Port 1025
- Purpose: Confirm the enov8 TDM application server can reach the Teradata system.
- Commands (from the TDM application server):
# Test TCP connectivity to the Teradata gateway port
Test-NetConnection -ComputerName <teradata-hostname> -Port 1025 # Windows
nc -zv <teradata-hostname> 1025 # Linux - Expected Output (Linux):
Connection to <teradata-hostname> 1025 port [tcp] succeeded!
Verify COP Entries
- Purpose: Confirm DNS COP entries resolve correctly for load-balanced multi-node access.
- Commands:
nslookup cop1-<teradata-hostname>
nslookup cop2-<teradata-hostname> - Verification: Each COP entry should resolve to a different node IP address. Missing COP entries may cause the TDM tool to connect to only a single node, reducing parallelism.
Check Teradata Gateway Status
- Purpose: Confirm the Teradata gateway process is running and accepting connections.
- Commands (on the Teradata system as root or tdatuser):
tpareset -n "check gateway"
pdestate -a - Expected Output:
PDE state is RUN/STARTED.
DBS state is 5: Logons are enabled - Users are logged on - Verification: DBS state 5 (or 4) indicates the system is operational and accepting logins.
2.8 Verify Security and Patch Levels
Check Teradata Database Version
- Purpose: Confirm the Teradata version to ensure compatibility with the TDM tool.
- Commands:
SELECT InfoData AS TeradataVersion
FROM DBC.DBCInfoV
WHERE InfoKey = 'VERSION'; - Expected Output:
TeradataVersion
---------------
17.20.xx.xx - Verification: Ensure the instance is running a version currently supported by Teradata. The TDM tool queries
DBC.DBCInfoVat connection time to determine version-specific feature support.
Check Applied Patches
- Purpose: Confirm the Teradata system is up to date with maintenance releases.
- Commands (on the Teradata system):
tpareset -s - Verification: Cross-reference the build number with the latest Teradata maintenance release for your major version available on the Teradata support portal.
2.9 Application Interaction
Confirm Connection Management
- Purpose: Ensure the TDM tool is connecting with the correct driver and authentication mechanism.
- Verification:
- On Linux: The TDM tool uses the
teradatasqlPython driver (teradatasql://user:pass@host:port/dbname). - On Windows: The TDM tool uses the SQLAlchemy Teradata dialect with the ODBC Teradata driver (
teradata+pyodbc://user:pass@host:port/dbname?driver=Teradata). - Confirm the Teradata ODBC driver (Windows) or
teradatasqlPython package (Linux) is installed and the correct version.
- On Linux: The TDM tool uses the
Monitor TDM Session Activity
- Purpose: Verify the expected number of masking sessions are active and consuming appropriate resources.
- Commands:
SELECT
SessionNo,
Username,
State,
ReqPhysIO,
ReqCPU
FROM DBC.SessionInfoV
WHERE Username = 'TDM_USER'; - Expected Output: A session in
Activestate with progressively increasingReqPhysIOandReqCPUvalues indicates normal masking progress.
2.10 System Resource Monitoring
Monitor CPU and Memory Usage per AMP
- Purpose: Detect AMP-level resource bottlenecks that are constraining masking throughput.
- Commands:
SELECT
NodeID,
CPUTime,
IOWaitTime,
MemoryUsage
FROM DBC.ResUsageSpma
ORDER BY CPUTime DESC; - Verification: High
IOWaitTimeon specific nodes indicates disk I/O bottlenecks. Evenly distributedCPUTimeindicates good AMP parallelism.
Check Swap Usage (Linux)
- Purpose: Ensure minimal swap usage across Teradata nodes.
- Commands:
swapon -s
free -h - Verification: Confirm minimal swap usage. Significant swap activity on Teradata nodes degrades AMP performance substantially.
3. Database Permissions
The following table outlines the specific database permissions required for the Enov8 Test Data Management (TDM) Tool to function properly with Teradata Database. Ensure that the TDM service account has the appropriate grants on both the staging database and the DBC system views.
| Privilege | Object | Description | Used By |
|---|---|---|---|
| SELECT | DBC.TablesV | Required for checking database and table existence, listing tables, and retrieving table metadata. | All |
| SELECT | DBC.ColumnsV | Required for reading column metadata, data types, character sets, and lengths for all profiling, masking, and validation operations. | All |
| SELECT | DBC.TableStatsV | Required for identifying tables with missing statistics and driving statistics collection. | Profile |
| SELECT | DBC.StatsV | Required for retrieving statistics-based row counts per table for profiling and forensic scan operations. | Profile, Forensic Scan |
| SELECT | DBC.IndicesV | Required for reading index metadata (Primary Index, Secondary Indexes) used in column and table analysis. | All, Masking |
| SELECT | DBC.All_RI_ChildrenV | Required for reading referential integrity (foreign key) relationships between tables. | All, Forensic Scan |
| SELECT | DBC.DiskSpace | Required for calculating database size and current permanent space usage. | Forensic Scan |
| SELECT | DBC.UsersV | Required for reading user spool/perm space allocations during forensic scan. | Forensic Scan |
| SELECT | DBC.RoleMembersV | Required for resolving role-based access during forensic scan user enumeration. | Forensic Scan |
| SELECT | DBC.TABLES | Required for legacy table listing queries used in certain metadata operations. | All |
| SELECT | DBC.DBCInfoV | Required for retrieving the Teradata version at connection time for feature compatibility checks. | All |
| SELECT | DBC.TriggersV | Required for discovering and managing triggers on staging tables during masking operations. | Masking |
| SELECT | DBC.SessionInfoV | Required for verifying session character set and workload assignment. | All |
| SELECT, INSERT, UPDATE, DELETE | Target staging database tables | Required for reading source data and writing masked values back to staging tables. | Masking |
| CREATE TABLE | Target staging database | Required for creating temporary masking tables (ENOV8_PREVIEW, row-number staging tables). | Masking |
| DROP TABLE | Target staging database | Required for cleaning up temporary tables created during masking operations. | Masking |
| ALTER TABLE | Target staging database | Required for adding and dropping temporary columns used during in-place masking. | Masking |
| ALTER TRIGGER | Target staging database | Required for enabling and disabling existing triggers on staging tables during masking (ALTER TRIGGER ... ENABLED/DISABLED). | Masking |