Profiling
Overview of Profiling
The first step of the process is to Profile your Data Source. Profiling, scans the data source and determines, with a degree of certainty, the type of data in the column. During the process a Data Masking configuration file and a Data Validation (Audit) configuration file is generated as per the TDM solution.
Profile Configurations
The rulesets and patterns scanned as part of the profiling process are defined in Profile Configurations.
For TDM version 3.5:
Navigate to Data Management Hub > Manage Configurations > Profile Configurations. Here you can Add, Edit, Clone or Delete a Profile Configuration.
For TDM versions 3.4 or earlier:
Navigate to Profiling > Profiling Configurations here you can Add, Edit, Clone or Delete a Profile Configuration.
Adding a Profile Configuration
Click the Add Profile Configuration button. This will present a new Profile Configuration form for adding the following information.
Note: You can add a default sample by clicking the "Load Sample" button.
For TDM version 3.5:
For TDM versions 3.4 or earlier:
On the Profile Configuration form, you can edit the following configuration information.
Type | Description |
---|---|
Configuration Name | The profile configuration name. |
Default Tolerance | The default quantity of matches for it to be classified as a pattern. |
Depth | The number of maximum records to search. (Note: The higher this is set, the slower the profile will be). |
Description | The profile configuration description or long name. |
Country | The country or language to which the respective configuration may apply. (This field can be null.) |
Include/Exclude Table | The tables to exclude from profiling or alternatively only include for profiling. |
Notification | List of users who will be notified. |
Status | Set the profile to Active for use or Draft/Inactive to hide it from the Profile Execution area. |
Deep Scan | An advanced feature that ignores null values when identifying PII data. This is advantageous in an empty table that may contain PII. |
Thread Count | The number of concurrent threads used to scan (ranging from 1 to 64) and process table data simultaneously to optimise performance. |
Add/Edit Pattern(s) section:
Type | Description |
---|---|
Category | The pattern category. (This filters the results visible in the Pattern drop list). |
Identifier | Pattern Name used in the profile config. |
Tolerance | Overrides the default tolerance with a specific tolerance. |
PII Level | Sets the pattern as Primary PII, Secondary PII, or Other. |
Pattern Type | The pattern type as defined in the Data Library Administration area. |
Pattern | Pattern which has to be used. |
Once complete, scroll to the bottom of the main window and click Save to Save the Profile or Close to Cancel out without saving.
Editing a Profile Configuration
Click the Edit Profile Connection button on the Profile Configuration you want to edit. This will present the Profile Configuration form for modification.
Once complete, scroll to the bottom of the main window and click Save to Save the configuration or Close to Cancel out without saving.
Cloning a Profile Configuration
Click the Clone Profile Connection button on the Profile Configuration you want to duplicate. This will present the Profile Configuration form with a duplication of all data for saving as a new Profile Configuration.
Once complete, scroll to the bottom of the main window and click Save to Save the configuration or Close to Cancel out without saving.
Deleting a Profile Configuration
Click the Delete Profile Connection button on the Profile Configuration you want to remove. This will present a confirmation window confirming that you want to delete the Profile Configuration or not.
Comparing Profile Configurations
Tick on the check boxes in front of the two configurations which you want to compare and then press the Config Comparison button to view the comparison report. The text highlighted in red marks the differences in the two configurations.
Execute Profiling - Single Data Source
For TDM version 3.5+:
To execute a new profile scan, navigate to Data Management Hub > Execution Console.
Here, you can select your data source then press View. Scroll down to the Execution Details section, select Profile in the first dropdown menu, and your chosen validation/mask configuration in the second dropdown menu. The Execute button can be used to then run a profile scan or View Results to simply view the results from the previously ran profile jobs.
For TDM versions 3.4 or earlier:
To Execute a new profiling request, Navigate to Profiling > Execute Profiling
Here you can select a data source connection and Profile Configuration for execution. Press Run button to run the selected profile configuration. On Run button press, the list down will be refreshed with past executions and will also show the status of current execution request.
Execute Profiling - Connection Groups
For TDM version 3.6:
In order to run a profile scan on a connection group, firstly ensure TDM Queue Manager is turned on. Then, navigate to Data Management Hub > Execution Console > Connection Group.
Here, select your Connection Group from the first dropdown menu, then select Profile for your scan type, and then the default profile configuration to be used for all or most data sources in the connection group. After selecting all the fields, click on View.
You can further select separate profile configurations from the Data Configuration column for specific Data Sources.
In order to run a bulk profile scan, select all the data sources that you want to execute the scan on and click on Execute.
Then, scroll to the bottom of the page to the Execution Details section to look at the scan status, and access the logs and reports.
Profiling Log
In the Details column, the Log button can be used to view logs for the execution. execution.
Note: The Log window refreshes every 15 seconds if the profile is currently executing.
A currently executing profile scan can also be cancelled by clicking the Cancel button.
Profiling Report - Single Data Source
Successfully completed profiles will generate a Profile Report which can be viewed by clicking the Report button.
This can also be searched on or exported out to Excel or CSV.
Profiling Report Table
Type | Description |
---|---|
Connection Name | The name of the data connection. |
Schema | The schema under which the table resides. |
Table Name | The name of the table containing the data. |
Column | The specific column within the table. |
Data Type | The type of data stored in the column (e.g., string, integer, date). |
Data Length | The maximum length of the data in the column. |
Suggested Content Type | Recommended type of content for the column (e.g., email, address). |
Row Count | The number of rows present in the table. |
Meta | Metadata information related to the column name. |
Content | Actual content stored in the column. |
Matching Data | Data that matches the pattern |
Random Data | Sample random data generated for testing purposes. |
Error Details | Details of any errors encountered on this column |
Profiling Report - Connection Groups
For grouped connections, each data source in the group will generate a Profile Report if successful, which can be viewed by clicking the Report button.
In order export the reports in bulk, navigate to the Execution Console section in the Connection Group tab, and select the data sources to include in the bulk report.
Click on the Generate Report button, then download the report by clicking on the popup window.
Important: Databases and files have different report structures, therefore their reports cannot be generated together in bulk. When selecting the data sources to include in the bulk report, make sure that files and database connections are not selected together.