Skip to main content

Data Factory

Overview of Data Factory

The Data Factory is a comprehensive data management solution that combines intelligent data mining with advanced synthetic data generation. The platform provides two main workflows:

  • Configure Data Rules: Set up intelligent data generation patterns using a 3-step wizard
  • Mine & Generate Data: Execute queries to extract real data or generate synthetic datasets

Configure Data Rules

The Configure Data Rules feature uses a sophisticated 3-step wizard to analyze your SQL queries and automatically detect data patterns for synthetic data generation.

Data Factory Configuration Screen

Step 1: Define Data SQL

Navigate to Data Factory > Configure Data Rules to start the configuration wizard.

Data Factory Automatic Pattern Detection

Configuration Form Fields:

FieldDescription
Configuration NameName for your data generation configuration
CountryGeographic region for data generation
DB Source ConnectionDatabase connection to analyze
DescriptionOptional description of the configuration

Data SQL Editor:

  • Advanced SQL editor with syntax highlighting
  • Support for complex queries with JOINs, subqueries, and aggregations
  • Real-time validation and error detection

Available Actions:

  • Preview: Test your SQL query and see sample results
  • SQL Meta Scan: Analyze the query structure for automatic pattern detection
Important

Always run the SQL Meta Scan after defining your query. This enables the Data Factory to automatically detect column patterns and suggest appropriate data generation rules.

Step 2: Run SQL Meta Scan

The SQL Meta Scan analyzes your custom SQL queries to understand the data structure and patterns.

Scan Options:

  • Scan Only: Analyzes the query structure without loading patterns
  • Scan & Load Pattern: Analyzes the query and automatically loads suggested generation patterns

This intelligent scanning process:

  • Identifies table relationships and joins
  • Detects column data types and constraints
  • Suggests appropriate Faker patterns for realistic data generation
  • Maps foreign key relationships for data consistency

Step 3: Define Pattern Configuration

The final step allows you to configure detailed generation rules for each column in your dataset.

SQL Pattern Details

The system displays a comprehensive analysis of your SQL query structure:

ColumnDescription
Table NameSource table identified in the query
Column NameIndividual column from the database
Data TypeDatabase data type (VARCHAR2, NUMBER, DATE, etc.)
Data LengthMaximum character length for the column
Original AliasColumn alias used in the SQL query

Pattern Details Configuration

Configure intelligent data generation rules for each column:

Data Factory Faker Module Integration

Advanced Pattern Types:

  • Faker: Built-in realistic data generation (names, emails, addresses, etc.)
  • Lookup: Reference tables for consistent data relationships
  • Regex: Custom regular expression patterns
  • Custom: User-defined generation logic

Mine & Generate Data

The Mine & Generate Data interface provides access to your configured data sources and generation capabilities.

Data Configurations

Navigate to Data Factory > Mine & Generate Data to access your data configurations.

Data Factory Mine and Geneate

Data Results Interface

When you click "Mine & Generate" on a configuration, you access the data results interface with advanced filtering and data management capabilities.

Data Factory Advanced Filtering

Advanced WHERE Clause Builder

The filtering system provides a sophisticated UI for building custom WHERE clauses that are applied to your SQL query in the backend:

Filter Components:

  • Column: Select any column from your query results
  • Condition: Choose from multiple operators for precise filtering
  • Value: Specify the value to match against

Available Condition Operators:

OperatorNameDescription
=EqualsExact match filtering
LIKEPattern MatchPattern matching with wildcards
NOT LIKEExclude PatternExclude pattern matches
>Greater ThanNumerical and date comparisons
>=Greater/EqualInclusive range filtering
<=Less/EqualUpper bound filtering
!=Not EqualExclusion filtering

Filter Management:

  • Add Filter: Build multiple filter conditions
  • Apply Filters: Execute the filtered query
  • Clear Filters: Remove all active filters

This advanced filtering system dynamically modifies your original SQL query by appending WHERE clauses, allowing you to create complex data subsets without writing additional SQL code.

Data Actions

  • Create Booking: Reserve specific data records for team use
  • Data Subset: Create focused data subsets
  • Generate Data: Create synthetic data based on configured patterns

Data Preview

View live data results with:

  • Selectable rows with checkboxes
  • Column sorting and filtering
  • Pagination for large datasets
  • Export capabilities

Create Data Booking

When you select data rows and click "Create Booking", the system opens a booking management interface:

Data Factory Data Booking

Booking Details:

  • Summary: Booking name and description
  • Booking Timeline: Start and end dates for data reservation
  • Selected Data ID(s): List of specific records being booked

Contention Management:

  • Contention Status: Real-time availability checking
  • Conflict Resolution: Automatic removal of conflicting data
  • Individual Actions: Delete specific items from booking

The system automatically checks for data conflicts and displays "No Contention" for available records or warns about conflicting bookings.

Generate Synthetic Data

Click "Generate Data" to create synthetic datasets based on your pre-configured patterns:

Data Factory Generate Data

Generation Options:

  • Record Count: Specify the number of records to generate

The generated data follows all the patterns and rules you've configured, ensuring realistic and consistent synthetic datasets.