Data Factory
Overview of Data Factory
The Data Factory is a comprehensive data management solution that combines intelligent data mining with advanced synthetic data generation. The platform provides two main workflows:
- Configure Data Rules: Set up intelligent data generation patterns using a 3-step wizard
- Mine & Generate Data: Execute queries to extract real data or generate synthetic datasets
Configure Data Rules
The Configure Data Rules feature uses a sophisticated 3-step wizard to analyze your SQL queries and automatically detect data patterns for synthetic data generation.
Step 1: Define Data SQL
Navigate to Data Factory > Configure Data Rules to start the configuration wizard.
Configuration Form Fields:
Field | Description |
---|---|
Configuration Name | Name for your data generation configuration |
Country | Geographic region for data generation |
DB Source Connection | Database connection to analyze |
Description | Optional description of the configuration |
Data SQL Editor:
- Advanced SQL editor with syntax highlighting
- Support for complex queries with JOINs, subqueries, and aggregations
- Real-time validation and error detection
Available Actions:
- Preview: Test your SQL query and see sample results
- SQL Meta Scan: Analyze the query structure for automatic pattern detection
Always run the SQL Meta Scan after defining your query. This enables the Data Factory to automatically detect column patterns and suggest appropriate data generation rules.
Step 2: Run SQL Meta Scan
The SQL Meta Scan analyzes your custom SQL queries to understand the data structure and patterns.
Scan Options:
- Scan Only: Analyzes the query structure without loading patterns
- Scan & Load Pattern: Analyzes the query and automatically loads suggested generation patterns
This intelligent scanning process:
- Identifies table relationships and joins
- Detects column data types and constraints
- Suggests appropriate Faker patterns for realistic data generation
- Maps foreign key relationships for data consistency
Step 3: Define Pattern Configuration
The final step allows you to configure detailed generation rules for each column in your dataset.
SQL Pattern Details
The system displays a comprehensive analysis of your SQL query structure:
Column | Description |
---|---|
Table Name | Source table identified in the query |
Column Name | Individual column from the database |
Data Type | Database data type (VARCHAR2, NUMBER, DATE, etc.) |
Data Length | Maximum character length for the column |
Original Alias | Column alias used in the SQL query |
Pattern Details Configuration
Configure intelligent data generation rules for each column:
Advanced Pattern Types:
- Faker: Built-in realistic data generation (names, emails, addresses, etc.)
- Lookup: Reference tables for consistent data relationships
- Regex: Custom regular expression patterns
- Custom: User-defined generation logic
Mine & Generate Data
The Mine & Generate Data interface provides access to your configured data sources and generation capabilities.
Data Configurations
Navigate to Data Factory > Mine & Generate Data to access your data configurations.
Data Results Interface
When you click "Mine & Generate" on a configuration, you access the data results interface with advanced filtering and data management capabilities.
Advanced WHERE Clause Builder
The filtering system provides a sophisticated UI for building custom WHERE clauses that are applied to your SQL query in the backend:
Filter Components:
- Column: Select any column from your query results
- Condition: Choose from multiple operators for precise filtering
- Value: Specify the value to match against
Available Condition Operators:
Operator | Name | Description |
---|---|---|
= | Equals | Exact match filtering |
LIKE | Pattern Match | Pattern matching with wildcards |
NOT LIKE | Exclude Pattern | Exclude pattern matches |
> | Greater Than | Numerical and date comparisons |
>= | Greater/Equal | Inclusive range filtering |
<= | Less/Equal | Upper bound filtering |
!= | Not Equal | Exclusion filtering |
Filter Management:
- Add Filter: Build multiple filter conditions
- Apply Filters: Execute the filtered query
- Clear Filters: Remove all active filters
This advanced filtering system dynamically modifies your original SQL query by appending WHERE clauses, allowing you to create complex data subsets without writing additional SQL code.
Data Actions
- Create Booking: Reserve specific data records for team use
- Data Subset: Create focused data subsets
- Generate Data: Create synthetic data based on configured patterns
Data Preview
View live data results with:
- Selectable rows with checkboxes
- Column sorting and filtering
- Pagination for large datasets
- Export capabilities
Create Data Booking
When you select data rows and click "Create Booking", the system opens a booking management interface:
Booking Details:
- Summary: Booking name and description
- Booking Timeline: Start and end dates for data reservation
- Selected Data ID(s): List of specific records being booked
Contention Management:
- Contention Status: Real-time availability checking
- Conflict Resolution: Automatic removal of conflicting data
- Individual Actions: Delete specific items from booking
The system automatically checks for data conflicts and displays "No Contention" for available records or warns about conflicting bookings.
Generate Synthetic Data
Click "Generate Data" to create synthetic datasets based on your pre-configured patterns:
Generation Options:
- Record Count: Specify the number of records to generate
The generated data follows all the patterns and rules you've configured, ensuring realistic and consistent synthetic datasets.