datagen - Encoded Binary
datagen is the encoded/production binary that has models already compiled and embedded within it. Use this binary when you have a pre-built executable with models encoded for fast, repeatable data generation without transpilation overhead.
Overview
Section titled “Overview”When to use datagen:
- You have a pre-built binary with embedded models
 - You’re in production or CI/CD pipelines
 - You need fast, repeatable runs without transpilation
 - You want to distribute a single executable
 
What it does:
- Uses models already embedded in the binary
 - Filters models using tags or config files
 - Generates data directly (no transpilation step)
 
Key Difference: No file paths needed - models are already inside the binary!
Commands
Section titled “Commands”datagen gen - Generate Data to Files
Section titled “datagen gen - Generate Data to Files”Generate data from embedded models and output to files or stdout.
Syntax
Section titled “Syntax”datagen gen [flags]Command Flags
Section titled “Command Flags”| Flag | Short | Description | Default | Example | 
|---|---|---|---|---|
--count | -n | Number of records to generate (overrides metadata) | Uses metadata count | -n 1000 | 
--seed | -s | Seed for deterministic random generation | none | -s 12345 | 
--tags | -t | Filter models by tags (must match ALL key-value pairs) | "" | -t "service=auth,team=platform" | 
--output | -o | Output directory or file path | ”.” | -o ./data | 
--format | -f | Output format: csv, json, xml, stdout | stdout | -f csv | 
Quick Examples
Section titled “Quick Examples”# Generate data for all embedded modelsdatagen gen
# Generate 1000 records for all modelsdatagen gen -n 1000
# Generate data for models matching specific tagsdatagen gen -t "service=auth"
# Generate models matching multiple tag criteria (AND logic)datagen gen -t "service=auth,environment=prod"
# Generate and save as CSVdatagen gen -n 1000 -f csv -o ./data
# Deterministic output with seeddatagen gen -n 10 -s 12345
# Generate for specific teamdatagen gen -t "team=platform" -n 500Output Formats
Section titled “Output Formats”csv- Comma-separated values with headersjson- JSON array of objectsxml- XML format with root elementstdout- Print to standard output (default)
Count Behavior
Section titled “Count Behavior”The --count flag controls how many records to generate:
Without --count flag
Section titled “Without --count flag”Uses the count specified in each model’s metadata section:
model User {  metadata {    count: 500  // Will generate 500 records  }  // ...}If no metadata count is specified, defaults to 1 record.
With --count flag
Section titled “With --count flag”Overrides all model counts with the specified value:
# Generate exactly 1000 records for each model, ignoring metadatadatagen gen -n 1000Tags Filtering
Section titled “Tags Filtering”Since the binary contains multiple embedded models, use tags to filter which models to generate:
How Tags Work
Section titled “How Tags Work”Tags are defined in the model’s metadata (before compilation):
model User {  metadata {    tags: {      "service": "user-management",      "team": "platform",      "environment": "prod"    }  }  // ...}Filtering by Tags
Section titled “Filtering by Tags”# Generate only models with specific servicedatagen gen -t "service=user-management"
# Generate models matching multiple criteria (AND logic)datagen gen -t "service=auth,environment=prod"
# Generate models for specific teamdatagen gen -t "team=platform"Important: Models must match ALL provided tag key-value pairs to be selected.
datagen execute - Load Data to Data Sinks
Section titled “datagen execute - Load Data to Data Sinks”Load data from embedded models directly into database sinks like MySQL.
Syntax
Section titled “Syntax”datagen execute --config <config_file> [flags]Required Arguments
Section titled “Required Arguments”--config- Path to configuration JSON file
Command Flags
Section titled “Command Flags”| Flag | Short | Description | Example | 
|---|---|---|---|
--config | -c | Path to configuration JSON file | -c config.json | 
--output | -o | Output directory for logs/artifacts | -o ./logs | 
Configuration File
Section titled “Configuration File”The execute command requires a JSON configuration file that specifies which embedded models to use:
{  "models": [    {      "model_name": "User",      "target_sinks": ["mysql_sink"],      "count": 1000    },    {      "model_name": "Order",      "target_sinks": ["mysql_sink"],      "count": 500    }  ],  "sinks": [    {      "sink_name": "mysql_sink",      "sink_type": "mysql",      "config": {        "host": "localhost",        "database": "testdb",        "port": "3306",        "user": "root",        "password": "password",        "batch_size": 1000,        "throttle_ms": 10      }    }  ]}Examples
Section titled “Examples”# Load data into database using embedded modelsdatagen execute -c config.json
# Load data with custom output directory for logsdatagen execute -c config.json -o ./logs
# Production deploymentdatagen execute --config prod-config.jsonProcess Flow
Section titled “Process Flow”- Uses models already embedded in the binary
 - Reads configuration file to determine which models to use
 - Generates data according to config
 - Loads data into specified sinks
 - No transpilation - fast execution
 
Use Cases
Section titled “Use Cases”- Production data loading
 - Scheduled data generation (cron jobs)
 - CI/CD pipeline integration
 - High-performance scenarios
 - Distributed deployments with consistent models
 
Building an Encoded Binary
Section titled “Building an Encoded Binary”To create a datagen encoded binary from your models:
# Use datagenc to transpile and builddatagenc gen ./models --noexec -o ./output
# Navigate to output directory and buildcd ./outputgo build -o datagen
# Now you have an encoded binary with embedded models./datagen gen -t "service=auth"Getting Help
Section titled “Getting Help”# General helpdatagen --help
# Command-specific helpdatagen gen --helpdatagen execute --help
# Version informationdatagen --versionNext Steps
Section titled “Next Steps”- For development workflows, see the datagenc reference
 - For a detailed comparison between binaries, see datagenc vs datagen
 - For model syntax, see Data Model concepts
 - For examples, see the Examples section