Skip to content

datagen - Encoded Binary

datagen is the encoded/production binary that has models already compiled and embedded within it. Use this binary when you have a pre-built executable with models encoded for fast, repeatable data generation without transpilation overhead.

When to use datagen:

  • You have a pre-built binary with embedded models
  • You’re in production or CI/CD pipelines
  • You need fast, repeatable runs without transpilation
  • You want to distribute a single executable

What it does:

  1. Uses models already embedded in the binary
  2. Filters models using tags or config files
  3. Generates data directly (no transpilation step)

Key Difference: No file paths needed - models are already inside the binary!

Generate data from embedded models and output to files or stdout.

Terminal window
datagen gen [flags]
FlagShortDescriptionDefaultExample
--count-nNumber of records to generate (overrides metadata)Uses metadata count-n 1000
--seed-sSeed for deterministic random generationnone-s 12345
--tags-tFilter models by tags (must match ALL key-value pairs)""-t "service=auth,team=platform"
--output-oOutput directory or file path”.”-o ./data
--format-fOutput format: csv, json, xml, stdoutstdout-f csv
Terminal window
# Generate data for all embedded models
datagen gen
# Generate 1000 records for all models
datagen gen -n 1000
# Generate data for models matching specific tags
datagen gen -t "service=auth"
# Generate models matching multiple tag criteria (AND logic)
datagen gen -t "service=auth,environment=prod"
# Generate and save as CSV
datagen gen -n 1000 -f csv -o ./data
# Deterministic output with seed
datagen gen -n 10 -s 12345
# Generate for specific team
datagen gen -t "team=platform" -n 500
  • csv - Comma-separated values with headers
  • json - JSON array of objects
  • xml - XML format with root element
  • stdout - Print to standard output (default)

The --count flag controls how many records to generate:

Uses the count specified in each model’s metadata section:

User.dg (embedded in binary)
model User {
metadata {
count: 500 // Will generate 500 records
}
// ...
}

If no metadata count is specified, defaults to 1 record.

Overrides all model counts with the specified value:

Terminal window
# Generate exactly 1000 records for each model, ignoring metadata
datagen gen -n 1000

Since the binary contains multiple embedded models, use tags to filter which models to generate:

Tags are defined in the model’s metadata (before compilation):

User.dg
model User {
metadata {
tags: {
"service": "user-management",
"team": "platform",
"environment": "prod"
}
}
// ...
}
Terminal window
# Generate only models with specific service
datagen gen -t "service=user-management"
# Generate models matching multiple criteria (AND logic)
datagen gen -t "service=auth,environment=prod"
# Generate models for specific team
datagen gen -t "team=platform"

Important: Models must match ALL provided tag key-value pairs to be selected.

Load data from embedded models directly into database sinks like MySQL.

Terminal window
datagen execute --config <config_file> [flags]
  • --config - Path to configuration JSON file
FlagShortDescriptionExample
--config-cPath to configuration JSON file-c config.json
--output-oOutput directory for logs/artifacts-o ./logs

The execute command requires a JSON configuration file that specifies which embedded models to use:

config.json
{
"models": [
{
"model_name": "User",
"target_sinks": ["mysql_sink"],
"count": 1000
},
{
"model_name": "Order",
"target_sinks": ["mysql_sink"],
"count": 500
}
],
"sinks": [
{
"sink_name": "mysql_sink",
"sink_type": "mysql",
"config": {
"host": "localhost",
"database": "testdb",
"port": "3306",
"user": "root",
"password": "password",
"batch_size": 1000,
"throttle_ms": 10
}
}
]
}
Terminal window
# Load data into database using embedded models
datagen execute -c config.json
# Load data with custom output directory for logs
datagen execute -c config.json -o ./logs
# Production deployment
datagen execute --config prod-config.json
  1. Uses models already embedded in the binary
  2. Reads configuration file to determine which models to use
  3. Generates data according to config
  4. Loads data into specified sinks
  5. No transpilation - fast execution
  • Production data loading
  • Scheduled data generation (cron jobs)
  • CI/CD pipeline integration
  • High-performance scenarios
  • Distributed deployments with consistent models

To create a datagen encoded binary from your models:

Terminal window
# Use datagenc to transpile and build
datagenc gen ./models --noexec -o ./output
# Navigate to output directory and build
cd ./output
go build -o datagen
# Now you have an encoded binary with embedded models
./datagen gen -t "service=auth"
Terminal window
# General help
datagen --help
# Command-specific help
datagen gen --help
datagen execute --help
# Version information
datagen --version