Overview
What is datagen?
Section titled “What is datagen?”datagen is an open-source data generation toolkit that allows you to create realistic mock data using a simple, declarative Domain Specific Language (DSL).
datagen collects and generates the data based on model definitions. The models are transpiled to golang and the transpiled code can be compiled and executed to generate the data in various formats including CSV, JSON, XML, or directly into data stores like MySQL. Each model defines the structure and generation logic for your data, alongside optional metadata for configuration and filtering.
Features
Section titled “Features”- Simple DSL: Intuitive, declarative syntax for defining data models
 - High Performance: Transpiles to native Go code for maximum speed
 - Complex Generation Logic: Support for advanced data generation patterns and algorithms
 - Data Relationships: Define and maintain relationships between different data models
 - Ordered Sinking: Intelligent dependency resolution ensures base models are processed before their dependent models
 - Seeding Support: Generate consistent, reproducible data with configurable seed values
 - Multiple Output Formats: CSV, JSON, XML, stdout, and database sinks
 - Built-in Functions: 140+ data generation functions for common use cases
 - Tag-based Filtering: Logical grouping and selective generation
 - Extensible: Support for custom types and functions
 
When does it fit?
Section titled “When does it fit?”datagen works well for:
- Application Testing: Generating realistic test data for any application
 - Database Seeding: Populating development and staging databases
 - Performance Testing: Creating large datasets for load testing
 - API Development: Generating test data for API endpoints
 - Data Pipeline Testing: Creating input data for ETL processes
 - Compliance Testing: Generating data that meets specific regulatory requirements
 - Prototyping: Quickly creating realistic data for demos
 
datagen is designed for reliability and consistency. You can depend on it to generate the same data reliably via the --seed flag, making it ideal for automated testing and CI/CD pipelines.