Gen4s is a powerful data generation tool designed for developers and QA engineers.
- Features
- Installation
- Running
- Building from source
- Testing
- Configuration
- Schema definition and data generators
-
Data Generation: Gen4s allows users to generate up-to-date data and publish it to their systems. This is particularly useful for testing and development purposes.
-
Maintain Test Data: Gen4s enables users to maintain test data in the file system or repository, ensuring that the data is always accessible and up-to-date.
-
Data Sharing: With Gen4s, users can easily share test data with their team, improving collaboration and efficiency.
-
Support for Different Profiles: Gen4s supports different profiles such as dev, local, QA, etc. This allows users to switch between different environments as needed.
-
Generation Scenarios: Gen4s supports running generation scenarios. These can be used to publish data, wait, and then publish another portion of data, simulating event time processing.
-
Load Testing: Gen4s is capable of load testing your system by publishing millions of messages. This can help identify potential performance issues.
-
Semi-Generation of Data: Gen4s supports semi-generation of data, where users can generate a CSV file from their database and use it as part of the data generation schema.
-
Command Line Execution: Gen4s can be executed directly from the command line, providing a simple and efficient way to generate data.
-
Support for Multiple Output Formats: Gen4s supports various output formats including stdout, Kafka, Avro, Protobuf, file system, and HTTP.
-
Schema Definition and Data Generators: Gen4s provides a variety of data generators for different data types and structures, including static values, timestamps, numbers, strings, UUIDs, IP addresses, and more.
To install Gen4s using Homebrew, you first need to tap into the xdev-developer/tap repository.
Once the repository is tapped, you can install Gen4s. Here are the steps:
- Open your terminal.
- Tap into the
xdev-developer/taprepository by running the command:brew tap xdev-developer/tap. - Once the repository is tapped, install Gen4s by running the command:
brew install gen4s.
Download latest release from Releases page, unzip archive and execute ./bin/gen4s
Gen4s
Usage: gen4s [preview|run|scenario] [options]
-c, --config <file> Configuration file. Default ./config.conf
-p, --profile <file> Environment variables profile.
-i, --input-records key=value,key1=value1
Key/Value pairs to override generated variable
Command: preview [options]
Preview data generation.
--pretty pretty print
-s, --samples <number> Samples to generate, default 1
Command: run [options]
Run data generation stream.
-s, --samples <number> Samples to generate, default 1
Command: scenario
Run scenario
Command: runbook
Alias for scenario
--help prints usage info./bin/gen4s run -c ./examples/playground/config.confYou can create env vars profile for each runtime env: dev, staging, prod etc.
Env vars profile file format
dev.profile:
KAFKA_BOOTSTRAP_SERVERS=dev.kafka:9095
ORG_ID=12345./bin/gen4s run -c ./examples/playground/config.conf -s 5 -p ./profiles/dev.profile./bin/gen4s scenario -c ./examples/scenario/scenario.conf -p ./profiles/dev.profile./bin/gen4s run -i test-string=hello,test-int=12345 -c ./examples/playground/config.confBuilding standalone application:
sbt 'universal:packageXzTarball' OR
sbt 'universal:packageBin'Building docker image
sbt 'universal:packageXzTarball'
cd app
docker build -t xdev.developer/gen4s:<version> .Test docker image
docker run xdev.developer/gen4s:<version> bin/gen4s preview --pretty -c examples/playground/config.conf -s 5Benchmarking
sbt clean "project benchmarks;jmh:run -i 3 -wi 3 -f3 -t1"input {
schema = "<path-to>/examples/sample-schema.json"
template = "<path-to>/examples/sample.template"
}
output {
writer: {
type: "std-output"
}
transformers: ["json-prettify"]
}-
schema - path to schema file
-
template - path to template file.
-
decode-new-line-as-template - treat each line in template file as standalone template.
-
csv-records - csv records input file.
-
global-variables - list of global variables. Global variable will be generated once per run.
Using csv-records streaming you can generate templates using info from csv file with combination of random generators, see examples/csv-input.
Console output.
output {
writer: {
type: "std-output"
}
transformers = ["json-prettify"]
validators = ["json", "missing-vars"]
}output {
writer {
type = kafka-output
topic = "logs"
topic = ${?KAFKA_TOPIC}
bootstrap-servers = "localhost:9092"
bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}
batch-size = 1000
headers {
key = value
}
decode-input-as-key-value = true
write-tombstone-record = false
producer-config {
compression-type = none # snappy, gzip, lz4
in-flight-requests = 5
linger-ms = 15
max-batch-size-bytes = 1024
max-request-size-bytes = 512
additional-properties {
"key" = "value"
}
}
}
transformers = ["json-minify"]
validators = ["json", "missing-vars"]
}-
write-tombstone-record - write tombstone record, default false. This tells kafka to delete old records with the same key and keep only the most recent one in a topic partition.
-
decode-input-as-key-value: true/false - decode input template as key/value json.
key will be produced as 'kafka message key' and value as 'kafka message value'.
{ "key": 1, "value": { "id": 1, "timestamp": ${ts}, "event": "Logged in" } }
output {
writer {
type = kafka-avro-output
topic = "logs-avro"
topic = ${?KAFKA_TOPIC}
bootstrap-servers = "localhost:9092"
bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}
batch-size = 1000
headers {
key = value
}
decode-input-as-key-value = true
write-tombstone-record = false
producer-config {
compression-type = gzip
in-flight-requests = 1
linger-ms = 15
max-batch-size-bytes = 1024
max-request-size-bytes = 512
}
avro-config {
schema-registry-url = "http://localhost:8081"
schema-registry-url = ${?SCHEMA_REGISTRY_URL}
key-schema = "/path/to/file/key.avsc"
value-schema = "/path/to/file/value.avsc"
auto-register-schemas = false
registry-client-max-cache-size = 1000
}
}
transformers = []
validators = ["json", "missing-vars"]
}- key-schema - path to key schema, Optional.
- value-schema - path to value schema, Optional.
- auto-register-schemas - register schemas in schema-registry.
- write-tombstone-record - write tombstone record, default false. This tells kafka to delete old records with the same key and keep only the most recent one in a topic partition.
How schema resolver works:
- Read from file.
- When file isn't provided, gen4s lookup schema subject from schema registry (topic_name-key or topic_name-value).
output {
writer {
type = kafka-protobuf-output
topic = "persons-proto"
topic = ${?KAFKA_TOPIC}
bootstrap-servers = "localhost:9092"
bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}
batch-size = 1000
headers {
key = value
}
decode-input-as-key-value = true
write-tombstone-record = false
proto-config {
schema-registry-url = "http://localhost:8081"
schema-registry-url = ${?SCHEMA_REGISTRY_URL}
value-descriptor {
file = "./examples/kafka-protobuf/person-value.desc"
message-type = "Person"
}
auto-register-schemas = true
registry-client-max-cache-size = 1000
}
}
transformers = []
validators = ["json", "missing-vars"]
}- value-descriptor - path to protobuf descriptor and message type.
- auto-register-schemas - register schemas in schema-registry.
- write-tombstone-record - write tombstone record, default false. This tells kafka to delete old records with the same key and keep only the most recent one in a topic partition.
Descriptor file can be created using protoc command:
protoc --include_imports --descriptor_set_out=person-value.desc person-value.protoor using scalapbc
scalapbc --include_imports --descriptor_set_out=person-value.desc person-value.protooutput {
writer {
type = fs-output
dir = "/tmp"
filename-pattern = "my-cool-logs-%s.txt"
}
transformers = ["json-prettify"]
validators = ["json", "missing-vars"]
}output {
writer {
type = http-output
url = "http://example.com"
url = ${?REQUEST_URL}
method = POST
headers {
key = value
}
parallelism = 3
content-type = "application/json"
stop-on-error = true
}
transformers = ["json-minify"]
validators = ["json", "missing-vars"]
}output {
writer {
type = s-3-output
bucket = "test-bucket"
key = "key-%s.json"
region = "us-east-1"
endpoint = "http://localhost:4566"
part-size-mb = 5
}
transformers = ["json-minify"]
validators = ["json", "missing-vars"]
}The available options for configuring an S3 output are:
bucket: This is the name of the S3 bucket where the output data will be written.key: Represents the object key pattern. The%sis a placeholder that will be replaced unique identifier.region: This is the AWS region where the S3 bucket is located.endpoint: This is the URL of the S3 service endpoint. This can be useful for testing with local S3-compatible services like LocalStack.part-size-mb: This is used to specify the part size for multipart uploads to the S3 bucket. The value is in megabytes.
json-minify - transform generated JSON to compact printed JSON - (removes all new lines and spaces).
json-prettify - transform generated JSON to pretty printed JSON.
Using scenario you can run multiple stages, configure delay between stages, number of samples to generate, override any variable.
stages: [
{ name: "Playground", samples: 5, config-file: "./examples/playground/config.conf", delay: 5 seconds},
{ name: "CSV Input", samples: 3, config-file: "./examples/csv-input/config.conf"},
{ name: "Playground with override", samples: 3, config-file: "./examples/playground/config.conf",
overrides {
test-string: "overridden",
test-int: 777
}
}
]Input Records allow you to override template variables at runtime. This is useful when you need to:
- Test specific scenarios with known values
- Override generated data with fixed values
- Share the same configuration across different environments
You can override variables using the -i or --input-records flag:
./bin/gen4s run -i user-id=12345,timestamp=1632150400000 -c ./config.confVariables can be overridden in scenario configurations:
stages: [
{
name: "Test with overrides",
samples: 3,
config-file: "./config.conf",
overrides {
user-id: "12345",
timestamp: "1632150400000"
}
}
]Variable values are resolved in the following order:
- Command line overrides (
-iflag) - Scenario overrides (in scenario config)
- Generated values (from schema definition)
This means command line overrides take precedence over scenario overrides, which take precedence over generated values.
This sampler can be used like template constant (static value).
{ "variable": "id", "type": "static", "value": "id-12332221"}{ "variable": "ts", "type": "timestamp", "unit": "sec"}unit - timestamp unit, possible values: ms, ns, micros, sec. Default value - ms.
shiftDays - shift timestamp to n or -n days. Optional.
shiftHours - shift timestamp to n or -n hours. Optional.
shiftMinutes - shift timestamp to n or -n minutes. Optional.
shiftSeconds - shift timestamp to n or -n seconds. Optional.
shiftMillis - shift timestamp to n or -n milliseconds. Optional.
{ "variable": "my-int", "type": "int", "min": 10, "max": 1000 }{ "variable": "test-double", "type": "double", "min": 10.5, "max": 15.5, "scale": 6 }{ "variable": "test-bool", "type": "boolean"}{ "variable": "test-string", "type": "string", "len": 10}{ "variable": "test-string-pattern", "type": "pattern", "pattern": "hello-???-###"} // hello-abc-123{ "variable": "test-uuid", "type": "uuid" }GUID field generator using Ride.
{ "variable": "test-guid", "type": "guid" }{ "variable": "test-ip", "type": "ip", "ipv6": false }{ "variable": "test-enum", "type": "enum", "oneOf": ["hello", "world"] }{ "variable": "test-var", "type": "env-var", "name": "ORG_ID" }Supported env vars:
List(
"CUSTOMER_ID",
"USER_ID",
"USERNAME",
"ORG_ID",
"EVENT_ID",
"user.name",
"os.name"
)OR any env var with G4S_ prefix, for example G4S_QA_USERNAME
{ "variable": "test-date", "type": "date", "format": "MM/dd/yyyy", "shiftDays": -10 }format - date format.
shiftDays - shift timestamp to n or -n days. Optional.
shiftHours - shift timestamp to n or -n hours. Optional.
shiftMinutes - shift timestamp to n or -n minutes. Optional.
shiftSeconds - shift timestamp to n or -n seconds. Optional.
{ "variable": "test-array", "type": "list", "len": 3, "generator": { "variable": "_", "type": "ip" } }Where len - list size to generate.
generator - element generator.