Skip to content

xdev-developer/gen4s

Repository files navigation

Gen4s - data generator tool for developers and QA engineers.

Scala Steward badge Coverage Status Build status

Gen4s is a powerful data generation tool designed for developers and QA engineers.

Table of Contents

Features:

  • Data Generation: Gen4s allows users to generate up-to-date data and publish it to their systems. This is particularly useful for testing and development purposes.

  • Maintain Test Data: Gen4s enables users to maintain test data in the file system or repository, ensuring that the data is always accessible and up-to-date.

  • Data Sharing: With Gen4s, users can easily share test data with their team, improving collaboration and efficiency.

  • Support for Different Profiles: Gen4s supports different profiles such as dev, local, QA, etc. This allows users to switch between different environments as needed.

  • Generation Scenarios: Gen4s supports running generation scenarios. These can be used to publish data, wait, and then publish another portion of data, simulating event time processing.

  • Load Testing: Gen4s is capable of load testing your system by publishing millions of messages. This can help identify potential performance issues.

  • Semi-Generation of Data: Gen4s supports semi-generation of data, where users can generate a CSV file from their database and use it as part of the data generation schema.

  • Command Line Execution: Gen4s can be executed directly from the command line, providing a simple and efficient way to generate data.

  • Support for Multiple Output Formats: Gen4s supports various output formats including stdout, Kafka, Avro, Protobuf, file system, and HTTP.

  • Schema Definition and Data Generators: Gen4s provides a variety of data generators for different data types and structures, including static values, timestamps, numbers, strings, UUIDs, IP addresses, and more.

Installation

Using Homebrew

To install Gen4s using Homebrew, you first need to tap into the xdev-developer/tap repository.

Once the repository is tapped, you can install Gen4s. Here are the steps:

  1. Open your terminal.
  2. Tap into the xdev-developer/tap repository by running the command: brew tap xdev-developer/tap.
  3. Once the repository is tapped, install Gen4s by running the command: brew install gen4s.

Manual

Download latest release from Releases page, unzip archive and execute ./bin/gen4s

Running

Gen4s
Usage: gen4s [preview|run|scenario] [options]

  -c, --config <file>      Configuration file. Default ./config.conf
  -p, --profile <file>     Environment variables profile.
  -i, --input-records key=value,key1=value1
                           Key/Value pairs to override generated variable

Command: preview [options]
Preview data generation.
  --pretty                 pretty print
  -s, --samples <number>   Samples to generate, default 1

Command: run [options]
Run data generation stream.
  -s, --samples <number>   Samples to generate, default 1

Command: scenario
Run scenario

Command: runbook
Alias for scenario
  --help                   prints usage info
./bin/gen4s run -c ./examples/playground/config.conf

Running with profile

You can create env vars profile for each runtime env: dev, staging, prod etc.

Env vars profile file format

dev.profile:

KAFKA_BOOTSTRAP_SERVERS=dev.kafka:9095
ORG_ID=12345
./bin/gen4s run -c ./examples/playground/config.conf -s 5 -p ./profiles/dev.profile

Running scenario

./bin/gen4s scenario -c ./examples/scenario/scenario.conf -p ./profiles/dev.profile

Running with value override

./bin/gen4s run -i test-string=hello,test-int=12345 -c ./examples/playground/config.conf

Building from source

Building standalone application:

sbt 'universal:packageXzTarball' OR
sbt 'universal:packageBin'

Building docker image

sbt 'universal:packageXzTarball'
cd app
docker build -t xdev.developer/gen4s:<version> .

Test docker image

docker run xdev.developer/gen4s:<version> bin/gen4s preview --pretty -c examples/playground/config.conf -s 5

Testing

Benchmarking

sbt clean "project benchmarks;jmh:run -i 3 -wi 3 -f3 -t1"

Configuration

input {
    schema = "<path-to>/examples/sample-schema.json"
    template = "<path-to>/examples/sample.template"
}


output {
    writer: {
      type: "std-output"
    }

    transformers: ["json-prettify"]
}

Input

  • schema - path to schema file

  • template - path to template file.

  • decode-new-line-as-template - treat each line in template file as standalone template.

  • csv-records - csv records input file.

  • global-variables - list of global variables. Global variable will be generated once per run.

CSV Records streaming

Using csv-records streaming you can generate templates using info from csv file with combination of random generators, see examples/csv-input.

Output

Stdout output

Console output.

output {
    writer: {
      type: "std-output"
    }

    transformers = ["json-prettify"] 
    validators = ["json", "missing-vars"]
}

Kafka output

output {
    writer {
        type = kafka-output

        topic = "logs"
        topic = ${?KAFKA_TOPIC}

        bootstrap-servers = "localhost:9092"
        bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}

        batch-size = 1000
                
        headers {
            key = value
        }

        decode-input-as-key-value = true
        write-tombstone-record = false
        
        producer-config {
          compression-type = none # snappy, gzip, lz4
          in-flight-requests =  5
          linger-ms = 15
          max-batch-size-bytes = 1024
          max-request-size-bytes = 512

          additional-properties {
            "key" = "value"
          }
        }
    }
    transformers = ["json-minify"] 
    validators = ["json", "missing-vars"]
}
  • write-tombstone-record - write tombstone record, default false. This tells kafka to delete old records with the same key and keep only the most recent one in a topic partition.

  • decode-input-as-key-value: true/false - decode input template as key/value json.

    key will be produced as 'kafka message key' and value as 'kafka message value'.

    {
      "key": 1,
      "value": { "id": 1, "timestamp": ${ts}, "event": "Logged in" }
    }

Kafka AVRO output

output {
    writer {
        type = kafka-avro-output

        topic = "logs-avro"
        topic = ${?KAFKA_TOPIC}

        bootstrap-servers = "localhost:9092"
        bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}

        batch-size = 1000
                
        headers {
            key = value
        }

        decode-input-as-key-value = true
        write-tombstone-record = false
        
        producer-config {
          compression-type = gzip
          in-flight-requests =  1
          linger-ms = 15
          max-batch-size-bytes = 1024
          max-request-size-bytes = 512
        }

        avro-config {
          schema-registry-url = "http://localhost:8081"
          schema-registry-url = ${?SCHEMA_REGISTRY_URL}

          key-schema = "/path/to/file/key.avsc"
          value-schema = "/path/to/file/value.avsc"
          auto-register-schemas = false
          registry-client-max-cache-size = 1000
        }
    }
    transformers = []
    validators = ["json", "missing-vars"]
}
  • key-schema - path to key schema, Optional.
  • value-schema - path to value schema, Optional.
  • auto-register-schemas - register schemas in schema-registry.
  • write-tombstone-record - write tombstone record, default false. This tells kafka to delete old records with the same key and keep only the most recent one in a topic partition.

How schema resolver works:

  • Read from file.
  • When file isn't provided, gen4s lookup schema subject from schema registry (topic_name-key or topic_name-value).

Kafka Protobuf output

output {
    writer {
        type = kafka-protobuf-output

        topic = "persons-proto"
        topic = ${?KAFKA_TOPIC}

        bootstrap-servers = "localhost:9092"
        bootstrap-servers = ${?KAFKA_BOOTSTRAP_SERVERS}

        batch-size = 1000

        headers {
            key = value
        }

        decode-input-as-key-value = true
        write-tombstone-record = false

        proto-config {
          schema-registry-url = "http://localhost:8081"
          schema-registry-url = ${?SCHEMA_REGISTRY_URL}
          
          value-descriptor {
            file = "./examples/kafka-protobuf/person-value.desc"
            message-type = "Person"
          }
          auto-register-schemas = true
          registry-client-max-cache-size = 1000
        }
    }

    transformers = []
    validators = ["json", "missing-vars"]
}
  • value-descriptor - path to protobuf descriptor and message type.
  • auto-register-schemas - register schemas in schema-registry.
  • write-tombstone-record - write tombstone record, default false. This tells kafka to delete old records with the same key and keep only the most recent one in a topic partition.

Create protobuf descriptor from proto file

Descriptor file can be created using protoc command:

protoc --include_imports --descriptor_set_out=person-value.desc person-value.proto

or using scalapbc

scalapbc --include_imports --descriptor_set_out=person-value.desc person-value.proto

File System output

output {
    writer {
        type = fs-output
        dir = "/tmp"
        filename-pattern = "my-cool-logs-%s.txt"
    }
    transformers = ["json-prettify"]
    validators = ["json", "missing-vars"]
}

Http output

output {
  writer {
    type = http-output
    url = "http://example.com"
    url = ${?REQUEST_URL}

    method = POST
    headers {
        key = value
    }
    parallelism = 3
    content-type = "application/json"
    stop-on-error = true
  }
  transformers = ["json-minify"]
  validators = ["json", "missing-vars"]
}

AWS S3 Output

output {
  writer {
    type = s-3-output
    bucket = "test-bucket"
    key = "key-%s.json"
    region = "us-east-1"
    endpoint = "http://localhost:4566"
    part-size-mb = 5
  }
  transformers = ["json-minify"]
  validators = ["json", "missing-vars"]
}

The available options for configuring an S3 output are:

  • bucket: This is the name of the S3 bucket where the output data will be written.
  • key: Represents the object key pattern. The %s is a placeholder that will be replaced unique identifier.
  • region: This is the AWS region where the S3 bucket is located.
  • endpoint: This is the URL of the S3 service endpoint. This can be useful for testing with local S3-compatible services like LocalStack.
  • part-size-mb: This is used to specify the part size for multipart uploads to the S3 bucket. The value is in megabytes.

Transformers

json-minify - transform generated JSON to compact printed JSON - (removes all new lines and spaces).

json-prettify - transform generated JSON to pretty printed JSON.

Scenario configuration

Using scenario you can run multiple stages, configure delay between stages, number of samples to generate, override any variable.

stages: [
    { name: "Playground", samples: 5, config-file: "./examples/playground/config.conf", delay: 5 seconds},
    { name: "CSV Input",  samples: 3, config-file: "./examples/csv-input/config.conf"},
    { name: "Playground with override",  samples: 3, config-file: "./examples/playground/config.conf",
      overrides {
        test-string: "overridden",
        test-int: 777
      }
   }
]

Input Records & Variable Overrides

Input Records allow you to override template variables at runtime. This is useful when you need to:

  • Test specific scenarios with known values
  • Override generated data with fixed values
  • Share the same configuration across different environments

Command Line Override

You can override variables using the -i or --input-records flag:

./bin/gen4s run -i user-id=12345,timestamp=1632150400000 -c ./config.conf

In Scenarios

Variables can be overridden in scenario configurations:

stages: [
    {
        name: "Test with overrides",
        samples: 3,
        config-file: "./config.conf",
        overrides {
            user-id: "12345",
            timestamp: "1632150400000"
        }
    }
]

Precedence Order

Variable values are resolved in the following order:

  1. Command line overrides (-i flag)
  2. Scenario overrides (in scenario config)
  3. Generated values (from schema definition)

This means command line overrides take precedence over scenario overrides, which take precedence over generated values.

Schema definition and data generators

Static value generator

This sampler can be used like template constant (static value).

{ "variable": "id", "type": "static", "value": "id-12332221"}

Timestamp generator

{ "variable": "ts", "type": "timestamp", "unit": "sec"}

unit - timestamp unit, possible values: ms, ns, micros, sec. Default value - ms.

shiftDays - shift timestamp to n or -n days. Optional.

shiftHours - shift timestamp to n or -n hours. Optional.

shiftMinutes - shift timestamp to n or -n minutes. Optional.

shiftSeconds - shift timestamp to n or -n seconds. Optional.

shiftMillis - shift timestamp to n or -n milliseconds. Optional.

Int number generator.

{ "variable": "my-int", "type": "int", "min": 10, "max": 1000 }

Double number generator.

{ "variable": "test-double", "type": "double", "min": 10.5, "max": 15.5, "scale": 6 }

Boolean generator.

{ "variable": "test-bool", "type": "boolean"}

String generator.

{ "variable": "test-string", "type": "string", "len": 10}

String pattern generator.

{ "variable": "test-string-pattern", "type": "pattern", "pattern": "hello-???-###"} // hello-abc-123

Java UUID field generator.

{ "variable": "test-uuid", "type": "uuid" }

GUID field generator using Ride.

{ "variable": "test-guid", "type": "guid" }

Ip address generator

{ "variable": "test-ip", "type": "ip", "ipv6": false }

Enumeration generator.

{ "variable": "test-enum", "type": "enum", "oneOf": ["hello", "world"] }

Env var generator.

{ "variable": "test-var", "type": "env-var", "name": "ORG_ID" }

Supported env vars:

    List(
      "CUSTOMER_ID",
      "USER_ID",
      "USERNAME",
      "ORG_ID",
      "EVENT_ID",
      "user.name",
      "os.name"
    )

OR any env var with G4S_ prefix, for example G4S_QA_USERNAME

DateTime generator

{ "variable": "test-date", "type": "date", "format": "MM/dd/yyyy", "shiftDays": -10 }

format - date format.

shiftDays - shift timestamp to n or -n days. Optional.

shiftHours - shift timestamp to n or -n hours. Optional.

shiftMinutes - shift timestamp to n or -n minutes. Optional.

shiftSeconds - shift timestamp to n or -n seconds. Optional.

List generator.

{ "variable": "test-array", "type": "list", "len": 3, "generator": { "variable": "_", "type": "ip" } }

Where len - list size to generate.

generator - element generator.

About

Data generator tool for developers and QA engineers.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors