diff --git a/README.md b/README.md index 6d15144c7..915a82e7c 100644 --- a/README.md +++ b/README.md @@ -41,20 +41,9 @@ Explore the following resources to get started with Dagger: * [Usecase](https://odpf.github.io/dagger/docs/usecase/overview) describes examples use cases which can be solved via Dagger. ## Running locally -Make sure you have Java8 and local kafka-2.4+ setup pre-installed on your local machine. -```sh -# Clone the repo -$ git clone https://github.com/odpf/dagger.git - -# Build the jar -$ ./gradlew clean build -# Configure env variables -$ cat dagger-core/env/local.properties +Please follow this [Dagger Quickstart Guide](https://odpf.github.io/dagger/docs/guides/quickstart) for setting up a local running Dagger consuming from Kafka. -# Run a Dagger -$ ./gradlew dagger-core:runFlink -``` **Note:** Sample configuration for running a basic dagger can be found [here](https://odpf.github.io/dagger/docs/guides/create_dagger#common-configurations). For detailed configurations, refer [here](https://odpf.github.io/dagger/docs/reference/configuration). Find more detailed steps on local setup [here](https://odpf.github.io/dagger/docs/guides/create_dagger). diff --git a/dagger-common/build.gradle b/dagger-common/build.gradle index 765a38dfb..27f55307f 100644 --- a/dagger-common/build.gradle +++ b/dagger-common/build.gradle @@ -147,6 +147,12 @@ protobuf { } } generateProtoTasks { + all().each { task -> + task.generateDescriptorSet = true + task.descriptorSetOptions.includeImports = true + task.descriptorSetOptions.includeSourceInfo = false + task.descriptorSetOptions.path = "$projectDir/src/generated-sources/descriptors/dagger-descriptors.bin" + } all()*.plugins { grpc {} } diff --git a/dagger-common/src/test/proto/sample_message.txt b/dagger-common/src/test/proto/sample_message.txt new file mode 100644 index 000000000..6139f433a --- /dev/null +++ b/dagger-common/src/test/proto/sample_message.txt @@ -0,0 +1,6 @@ +is_valid: true +order_number: "INV-53535321" +event_timestamp: { + seconds: 1663565509 + nanos: 0 +} diff --git a/dagger-core/env/local.properties b/dagger-core/env/local.properties index 8d4372bf8..840190dc7 100644 --- a/dagger-core/env/local.properties +++ b/dagger-core/env/local.properties @@ -1,9 +1,9 @@ # == Query == -FLINK_SQL_QUERY=SELECT * from data_stream +FLINK_SQL_QUERY=SELECT event_timestamp, is_valid, order_number from data_stream FLINK_WATERMARK_INTERVAL_MS=10000 FLINK_WATERMARK_DELAY_MS=0 # == Input Stream == -STREAMS=[{"SOURCE_KAFKA_TOPIC_NAMES":"test-topic","INPUT_SCHEMA_TABLE":"data_stream","INPUT_SCHEMA_PROTO_CLASS":"com.tests.TestMessage","INPUT_SCHEMA_EVENT_TIMESTAMP_FIELD_INDEX":"41","SOURCE_KAFKA_CONSUMER_CONFIG_BOOTSTRAP_SERVERS":"localhost:9092","SOURCE_KAFKA_CONSUMER_CONFIG_AUTO_COMMIT_ENABLE":"","SOURCE_KAFKA_CONSUMER_CONFIG_AUTO_OFFSET_RESET":"latest","SOURCE_KAFKA_CONSUMER_CONFIG_GROUP_ID":"dummy-consumer-group","SOURCE_KAFKA_NAME":"local-kafka-stream"}] +STREAMS=[ { "SOURCE_KAFKA_TOPIC_NAMES": "dagger-test-topic-v1", "INPUT_SCHEMA_TABLE": "data_stream", "INPUT_SCHEMA_PROTO_CLASS":"io.odpf.dagger.consumer.TestPrimitiveMessage", "INPUT_SCHEMA_EVENT_TIMESTAMP_FIELD_INDEX": "9", "SOURCE_KAFKA_CONSUMER_CONFIG_BOOTSTRAP_SERVERS": "localhost:9092", "SOURCE_KAFKA_CONSUMER_CONFIG_AUTO_COMMIT_ENABLE": "false", "SOURCE_KAFKA_CONSUMER_CONFIG_AUTO_OFFSET_RESET": "latest", "SOURCE_KAFKA_CONSUMER_CONFIG_GROUP_ID": "dagger-test-topic-cgroup-v1", "SOURCE_KAFKA_NAME": "local-kafka-stream", "SOURCE_DETAILS": [ { "SOURCE_TYPE": "UNBOUNDED", "SOURCE_NAME": "KAFKA_CONSUMER" } ] } ] # == Preprocessor == PROCESSOR_PREPROCESSOR_ENABLE=false @@ -17,7 +17,8 @@ PROCESSOR_POSTPROCESSOR_CONFIG={} SINK_TYPE=log # == Stencil == -SCHEMA_REGISTRY_STENCIL_ENABLE=false +SCHEMA_REGISTRY_STENCIL_ENABLE=true +SCHEMA_REGISTRY_STENCIL_URLS=http://127.0.0.1:8000/dagger-descriptors.bin # == Telemetry == METRIC_TELEMETRY_SHUTDOWN_PERIOD_MS=10000 diff --git a/docs/docs/guides/overview.md b/docs/docs/guides/overview.md index 3910de8f5..7c0fbc878 100644 --- a/docs/docs/guides/overview.md +++ b/docs/docs/guides/overview.md @@ -2,6 +2,10 @@ The following section describes how to manage Dagger throughout its lifecycle. +### [Dagger Quickstart](./quickstart.md) + +Get quickly up and running by setting up a local Dagger instance consuming from Kafka. + ### [Choosing a Dagger Source](./choose_source.md) Dagger requires configuring a source from where data will be streamed for processing. This section explains what are the diff --git a/docs/docs/guides/quickstart.md b/docs/docs/guides/quickstart.md new file mode 100644 index 000000000..5fd1e852b --- /dev/null +++ b/docs/docs/guides/quickstart.md @@ -0,0 +1,67 @@ +# Dagger Quickstart + +## Prerequisites + +1. **Your Java version is Java 8**: Dagger as of now works only with Java 8. Some features might not work with older or later versions. +2. Your **Kafka** version is **3.0.0** or a minor version of it +3. You have **kcat** installed: We will use kcat to push messages to Kafka from the CLI. You can follow the installation steps [here](https://github.com/edenhill/kcat). Ensure the version you install is 1.7.0 or a minor version of it. +4. You have **protobuf** installed: We will use protobuf to push messages encoded in protobuf format to Kafka topic. You can follow the installation steps for MacOS [here](https://formulae.brew.sh/formula/protobuf). For other OS, please download the corresponding release from [here](https://github.com/protocolbuffers/protobuf/releases). Please note, this quickstart has been written to work with[ 3.17.3](https://github.com/protocolbuffers/protobuf/releases/tag/v3.17.3) of protobuf. Compatibility with other versions is unknown. +5. You have **Python 2.7+** and **simple-http-server** installed: We will use Python along with simple-http-server to spin up a mock Stencil server which can serve the proto descriptors to Dagger. To install **simple-http-server**, please follow these [installation steps](https://pypi.org/project/simple-http-server/). + +## Quickstart + +1. Clone Dagger repository into your local + +```shell +git clone https://github.com/odpf/dagger.git +``` +2. Next, we will generate our proto descriptor set. Ensure you are at the top level directory(`dagger`) and then fire this command + +``` +./gradlew clean dagger-common:generateTestProto +``` + +This command will generate a descriptor set containing the proto descriptors of all the proto files present under `dagger-common/src/test/proto`. After running this, you should see a binary file called `dagger-descriptors.bin` under `dagger-common/src/generated-sources/descriptors/`. + +3. Next, we will setup a mock Stencil server to serve this proto descriptor set to Dagger. Open up a new tab in your terminal and `cd` into this directory: `dagger-common/src/generated-sources/descriptors`. Then fire this command: + +```python +python -m SimpleHTTPServer 8000 +``` + +This will spin up a mock HTTP server and serve the descriptor set we just generated in the previous step at port 8000. +The Stencil client being used in Dagger will fetch it by calling this URL. This has been already configured in `local.properties`, as we have set `SCHEMA_REGISTRY_STENCIL_ENABLE` to true and pointed `SCHEMA_REGISTRY_STENCIL_URLS` to `http://127.0.0.1:8000/dagger-descriptors.bin`. + +4. Next, we will generate and send some messages to a sample kafka topic as per some proto schema. Note that, in `local.properties` we have set `INPUT_SCHEMA_PROTO_CLASS` under `STREAMS` to use `io.odpf.dagger.consumer.TestPrimitiveMessage` proto. Hence, we will push messages which conform to this schema into the topic. For doing this, please follow these steps: + 1. `cd` into the directory `dagger-common/src/test/proto`. You should see a text file `sample_message.txt` which contains just one message. We will encode it into a binary in protobuf format. + 2. Fire this command: + ```protobuf + protoc --proto_path=./ --encode=io.odpf.dagger.consumer.TestPrimitiveMessage ./TestLogMessage.proto < ./sample_message.txt > out.bin + ``` + This will generate a binary file called `out.bin`. It contains the binary encoded message of `sample_message.txt`. + + 3. Next, we will push this encoded message to the source Kafka topic as mentioned under `SOURCE_KAFKA_TOPIC_NAMES` inside `STREAMS` inside `local.properties`. Ensure Kafka is running at `localhost:9092` and then, fire this command: + ```shell + kcat -P -b localhost:9092 -D "\n" -T -t dagger-test-topic-v1 out.bin + ``` + You can also fire this command multiple times, if you want multiple messages to be sent into the topic. Just make sure you increment the `event_timestamp` value every time inside `sample_message.txt` and then repeat the above steps. +6. `cd` into the repository root again (`dagger`) and start Dagger by running the following command: +```shell +./gradlew dagger-core:runFlink +``` + +After some initialization logs, you should see the output of the SQL query getting printed. + +## Troubleshooting + +1. **I am pushing messages to the kafka topic but not seeing any output in the logs.** + + This can happen for the following reasons: + + a. Pushed messages are not reaching the right topic: Check for any exceptions or errors when pushing messages to the Kafka topic. Ensure that the topic to which you are pushing messages is the same one for which you have configured Dagger to read from under `STREAMS` -> `SOURCE_KAFKA_TOPIC_NAMES` in `local.properties` + + b. The consumer group is not updated: Dagger might have already processed those messages. If you have made any changes to the setup, make sure you update the `STREAMS` -> `SOURCE_KAFKA_CONSUMER_CONFIG_GROUP_ID` variable in `local.properties` to some new value. + +2. **I see an exception `java.lang.RuntimeException: Unable to retrieve any partitions with KafkaTopicsDescriptor: Topic Regex Pattern`** + + This can happen if the topic configured under `STREAMS` -> `SOURCE_KAFKA_TOPIC_NAMES` in `local.properties` is new and you have not pushed any messages to it yet. Ensure that you have pushed atleast one message to the topic before you start dagger. \ No newline at end of file diff --git a/docs/sidebars.js b/docs/sidebars.js index 2d91858c8..f8d3da9c9 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -7,6 +7,7 @@ module.exports = { label: "Guides", items: [ "guides/overview", + "guides/quickstart", "guides/choose_source", "guides/create_dagger", "guides/query_examples",