Skip to content

Commit b218a88

Browse files
committed
vdk-gdp-execution-id: example added
I sketched a vdk-gdp-execution-id data job example to check locally using quickstart-vdk. This is a stacked PR to import fix: #1961 Added the data job to examples directory. Testing done: verified locally Signed-off-by: ivakoleva <iva.koleva@clearcode.bg>
1 parent 8fdde6f commit b218a88

File tree

5 files changed

+135
-0
lines changed

5 files changed

+135
-0
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
-- SQL scripts are standard SQL scripts. They are executed against Platform OLAP database.
2+
-- Refer to platform documentation for more information.
3+
4+
-- Common uses of SQL steps are:
5+
-- aggregating data from other tables to a new one
6+
-- creating a table or a view that is needed for the python steps
7+
8+
-- Queries in .sql files can be parametrised.
9+
-- A valid query parameter looks like → {parameter}.
10+
-- Parameters will be automatically replaced if there is a corresponding value existing in the IJobInput properties.
11+
12+
CREATE TABLE IF NOT EXISTS hello_world (id NVARCHAR, vdk_gdp_execution_id NVARCHAR);
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Copyright 2021-2023 VMware, Inc.
2+
# SPDX-License-Identifier: Apache-2.0
3+
import logging
4+
5+
from vdk.api.job_input import IJobInput
6+
7+
log = logging.getLogger(__name__)
8+
9+
10+
def run(job_input: IJobInput):
11+
"""
12+
Function named `run` is required in order for a python script to be recognized as a Data Job Python step and executed.
13+
14+
VDK provides to every python step an object - job_input - that has methods for:
15+
16+
* executing queries to OLAP Database;
17+
* ingesting data into a database;
18+
* processing data into a database.
19+
See IJobInput documentation for more details.
20+
"""
21+
log.info(f"Starting job step {__name__}")
22+
23+
# Write your python code inside here ... for example:
24+
job_input.send_object_for_ingestion(
25+
payload=dict(id="Hello World!"), destination_table="hello_world"
26+
)
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# My shiny new job
2+
3+
Versatile Data Kit feature allows you to implement automated pull ingestion and batch data processing.
4+
5+
# Generative Data Packs
6+
A GDP plugin expands the data you ingest automatically.
7+
8+
# Data expansion
9+
The `vdk-gdp-execution-id` plugin used in [requirements.txt](./requirements.txt) and [config.ini](./config.ini)
10+
automatically expands your dataset with the unique Data Job execution id.
11+
The result is, the dataset produced can be correlated to a particular Data Job execution.
12+
13+
# Run the example
14+
To run the data job locally:
15+
```bash
16+
vdk run gdp-execution-id-example
17+
```
18+
19+
To check the result of data expanded and ingested:
20+
```
21+
% vdk sqlite-query -q "select * from hello_world"
22+
Creating new connection against local file database located at: /var/folders/h3/9ns__d4945qcvkdm2m2vjvqh0000gq/T/vdk-sqlite.db
23+
id vdk_gdp_execution_id
24+
------------ -----------------------------------------------
25+
Hello World! a17baca4-4780-4a60-b409-10e8b6fa90de-1682424042
26+
```
27+
Where the `hello_world.id` is being ingested in [20_python_step.py](./20_python_step.py),
28+
and vdk_gdp_execution_id gets added automatically for you.
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
; Supported format: https://docs.python.org/3/library/configparser.html#supported-ini-file-structure
2+
3+
; This is the only file required to deploy a Data Job.
4+
; Read more to understand what each option means:
5+
6+
; Information about the owner of the Data Job
7+
[owner]
8+
9+
; Team is a way to group Data Jobs that belonged to the same team.
10+
team = taurus
11+
12+
; Configuration related to running data jobs
13+
[job]
14+
; For format see https://en.wikipedia.org/wiki/Cron
15+
; The cron expression is evaluated in UTC time.
16+
; If it is time for a new job run and the previous job run hasn’t finished yet,
17+
; the cron job waits until the previous execution has finished.
18+
schedule_cron = */2 * * * *
19+
20+
; Who will be contacted and on what occasion
21+
[contacts]
22+
23+
; Specifies the time interval (in minutes) that a job execution is allowed to be delayed
24+
; from its scheduled time before a notification email is sent. The default is 240.
25+
; notification_delay_period_minutes=240
26+
27+
; Specifies whether to enable or disable the email notifications for each data job run attempt.
28+
; The default value is true.
29+
; enable_attempt_notifications=true
30+
31+
; Specifies whether to enable or disable email notifications per data job execution and execution delays.
32+
; The default value is true.
33+
;enable_execution_notifications=true
34+
35+
; The [contacts] properties below use semicolon-separated list of email addresses that will be notified with email message on a given condition.
36+
; You can also provide email address linked to your Slack account in order to receive Slack messages.
37+
; To generate Slack linked email address follow the steps here:
38+
; https://get.slack.help/hc/en-us/articles/206819278-Send-emails-to-Slack#connect-the-email-app-to-your-workspace
39+
40+
; Semicolon-separated list of email addresses to be notified on job execution failure caused by user code or user configuration why.
41+
; For example: if the job contains an SQL script with syntax error.
42+
; notified_on_job_failure_user_error=example@vmware.com
43+
notified_on_job_failure_user_error=
44+
45+
; Semicolon-separated list of email addresses to be notified on job execution failure caused by a platform why.
46+
; notified_on_job_failure_platform_error=example@example.com; example2@example.com
47+
notified_on_job_failure_platform_error=
48+
49+
; Semicolon-separated list of email addresses to be notified on job execution success.
50+
notified_on_job_success=
51+
52+
; Semicolon-separated list of email addresses to be notified of job deployment outcome.
53+
; Notice that if this file is malformed (file structure is not as per https://docs.python.org/3/library/configparser.html#supported-ini-file-structure),
54+
; then an email notification will NOT be sent to the recipients specified here.
55+
notified_on_job_deploy=
56+
57+
[vdk]
58+
; Key value pairs of any configuration options that can be passed to vdk.
59+
; For possible options in your vdk installation execute command vdk config-help
60+
db_default_type=SQLITE
61+
ingest_method_default=SQLITE
62+
ingest_payload_preprocess_sequence=vdk-gdp-execution-id
63+
; The name of the micro-dimension that is added to each payload sent for ingestion.
64+
;gdp_execution_id_micro_dimension_name=vdk_gdp_execution_id
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Python jobs can specify extra library dependencies in requirements.txt file.
2+
# See https://pip.readthedocs.io/en/stable/user_guide/#requirements-files
3+
# The file is optional and can be deleted if no extra library dependencies are necessary.
4+
5+
vdk-gdp-execution-id

0 commit comments

Comments
 (0)