Skip to content
This repository was archived by the owner on Feb 16, 2024. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3de54a4
Implement demo command
sbernauer Aug 9, 2022
60b4f43
Add demo trino-taxi-data
sbernauer Aug 9, 2022
09144be
Change to Github links instead of file
sbernauer Aug 9, 2022
6644eb6
Improve demo description
sbernauer Aug 9, 2022
b39f6ab
Increase backoffLimit for Jobs
sbernauer Aug 10, 2022
8ca053c
docs round 1 (quickstart missing)
sbernauer Aug 10, 2022
7f1c466
Fix missing text in docs
sbernauer Aug 10, 2022
da75ab4
docs
sbernauer Aug 10, 2022
6cbecbc
docs: Improve installation command
sbernauer Aug 10, 2022
732b4b4
Fix typo in docs
sbernauer Aug 10, 2022
54e72a2
typo
sbernauer Aug 10, 2022
3547d95
Writing docs all day long
sbernauer Aug 10, 2022
8dfb6e4
typo
sbernauer Aug 10, 2022
882b357
Update docs/modules/ROOT/pages/demos/trino-taxi-data.adoc
sbernauer Aug 11, 2022
99c57cf
Update docs/modules/ROOT/pages/demos/trino-taxi-data.adoc
sbernauer Aug 11, 2022
aed2771
Update docs/modules/ROOT/pages/demos/trino-taxi-data.adoc
sbernauer Aug 11, 2022
59338f4
Update docs/modules/ROOT/pages/demos/trino-taxi-data.adoc
sbernauer Aug 11, 2022
d676f0d
Move demos documentation into separate attribute
sbernauer Aug 11, 2022
6ad5510
Merge branch 'demos' of github.com:stackabletech/stackablectl into demos
sbernauer Aug 11, 2022
63c2819
doc feeback
sbernauer Aug 11, 2022
5d2637c
Update description of demo
sbernauer Aug 11, 2022
b974ecf
fix nav link
sbernauer Aug 11, 2022
48b0922
preferred -> recommended
sbernauer Aug 11, 2022
15f8240
Add missing Documentation to demo descibe docs
sbernauer Aug 11, 2022
14a5117
Switch links to main branch
sbernauer Aug 11, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## [Unreleased]

### Added

- Support demos, which are an end-to-end demonstrations of the usage of the Stackable Data Platform ([#66](https://github.com/stackabletech/stackablectl/pull/66))

## [0.3.0] - 2022-08-09

### Added
Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions demos/demos-v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
demos:
trino-taxi-data:
description: Demo loading 2.5 years of New York taxi data into S3 bucket, creating a Trino table and a Superset dashboard
documentation: https://docs.stackable.tech/stackablectl/stable/demos/trino-taxi-data.html
stackableStack: trino-superset-s3
labels:
- trino
- superset
- minio
- s3
- ny-taxi-data
manifests:
- plainYaml: https://github.com/stackabletech/stackablectl/main/demos/trino-taxi-data/load-test-data.yaml
- plainYaml: https://github.com/stackabletech/stackablectl/main/demos/trino-taxi-data/create-table-in-trino.yaml
- plainYaml: https://github.com/stackabletech/stackablectl/main/demos/trino-taxi-data/setup-superset.yaml
105 changes: 105 additions & 0 deletions demos/trino-taxi-data/create-table-in-trino.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
apiVersion: batch/v1
kind: Job
metadata:
name: create-ny-taxi-data-table-in-trino
spec:
template:
spec:
containers:
- name: create-ny-taxi-data-table-in-trino
image: python:3.10-slim
command: ["bash", "-c", "pip install trino==0.314.0 && python /tmp/script/script.py"]
volumeMounts:
- name: script
mountPath: /tmp/script
restartPolicy: OnFailure
volumes:
- name: script
configMap:
name: create-ny-taxi-data-table-in-trino-script
restartPolicy: Never
backoffLimit: 50 # It can take some time until Trino is ready
---
apiVersion: v1
kind: ConfigMap
metadata:
name: create-ny-taxi-data-table-in-trino-script
data:
script.py: |
import sys
import trino

if not sys.warnoptions:
import warnings
warnings.simplefilter("ignore")

def get_connection():
connection = trino.dbapi.connect(
host="trino-coordinator",
port=8443,
user="demo",
http_scheme='https',
auth=trino.auth.BasicAuthentication("demo", "demo"),
)
connection._http_session.verify = False
return connection

def run_query(connection, query):
print(f"[DEBUG] Executing query {query}")
cursor = connection.cursor()
cursor.execute(query)
return cursor.fetchall()

connection = get_connection()

assert run_query(connection, "CREATE SCHEMA IF NOT EXISTS hive.demo WITH (location = 's3a://demo/')")[0][0] is True
assert run_query(connection, """
CREATE TABLE IF NOT EXISTS hive.demo.ny_taxi_data_raw (
VendorID BIGINT,
tpep_pickup_datetime TIMESTAMP,
tpep_dropoff_datetime TIMESTAMP,
passenger_count DOUBLE,
trip_distance DOUBLE,
payment_type BIGINT,
Fare_amount DOUBLE,
Tip_amount DOUBLE,
Total_amount DOUBLE
) WITH (
external_location = 's3a://demo/ny-taxi-data/raw/',
format = 'parquet'
)
""")[0][0] is True

loaded_rows = run_query(connection, "SELECT COUNT(*) FROM hive.demo.ny_taxi_data_raw")[0][0]
print(f"Loaded {loaded_rows} rows")
assert loaded_rows > 0

print("Analyzing table ny_taxi_data_raw")
analyze_rows = run_query(connection, """ANALYZE hive.demo.ny_taxi_data_raw""")[0][0]
assert analyze_rows == loaded_rows
stats = run_query(connection, """show stats for hive.demo.ny_taxi_data_raw""")
print("Produced the following stats:")
print(*stats, sep="\n")

assert run_query(connection, """
create or replace view hive.demo.ny_taxi_data as
select
vendorid,
tpep_pickup_datetime,
tpep_dropoff_datetime,
date_diff('minute', tpep_pickup_datetime, tpep_dropoff_datetime) as duration_min,
passenger_count,
trip_distance,
case payment_type when 1 then 'Credit card' when 2 then 'Cash' when 3 then 'No charge' when 4 then 'Dispute' when 6 then 'Voided trino' else 'Unknown' end as payment_type,
fare_amount,
tip_amount,
total_amount
from hive.demo.ny_taxi_data_raw
where tpep_pickup_datetime >= from_iso8601_timestamp('2019-12-01T00:00:00')
and tpep_pickup_datetime <= from_iso8601_timestamp('2022-05-31T00:00:00')
""")[0][0] is True

rows_in_view = run_query(connection, "SELECT COUNT(*) FROM hive.demo.ny_taxi_data")[0][0]
print(f"{rows_in_view} rows in view")
assert rows_in_view > 0
13 changes: 13 additions & 0 deletions demos/trino-taxi-data/load-test-data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
apiVersion: batch/v1
kind: Job
metadata:
name: load-ny-taxi-data
spec:
template:
spec:
containers:
- name: load-ny-taxi-data
image: "bitnami/minio:2022-debian-10"
command: ["bash", "-c", "cd /tmp && for month in 2020-01 2020-02 2020-03 2020-04 2020-05 2020-06 2020-07 2020-08 2020-09 2020-10 2020-11 2020-12 2021-01 2021-02 2021-03 2021-04 2021-05 2021-06 2021-07 2021-08 2021-09 2021-10 2021-11 2021-12 2022-01 2022-02 2022-03 2022-04; do curl -O https://repo.stackable.tech/repository/misc/ny-taxi-data/yellow_tripdata_$month.parquet && mc --insecure alias set minio http://minio-trino:9000/ demo demodemo && mc cp yellow_tripdata_$month.parquet minio/demo/ny-taxi-data/raw/; done"]
restartPolicy: OnFailure
85 changes: 85 additions & 0 deletions demos/trino-taxi-data/setup-superset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
apiVersion: batch/v1
kind: Job
metadata:
name: setup-superset
spec:
template:
spec:
containers:
- name: setup-superset
image: python:3.10-slim
command: ["bash", "-c", " apt update && apt install -y curl && curl -o superset-assets.zip https://github.com/stackabletech/stackablectl/main/demos/trino-taxi-data/superset-assets.zip && pip install requests==2.22.0 && python /tmp/script/script.py"]
volumeMounts:
- name: script
mountPath: /tmp/script
restartPolicy: OnFailure
volumes:
- name: script
configMap:
name: setup-superset-script
restartPolicy: Never
backoffLimit: 50 # It can take some time until Superset is ready
---
apiVersion: v1
kind: ConfigMap
metadata:
name: setup-superset-script
data:
script.py: |
import logging
import requests

base_url = "http://superset-external:8088"
# base_url = "http://172.18.0.4:31024"
username = "admin"
password = "admin"

logging.basicConfig(level=logging.INFO)
logging.info("Starting setup of Superset")

logging.info("Getting access token from /api/v1/security/login")
session = requests.session()
access_token = session.post(f"{base_url}/api/v1/security/login", json={"username": username, "password": password, "provider": "db", "refresh": True}).json()['access_token']
# print(f"access_token: {access_token}")

logging.info("Getting csrf token from /api/v1/security/csrf_token")
csrf_token = session.get(f"{base_url}/api/v1/security/csrf_token", headers={"Authorization": f"Bearer {access_token}"}).json()["result"]
# print(f"csrf_token: {csrf_token}")

headers = {
"accept": "application/json",
"Authorization": f"Bearer {access_token}",
"X-CSRFToken": csrf_token,
}

# To retrieve all of the assets (datasources, datasets, charts and dashboards) run the following commands
# logging.info("Exporting all assets")
# result = session.get(f"{base_url}/api/v1/assets/export", headers=headers)
# assert result.status_code == 200
# with open("superset-assets.zip", "wb") as f:
# f.write(result.content)


#########################
# IMPORTANT
#########################
# The exported zip file had to be modified, otherwise we get:
# <Response [422]>
# {"errors": [{"message": "Error importing assets", "error_type": "GENERIC_COMMAND_ERROR", "level": "warning", "extra": {"databases/Trino.yaml": {"extra": {"disable_data_preview": ["Unknown field."]}}, "issue_codes": [{"code": 1010, "message": "Issue 1010 - Superset encountered an error while running a command."}]}}]}
#
# The file databases/Trino.yaml was modified and the attribute "extra.disable_data_preview" was removed
#########################
logging.info("Importing all assets")
files = {
"bundle": ("superset-assets.zip", open("superset-assets.zip", "rb")),
}
data = {
"passwords": '{"databases/Trino.yaml": "demo"}'
}
result = session.post(f"{base_url}/api/v1/assets/import", headers=headers, files=files, data=data)
print(result)
print(result.text)
assert result.status_code == 200

logging.info("Finished setup of Superset")
Binary file added demos/trino-taxi-data/superset-assets.zip
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,7 @@
** xref:commands/release.adoc[]
** xref:commands/services.adoc[]
** xref:commands/stack.adoc[]
* Demos
** xref:demos/trino-taxi-data.adoc[]
* xref:customization.adoc[]
* xref:troubleshooting.adoc[]
129 changes: 128 additions & 1 deletion docs/modules/ROOT/pages/commands/demo.adoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,130 @@
= Demo

Not implemented yet
A demo is an end-to-end demonstration of the usage of the Stackable data platform.
It is tied to a specific stack of the Stackable data platform, which will provide the required products for the demo.

== Browse available demos
To list the available demos, run the following command:

[source,console]
----
$ stackablectl demo list
DEMO STACKABLE STACK DESCRIPTION
trino-taxi-data trino-superset-s3 Demo loading 2.5 years of New York taxi data into S3 bucket, creating a Trino table and a Superset dashboard
----

Detailed information of a demo can be queried with the `describe` command:

[source,console]
----
$ stackablectl demo describe trino-taxi-data
Demo: trino-taxi-data
Description: Demo loading 2.5 years of New York taxi data into S3 bucket, creating a Trino table and a Superset dashboard
Documentation: https://docs.stackable.tech/stackablectl/stable/demos/trino-taxi-data.html
Stackable stack: trino-superset-s3
Labels: trino, superset, minio, s3, ny-taxi-data
----

Future versions of `stackablectl` will also allow to search for demos based on the labels.

== Install demo
=== Using existing Kubernetes cluster
If you want to access a Kubernetes cluster, make sure your https://kubernetes.io/docs/tasks/tools/#kubectl[`kubectl`] Kubernetes client is configured to interact with the Kubernetes cluster.
After that run the following command

[source,console]
----
$ stackablectl demo install trino-taxi-data
[INFO ] Installing demo trino-taxi-data
[INFO ] Installing stack trino-superset-s3
[INFO ] Installing release 22.06
[INFO ] Installing airflow operator in version 0.4.0
[INFO ] Installing commons operator in version 0.2.0
[INFO ] Installing druid operator in version 0.6.0
[INFO ] Installing hbase operator in version 0.3.0
[INFO ] Installing hdfs operator in version 0.4.0
[INFO ] Installing hive operator in version 0.6.0
[INFO ] Installing kafka operator in version 0.6.0
[INFO ] Installing nifi operator in version 0.6.0
[INFO ] Installing opa operator in version 0.9.0
[INFO ] Installing secret operator in version 0.5.0
[INFO ] Installing spark-k8s operator in version 0.3.0
[INFO ] Installing superset operator in version 0.5.0
[INFO ] Installing trino operator in version 0.4.0
[INFO ] Installing zookeeper operator in version 0.10.0
[INFO ] Installing components of stack trino-superset-s3
[INFO ] Installed stack trino-superset-s3
[INFO ] Installing components of demo trino-taxi-data
[INFO ] Installed demo trino-taxi-data. Use "stackablectl services list" to list the installed services
----

=== Using local kind cluster
If you don't have a Kubernetes cluster available, `stackablectl` can spin up a https://kind.sigs.k8s.io/[kind] Kubernetes cluster for you.
Make sure you have `kind` installed and run the following command:

[source,console]
----
$ stackablectl demo install trino-taxi-data --kind-cluster
[INFO ] Creating kind cluster stackable-data-platform
Creating cluster "stackable-data-platform" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦 📦 📦 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
✓ Joining worker nodes 🚜
Set kubectl context to "kind-stackable-data-platform"
You can now use your cluster with:

kubectl cluster-info --context kind-stackable-data-platform

Have a nice day! 👋
[INFO ] Installing demo trino-taxi-data
[INFO ] Installing stack trino-superset-s3
[INFO ] Installing release 22.06
[INFO ] Installing airflow operator in version 0.4.0
[INFO ] Installing commons operator in version 0.2.0
[INFO ] Installing druid operator in version 0.6.0
[INFO ] Installing hbase operator in version 0.3.0
[INFO ] Installing hdfs operator in version 0.4.0
[INFO ] Installing hive operator in version 0.6.0
[INFO ] Installing kafka operator in version 0.6.0
[INFO ] Installing nifi operator in version 0.6.0
[INFO ] Installing opa operator in version 0.9.0
[INFO ] Installing secret operator in version 0.5.0
[INFO ] Installing spark-k8s operator in version 0.3.0
[INFO ] Installing superset operator in version 0.5.0
[INFO ] Installing trino operator in version 0.4.0
[INFO ] Installing zookeeper operator in version 0.10.0
[INFO ] Installing components of stack trino-superset-s3
[INFO ] Installed stack trino-superset-s3
[INFO ] Installing components of demo trino-taxi-data
[INFO ] Installed demo trino-taxi-data. Use "stackablectl services list" to list the installed services
----

=== List deployed services
After installing your demo you can use the xref:commands/services.adoc[] command to list the installed services as follows

[source,console]
----
$ stackablectl services list --all-namespaces
PRODUCT NAME NAMESPACE ENDPOINTS EXTRA INFOS

hive hive default hive 172.18.0.4:32658
metrics 172.18.0.4:30745

opa opa default http http://172.18.0.2:31324

superset superset default external-superset http://172.18.0.2:32716 Admin user: admin, password: admin

trino trino default coordinator-http http://172.18.0.5:32128
coordinator-metrics 172.18.0.5:31199
coordinator-https https://172.18.0.5:32721

minio minio-trino default http http://172.18.0.4:31026 Third party service
console-http http://172.18.0.4:30354 Admin user: root, password: rootroot
----

== Uninstall stack
Currently there is no support for uninstalling a demo again.
Loading