diff --git a/docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc b/docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc index 5c90d526..bc6fc5a7 100644 --- a/docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc +++ b/docs/modules/ROOT/pages/demos/airflow-scheduled-job.adoc @@ -1,12 +1,5 @@ = airflow-scheduled-job -[NOTE] -==== -This guide assumes that you already have the demo `airflow-scheduled-job` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install airflow-scheduled-job`. -==== - This demo will * Install the required Stackable operators @@ -22,6 +15,21 @@ You can see the deployed products as well as their relationship in the following image::demo-airflow-scheduled-job/overview.png[] +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 2.5 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 9GiB memory +* 24GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install airflow-scheduled-job`. + == List deployed Stackable services To list the installed Stackable services run the following command: diff --git a/docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc b/docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc index 27efb1fd..c2390cc5 100644 --- a/docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc +++ b/docs/modules/ROOT/pages/demos/data-lakehouse-iceberg-trino-spark.adoc @@ -1,36 +1,20 @@ = data-lakehouse-iceberg-trino-spark -[WARNING] +[IMPORTANT] ==== This demo shows a data workload with real world data volumes and uses significant amount of resources to ensure acceptable response times. It will most likely not run on your workstation. There is also the smaller xref:demos/trino-iceberg.adoc[] demo focusing on the abilities a lakehouse using Apache Iceberg offers. The `trino-iceberg` demo has no streaming data part and can be executed on a local workstation. - -The demo was developed and tested on a kubernetes cluster with 10 nodes (4 cores (8 threads), 20GB RAM and 30GB HDD). -Instance types that loosely correspond to this on the Hyperscalers are: - -- *Google*: `e2-standard-8` -- *Azure*: `Standard_D4_v2` -- *AWS*: `m5.2xlarge` - -In addition to these nodes the operators will request multiple persistent volumes with a total capacity of about 1TB. ==== -[WARNING] +[CAUTION] ==== This demo only runs in the `default` namespace, as a `ServiceAccount` will be created. Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid. ==== -[NOTE] -==== -This guide assumes that you already have the demo `data-lakehouse-iceberg-trino-spark` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install data-lakehouse-iceberg-trino-spark`. -==== - This demo will * Install the required Stackable operators @@ -53,6 +37,24 @@ You can see the deployed products as well as their relationship in the following image::demo-data-lakehouse-iceberg-trino-spark/overview.png[] +[#system-requirements] +== System requirements + +The demo was developed and tested on a kubernetes cluster with 10 nodes (4 cores (8 threads), 20GB RAM and 30GB HDD). +Instance types that loosely correspond to this on the Hyperscalers are: + +- *Google*: `e2-standard-8` +- *Azure*: `Standard_D4_v2` +- *AWS*: `m5.2xlarge` + +In addition to these nodes the operators will request multiple persistent volumes with a total capacity of about 1TB. + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install data-lakehouse-iceberg-trino-spark`. + == Apache Iceberg As Apache Iceberg states on their https://iceberg.apache.org/docs/latest/[website]: diff --git a/docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc b/docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc index ae42eb72..4a300ab8 100644 --- a/docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc +++ b/docs/modules/ROOT/pages/demos/hbase-hdfs-load-cycling-data.adoc @@ -1,12 +1,5 @@ = hbase-hdfs-cycling-data -[NOTE] -==== -This guide assumes that you already have the demo `hbase-hdfs-load-cycling-data` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install hbase-hdfs-load-cycling-data`. -==== - This demo will * Install the required Stackable operators @@ -22,6 +15,21 @@ You can see the deployed products as well as their relationship in the following image::demo-hbase-hdfs-load-cycling-data/overview.png[] +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 3 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 6GiB memory +* 16GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install hbase-hdfs-load-cycling-data`. + == List deployed Stackable services To list the installed Stackable services run the following command: `stackablectl services list --all-namespaces` diff --git a/docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc b/docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc index 1aef8d27..c157f409 100644 --- a/docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc +++ b/docs/modules/ROOT/pages/demos/jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc @@ -2,30 +2,10 @@ This demo showcases the integration between https://jupyter.org[Jupyter] and https://hadoop.apache.org/[Apache Hadoop] deployed on the Stackable Data Platform (SDP) Kubernetes cluster. https://jupyterlab.readthedocs.io/en/stable/[JupyterLab] is deployed using the https://github.com/jupyterhub/zero-to-jupyterhub-k8s[pyspark-notebook stack] provided by the Jupyter community. The SDP makes this integration easy by publishing a discovery `ConfigMap` for the HDFS cluster. This `ConfigMap` is then mounted in all `Pods`` running https://spark.apache.org/docs/latest/api/python/getting_started/index.html[PySpark] notebooks so that these have access to HDFS data. For this demo, the HDFS cluster is provisioned with a small sample of the https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page[NYC taxi trip dataset] which is analyzed with a notebook that is provisioned automatically in the JupyterLab interface . -This demo can be installed on most cloud managed Kubernetes clusters as well as on premise or on a reasonably provisioned laptop. Install this demo on an existing Kubernetes cluster: - -[source,bash] ----- -stackablectl demo install jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data ----- - -[WARNING] -==== -This demo should not be run alongside other demos and requires a minimum of 32 GB RAM and 8 CPUs. -==== - -[NOTE] -==== -Some container images used by this demo are quite large and some steps may take several minutes to complete. If you install this demo locally, on a developer laptop for example, this can lead to timeouts during the installation. If this happens, it's safe to rerun the `stackablectl` command from above. - -For more details on how to install Stackable demos see the xref:commands/demo.adoc#_install_demo[documentation]. -==== - == Aim / Context This demo does not use the Stackable spark-k8s-operator but rather delegates the creation of executor pods to JupyterHub. The intention is to demonstrate how to interact with SDP components when designing and testing Spark jobs: the resulting script and Spark job definition can then be transferred for use with a Stackable `SparkApplication` resource. When logging in to JupyterHub (described below), a pod will be created with the username as a suffix e.g. `jupyter-admin`. This runs a container that hosts a Jupyter notebook with Spark, Java and Python pre-installed. When the user creates a `SparkSession`, temporary spark executors are created that are persisted until the notebook kernel is shut down or re-started. The notebook can thus be used as a sandbox for writing, testing and benchmarking Spark jobs before they are moved into production. - == Overview This demo will: @@ -39,6 +19,27 @@ This demo will: * Train an anomaly detection model using PySpark on the data available in HDFS * Perform some predictions and visualize anomalies +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 8 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 32GiB memory +* 22GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data`. + +[NOTE] +==== +Some container images used by this demo are quite large and some steps may take several minutes to complete. If you install this demo locally, on a developer laptop for example, this can lead to timeouts during the installation. If this happens, it's safe to rerun the `stackablectl` command from above. + +For more details on how to install Stackable demos see the xref:commands/demo.adoc#_install_demo[documentation]. +==== == HDFS diff --git a/docs/modules/ROOT/pages/demos/logging.adoc b/docs/modules/ROOT/pages/demos/logging.adoc index 4ffc3f10..e68aa67b 100644 --- a/docs/modules/ROOT/pages/demos/logging.adoc +++ b/docs/modules/ROOT/pages/demos/logging.adoc @@ -59,14 +59,20 @@ vm.max_map_count=262144 Then run `sudo sysctl --load` to reload. -== Run the demo +[#system-requirements] +== System requirements -The following command creates a kind cluster and installs this demo: +To run this demo, your system needs at least: -[source,console] ----- -$ stackablectl demo install logging --kind-cluster ----- +* 6.5 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 5GiB memory +* 27GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install logging`. == List deployed Stackable services diff --git a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc index c8ee3f7f..0a9d5d33 100644 --- a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc +++ b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-earthquake-data.adoc @@ -1,18 +1,11 @@ = nifi-kafka-druid-earthquake-data -[WARNING] +[CAUTION] ==== This demo only runs in the `default` namespace, as a `ServiceAccount` will be created. Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid. ==== -[NOTE] -==== -This guide assumes that you already have the demo `nifi-kafka-druid-earthquake-data` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install nifi-kafka-druid-earthquake-data`. -==== - This demo will * Install the required Stackable operators @@ -32,6 +25,21 @@ You can see the deployed products as well as their relationship in the following image::demo-nifi-kafka-druid-earthquake-data/overview.png[] +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 28GiB memory +* 75GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install nifi-kafka-druid-earthquake-data`. + == List deployed Stackable services To list the installed Stackable services run the following command: diff --git a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc index 3d344007..6ca2c38e 100644 --- a/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc +++ b/docs/modules/ROOT/pages/demos/nifi-kafka-druid-water-level-data.adoc @@ -1,18 +1,11 @@ = nifi-kafka-druid-water-level-data -[WARNING] +[CAUTION] ==== This demo only runs in the `default` namespace, as a `ServiceAccount` will be created. Additionally, we have to use the fqdn service names (including the namespace), so that the used TLS certificates are valid. ==== -[NOTE] -==== -This guide assumes that you already have the demo `nifi-kafka-druid-water-level-data` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install nifi-kafka-druid-water-level-data`. -==== - This demo will * Install the required Stackable operators @@ -34,6 +27,21 @@ You can see the deployed products as well as their relationship in the following image::demo-nifi-kafka-druid-water-level-data/overview.png[] +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 28GiB memory +* 75GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install nifi-kafka-druid-water-level-data`. + == List deployed Stackable services To list the installed Stackable services run the following command: diff --git a/docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc b/docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc index 8de7319a..c08bd2b9 100644 --- a/docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc +++ b/docs/modules/ROOT/pages/demos/spark-k8s-anomaly-detection-taxi-data.adoc @@ -1,16 +1,5 @@ = spark-k8s-anomaly-detection-taxi-data -[WARNING] -==== -This demo should not be run alongside other demos and requires a minimum of 32 GB RAM and 8 CPUs. -==== -[NOTE] -==== -This guide assumes you already have the demo `spark-k8s-anomaly-detection-taxi-data` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install spark-k8s-anomaly-detection-taxi-data`. -==== - This demo will * Install the required Stackable operators @@ -29,6 +18,21 @@ You can see the deployed products as well as their relationship in the following image::spark-k8s-anomaly-detection-taxi-data/overview.png[] +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 8 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 32GiB memory +* 35GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install spark-k8s-anomaly-detection-taxi-data`. + == List deployed Stackable services To list the installed Stackable services run the following command: diff --git a/docs/modules/ROOT/pages/demos/trino-iceberg.adoc b/docs/modules/ROOT/pages/demos/trino-iceberg.adoc index b6e51924..d517fd10 100644 --- a/docs/modules/ROOT/pages/demos/trino-iceberg.adoc +++ b/docs/modules/ROOT/pages/demos/trino-iceberg.adoc @@ -7,13 +7,6 @@ It focuses on the Trino and Iceberg integration and should run on you local work If you are interested in a more complex lakehouse setup, please have a look at the xref:demos/data-lakehouse-iceberg-trino-spark.adoc[] demo. ==== -[NOTE] -==== -This guide assumes that you already have the demo `trino-iceberg` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install trino-iceberg`. -==== - This demo will * Install the required Stackable operators @@ -22,6 +15,21 @@ This demo will * Create multiple data lakehouse tables using Apache Iceberg and data from the https://www.tpc.org/tpch/[TPC-H dataset]. * Run some queries to show the benefits of Iceberg +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 9 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 27GiB memory +* 110GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install trino-iceberg`. + == List deployed Stackable services To list the installed installed Stackable services run the following command: diff --git a/docs/modules/ROOT/pages/demos/trino-taxi-data.adoc b/docs/modules/ROOT/pages/demos/trino-taxi-data.adoc index dc8ba10b..c65bfc8e 100644 --- a/docs/modules/ROOT/pages/demos/trino-taxi-data.adoc +++ b/docs/modules/ROOT/pages/demos/trino-taxi-data.adoc @@ -1,12 +1,5 @@ = trino-taxi-data -[NOTE] -==== -This guide assumes that you already have the demo `trino-taxi-data` installed. -If you don't have it installed please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. -To put it simply you have to run `stackablectl demo install trino-taxi-data`. -==== - This demo will * Install the required Stackable operators @@ -24,6 +17,21 @@ You can see the deployed products as well as their relationship in the following image::demo-trino-taxi-data/overview.png[] +[#system-requirements] +== System requirements + +To run this demo, your system needs at least: + +* 7 https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu[cpu units] (core/hyperthread) +* 16GiB memory +* 28GiB disk storage + +[#installation] +== Installation + +Please follow the xref:commands/demo.adoc#_install_demo[documentation on how to install a demo]. +To put it simply you just have to run `stackablectl demo install trino-taxi-data`. + == List deployed Stackable services To list the installed Stackable services run the following command: