This repository implements a collection of AWS integrations for the zenml platform.
Contains:
- a customized version of the AWS Batch step operator proposed in this PR in the official zenml repository (based on this original plugin implementation)
- a placeholder for a future AWS Batch EC2 (and thus GPU) compatible extension of the great ML Ops Club's step functions orchestrator implementation
Create a virtual environment with python 3.12:
uv venv --python=3.12and activate it. Then install all python dependencies:
uv syncProvision the pulumi stack in the AWS cloud, including a publically available
RDS sql server for the remote zenml store. This needs to be run by an AWS
identity that has the required pulumi provisioning permissions. My local setup
achieves this by a designated PulumiDevRole holding all the required policies
and/or permissions, and that can be assumed by a
pululmi-bootstrap User. I've configured this setup using the below
configuration files ~/.aws/config and ~/.aws/credentials:
[profile pulumi] # ~/.aws/config
role_arn = arn:aws:iam::743582000746:role/PulumiDevRole
source_profile = pulumi-bootstrap
region = eu-west-1
[pulumi-bootstrap] # ~/.aws/credentials
aws_access_key_id = <AWS_ACCESS_KEY_ID>
aws_secret_access_key = <AWS_SECRET_ACCESS_KEY>
export AWS_PROFILE=pulumi # 'set AWS_PROFILE=pulumi' on windows
cd infrastructure
pulumi up -yTo authenticate your local docker client with the remote ECR stack you just provisioned, run:
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 743582000746.dkr.ecr.eu-west-1.amazonaws.comTo build a zenml docker image that can run remotely, run:
docker build -f infrastructure\docker\component\Dockerfile . -t 743582000746.dkr.ecr.eu-west-1.amazonaws.com/zenml:latest
docker push 743582000746.dkr.ecr.eu-west-1.amazonaws.com/zenml:latestCreate a zenml stack using this library's integrations by running the following
commands.
Login with the remote SQL zenml store directly:
zenml login mysql://zenml:password@zenml-metdata-store992d729.c1cyu4q20nag.eu-west-1.rds.amazonaws.com:3306/zenmlRegister the git repository as a local zenml repository:
zenml initRegister a remote type ECR contaier registry component:
zenml container-registry register aws-ecr -f aws --uri=743582000746.dkr.ecr.eu-west-1.amazonaws.comRegister a remote type S3 artifact store component:
zenml artifact-store register aws-s3 -f s3 --path=s3://zenml-artifact-store-29182ffzenml stack set default
zenml stack delete test-step-operator -y
zenml step-operator delete aws-batch
zenml step-operator flavor delete aws_batch
zenml step-operator flavor register zenml_aws.step_operator.aws_batch_step_operator_flavor.AWSBatchStepOperatorFlavor
zenml step-operator register aws-batch -f aws_batch --execution_role=arn:aws:iam::743582000746:role/batch-execution-role --job_role=arn:aws:iam::743582000746:role/batch-job-role --job_queue_name=zenml-test-fargate-job-queue --backend=FARGATE --tags="{\"test\": \"step-operator\"}" --assign_public_ip=ENABLED --timeout_seconds=900 --aws_profile=pulumi --delete_resources_on="[\"SUCCEEDED\"]" --log_group=/aws/batch/job
zenml stack register test-step-operator -a default -o default -c aws-ecr -s aws-batch -a aws-s3
zenml stack set test-step-operatorFor end-to-end tests running on the provisioned AWS infrastructure, run the
test scripts in the scripts directory:
zenml stack set test-step-operator
python scripts/test_run_step_operator.py --backend EC2 --job-queue zenml-test-ec2-job-queue --memory 1000
python scripts/test_run_step_operator.py --backend FARGATE --job-queue zenml-test-fargate-job-queue --memory 2048zenml stack set default
zenml stack delete test-orchestrator -y
zenml step-operator delete aws-batch
zenml step-operator flavor delete aws_batch
zenml orchestrator delete aws-stepfunctions
zenml orchestrator flavor delete aws_stepfunctions
zenml step-operator flavor register zenml_aws.step_operator.aws_batch_step_operator_flavor.AWSBatchStepOperatorFlavor
zenml step-operator register aws-batch -f aws_batch --execution_role=arn:aws:iam::743582000746:role/batch-execution-role --job_role=arn:aws:iam::743582000746:role/batch-job-role --job_queue_name=zenml-test-fargate-job-queue --backend=FARGATE --tags="{\"test-1\": \"step-operator\"}" --assign_public_ip=ENABLED --timeout_seconds=900 --aws_profile=pulumi
zenml orchestrator flavor register zenml_aws.orchestrator.aws_stepfunctions_batch_orchestrator_flavor.AWSStepFunctionsOrchestratorFlavor
zenml orchestrator register aws-stepfunctions -f aws_stepfunctions --stepfunctions_execution_role=arn:aws:iam::743582000746:role/stepfunctions-execution-role --batch_execution_role=arn:aws:iam::743582000746:role/batch-execution-role --batch_job_role=arn:aws:iam::743582000746:role/batch-job-role --job_queue_name=zenml-test-fargate-job-queue --backend=FARGATE --tags="{\"test-2\": \"orchestrator\"}" --assign_public_ip=ENABLED --timeout_seconds=900 --aws_profile=pulumi --delete_stepfunctions_resource_on="[]" --batch_log_group=/aws/batch/job --stepfunctions_log_group_arn=arn:aws:logs:eu-west-1:743582000746:log-group:/aws/batch/job:*
zenml stack register test-orchestrator -a default -o aws-stepfunctions -c aws-ecr -a aws-s3 -s aws-batch
zenml stack set test-orchestratorFor end-to-end tests running on the provisioned AWS infrastructure, run the
test scripts in the scripts directory:
zenml stack set test-orchestrator
python scripts/test_run_orchestrator.py --backend EC2 --job-queue zenml-test-ec2-job-queue --memory 1000
python scripts/test_run_orchestrator.py --backend FARGATE --job-queue zenml-test-fargate-job-queue --memory 2048It can be useful to track the state of stacks and pipelines via the zenml dashboard running independently against the same SQL zenml store:
cd infrastructure
docker compose upThe username is default, and the password is empty.
For local only unit and integration tests, simply run the pytest test suites in the respective directories:
pytest tests/unit -vv # unit tests
pytest tests/integration -vv # integration testsFor end-to-end tests, see the sections above.




