generated from amazon-archives/__template_MIT-0
-
Notifications
You must be signed in to change notification settings - Fork 1k
New serverless pattern - Step Function to durable Lambda function #2975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sin-ak
wants to merge
2
commits into
aws-samples:main
Choose a base branch
from
sin-ak:sinak-feature-stepfunction-durable-lambda-function
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| *.swp | ||
| package-lock.json | ||
| __pycache__ | ||
| .pytest_cache | ||
| .venv | ||
| *.egg-info | ||
|
|
||
| # CDK asset staging directory | ||
| .cdk.staging | ||
| cdk.out |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,199 @@ | ||
| # AWS Step Functions to AWS Lambda durable functions | ||
|
|
||
| This pattern demonstrates how to integrate AWS Lambda durable functions into an AWS Step Functions workflow. This pattern covers both the synchronous invocation (using default Request Response pattern) and asynchronous invocation (using the Step Function Wait for Callback with Task Token integration pattern) of the durable Lambda function. It addresses the challenge of running long-running Lambda functions (beyond 15 minutes) within a Step Functions orchestration, using asynchronous invocation and durable checkpointing. | ||
|
|
||
| Announced at re:Invent 2025, [Lambda durable functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html) introduce a checkpoint/replay mechanism that allows Lambda executions to run for up to one year, automatically recovering from interruptions. This pattern shows how to combine durable functions with Step Functions in a hybrid architecture: durable functions handle application-level logic within Lambda, while Step Functions coordinates the high-level workflow across multiple AWS services. | ||
|
|
||
| Learn more about this pattern at Serverless Land Patterns: << Add the live URL here >> | ||
|
|
||
| Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. | ||
|
|
||
| ## When to Use This Pattern | ||
| Use this pattern when: | ||
| - Your Lambda function execution time exceeds 15 minutes and must be orchestrated by Step Functions | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: the 15min is not a differentiator for SFN vs DF - I think the reasons for combining both are 1/ existing SFN experience/workflows and 2/ simplifying Lambda orchestration logic e.g., reducing the number of functions, calls, and complexity when building hybrid workflows |
||
| - You want to keep complex business logic within a Lambda function rather than splitting into a fanout architecture | ||
| - Your team prefers standard programming languages and IDE-based development over visual/JSON workflow designers | ||
| - You need fine-grained control over execution state in code | ||
|
|
||
| Use Step Functions alone when: | ||
| - You are orchestrating multiple AWS services with native integrations | ||
| - Non-technical stakeholders need to understand and validate workflow logic | ||
| - You require zero-maintenance, fully managed infrastructure | ||
|
|
||
| Many applications benefit from using both services. A common pattern is using durable functions for application-level logic within Lambda, while Step Functions coordinates high-level workflows across multiple AWS services beyond Lambda functions. | ||
|
|
||
|
|
||
| ## Requirements | ||
|
|
||
| * [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. | ||
| * [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured | ||
| * [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) | ||
| * [AWS Cloud Development Kit](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) (AWS CDK >= 2.240.0) Installed | ||
|
|
||
| ## Deployment Instructions | ||
|
|
||
| 1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: | ||
| ``` | ||
| git clone https://github.com/aws-samples/serverless-patterns | ||
| ``` | ||
| 1. Change directory to the pattern directory: | ||
| ``` | ||
| cd cdk-stepfunction-durable-lambda-function | ||
| ``` | ||
| 1. Create a virtual environment for python: | ||
| ```bash | ||
| python3 -m venv .venv | ||
| ``` | ||
| 1. Activate the virtual environment: | ||
| ```bash | ||
| source .venv/bin/activate | ||
| ``` | ||
|
|
||
| If you are in Windows platform, you would activate the virtualenv like this: | ||
|
|
||
| ``` | ||
| % .venv\Scripts\activate.bat | ||
| ``` | ||
|
|
||
| 1. Install python modules: | ||
| ```bash | ||
| python3 -m pip install -r requirements.txt | ||
| ``` | ||
| 1. From the command line, use CDK to synthesize the CloudFormation template and check for errors: | ||
|
|
||
| ```bash | ||
| cdk synth | ||
| ``` | ||
| NOTE: You may need to perform a one time cdk bootstrapping using the following command. See [CDK Bootstrapping](https://docs.aws.amazon.com/cdk/v2/guide/bootstrapping.html) for more details. | ||
| ```bash | ||
| cdk bootstrap aws://<ACCOUNT-NUMBER-1>/<REGION-1> | ||
| ``` | ||
|
|
||
| 1. From the command line, use CDK to deploy the stack: | ||
|
|
||
| ```bash | ||
| cdk deploy | ||
| ``` | ||
|
|
||
| Expected result: | ||
|
|
||
| ```bash | ||
| ✅ CdkStepfunctionDurableLambdaFunctionStack | ||
|
|
||
| Outputs: | ||
| CdkStepfunctionDurableLambdaFunctionStack.AsyncDurableFunctionName = sfn-dfn-async-durable-fn | ||
| CdkStepfunctionDurableLambdaFunctionStack.StepFunctionDFArn = arn:aws:states:us-east-1:XXXXXXXXXXXX:stateMachine:sfn-dfn-integration-pattern-cdk | ||
| CdkStepfunctionDurableLambdaFunctionStack.SyncDurableFunctionName = sfn-dfn-sync-durable-fn | ||
| Stack ARN: | ||
| arn:aws:cloudformation:us-east-1:XXXXXXXXXXXX:stack/CdkStepfunctionDurableLambdaFunctionStack/e4d30000-0000-0000-0000-000000007503 | ||
|
|
||
| ``` | ||
|
|
||
| 1. Note the outputs from the CDK deployment process. These contain the resource names and/or ARNs which are used for testing. | ||
|
|
||
|
|
||
|
|
||
| ## How it works | ||
|
|
||
| Once the CDK stack is deployed successfully, a Step Function workflow is created along with two durable Lambda functions in the account & region provided during the bootstrap step. Go to AWS Step Function Console to understand the basic state machine created. | ||
|
|
||
| - The `sfn-dfn-async-durable-fn` durable Lambda function simulates a long running task that takes more than 15 mins (using a Wait condition). To avoid hitting Lambda function's 15 mins timeout, the function is configured with a durable execution timeout of 1 hr. As a result, this Lambda function can only be invoked asynchronously by setting the [InvocationType](https://docs.aws.amazon.com/lambda/latest/api/API_Invoke.html#lambda-Invoke-request-InvocationType) parameter to `Event`. | ||
| - The `sfn-dfn-sync-durable-fn` durable Lambda function simulates a short running task that completes within the 15 mins timeout. It is configured with a durable execution timeout of 15 mins which matches the standard Lambda function timeout. This Lambda function can be invoked synchronously without specifying any InvocationType parameter (or using `RequestResponse` value, which is also the default). | ||
|
|
||
| See AWS documentation for more details on [Invoking durable Lambda functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-invoking.html). | ||
|
|
||
| #### Step Functions State Machine | ||
|  | ||
|
|
||
| The state machine invokes these 2 durable Lambda functions in the following pattern: | ||
| 1. When the state machine starts, it first executes the 'Async Durable Lambda Fn Invoke' task, which invokes the `sfn-dfn-async-durable-fn` Lambda function. Since Step Functions' default `LambdaInvoke` uses synchronous invocation, we need to change to the '[Wait for Callback with Task Token](https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-wait-token)' integtation pattern with asynchronous invocation, otherwise Step Function task will throw an error - | ||
| ``` | ||
| Lambda.InvalidParameterValueException: You cannot synchronously invoke a durable function with an executionTimeout greater than 15 minutes. | ||
| ``` | ||
| The below state machine ASL snippet shows this configuration: | ||
| ```bash | ||
| "Async Durable Lambda Fn Invoke": { | ||
| "Type": "Task", | ||
| "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken", # wait for callback integration pattern | ||
| "InvocationType": "Event", # set InvocationType = 'Event' for async Lambda invocation | ||
| "Arguments": { | ||
| "FunctionName": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:sfn-dfn-async-durable-fn:1", # durable Lambda functions must be invoked with a qualified ARN (version or alias) | ||
| "Payload": { | ||
| "TaskToken": "{% $states.context.Task.Token %}", # pass the task-token to Lambda for a callback later. | ||
| "minutes_to_wait": "{% $states.input.minutes_to_wait %}" | ||
| }, | ||
| "HeartbeatSeconds": 3600, # set a heartbeat timeout of 1 hr before task is considered failed | ||
| }, | ||
| "Output": "{% $states.result %}" | ||
| } | ||
| ``` | ||
| > Note: Durable functions require qualified identifiers for invocation. You must invoke durable functions using a version number, alias, or $LATEST. You can use either a full qualified ARN or a function name with version/alias suffix. You cannot use an unqualified identifier (without a version or alias suffix). See AWS Documentation for more details on [Qualified ARNs requirement](https://docs.aws.amazon.com/lambda/latest/dg/durable-invoking.html#durable-invoking-qualified-arns). | ||
|
|
||
| Since this durable Lambda function has an artificial wait time of X mins (specified as a Step Function input), both the Step Functions execution and durable Lambda function execution will pause, without consuming any CPU. Once the wait timer expires, durable Lambda function will resume execution from this point, having checkpointed the previous steps. Since this Lambda was invoked asynchronously, we need to call Step Functions' `send_task_success` or `send_task_failure` API and pass the task-token that was sent as an event parameter to the Lambda from Step Function. This will enable the Step Functions to resume its state machine. | ||
|
|
||
| IMPORTANT: When using Step Function WAIT_FOR_TASK_TOKEN pattern, wrap SendTaskSuccess in context.step() in your Lambda code to make it durable. If placed outside context.step(), it will execute on every replay causing duplicate callbacks, or may never execute if Lambda is interrupted, leaving Step Functions waiting indefinitely. Also, send callback as the FINAL durable step. | ||
|
|
||
| 2. The state machine then executes the 'synchronous Durable Lambda Fn Invoke' task which invokes the `sfn-dfn-sync-durable-fn` Lambda function. Since this function can be invoked synchronously, we use the default Step Function task configuration, as shown below - | ||
| ```bash | ||
| "Synchronous Durable Lambda Fn Invoke": { | ||
| "Type": "Task", | ||
| "Resource": "arn:aws:states:::lambda:invoke", # default request-response pattern to invoke Lambda synchronously | ||
| "Arguments": { | ||
| "FunctionName": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:sfn-dfn-sync-durable-fn:1", # durable Lambda functions must be invoked with a qualified ARN (version or alias) | ||
| "Payload": "{% $states.input %}" | ||
| }, | ||
| "Output": "{% $states.result.Payload %}", | ||
| "End": true | ||
| } | ||
| }, | ||
| ``` | ||
| Once the Lambda function completes its execution and returns a response, Step Functions completes the task execution and end the state machine flow. | ||
|
|
||
| ## Testing | ||
|
|
||
| Go to the AWS Step Functions Console and select the Step Function created by CDK (look for a name starting with `sfn-dfn-integration-pattern-cdk`). Execute the step function workflow and provide the input parameters as described below - this makes the Lambda durable function wait for 20 mins, which is more than the standard Lambda execution timeout. Since the durable execution configuration is set at 1 hr, Lambda will pause and resume execution after 20 mins, instead of timing out. | ||
| ```bash | ||
| { | ||
| "minutes_to_wait": 20 | ||
| } | ||
| ``` | ||
|
|
||
| Wait for the Step Function workflow to complete. You can check the progress of the execution steps under the Executions section. | ||
|
|
||
| > NOTE: Since we have artificially added a wait condition in the `sfn-dfn-async-durable-fn` durable Lambda function which will wait for the duration specified in the state machine execution input parameters, the function will pause until the timer expires. For testing purposes, change the timeout to a smaller value | ||
|
|
||
| You can check the Durable executions section on the AWS Lambda service console for the `sfn-dfn-async-durable-fn` durable Lambda function to see how the various steps are checkpointed. | ||
|
|
||
| #### Durable execution in the Lambda console | ||
|  | ||
|
|
||
| ## Best practices for Lambda durable functions and Step Functions integration | ||
| Durable functions use a replay-based execution model that requires different patterns than traditional Lambda functions. Follow these best practices to build reliable, cost-effective workflows. Please see AWS documentation for more details on [Best practices for Lambda durable functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-best-practices.html). | ||
|
|
||
| - Synchronous invocation is not supported for durable functions with execution_timeout > 15 minutes. Always use WAIT_FOR_TASK_TOKEN + invocation_type=EVENT. | ||
| - `SendTaskSuccess` must be a durable step. Placing it outside context.step() risks duplicate callbacks on replay or missed callbacks on interruption. | ||
| - Durable and standard Lambdas can coexist in the same workflow. | ||
|
|
||
|
|
||
| ## Cleanup | ||
| 1. Delete the stack | ||
| ```bash | ||
| cdk destroy | ||
| ``` | ||
|
|
||
| ## Tutorial | ||
|
|
||
| See [this useful workshop](https://cdkworkshop.com/30-python.html) on working with the AWS CDK for Python projects. | ||
|
|
||
| ## Useful commands | ||
|
|
||
| * `cdk ls` list all stacks in the app | ||
| * `cdk synth` emits the synthesized CloudFormation template | ||
| * `cdk deploy` deploy this stack to your default AWS account/region | ||
| * `cdk diff` compare deployed stack with current state | ||
| * `cdk docs` open CDK documentation | ||
|
|
||
| ---- | ||
| Copyright 2025 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
|
|
||
| SPDX-License-Identifier: MIT-0 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| #!/usr/bin/env python3 | ||
| import os | ||
|
|
||
| import aws_cdk as cdk | ||
|
|
||
| from cdk_stepfunction_durable_lambda_function.cdk_stepfunction_durable_lambda_function_stack import CdkStepfunctionDurableLambdaFunctionStack | ||
|
|
||
|
|
||
| app = cdk.App() | ||
| CdkStepfunctionDurableLambdaFunctionStack(app, "CdkStepfunctionDurableLambdaFunctionStack", | ||
| # If you don't specify 'env', this stack will be environment-agnostic. | ||
| # Account/Region-dependent features and context lookups will not work, | ||
| # but a single synthesized template can be deployed anywhere. | ||
|
|
||
| # Uncomment the next line to specialize this stack for the AWS Account | ||
| # and Region that are implied by the current CLI configuration. | ||
|
|
||
| #env=cdk.Environment(account=os.getenv('CDK_DEFAULT_ACCOUNT'), region=os.getenv('CDK_DEFAULT_REGION')), | ||
|
|
||
| # Uncomment the next line if you know exactly what Account and Region you | ||
| # want to deploy the stack to. */ | ||
|
|
||
| #env=cdk.Environment(account='123456789012', region='us-east-1'), | ||
|
|
||
| # For more information, see https://docs.aws.amazon.com/cdk/latest/guide/environments.html | ||
| ) | ||
|
|
||
| app.synth() |
74 changes: 74 additions & 0 deletions
74
...unction-durable-lambda-function/async-durable-lambda/async_durable_function_invocation.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| from aws_durable_execution_sdk_python.config import Duration | ||
| from aws_durable_execution_sdk_python.context import DurableContext, StepContext, durable_step | ||
| from aws_durable_execution_sdk_python.execution import durable_execution | ||
| import random | ||
| import datetime | ||
| import boto3 | ||
| import json | ||
|
|
||
| @durable_step | ||
| def create_order(context: StepContext): | ||
| order_id = f"order-{random.randint(1, 100)}" | ||
| context.logger.info(f"Creating order... : {order_id}") | ||
| return { | ||
| "order_id": order_id, | ||
| "total": 50.00, | ||
| "status": "Created" | ||
| } | ||
|
|
||
| @durable_step | ||
| def send_notification(context: StepContext, order_id: str): | ||
| context.logger.info(f"Sending notification...") | ||
| return { | ||
| "sent": True, | ||
| "order_id": order_id, | ||
| "recipient": "customer@example.com", | ||
| "timestamp": datetime.datetime.now().isoformat() | ||
| } | ||
|
|
||
| @durable_step | ||
| def send_sfn_task_success(context: StepContext, task_token: str, response: dict): | ||
| sfn_client = boto3.client("stepfunctions") | ||
| sfn_client.send_task_success( | ||
| taskToken=task_token, | ||
| output=json.dumps(response, default=str), | ||
| ) | ||
|
|
||
| @durable_execution | ||
| def lambda_handler(event: dict, context: DurableContext) -> dict: | ||
| context.logger.info(f"Async Durable Lambda Event: {event}") | ||
|
|
||
| # Extract Step Function Task Token outside durable step | ||
| # Only deterministic operations like event.pop("TaskToken") are safe outside steps. | ||
| task_token = event.pop("TaskToken", None) | ||
| minutes_to_wait = event.pop("minutes_to_wait", 1) | ||
|
|
||
| # Step 1: Create the order | ||
| order_details = context.step(create_order()) | ||
| context.logger.info(f"Order created: {order_details['order_id']}") | ||
|
|
||
| # Step 2: Wait X minutes - simulate a long running task | ||
| context.logger.info(f"Waiting {minutes_to_wait} minutes before sending notification...") | ||
| context.wait(Duration.from_minutes(minutes_to_wait)) | ||
|
|
||
| # Step 3: Send notification | ||
| context.logger.info(f"Waited for {minutes_to_wait} minutes without consuming CPU.") | ||
| notification_details = context.step(send_notification(order_details['order_id'])) | ||
| context.logger.info("Notification sent successfully...") | ||
|
|
||
| response = { | ||
| "success": True, | ||
| "notification": notification_details | ||
| } | ||
|
|
||
| # IMPORTANT: When using Step Function WAIT_FOR_TASK_TOKEN pattern, | ||
| # wrap SendTaskSuccess in context.step() to make it durable. | ||
| # If placed outside context.step(), it will execute on every | ||
| # replay causing duplicate callbacks, or may never execute if | ||
| # Lambda is interrupted, leaving Step Functions waiting indefinitely. | ||
| # Send callback as the FINAL durable step | ||
| if task_token: | ||
| context.logger.info("Resuming Step Function by calling send_task_success with task_token") | ||
| context.step(send_sfn_task_success(task_token, response)) | ||
|
|
||
| return response |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I'm reading this section as "when to use this serverless pattern" but somehow the section seems to be more a DF vs SFN - consider streamlining this section to match the use case and refer to the user guide for more guidance