Skip to content

Introduce Amazon Comprehend Service#39592

Merged
vincbeck merged 9 commits intoapache:mainfrom
gopidesupavan:add-comprehend-start-pii-entities-detection-job-operator
May 15, 2024
Merged

Introduce Amazon Comprehend Service#39592
vincbeck merged 9 commits intoapache:mainfrom
gopidesupavan:add-comprehend-start-pii-entities-detection-job-operator

Conversation

@gopidesupavan
Copy link
Copy Markdown
Member

@gopidesupavan gopidesupavan commented May 13, 2024

Added Amazon Comprehend Start Pii Entities Detection Job Operator Doc, Hook,
Operator, Sensor, Trigger, Waiter, Unit Test, System Test.

At present it supports only Pii Entities Detection Job. Remaining Comprehend services coming next.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend/client/start_pii_entities_detection_job.html

Sample Dag:

from datetime import datetime

from airflow import DAG
from airflow.providers.amazon.aws.operators.comprehend import ComprehendStartPiiEntitiesDetectionJobOperator

with DAG(
    dag_id="comprehend_testing",
    schedule_interval=None,
    start_date=datetime(2021, 1, 1),
    tags=["comprehend pii entities detection"],
    catchup=False,
) as dag:
    pii_entities_detection_job = ComprehendStartPiiEntitiesDetectionJobOperator(
        task_id="pii_entities_detection_job",
        input_data_config={"S3Uri": f"s3://aws-comprehend-testing-hpl7cy/sample_data.txt",
                           "InputFormat": "ONE_DOC_PER_LINE",
                           },
        output_data_config={"S3Uri": f"s3://aws-comprehend-testing-hpl7cy/redacted_output/"},
        mode="ONLY_REDACTION",
        language_code="en",
        data_access_role_arn="arn:aws:iam::{ACCOUNT_ID}:role/ComprehendRole",
        start_pii_entities_kwargs={"RedactionConfig": {"PiiEntityTypes": ["NAME", "ADDRESS"],
                                                       "MaskMode": "REPLACE_WITH_PII_ENTITY_TYPE"}}
    )
image image

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Copy Markdown
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good job! Thanks for following the good practises and everything. I just added one question but overall it is really good!

Copy link
Copy Markdown
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome PR! Super thorough and ticks all the boxes. We'll use this as an example for future folks, great work! 😃

@gopidesupavan
Copy link
Copy Markdown
Member Author

This is an awesome PR! Super thorough and ticks all the boxes. We'll use this as an example for future folks, great work! 😃

Thank you so much for reviewing this 😄 , Applied all your feedback.
The quick start guides are really helpful and well documented.

Copy link
Copy Markdown
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR!

@gopidesupavan
Copy link
Copy Markdown
Member Author

Awesome PR!

Thank you @vincbeck 😃

@vincbeck vincbeck merged commit 9dd7752 into apache:main May 15, 2024
@utkarsharma2 utkarsharma2 added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jun 3, 2024
@gopidesupavan gopidesupavan deleted the add-comprehend-start-pii-entities-detection-job-operator branch July 5, 2024 12:29
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers area:system-tests changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants