Reducing memory footprint for synchronous S3KeySensor#55070
Reducing memory footprint for synchronous S3KeySensor#55070ashb merged 9 commits intoapache:mainfrom
S3KeySensor#55070Conversation
|
cc: @dstandish |
providers/amazon/src/airflow/providers/amazon/aws/sensors/s3.py
Outdated
Show resolved
Hide resolved
|
@eladkal, since we're changing the return type of |
|
mentioned it on slack but just adding here should check with @eladkal re backcompat issues. changing but thinking about that..... the problem is it's somewhat ambiguous what the behavior should be for best performance, you would want check_fn to return on first "pass" but, the behavior that would be most similar to current, would be to return all the files for which check_fn evaluates to true |
|
cc: @eladkal |
|
Lets consult first with @o-nikolas and @vincbeck. |
|
To keep it backward compatible, could we make |
|
I think that's feasible - how would we determine whether to return a |
|
In the current implementation we would always return an iterator but having |
Making the type annotation change now. Both cases are handled in |
vincbeck
left a comment
There was a problem hiding this comment.
Actually sorry but I realized I provided wrong instructions. This does not solve back compat issues ... What we should do instead is deprecate get_file_metadata and create a new one using Iterator and use this one in the sensor. That way we do not need to create a major release because of this.
|
Got it - so the plan would be to:
What is the best way to deprecate a method and create one with the same name? Or should I create a new method with a new name, and use this one in the Sensor? |
Correct
To deprecate a method, please emit deprecation warning at the beginning of the method. Example below:
Yes, you should create a new one with a new name |
|
@vincbeck, I've updated the PR accordingly! |
vincbeck
left a comment
There was a problem hiding this comment.
Nice! Thanks for the quick turn around!
I guess just wondering what is wrong with making a major release? How do we determine when to proceed with one? |
Nothing wrong about it, we are just trying to minimize them. Upgrading major version can be painful for users and require some work so we are trying to do that not so often. |
|
@dstandish, can you take a look at the updated PR? |
|
@eladkal, can you take a look at this for me? |
providers/amazon/src/airflow/providers/amazon/aws/sensors/s3.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/sensors/s3.py
Outdated
Show resolved
Hide resolved
|
@ashb should be all set here! |
This PR aims to reduce the memory footprint for the
S3KeySensor. This was done by altering theget_file_metadatamethod in the S3Hook toyieldrecords in the paginated response, rather than loading them into a single list. The return type for theget_file_metadatamethod is not anIterator. An assertion was added to validate this, and all appropriate tests were updated.closes: #55039