Skip to content

SAMZA-2276: Add Metdata store putAll API#1112

Closed
dnishimura wants to merge 2 commits intoapache:masterfrom
dnishimura:samza-2276-metadata-store-putall-api
Closed

SAMZA-2276: Add Metdata store putAll API#1112
dnishimura wants to merge 2 commits intoapache:masterfrom
dnishimura:samza-2276-metadata-store-putall-api

Conversation

@dnishimura
Copy link
Contributor

Add a putsAll API to the metadata store API. This allows for a delayed flush() call for underlying stores that require a flush() after write such as Kafka.

Copy link
Contributor

@mynameborat mynameborat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions to understand the motivation of PutAll

  1. Does Put have to be synchronous? Since flush() is exposed as part of API, what do you think about callers handling it in two steps? Have we evaluated this?
  2. What is the use of flush()? Looking at the implementation of Put and PutAll, both are synchronous. Do we intend to have different behavior for future MetadataStore implementation? If so, I suppose we should follow up this PR with another PR that details the guarantees and expectations of these APIs.

@dnishimura
Copy link
Contributor Author

A few questions to understand the motivation of PutAll

  1. Does Put have to be synchronous? Since flush() is exposed as part of API, what do you think about callers handling it in two steps? Have we evaluated this?
  2. What is the use of flush()? Looking at the implementation of Put and PutAll, both are synchronous. Do we intend to have different behavior for future MetadataStore implementation? If so, I suppose we should follow up this PR with another PR that details the guarantees and expectations of these APIs.

I believe the original intent of the metadata store is to have consistent behavior regardless of the underlying store. Since it's simpler to make async calls synchronous than vice-versa, the expected behavior is synchronous. The intent of the putAll API is to provide an optimization option at least for the short-term until the next-gen metadata store abstraction is designed. That being said, I think flush() should not be exposed as an external API.
All good points you have brought up that we can take into account for the next phase of the metadata store design.

@dnishimura
Copy link
Contributor Author

@mynameborat if you don't have any more comments, please approve and merge this commit. This is required to address an internal performance regression with job startup time. Thanks!
CC: @shanthoosh @prateekm

Copy link
Contributor

@mynameborat mynameborat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Minor documentation comment.
Let me know when its ready to be merged.

@dnishimura
Copy link
Contributor Author

@mynameborat thanks for the review. I addressed your comment. This is ready to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments