-
Notifications
You must be signed in to change notification settings - Fork 319
BigQuery: fix an issue with option propagation and refactor to future-proof #540
BigQuery: fix an issue with option propagation and refactor to future-proof #540
Conversation
…-proof * We created a helper in BigQueryIO to create a JobConfigurationQuery capturing all options, but we had not yet propagated this cleanup into the Services abstraction or helper classes. Refactor BigQueryServices and BigQueryTableRowIterator to propagate the same configuration. Adds a new deprecated constructor to BigQueryTableRowIterator for backwards-compatibility. This fixes #539.
|
@peihe given the amount of code divergence, they both need careful review. Please review here |
|
But, I think we should still do the Beam first, otherwise it will diverge further more. And, I think forward ports PRs could cause additional inconvenience during review and backport. |
|
commented on apache/beam#1873 Let's get Beam PR LGTMed, and then update this accordingly. |
|
Update this PR based on apache/beam#1873? |
|
The code is substantially different here, plus we are unable to make backwards-incompatible changes. Needs separate review. |
peihe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
I am okay to review it separately, as long as we make it clear that we don't need to keep Dataflow code less divergent from Beam code.
| return new JobConfigurationQuery() | ||
| .setQuery(query.get()) | ||
| .setFlattenResults(flattenResults) | ||
| .setPriority("BATCH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert BATCH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| private JobConfigurationQuery createBasicQueryConfig() { | ||
| // Due to deprecated functionality, if this function is updated | ||
| // then the similar code in BigQueryTableRowIterator#fromQuery should be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that better to have a
public static createBasicQueryConfig(String query, boolean flattenResults, boolean useLegacySql)
and, use it in two places?
| checkNotNull(queryConfig, "queryConfig"); | ||
| checkNotNull(projectId, "projectId"); | ||
| checkNotNull(client, "client"); | ||
| return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
null /* ref */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
| .setPriority("BATCH") | ||
| .setQuery(query) | ||
| .setUseLegacySql(MoreObjects.firstNonNull(useLegacySql, Boolean.TRUE)); | ||
| return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
null /* ref */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
dhalperi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @peihe
| return new JobConfigurationQuery() | ||
| .setQuery(query.get()) | ||
| .setFlattenResults(flattenResults) | ||
| .setPriority("BATCH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| .setPriority("BATCH") | ||
| .setQuery(query) | ||
| .setUseLegacySql(MoreObjects.firstNonNull(useLegacySql, Boolean.TRUE)); | ||
| return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
| checkNotNull(queryConfig, "queryConfig"); | ||
| checkNotNull(projectId, "projectId"); | ||
| checkNotNull(client, "client"); | ||
| return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
|
LGTM |
|
Thanks! |
We created a helper in BigQueryIO to create a JobConfigurationQuery capturing all options, but we had not yet propagated this cleanup into the Services abstraction or helper classes.
Refactor BigQueryServices and BigQueryTableRowIterator to propagate the same configuration.
Adds a new deprecated constructor to BigQueryTableRowIterator for backwards-compatibility.
This fixes #539.