[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status#16294
[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status#16294tdas wants to merge 8 commits into
Conversation
|
Test build #70182 has started for PR 16294 at commit |
|
Test build #70218 has finished for PR 16294 at commit
|
| <div data-lang="scala" markdown="1"> | ||
|
|
||
| {% highlight scala %} | ||
| import spark.implicits._ |
There was a problem hiding this comment.
Just curious, will there be a complete example in the examples folder? In documents like ML, SQL, the code is cited from the example file instead of hard code in the document. Thanks!
There was a problem hiding this comment.
I wasnt planning to adding examples in this PR to keep this just about docs. I am happy to see simple examples contributed as PRs from the community.
|
Hi, @tdas . Is it possible for you to update this statement for Apache Spark 2.1?
https://github.com/apache/spark/blame/master/docs/structured-streaming-programming-guide.md#L13 |
|
will do. i am rewriting parts of this PR right now. |
|
Test build #70500 has finished for PR 16294 at commit
|
|
Test build #70525 has finished for PR 16294 at commit
|
|
Test build #70524 has finished for PR 16294 at commit
|
|
^^ The last failure on build 70524 is for a older commit. I reverted the culprit change and build 70525 passed. |
zsxwing
left a comment
There was a problem hiding this comment.
Looks good overall. Just some nits.
| } | ||
| @Overrides void onQueryProgress(QueryProgressEvent queryProgress) { | ||
| System.out.println("Query made progress: " + queryProgress.queryStatus); | ||
| System.out.println("Query made progress: " + queryProgress.progress); |
| } | ||
| @Overrides void onQueryTerminated(QueryTerminatedEvent queryTerminated) { | ||
| System.out.println("Query terminated: " + queryTerminated.queryStatus.name); | ||
| System.out.println("Query terminated: " + queryTerminated.id); |
|
|
||
| @Overrides void onQueryStarted(QueryStartedEvent queryStarted) { | ||
| System.out.println("Query started: " + queryTerminated.queryStatus.name); | ||
| System.out.println("Query started: " + queryTerminated.id); |
|
|
||
| override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = { | ||
| println("Query started: " + queryTerminated.queryStatus.name) | ||
| println("Query started: " + queryTerminated.id) |
There was a problem hiding this comment.
nit: queryTerminated -> queryStarted
| {% highlight python %} | ||
| query = ... // a StreamingQuery | ||
| query = ... # a StreamingQuery | ||
| print(query.progress) |
| latency.getBatch.source: 20 | ||
| Sink status - MySink | ||
| Committed offsets: [1, -] | ||
| System.out.println(query.status); |
| StreamingQuery query = ... | ||
|
|
||
| System.out.println(query.status); | ||
| System.out.println(query.progress); |
| {% highlight scala %} | ||
| val query: StreamingQuery = ... | ||
|
|
||
| println(query.progress) |
| preserves all data in the Result Table. | ||
| </td> | ||
| </tr> | ||
| <tr> |
There was a problem hiding this comment.
same reason as the other place.
| </tr> | ||
| </tr> | ||
| <tr> | ||
| <td></td> |
There was a problem hiding this comment.
empty row to make sure that there is line after the table when rendered. otherwise looks odd.
| } | ||
| @Overrides void onQueryProgress(QueryProgressEvent queryProgress) { | ||
| System.out.println("Query made progress: " + queryProgress.progress); | ||
| System.out.println("Query made progress: " + queryProgress.lastProgress()); |
There was a problem hiding this comment.
nit: the method is QueryProgressEvent.progress
|
Test build #70529 has finished for PR 16294 at commit
|
|
LGTM pending tests |
|
Test build #70530 has finished for PR 16294 at commit
|
…egarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status - Updated API changes in the StreamingQueryListener example TODO - [x] Figure showing the watermarking ## How was this patch tested? N/A ## Screenshots ### Section: Windowed Aggregation with Event Time <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">  <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png"> ---------------------------- ### Section: Output Modes  ---------------------------- ### Section: Monitoring   Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #16294 from tdas/SPARK-18669. (cherry picked from commit 092c672) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
This version is built from the docs source code generated by applying apache/spark#16294 to v2.1.0 (so, other changes in branch 2.1 will not affect the doc).
…egarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status - Updated API changes in the StreamingQueryListener example TODO - [x] Figure showing the watermarking ## How was this patch tested? N/A ## Screenshots ### Section: Windowed Aggregation with Event Time <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">  <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png"> ---------------------------- ### Section: Output Modes  ---------------------------- ### Section: Monitoring   Author: Tathagata Das <tathagata.das1565@gmail.com> Closes apache#16294 from tdas/SPARK-18669.
…egarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status - Updated API changes in the StreamingQueryListener example TODO - [x] Figure showing the watermarking ## How was this patch tested? N/A ## Screenshots ### Section: Windowed Aggregation with Event Time <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">  <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png"> ---------------------------- ### Section: Output Modes  ---------------------------- ### Section: Monitoring   Author: Tathagata Das <tathagata.das1565@gmail.com> Closes apache#16294 from tdas/SPARK-18669.
What changes were proposed in this pull request?
TODO
How was this patch tested?
N/A
Screenshots
Section: Windowed Aggregation with Event Time
Section: Output Modes
Section: Monitoring