ARROW-2066: [Python] Document using pyarrow with Azure Blob Store#1544
ARROW-2066: [Python] Document using pyarrow with Azure Blob Store#1544rjrussell77 wants to merge 25 commits intoapache:masterfrom
Conversation
rjrussell77
left a comment
There was a problem hiding this comment.
@wesm Can you review this? (I had difficulty with getting the formatting down. Adding a sub-bullet list item caused the parent bullet to become italicized. Eventually omitted the sub-bullet.)
|
@xhochy Uwe - can you review? |
python/doc/source/parquet.rst
Outdated
|
|
||
| block_blob_service = BlockBlobService(account_name=account_name, account_key=account_key) | ||
| with tempfile.TemporaryFile() as fp: | ||
| block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=fp) |
There was a problem hiding this comment.
This should actually work without a temporary file, see the implementation in simplekv: https://github.com/mbr/simplekv/blob/master/simplekv/net/azurestore.py#L74
There was a problem hiding this comment.
Ok - I'll take a look. Thanks.
python/doc/source/parquet.rst
Outdated
| byte_stream = io.BytesIO() | ||
| block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream) | ||
| pd = pq.read_table(source=byte_stream).to_pandas() | ||
| pd.head(10) |
There was a problem hiding this comment.
@xhochy Good feedback - I replaced the temp file buffer with BytesIO stream instead.
| print("Error: {0}".format(err)) | ||
| finally: | ||
| byte_stream.close() | ||
|
|
There was a problem hiding this comment.
Added try/except/finally block to ensure closure of the stream
There was a problem hiding this comment.
Can you add this comment to the code? That will also be helpful for the reader later.
python/doc/source/parquet.rst
Outdated
| block_blob_service = BlockBlobService(account_name=account_name, account_key=account_key) | ||
| try: | ||
| block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream) | ||
| pd = pq.read_table(source=byte_stream).to_pandas() |
There was a problem hiding this comment.
The result is typically written into a variable called df whereas pd is the abbreviation you use when you import pandas (import pandas as pd)
python/doc/source/parquet.rst
Outdated
| try: | ||
| block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream) | ||
| pd = pq.read_table(source=byte_stream).to_pandas() | ||
| pd.head(10) |
There was a problem hiding this comment.
Better replace this with # Do work on DF …
There was a problem hiding this comment.
How about, # Do work on df (lower case)?
python/doc/source/parquet.rst
Outdated
| block_blob_service.get_blob_to_stream(container_name=container_name, blob_name=parquet_file, stream=byte_stream) | ||
| pd = pq.read_table(source=byte_stream).to_pandas() | ||
| pd.head(10) | ||
| except Exception as err: |
There was a problem hiding this comment.
Please don't catch exceptions like this, just let it throw.
| print("Error: {0}".format(err)) | ||
| finally: | ||
| byte_stream.close() | ||
|
|
There was a problem hiding this comment.
Can you add this comment to the code? That will also be helpful for the reader later.
…emove head() call and instead use comment to indicate generic fill-in code, add comment re: stream closure in finally block
| finally: | ||
| # Add finally block to ensure closure of the stream | ||
| byte_stream.close() | ||
|
|
There was a problem hiding this comment.
@xhochy Ok, I've responded to your last set of feedback. How are we looking now?
xhochy
left a comment
There was a problem hiding this comment.
+1, thank you for writing this up. This will be very helpful for new users.
|
@rjrussell77 do you have an Apache JIRA id so I could assign https://issues.apache.org/jira/browse/ARROW-2066 to you? |
|
@xhochy Regarding your question about my Apache JIRA id - try this: rob_PTL |
|
Thanks @rjrussell77 -- I added you as a contributor and assigned the issue to you |
Original question:
#1510
Improvement story:
https://issues.apache.org/jira/browse/ARROW-2066