Add `--recursive` option to `fs ls` command by justinTM · Pull Request #513 · databricks/databricks-cli

justinTM · 2022-07-08T20:45:47Z

No description provided.

pietern

Thanks for your PR, this looks useful.

Two general comments:

It seems to me that recursive implies absolute, or at least relative to the specified path. As is, it would allow for recursively showing the basename only, which doesn't make a lot of sense. Can you make the behavior of non-absolute recursive listing show the relative paths?
Please add a couple tests for this behavior.

pietern · 2022-07-11T11:42:49Z

databricks_cli/dbfs/cli.py

+          if f.is_dir:
+              recursive_echo(this_dbfs_path.join(f.basename))
+
+    recursive_echo(dbfs_path) if recursive else echo_path(dbfs_path)


This modifies existing behavior to no longer list files in the specified path, but just the path itself.

hi @pietern thank you for your helpful feedback! i believe i fixed this in the latest commit:

databricks-cli/databricks_cli/dbfs/api.py

Line 94 in 3330faf

def _recursive_list(self, **kwargs):

pietern · 2022-07-11T11:43:40Z

databricks_cli/dbfs/cli.py

+
+    def echo_path(files):
+        table = tabulate([f.to_row(is_long_form=l, is_absolute=absolute) for f in files],
+                         tablefmt='plain')


What happens to tabulate if the max width of these files is different?

@pietern thanks again. i believe tabulate will only be called once now after the latest commit, and the max file width should be a single value.

stormwindy · 2022-07-13T09:09:35Z

databricks_cli/dbfs/cli.py

+      for f in files:
+          if f.is_dir:
+              recursive_echo(this_dbfs_path.join(f.basename))


Do we have a rough estimation how much more requests this would generate? Do we have customers who already manually doing this and we are just giving them a shortcut?

hi @stormwindy thanks for your review. i truly have no idea how many more requests it will generate; totally dependent on customer file structure. will be 1 extra request for each directory in the path and all of their subdirectories.

I see. What are the use cases for this, at least in your perspective? Unless there is a crazy deep folder structure this shouldn't cause any problems.

getting filepaths of all worker logs in a cluster: #512

stormwindy

Overall the idea to add this option is good on our end. I am also happy with the overall implementation direction. I have left one not comment about the code. After that @pietern can decide if/when to approve the PR. Thanks a lot for the contribution.

databricks_cli/dbfs/cli.py

fix Dbfs files assignment

pietern

@justinTM Thanks for addressing my earlier comment. You refactored a bunch of the PR and I think there are correctness issues. Please have a look.

pietern · 2022-08-12T10:34:01Z

databricks_cli/dbfs/api.py

+        paths = self.client.list_files(*args, **kwargs)
+        files = [p for p in paths if not p.is_dir]
+        for p in paths:
+            files = files + self._recursive_list(p) if p.is_dir else files


I had to read this a couple times to understand what you're doing. I think it should be simplified to iterate over paths just once and then have an if p.is_dir in there with both dealing with a file or a directory.

Separate comment: this also need to pass along the headers from the **kwargs to be consistent with the list_files API today. Otherwise you use it on the first call but not later calls.

pietern · 2022-08-12T10:43:36Z

tests/dbfs/test_api.py

+        assert len(files) == 2
+        assert TEST_FILE_INFO0 == files[0]
+        assert TEST_FILE_INFO1 == files[1]
+


There doesn't seem to be a test for a real recursive call.

The variable TEST_FILE_JSON2 is unused. When you make a test that uses it and does a real recursive call, I expect it to fail for the reason I commented about above; the mismatch between FileInfo and DbfsPath.

justinTM mentioned this pull request Jul 8, 2022

Add -R recursive option to fs ls command #512

Open

pietern reviewed Jul 11, 2022

View reviewed changes

pietern requested a review from stormwindy July 11, 2022 11:46

stormwindy reviewed Jul 13, 2022

View reviewed changes

justinTM force-pushed the patch-1 branch from 0e8ff70 to 583c5c9 Compare July 13, 2022 19:12

!513 refactor recursive dbfs ls and add test

3330faf

justinTM force-pushed the patch-1 branch from 583c5c9 to 3330faf Compare July 13, 2022 19:20

!513 fix args for list_files()

9389901

stormwindy reviewed Jul 15, 2022

View reviewed changes

databricks_cli/dbfs/cli.py Show resolved Hide resolved

Update cli.py

b0fb9cf

fix Dbfs files assignment

stormwindy requested a review from pietern July 20, 2022 13:35

revert erroneous changes

7060ef0

pietern reviewed Aug 12, 2022

View reviewed changes

Conversation

justinTM commented Jul 8, 2022

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stormwindy Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stormwindy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stormwindy Jul 14, 2022 •

edited

Loading

stormwindy left a comment •

edited

Loading