Skip to content

Changes the processes endpoint way of exposing files#99

Merged
tdonohue merged 4 commits intoDSpace:masterfrom
atmire:processes-endpoint-file-changes
Jun 15, 2020
Merged

Changes the processes endpoint way of exposing files#99
tdonohue merged 4 commits intoDSpace:masterfrom
atmire:processes-endpoint-file-changes

Conversation

@KevinVdV
Copy link
Copy Markdown
Member

Small change in the way process endpoint exposes files. It uses actual bitstreams instead of having custom output.

Copy link
Copy Markdown
Member

@abollini abollini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updated contract, it makes easier to identify the key changes. Please see comments inline.

This endpoint will let an administrator download an output file created by a process. If the file is found, it will be presented as a download. This endpoint will support "Range" HTTP headers so that downloads can be paused and resumed.

This endpoint will return a list of files that are associated with the process, this can be files uploaded by the end user for imports or files generated by the script like for exports of data.
The files are grouped by "type", the type can be anything but they refer to what the file represents, is a mapping file, an exported csv file, ...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to create a separate documentation page where we list the defined types by script similar to what has been done for the submission sections, see https://github.com/DSpace/Rest7Contract/blob/master/submissionsection-types.md

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be quite some extra work to create an endpoint which specifies all types of all scripts (and also the API changes to be able to retrieve this information per scripts)
Just documenting it without an API would be something that's only useful to explain how it works (an example), but not an elaborate list of all types since each script can defined their own types and such documentation would tend to be always outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can keep this simple here, so the current approach is fine as suggested by @benbosman .

However, I think this area of the Contract could use a bit more explanation. For instance, when I read this I'm not sure what we mean by "files are grouped by type"? It looks like the only type field in the below example just says everything is a "bitstream".

That said, I think this is may be referencing the dspace.process.type metadata field? If so, we may want to reword this to refer to "files are grouped by process type" or similar.

UPDATE: Now that I look closer here, what do we mean by dspace.process.type anyways? I thought that was supposed to refer to the type of Process, but in the example here it looks like the type belongs more to the file (as you imply in the example below that multiple files belonging to the same process can have different dspace.process.type values)? If this "type" is more a type of File, we probably should rename the field here to be dspace.process.filetype or similar. That will clarify that the type belongs to the file and not the process.

Copy link
Copy Markdown
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman : Thanks for your updated comments here. I've added some responses inline below. I think we are mostly in agreement that this Contract PR is necessary, but I have some minor suggestions on how to refactor it to align with other endpoints & a possible rename for the "dspace.process.type" (based on what I think it's meant to represent)

This endpoint will let an administrator download an output file created by a process. If the file is found, it will be presented as a download. This endpoint will support "Range" HTTP headers so that downloads can be paused and resumed.

This endpoint will return a list of files that are associated with the process, this can be files uploaded by the end user for imports or files generated by the script like for exports of data.
The files are grouped by "type", the type can be anything but they refer to what the file represents, is a mapping file, an exported csv file, ...
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can keep this simple here, so the current approach is fine as suggested by @benbosman .

However, I think this area of the Contract could use a bit more explanation. For instance, when I read this I'm not sure what we mean by "files are grouped by type"? It looks like the only type field in the below example just says everything is a "bitstream".

That said, I think this is may be referencing the dspace.process.type metadata field? If so, we may want to reword this to refer to "files are grouped by process type" or similar.

UPDATE: Now that I look closer here, what do we mean by dspace.process.type anyways? I thought that was supposed to refer to the type of Process, but in the example here it looks like the type belongs more to the file (as you imply in the example below that multiple files belonging to the same process can have different dspace.process.type values)? If this "type" is more a type of File, we probably should rename the field here to be dspace.process.filetype or similar. That will clarify that the type belongs to the file and not the process.

Copy link
Copy Markdown
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KevinVdV: Thanks for the updates! This looks good now, except for one very minor thing (inline below). I'll approve this anyways, as it looks good. Just would appreciate the minor update before we merge!

**GET /api/system/processes/<:process-id>/files/<:file-name>**

This endpoint will let an administrator download an output file created by a process. If the file is found, it will be presented as a download. This endpoint will support "Range" HTTP headers so that downloads can be paused and resumed.
## Execution File Output List (type filter)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not call this a "type filter" or even a "List", as both of these imply multiple files might be returned from this endpoint. Let's rename this to be something like:
Execution File Output (using type identifier)

That helps to clarify that Type is an identifier here, and only one file will be returned based on the type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change.

@tdonohue
Copy link
Copy Markdown
Member

Per discussion in this week's meeting, I'm merging this with one approval. @abollini 's feedback was already addressed & he approved merging this in last week's meeting (once all feedback was addressed)

Thanks @KevinVdV !

@tdonohue tdonohue merged commit 8b85196 into DSpace:master Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants