Changes the processes endpoint way of exposing files#99
Changes the processes endpoint way of exposing files#99tdonohue merged 4 commits intoDSpace:masterfrom
Conversation
abollini
left a comment
There was a problem hiding this comment.
Thanks for the updated contract, it makes easier to identify the key changes. Please see comments inline.
processes-endpoint.md
Outdated
| This endpoint will let an administrator download an output file created by a process. If the file is found, it will be presented as a download. This endpoint will support "Range" HTTP headers so that downloads can be paused and resumed. | ||
|
|
||
| This endpoint will return a list of files that are associated with the process, this can be files uploaded by the end user for imports or files generated by the script like for exports of data. | ||
| The files are grouped by "type", the type can be anything but they refer to what the file represents, is a mapping file, an exported csv file, ... |
There was a problem hiding this comment.
I suggest to create a separate documentation page where we list the defined types by script similar to what has been done for the submission sections, see https://github.com/DSpace/Rest7Contract/blob/master/submissionsection-types.md
There was a problem hiding this comment.
It would be quite some extra work to create an endpoint which specifies all types of all scripts (and also the API changes to be able to retrieve this information per scripts)
Just documenting it without an API would be something that's only useful to explain how it works (an example), but not an elaborate list of all types since each script can defined their own types and such documentation would tend to be always outdated
There was a problem hiding this comment.
I agree that we can keep this simple here, so the current approach is fine as suggested by @benbosman .
However, I think this area of the Contract could use a bit more explanation. For instance, when I read this I'm not sure what we mean by "files are grouped by type"? It looks like the only type field in the below example just says everything is a "bitstream".
That said, I think this is may be referencing the dspace.process.type metadata field? If so, we may want to reword this to refer to "files are grouped by process type" or similar.
UPDATE: Now that I look closer here, what do we mean by dspace.process.type anyways? I thought that was supposed to refer to the type of Process, but in the example here it looks like the type belongs more to the file (as you imply in the example below that multiple files belonging to the same process can have different dspace.process.type values)? If this "type" is more a type of File, we probably should rename the field here to be dspace.process.filetype or similar. That will clarify that the type belongs to the file and not the process.
tdonohue
left a comment
There was a problem hiding this comment.
@benbosman : Thanks for your updated comments here. I've added some responses inline below. I think we are mostly in agreement that this Contract PR is necessary, but I have some minor suggestions on how to refactor it to align with other endpoints & a possible rename for the "dspace.process.type" (based on what I think it's meant to represent)
processes-endpoint.md
Outdated
| This endpoint will let an administrator download an output file created by a process. If the file is found, it will be presented as a download. This endpoint will support "Range" HTTP headers so that downloads can be paused and resumed. | ||
|
|
||
| This endpoint will return a list of files that are associated with the process, this can be files uploaded by the end user for imports or files generated by the script like for exports of data. | ||
| The files are grouped by "type", the type can be anything but they refer to what the file represents, is a mapping file, an exported csv file, ... |
There was a problem hiding this comment.
I agree that we can keep this simple here, so the current approach is fine as suggested by @benbosman .
However, I think this area of the Contract could use a bit more explanation. For instance, when I read this I'm not sure what we mean by "files are grouped by type"? It looks like the only type field in the below example just says everything is a "bitstream".
That said, I think this is may be referencing the dspace.process.type metadata field? If so, we may want to reword this to refer to "files are grouped by process type" or similar.
UPDATE: Now that I look closer here, what do we mean by dspace.process.type anyways? I thought that was supposed to refer to the type of Process, but in the example here it looks like the type belongs more to the file (as you imply in the example below that multiple files belonging to the same process can have different dspace.process.type values)? If this "type" is more a type of File, we probably should rename the field here to be dspace.process.filetype or similar. That will clarify that the type belongs to the file and not the process.
processes-endpoint.md
Outdated
| **GET /api/system/processes/<:process-id>/files/<:file-name>** | ||
|
|
||
| This endpoint will let an administrator download an output file created by a process. If the file is found, it will be presented as a download. This endpoint will support "Range" HTTP headers so that downloads can be paused and resumed. | ||
| ## Execution File Output List (type filter) |
There was a problem hiding this comment.
We should not call this a "type filter" or even a "List", as both of these imply multiple files might be returned from this endpoint. Let's rename this to be something like:
Execution File Output (using type identifier)
That helps to clarify that Type is an identifier here, and only one file will be returned based on the type.
Small change in the way process endpoint exposes files. It uses actual bitstreams instead of having custom output.