Skip to content

cache: serialize data emit #85

@salewski

Description

@salewski
$ ads-github-cache --version
ads-github-cache 0.3.4  (built: 2022-10-16 20:39:02)

Copyright (C) 2020, 2021, 2022 Alan D. Salewski <ads@salewski.email>
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Alan D. Salewski.

In issue #80 and issue #81, the ads-github-cache tool was given the ability to concurrently fetch data from the remote GitHub v3 API, which sped up the update operation considerably. However, we inadvertently broke operations that emitted the cached data, when operating in the "online" cache mode (which basically follows a pull-data-then-print approach). This can result in some of the cached data being emitted by several processes simultaneously, with the output intermixed (and therefore broken). Such behavior can sometimes be observed by simply piping the JSON data from an endpoint path into the jq tool:

    $ ads-github-cache --get-cached '/user/repos' | jq '.'
    parse error: Invalid numeric literal at line 1, column 94214

The problem noted in the jq error message is that the JSON structure has been corrupted; the particular error could be different, depending on the how the output happens to be intermixed on any given run of the program.

If we artificially limit the number of background processes to one, then we effectively serialize the output and works around the problem.

    $ ads-github-cache -j 1 --get-cached '/user/repos' | jq '.'
    <works>

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions