PersistentJSONDict survive with empty file#199
Conversation
Current coverage is 89.02% (diff: 100%)@@ master #199 diff @@
==========================================
Files 1 1
Lines 1006 1011 +5
Methods 0 0
Messages 0 0
Branches 164 166 +2
==========================================
+ Hits 895 900 +5
+ Misses 83 82 -1
- Partials 28 29 +1
|
|
Looks good. Could you please
|
|
At least for Python 3.4, a ValueError is raised
https://docs.python.org/3.4/library/json.html#json.load Python 3.5 introduces a subclass json.JSONDecodeError
https://docs.python.org/3.5/library/json.html#json.load There is nothing documented for Python 3.3: https://docs.python.org/3.3/library/json.html#json.load So I think we need to catch ValueError and hope it's the same for Python 3.3. |
|
@webmaster128, yes, it's same. I checked it with python3.3. I think it's better to squash my commits :) |
|
I see that we can easily treat an empty file like a non-existing file. But do we really want to silently throw away broken JSON content? I mean, whenever a user-edited JSON file misses a comma, the entire content is replaced with default values and no error is shown at all. Are half-written JSON files a real world problem? |
|
I can change behaviour:
I know that we had empty file two hours ago after 2-3 aborted execution, and I don't see big difference in probability between empty file and half-written, because root cause is same. I'm thinking mainly about CI system. Nobody should try to edit anything manually there :) |
|
@webmaster128 , yes, it looks like half-written JSON file is real world problem. Traceback (most recent call last):
File "clcache.py", line 1518, in <module>
File "clcache.py", line 1409, in main
File "clcache.py", line 1439, in processCompileRequest
File "clcache.py", line 1472, in processDirect
File "clcache.py", line 129, in getManifest
File "json\__init__.py", line 268, in load
File "json\__init__.py", line 318, in loads
File "json\decoder.py", line 343, in decode
File "json\decoder.py", line 359, in raw_decode
ValueError: Expecting ',' delimiter: line 279 column 112 (char 32708)
Failed to execute script clcacheI know that it's a different part of code, but root cause is same. It's interesting that job was not aborted in this time, so, file should be ok. But it's not. I would very appreciate any ideas how to fix such issues or at least minimise negative effect. Because I think we can't just ignore broken manifest file, because it will break existed cache. |
|
This one is tricky ;) I did not answer yet because I did not have a good idea. Is I think manifest are somehow easy, because you can just throw away broken ones and let them be recreated. But why the hack are your ccache processes crashing? Wouldn't the solution be to run less jobs in parallel or upgrade ram? |
…ent skipping broken json files
Yes, we have a lot of files in repo :)
Very good question, I'm not familiar with Windows IO API at all, so I even don't have any ideas. All calls of
Yes, right now we are running a lot of jobs in parallel, but in any case file could be broken in multiple ways in Continuous Integration System. The most stupid one (and frequently one I think) is |
(off topic but still interesting) The amount of line in a given manifest is the number of included files for one compilation unit (a .c/.cpp file). This is typically 100–1000 includes per compilation unit, no matter how many compilation units there are.
Do you share the cache between different machines? This is not supported |
|
No, not yet at least. My colleague is working on such support, but now we are using pure |
What does IncrediBuild do exactly? Does clcache call IncrediBuild or does IncrediBuild call clcache? |
|
Incredebuild calls ccache |
|
So, I can split this PR on two:
In this way you can merge first PR and keep second one in pending status for awhile. |
|
We need to wait for frerich anyway, but it makes definitely sense to me. Treating empty files like non-existing ones should be safe to do. |
|
Thanks for your work, I'm now back from vacation and slowly catching up on what happened during the last weeks. This PR seems to deal with a special case of the cache getting corrupted, right? I.e. a very similar situation might arise if e.g. an In general, I'd love if we could reduce the chance of this happening in the first place. Maybe it would be sufficient to write the JSON file (or any other files) to a temporary location and do a final 'rename' step (which is very quick) to reduce the likelyhood of being interrupted? That aside, of course it would be good if clcache would not choke on corrupted caches. In the case of the JSON files, if they are unexpected in some way (e.g. they don't exist, the JSON is syntactically invalid or some expected entry is missing) - how about we just log a warning message and go with some defaults? E.g. both empty or missing JSON files correspond to a file with all the fields set to zero. The only concern I have is that it's easy to miss warning messages during a build with thousands of compiler invocations. Future queries for the statistics won't show that these statistics don't actually reflect the real situation, so you may wonder why there should be statistics at all... |
|
First of all, I'm totally agree that it's better to reduce chance to corrupt file, but:
According your suggestion:
|
|
Just short note. On last master still incorrect json could be produced (in this case it was empty file). I will re-work this change to keep the safest behaviour (empty file == doesn't exist) and provide new PR. |
|
I'm sorry, I must admit that the lack of activity in this PR made me kind of forget about it... it would be much appreciated if you could rebase the PR so that we can resume the work again. I'll re-read the past comments later today to refresh my memories of what this is all about :-) |
|
Thanks! I think this is certainly moving things into the right direction. Ideally, I'd love to be able to avoid this issue in the common case (some research suggests that the |
Now PersistentJSONDict stops to work then it finds empty file.
Empty file could be generated due some issues (OOM killer, aborted builds in CI, etc).