Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

PersistentJSONDict survive with empty file#199

Merged
frerich merged 9 commits into
frerich:masterfrom
Jimilian:master
Nov 23, 2016
Merged

PersistentJSONDict survive with empty file#199
frerich merged 9 commits into
frerich:masterfrom
Jimilian:master

Conversation

@Jimilian
Copy link
Copy Markdown
Contributor

Now PersistentJSONDict stops to work then it finds empty file.
Empty file could be generated due some issues (OOM killer, aborted builds in CI, etc).

@codecov-io
Copy link
Copy Markdown

codecov-io commented Aug 10, 2016

Current coverage is 89.02% (diff: 100%)

Merging #199 into master will increase coverage by 0.05%

@@             master       #199   diff @@
==========================================
  Files             1          1          
  Lines          1006       1011     +5   
  Methods           0          0          
  Messages          0          0          
  Branches        164        166     +2   
==========================================
+ Hits            895        900     +5   
+ Misses           83         82     -1   
- Partials         28         29     +1   

Powered by Codecov. Last update a84b2ae...166551a

@webmaster128
Copy link
Copy Markdown
Contributor

Looks good. Could you please

  1. Move test to unittests. integrationtests are used to call clcache via the command line interface.
  2. Check in an empty_file.txt in git and don't create it in the test. This simplifies the test code

@webmaster128
Copy link
Copy Markdown
Contributor

At least for Python 3.4, a ValueError is raised

If the data being deserialized is not a valid JSON document, a ValueError will be raised.

https://docs.python.org/3.4/library/json.html#json.load

Python 3.5 introduces a subclass json.JSONDecodeError

If the data being deserialized is not a valid JSON document, a JSONDecodeError will be raised.

https://docs.python.org/3.5/library/json.html#json.load

There is nothing documented for Python 3.3: https://docs.python.org/3.3/library/json.html#json.load

So I think we need to catch ValueError and hope it's the same for Python 3.3.

@Jimilian
Copy link
Copy Markdown
Contributor Author

@webmaster128, yes, it's same. I checked it with python3.3. I think it's better to squash my commits :)

@webmaster128
Copy link
Copy Markdown
Contributor

I see that we can easily treat an empty file like a non-existing file. But do we really want to silently throw away broken JSON content? I mean, whenever a user-edited JSON file misses a comma, the entire content is replaced with default values and no error is shown at all.

Are half-written JSON files a real world problem?

@Jimilian
Copy link
Copy Markdown
Contributor Author

Jimilian commented Aug 10, 2016

I can change behaviour:

  • if content is broken - print content, warning and continue execution.
  • if file is empty - just ignore.

I know that we had empty file two hours ago after 2-3 aborted execution, and I don't see big difference in probability between empty file and half-written, because root cause is same.

I'm thinking mainly about CI system. Nobody should try to edit anything manually there :)

@Jimilian
Copy link
Copy Markdown
Contributor Author

@webmaster128 , yes, it looks like half-written JSON file is real world problem.

         Traceback (most recent call last):
           File "clcache.py", line 1518, in <module>
           File "clcache.py", line 1409, in main
           File "clcache.py", line 1439, in processCompileRequest
           File "clcache.py", line 1472, in processDirect
           File "clcache.py", line 129, in getManifest
           File "json\__init__.py", line 268, in load
           File "json\__init__.py", line 318, in loads
           File "json\decoder.py", line 343, in decode
           File "json\decoder.py", line 359, in raw_decode
         ValueError: Expecting ',' delimiter: line 279 column 112 (char 32708)
         Failed to execute script clcache

I know that it's a different part of code, but root cause is same. It's interesting that job was not aborted in this time, so, file should be ok. But it's not.

I would very appreciate any ideas how to fix such issues or at least minimise negative effect. Because I think we can't just ignore broken manifest file, because it will break existed cache.

@webmaster128
Copy link
Copy Markdown
Contributor

This one is tricky ;) I did not answer yet because I did not have a good idea.

Is line 279 column 112 (char 32708) the position in the JSON file?

I think manifest are somehow easy, because you can just throw away broken ones and let them be recreated.

But why the hack are your ccache processes crashing? Wouldn't the solution be to run less jobs in parallel or upgrade ram?

@Jimilian
Copy link
Copy Markdown
Contributor Author

Jimilian commented Aug 11, 2016

Is line 279 column 112 (char 32708) the position in the JSON file?

Yes, we have a lot of files in repo :)
So, it's unreal to write file in one call.

But why the hack are your ccache processes crashing?

Very good question, I'm not familiar with Windows IO API at all, so I even don't have any ideas. All calls of manifestSection.setManifest is under lock defence. json.dump didn't raise any exception, it means that system call passed and returned OK. So I really don't know why...

Wouldn't the solution be to run less jobs in parallel or upgrade ram?

Yes, right now we are running a lot of jobs in parallel, but in any case file could be broken in multiple ways in Continuous Integration System. The most stupid one (and frequently one I think) is not enough free space. We can decrease number of jobs or rent new nodes, but it would not guarantee anything. Because in last case build passed, clcache didn't print any issues, but manifest file became broken.

@webmaster128
Copy link
Copy Markdown
Contributor

Yes, we have a lot of files in repo :)

(off topic but still interesting) The amount of line in a given manifest is the number of included files for one compilation unit (a .c/.cpp file). This is typically 100–1000 includes per compilation unit, no matter how many compilation units there are.

We can decrease number of jobs or rent new nodes

Do you share the cache between different machines? This is not supported

@Jimilian
Copy link
Copy Markdown
Contributor Author

No, not yet at least. My colleague is working on such support, but now we are using pure clcache.pyfrom master. Each node has one instance of cache and each node can execute only one build at same time. One thing that could be interesting - we are using clcache as a frontend for Incdedibuild. That's why we have a lot of jobs/processes.

@webmaster128
Copy link
Copy Markdown
Contributor

we are using clcache as a frontend for Incdedibuild. That's why we have a lot of jobs/processes

What does IncrediBuild do exactly? Does clcache call IncrediBuild or does IncrediBuild call clcache?

@Jimilian
Copy link
Copy Markdown
Contributor Author

Incredebuild calls ccache

@Jimilian
Copy link
Copy Markdown
Contributor Author

So, I can split this PR on two:

  • ignore empty files
  • ignore broken files

In this way you can merge first PR and keep second one in pending status for awhile.

@webmaster128
Copy link
Copy Markdown
Contributor

We need to wait for frerich anyway, but it makes definitely sense to me. Treating empty files like non-existing ones should be safe to do.

@frerich
Copy link
Copy Markdown
Owner

frerich commented Aug 22, 2016

Thanks for your work, I'm now back from vacation and slowly catching up on what happened during the last weeks.

This PR seems to deal with a special case of the cache getting corrupted, right? I.e. a very similar situation might arise if e.g. an object file is missing from a cache entry or something like that.

In general, I'd love if we could reduce the chance of this happening in the first place. Maybe it would be sufficient to write the JSON file (or any other files) to a temporary location and do a final 'rename' step (which is very quick) to reduce the likelyhood of being interrupted?

That aside, of course it would be good if clcache would not choke on corrupted caches. In the case of the JSON files, if they are unexpected in some way (e.g. they don't exist, the JSON is syntactically invalid or some expected entry is missing) - how about we just log a warning message and go with some defaults? E.g. both empty or missing JSON files correspond to a file with all the fields set to zero.

The only concern I have is that it's easy to miss warning messages during a build with thousands of compiler invocations. Future queries for the statistics won't show that these statistics don't actually reflect the real situation, so you may wonder why there should be statistics at all...

@Jimilian
Copy link
Copy Markdown
Contributor Author

First of all, I'm totally agree that it's better to reduce chance to corrupt file, but:

  1. I'm not sure that rename is atomic on Windows( http://stackoverflow.com/questions/167414/is-an-atomic-file-rename-with-overwrite-possible-on-windows).
  2. I'm not sure that we can achieve it (because some bugs would be in code in any case)

According your suggestion:

  1. We are using clcache in our CI system. We build more than 5k builds with clcache + this PR and everything looks good. Without this PR node started to fail after 10 builds, because it was not able to recover json. If I understood correctly, empty file / default value and non-existed file are same thing on the most of cases.
  2. We can introduce new statistics - something like "clcache warnings" - and increment it only in such cases.

@Jimilian
Copy link
Copy Markdown
Contributor Author

Just short note. On last master still incorrect json could be produced (in this case it was empty file).

13:28:23 Traceback (most recent call last):
13:28:23   File "C:\Python34\lib\site-packages\cx_Freeze\initscripts\Console.py", line 27, in <module>
13:28:23   File "clcache.py", line 1639, in <module>
13:28:23   File "clcache.py", line 1466, in main
13:28:23   File "clcache.py", line 1317, in cleanCache
13:28:23   File "clcache.py", line 597, in __enter__
13:28:23   File "clcache.py", line 508, in __init__
13:28:23   File "C:\Python34\lib\json\__init__.py", line 268, in load
13:28:23   File "C:\Python34\lib\json\__init__.py", line 318, in loads
13:28:23   File "C:\Python34\lib\json\decoder.py", line 343, in decode
13:28:23   File "C:\Python34\lib\json\decoder.py", line 361, in raw_decode
13:28:23   ValueError: Expecting value: line 1 column 1 (char 0)

I will re-work this change to keep the safest behaviour (empty file == doesn't exist) and provide new PR.

@frerich
Copy link
Copy Markdown
Owner

frerich commented Nov 23, 2016

I'm sorry, I must admit that the lack of activity in this PR made me kind of forget about it... it would be much appreciated if you could rebase the PR so that we can resume the work again. I'll re-read the past comments later today to refresh my memories of what this is all about :-)

@frerich frerich merged commit 41354f1 into frerich:master Nov 23, 2016
@frerich
Copy link
Copy Markdown
Owner

frerich commented Nov 23, 2016

Thanks! I think this is certainly moving things into the right direction. Ideally, I'd love to be able to avoid this issue in the common case (some research suggests that the ReplaceFile function on Windows would be useful for this purpose), but it's better to have at least some handling in place than none. Thanks a lot for bringing this up again1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants