Add maintenance:repair step for file size of unencrypted files#36220
Add maintenance:repair step for file size of unencrypted files#36220simonspa wants to merge 7 commits intonextcloud:masterfrom
Conversation
Signed-off-by: Simon Spannagel <simonspa@kth.se>
Signed-off-by: Simon Spannagel <simonspa@kth.se>
Signed-off-by: Simon Spannagel <simonspa@kth.se>
Signed-off-by: Simon Spannagel <simonspa@kth.se>
PVince81
left a comment
There was a problem hiding this comment.
👍 from me, I hope the second reviewer knows more about DB perf
good luck appeasing the bots 😄
thanks a lot!
Co-authored-by: Anna <anna@nextcloud.com> Signed-off-by: simonspa <1677436+simonspa@users.noreply.github.com>
|
Drone failures look related |
| ->where($update->expr()->eq('encrypted', "0", IQueryBuilder::PARAM_INT)) | ||
| ->andWhere($update->expr()->neq('unencrypted_size', "0", IQueryBuilder::PARAM_INT)); |
There was a problem hiding this comment.
There is no index on neither of those columns, so we may want to add at least to the unencrypted size.
I think the most safe way to use such an index then would be:
| ->where($update->expr()->eq('encrypted', "0", IQueryBuilder::PARAM_INT)) | |
| ->andWhere($update->expr()->neq('unencrypted_size', "0", IQueryBuilder::PARAM_INT)); | |
| ->andWhere($update->expr()->gt('unencrypted_size', "0", IQueryBuilder::PARAM_INT)); | |
| ->andWhere($update->expr()->eq('encrypted', "0", IQueryBuilder::PARAM_INT)) |
There was a problem hiding this comment.
For the reasoning:
- neq may not use the index
- the order of where statements sometimes also matters to decide if the index is applied
There was a problem hiding this comment.
EXPLAIN of the current query does a full table scan:
explain update oc_filecache set unencrypted_size = 0 where encrypted = 0 and unencrypted_size != 0;
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | oc_filecache | index | NULL | PRIMARY | 8 | NULL | 8479 | Using where |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
There was a problem hiding this comment.
After reversal of arguments and using > query affects smaller number of rows:
explain update oc_filecache set unencrypted_size = 0 where unencrypted_size > 0 and encrypted = 0;
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | oc_filecache | index | NULL | PRIMARY | 8 | NULL | 490 | Using where |
+------+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
There was a problem hiding this comment.
I implemented the suggested change - but adding an index to either of these columns would have to done separately, and probably run manually, right? How would that work together with an automatic repair step?
Signed-off-by: Simon Spannagel <simonspa@kth.se>
574181b to
43da5b0
Compare
|
Performance tests is failing with
|
|
@simonspa this PR looks nice :) I'm reopening in the hopes that you're still interested in pushing this forward! Since you're a member of Nextcloud, could you recreate this PR in a branch directly in the nextcoud/server repo? Feel free and encouraged to check the Nextcloud Commit guidelines. Hope we can keep this cool improvement alive! 🚀 |
Summary
This resets the
unencrypted_sizeof files withencryption = 0to0. As described in the linked issue at length, there seem to be setups that either have or had encryption enabled in the past, where even for unencrypted files theunencrypted_sizecolumn shows non-zero values.These seem to completely confuse the quota calculation and result in partially unusable systems because users might have a huge quota usage reported while the actual used size is just a few MB.
TODO
Checklist