Conversation
|
@butonic, thanks for your PR! By analyzing the annotation information on this pull request, we identified @nickvergessen, @VicDeo and @LukasReschke to be potential reviewers |
There was a problem hiding this comment.
this will not help to quarantee parallelity
There was a problem hiding this comment.
It does not help to not explain why ...
There was a problem hiding this comment.
Sorry - too many things to do in parallel ....
What you need to do is to read and set the value in an atomic matter so that a second process will get the incremented value.
A transaction will not help in terms of atomicity - only a lock can help.
But I fear locking the appconfig table might have bad side effects ....
There was a problem hiding this comment.
What if we use something like "update set cronjob_user_offset=newvalue ... where cronjob_user_offset=oldvalue" and detect if any entry got changed and abort if it didn't, which would mean a concurrent job took over.
Doesn't sound safe though...
|
@DeepDiver1975 jenkins has 5 jobs failing related to federated sharing. Is there already a PR to fix them? |
this was merged to master a few minutes ago with #25769 |
f9f97bd to
bc1daaa
Compare
|
I agree with the general idea, hopefully we can find a solution for the offset's atomicity |
|
Ah interesting, I see we have an API to lock tables: https://github.com/owncloud/core/blob/v9.1.0/lib/private/BackgroundJob/JobList.php#L191 Maybe need to add one to lock rows too ? But might not work cross-DB... |
|
We could calculate the offset in the update and use a transaction to get the new value in a subsequent select query: $this->connection->beginTransaction();
$sql = 'UPDATE `*PREFIX*appconfig`
SET `configvalue` = TO_CLOB(TO_NUMBER(`configvalue`) + ?)
WHERE `appid` = ? AND `configkey` = ?';
$this->connection->executeUpdate($sql, array($amount, $appId, $key));
$sql = 'SELECT `configvalue`
FROM `*PREFIX*oc_appconfig`
WHERE `appid` = ? AND `configkey` = ?';
$result = $this->connection->executeQuery($sql, array($appId, $key));
$this->connection->commit();This one is specific to oracle because we need some additional CAST magic but you should get the idea. Calculating the new value in the UPDATE statement makes the change atomic, the transaction guarantees that we have our new value. |
bc1daaa to
08e3687
Compare
|
I think I have it ... um well ... implemented only for oracle. The principle should work for other dbs as well. I'll wait for a review before adding that. And don't even start with "but this is db specific it should go into the db layer" ... whatever, PR welcome. Using the update as the first statement in the transaction will cause other jobs to wait for the row lock to be released. I wait with the commit until I have checked that the user backend has at least one user at the offset. At least that is my understanding. @DeepDiver1975 @PVince81 @mrow4a comments? |
|
for reference: http://blog.2ndquadrant.com/postgresql-anti-patterns-read-modify-write-cycles/ describes the update problem |
| $sql = 'SELECT `configvalue` | ||
| FROM `*PREFIX*appconfig` | ||
| WHERE `appid` = ? AND `configkey` = ?'; | ||
| $result = $connection->executeQuery($sql, array('files_trashbin', 'cronjob_user_offset')); |
There was a problem hiding this comment.
shouldn't we end the transaction here so that another parallel executor can also work ? Else it would block (from my understanding) any other executor on the update row above.
|
@butonic I read the article and I understand the problem. If I understand well, you are trying to use the approach that does an UPDATE + increment. However, I don't think it will work with parallelization. See my comment above, I think parallelization will not happen because a concurrent executor for this job will wait for the transaction to end. Maybe it's ok, then at least we use the old approach and only prevent it from breaking in case of parallelization. |
|
@butonic how about SELECT FOR UPDATE ? That sounded interesting too and a quick google showed results for Mysql, PostgreSQL, Oracle. Not sure about SQLite though. |
|
TL;dr: the transaction may be needed to reset the offset, SELECT FOR UPDATE is not available everywhere. Longer version: sqlite doesn't have SELECT FOR UPDATE and oracles mechanism requires temporary tables. What I am doing is not mentioned in the article. It only talks about one of three scenarios: read-modify-write cycle Most solutions on the internet try to SELECT a value from a row and then calculate an updated value in code which needs to be written back to the db. Typically the calculation requires additional checks based on other data. This order (SELECT then UPDATE) is problematic because you need to make the two operations atomic. That is what SELECT FOR UPDATE (which creates a row lock) is meant to solve. The other option is to lock the whole table, as @DeepDiver1975 mentioned. The article sheds some light on all of this. Another scenario often found on the internet is read-after-update In the article one solution is called "Avoiding the read-modify-write cycle" and it moves the calculation to the database if possible. Since multiple jobs may increase the value a SELECT after an UPDATE is also not guaranteed to give you the result of the update. UPDATE ... RETURNING (postgres, oracle) and UPDATE ... OUTPUT (sql server) are meant to solve that. If we could use it it would solve our problem without having to use transactions. Unfortunately, mysql and sqlite don't. Which brings us to the last scenario and the solution that I propose: Using a transaction to read back the updated value read-after-update in transaction An UPDATE will lock the row and any other UPDATE has to wait for the release. Adding a transaction we can guarantee that we read back the value after the UPDATE. We also need to keep the transaction open in case we need to reset the offset to 0. But since we are only locking a single row in the appconfig table I think it is ok for another expiry job to wait for an ldap query. @PVince81 hope that answers your two questions. |
|
So if I understand well, it is acceptable to block concurrent jobs that would process the current one. |
|
@DeepDiver1975 what do you think ? |
d7e4bea to
57a4d62
Compare
57a4d62 to
564c156
Compare
|
Ok, this is now working on all DBs, @DeepDiver1975 please comment on the atomicity of the sql to move the job offset |
| $offset = $result - self::USERS_PER_SESSION; | ||
|
|
||
| // check if there is at least one user at this offset | ||
| $users = $this->userManager->search('', 1, $offset); |
There was a problem hiding this comment.
Any other better option? What is the risk of not finding any user here? Notice that the searches aren't cached, and we might need to connect to the LDAP backend in this case.
|
Reviewed |
|
Will have a look on Thursday. |
|
Owncloud uses READ_COMMITED transactions, which prevent Dirty Reads. The UPDATE locks the row and other transactions have to wait until the transaction is complete. AFAICT the logic is atomic and works with parrallel cron jobs. |
|
@DeepDiver1975 proposed moving to separate occ based cron jobs for long running tasks. @felixboehm agreed with the approach. I do as well, but I think that will take time. meanwhile this approach improves the situation. I still want this in and backported to 9.0.5. Waiting for feedback from customer. |
Not really - there are occ commands for both cases available. Admins can add them to their cron tab. |
|
It would then be good to get this merged soon as we do final 9.0.5 next week. |
Okay - occ commands are there for cleanup - not for expiry. Nevertheless - adding the current implementations as occ command is pretty straight forward and to be honest - I'm having major trouble in backporting such a big change. |
@DeepDiver1975 would be at least possible to backport to 9.1.1? IMHO I can see that we made a lot of changes in the background-jobs and would be better to concentrate us improving the newest 2 versions (9.1 and 9.2) than old versions. What do you think? |
|
This change is obsolet for now. We go with the occ commands for now. |
|
obsoleted by #25878 |
|
delete branch? @butonic or do you want to keep it? |
|
I'd like to keep it because I think we may very well need the atomic update when we redesign this. |
|
well the admins can always restore the branch ... so |
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
The current callForAll users iterates over all users in an LDAP directory. It may block other jobs for hours (85k users took me ~3h 17min). This PR restores the initial chunk base aproach and makes it parallelizable.
cc @PVince81 @DeepDiver1975 @mrow4a