Chunked parrallel expiration jobs by butonic · Pull Request #25771 · owncloud/core

butonic · 2016-08-11T13:40:10Z

The current callForAll users iterates over all users in an LDAP directory. It may block other jobs for hours (85k users took me ~3h 17min). This PR restores the initial chunk base aproach and makes it parallelizable.

cc @PVince81 @DeepDiver1975 @mrow4a

mention-bot · 2016-08-11T13:40:12Z

@butonic, thanks for your PR! By analyzing the annotation information on this pull request, we identified @nickvergessen, @VicDeo and @LukasReschke to be potential reviewers

DeepDiver1975 · 2016-08-11T13:43:38Z

apps/files_trashbin/lib/BackgroundJob/ExpireTrash.php

this will not help to quarantee parallelity

It does not help to not explain why ...

Sorry - too many things to do in parallel ....

What you need to do is to read and set the value in an atomic matter so that a second process will get the incremented value.

A transaction will not help in terms of atomicity - only a lock can help.
But I fear locking the appconfig table might have bad side effects ....

What if we use something like "update set cronjob_user_offset=newvalue ... where cronjob_user_offset=oldvalue" and detect if any entry got changed and abort if it didn't, which would mean a concurrent job took over.
Doesn't sound safe though...

butonic · 2016-08-11T13:56:53Z

@DeepDiver1975 jenkins has 5 jobs failing related to federated sharing. Is there already a PR to fix them?

DeepDiver1975 · 2016-08-11T14:37:05Z

@DeepDiver1975 jenkins has 5 jobs failing related to federated sharing. Is there already a PR to fix them?

this was merged to master a few minutes ago with #25769

PVince81 · 2016-08-11T17:56:58Z

I agree with the general idea, hopefully we can find a solution for the offset's atomicity

PVince81 · 2016-08-11T18:59:32Z

Ah interesting, I see we have an API to lock tables: https://github.com/owncloud/core/blob/v9.1.0/lib/private/BackgroundJob/JobList.php#L191

Maybe need to add one to lock rows too ? But might not work cross-DB...

butonic · 2016-08-12T11:05:50Z

We could calculate the offset in the update and use a transaction to get the new value in a subsequent select query:

        $this->connection->beginTransaction();
        $sql = 'UPDATE `*PREFIX*appconfig`
                SET `configvalue` = TO_CLOB(TO_NUMBER(`configvalue`) + ?)
                WHERE `appid` = ? AND `configkey` = ?';
        $this->connection->executeUpdate($sql, array($amount, $appId, $key));

        $sql = 'SELECT `configvalue`
                FROM `*PREFIX*oc_appconfig`
                WHERE `appid` = ? AND `configkey` = ?';
        $result = $this->connection->executeQuery($sql, array($appId, $key));
        $this->connection->commit();

This one is specific to oracle because we need some additional CAST magic but you should get the idea. Calculating the new value in the UPDATE statement makes the change atomic, the transaction guarantees that we have our new value.
I still need to figure out how to atomically reset the offset to 0...

butonic · 2016-08-12T11:44:37Z

I think I have it ... um well ... implemented only for oracle. The principle should work for other dbs as well. I'll wait for a review before adding that. And don't even start with "but this is db specific it should go into the db layer" ... whatever, PR welcome.

Using the update as the first statement in the transaction will cause other jobs to wait for the row lock to be released.

I wait with the commit until I have checked that the user backend has at least one user at the offset.

At least that is my understanding.

@DeepDiver1975 @PVince81 @mrow4a comments?

butonic · 2016-08-12T13:45:12Z

for reference: http://blog.2ndquadrant.com/postgresql-anti-patterns-read-modify-write-cycles/ describes the update problem

PVince81 · 2016-08-15T08:27:52Z

apps/files_trashbin/lib/BackgroundJob/ExpireTrash.php

+		$sql = 'SELECT `configvalue`
+				FROM `*PREFIX*appconfig`
+				WHERE `appid` = ? AND `configkey` = ?';
+		$result = $connection->executeQuery($sql, array('files_trashbin', 'cronjob_user_offset'));


shouldn't we end the transaction here so that another parallel executor can also work ? Else it would block (from my understanding) any other executor on the update row above.

PVince81 · 2016-08-15T08:29:00Z

@butonic I read the article and I understand the problem. If I understand well, you are trying to use the approach that does an UPDATE + increment. However, I don't think it will work with parallelization.

See my comment above, I think parallelization will not happen because a concurrent executor for this job will wait for the transaction to end. Maybe it's ok, then at least we use the old approach and only prevent it from breaking in case of parallelization.

PVince81 · 2016-08-15T08:30:19Z

@butonic how about SELECT FOR UPDATE ? That sounded interesting too and a quick google showed results for Mysql, PostgreSQL, Oracle. Not sure about SQLite though.

butonic · 2016-08-15T08:59:41Z

TL;dr: the transaction may be needed to reset the offset, SELECT FOR UPDATE is not available everywhere.

Longer version: sqlite doesn't have SELECT FOR UPDATE and oracles mechanism requires temporary tables.

What I am doing is not mentioned in the article. It only talks about one of three scenarios:

read-modify-write cycle

Most solutions on the internet try to SELECT a value from a row and then calculate an updated value in code which needs to be written back to the db. Typically the calculation requires additional checks based on other data. This order (SELECT then UPDATE) is problematic because you need to make the two operations atomic. That is what SELECT FOR UPDATE (which creates a row lock) is meant to solve. The other option is to lock the whole table, as @DeepDiver1975 mentioned. The article sheds some light on all of this.

Another scenario often found on the internet is read-after-update

In the article one solution is called "Avoiding the read-modify-write cycle" and it moves the calculation to the database if possible. Since multiple jobs may increase the value a SELECT after an UPDATE is also not guaranteed to give you the result of the update. UPDATE ... RETURNING (postgres, oracle) and UPDATE ... OUTPUT (sql server) are meant to solve that. If we could use it it would solve our problem without having to use transactions. Unfortunately, mysql and sqlite don't. Which brings us to the last scenario and the solution that I propose:

Using a transaction to read back the updated value read-after-update in transaction

An UPDATE will lock the row and any other UPDATE has to wait for the release. Adding a transaction we can guarantee that we read back the value after the UPDATE. We also need to keep the transaction open in case we need to reset the offset to 0. But since we are only locking a single row in the appconfig table I think it is ok for another expiry job to wait for an ldap query.

@PVince81 hope that answers your two questions.

PVince81 · 2016-08-15T09:05:33Z

So if I understand well, it is acceptable to block concurrent jobs that would process the current one.
Considering that our first goal here is moving away from callFromAllUsers() to the old chunking, this solution fits well for now.

PVince81 · 2016-08-15T09:11:02Z

@DeepDiver1975 what do you think ?

…ound job" This reverts commit 9ebae0b.

butonic · 2016-08-16T08:33:53Z

Ok, this is now working on all DBs, @DeepDiver1975 please comment on the atomicity of the sql to move the job offset

jvillafanez · 2016-08-16T10:25:27Z

apps/files_trashbin/lib/BackgroundJob/ExpireTrash.php

+			$offset = $result - self::USERS_PER_SESSION;
+
+			// check if there is at least one user at this offset
+			$users = $this->userManager->search('', 1, $offset);


Any other better option? What is the risk of not finding any user here? Notice that the searches aren't cached, and we might need to connect to the LDAP backend in this case.

jvillafanez · 2016-08-16T11:06:21Z

Reviewed

DeepDiver1975 · 2016-08-16T13:53:34Z

Will have a look on Thursday.

butonic · 2016-08-19T09:14:09Z

Owncloud uses READ_COMMITED transactions, which prevent Dirty Reads.

The UPDATE locks the row and other transactions have to wait until the transaction is complete.
READ_COMMITED prevents other transactions from reading the updates by the first transaction (DIRTY READ). See https://en.wikipedia.org/wiki/Isolation_(database_systems)

AFAICT the logic is atomic and works with parrallel cron jobs.

butonic · 2016-08-19T15:04:02Z

@DeepDiver1975 proposed moving to separate occ based cron jobs for long running tasks. @felixboehm agreed with the approach. I do as well, but I think that will take time. meanwhile this approach improves the situation.

I still want this in and backported to 9.0.5. Waiting for feedback from customer.

DeepDiver1975 · 2016-08-19T15:07:50Z

I do as well, but I think that will take time.

Not really - there are occ commands for both cases available. Admins can add them to their cron tab.

PVince81 · 2016-08-19T15:08:54Z

It would then be good to get this merged soon as we do final 9.0.5 next week.
I agree that it's critical and qualifies for post-RC merge.

DeepDiver1975 · 2016-08-19T15:14:01Z

Not really - there are occ commands for both cases available. Admins can add them to their cron tab.

Okay - occ commands are there for cleanup - not for expiry. Nevertheless - adding the current implementations as occ command is pretty straight forward and to be honest - I'm having major trouble in backporting such a big change.

cdamken · 2016-08-23T19:05:58Z

I'm having major trouble in backporting such a big change.

@DeepDiver1975 would be at least possible to backport to 9.1.1?

IMHO I can see that we made a lot of changes in the background-jobs and would be better to concentrate us improving the newest 2 versions (9.1 and 9.2) than old versions.

What do you think?

DeepDiver1975 · 2016-08-24T07:03:37Z

This change is obsolet for now. We go with the occ commands for now.
We will discusse how to enhance this for future releases.

butonic · 2016-08-24T12:34:11Z

obsoleted by #25878

DeepDiver1975 · 2016-08-24T18:00:49Z

delete branch? @butonic or do you want to keep it?

butonic · 2016-08-24T19:02:13Z

I'd like to keep it because I think we may very well need the atomic update when we redesign this.

butonic · 2016-08-24T19:02:35Z

well the admins can always restore the branch ... so

lock · 2019-08-04T21:02:36Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

butonic added 3 - To Review app:files_versions app:files_trashbin feature:backgroundjobs labels Aug 11, 2016

DeepDiver1975 added this to the 9.2 milestone Aug 11, 2016

DeepDiver1975 reviewed Aug 11, 2016
View reviewed changes

butonic force-pushed the chunked-parrallel-expiration-jobs branch from f9f97bd to bc1daaa Compare August 11, 2016 14:43

butonic force-pushed the chunked-parrallel-expiration-jobs branch from bc1daaa to 08e3687 Compare August 12, 2016 11:37

butonic changed the title ~~Chunked parrallel expiration jobs~~ [WIP] Chunked parrallel expiration jobs Aug 12, 2016

butonic added the green-ticket label Aug 12, 2016

PVince81 reviewed Aug 15, 2016
View reviewed changes

butonic force-pushed the chunked-parrallel-expiration-jobs branch 4 times, most recently from d7e4bea to 57a4d62 Compare August 15, 2016 14:22

butonic added 2 commits August 16, 2016 10:32

Revert "Chunk the users correctly in the trashbin and versions backgr…

1e3a1a9

…ound job" This reverts commit 9ebae0b.

update by calculating next offset, then select it in transaction

68b5fa7

butonic added 4 commits August 16, 2016 10:32

Revert "Chunk the users correctly in the trashbin and versions backgr…

1e3a1a9

…ound job" This reverts commit 9ebae0b.

update by calculating next offset, then select it in transaction

68b5fa7

add other db code and test

6f81a85

CAST for postgres

564c156

butonic force-pushed the chunked-parrallel-expiration-jobs branch from 57a4d62 to 564c156 Compare August 16, 2016 08:32

butonic changed the title ~~[WIP] Chunked parrallel expiration jobs~~ Chunked parrallel expiration jobs Aug 16, 2016

jvillafanez reviewed Aug 16, 2016
View reviewed changes

butonic closed this Aug 24, 2016

butonic deleted the chunked-parrallel-expiration-jobs branch August 24, 2016 19:02

butonic mentioned this pull request Apr 7, 2017

LOCK TABLES can lead to crashes or locks when used with Galera #27071

Closed

lock bot locked as resolved and limited conversation to collaborators Aug 4, 2019

Conversation

butonic commented Aug 11, 2016

Uh oh!

mention-bot commented Aug 11, 2016

Uh oh!

DeepDiver1975 Aug 11, 2016

Choose a reason for hiding this comment

Uh oh!

butonic Aug 11, 2016

Choose a reason for hiding this comment

Uh oh!

DeepDiver1975 Aug 11, 2016

Choose a reason for hiding this comment

Uh oh!

PVince81 Aug 11, 2016

Choose a reason for hiding this comment

Uh oh!

butonic commented Aug 11, 2016

Uh oh!

DeepDiver1975 commented Aug 11, 2016

Uh oh!

PVince81 commented Aug 11, 2016

Uh oh!

PVince81 commented Aug 11, 2016

Uh oh!

butonic commented Aug 12, 2016

Uh oh!

butonic commented Aug 12, 2016

Uh oh!

butonic commented Aug 12, 2016

Uh oh!

PVince81 Aug 15, 2016

Choose a reason for hiding this comment

Uh oh!

PVince81 commented Aug 15, 2016

Uh oh!

PVince81 commented Aug 15, 2016

Uh oh!

butonic commented Aug 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PVince81 commented Aug 15, 2016

Uh oh!

PVince81 commented Aug 15, 2016

Uh oh!

butonic commented Aug 16, 2016

Uh oh!

jvillafanez Aug 16, 2016

Choose a reason for hiding this comment

Uh oh!

jvillafanez commented Aug 16, 2016

Uh oh!

DeepDiver1975 commented Aug 16, 2016

Uh oh!

butonic commented Aug 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

butonic commented Aug 19, 2016

Uh oh!

DeepDiver1975 commented Aug 19, 2016

Uh oh!

PVince81 commented Aug 19, 2016

Uh oh!

DeepDiver1975 commented Aug 19, 2016

Uh oh!

cdamken commented Aug 23, 2016

Uh oh!

DeepDiver1975 commented Aug 24, 2016

Uh oh!

butonic commented Aug 24, 2016

Uh oh!

DeepDiver1975 commented Aug 24, 2016

Uh oh!

butonic commented Aug 24, 2016

Uh oh!

butonic commented Aug 24, 2016

Uh oh!

lock bot commented Aug 4, 2019

Uh oh!

Reviewers

butonic commented Aug 15, 2016 •

edited

Loading

butonic commented Aug 19, 2016 •

edited

Loading