Skip to content

Chunked parrallel expiration jobs#25771

Closed
butonic wants to merge 4 commits intomasterfrom
chunked-parrallel-expiration-jobs
Closed

Chunked parrallel expiration jobs#25771
butonic wants to merge 4 commits intomasterfrom
chunked-parrallel-expiration-jobs

Conversation

@butonic
Copy link
Contributor

@butonic butonic commented Aug 11, 2016

The current callForAll users iterates over all users in an LDAP directory. It may block other jobs for hours (85k users took me ~3h 17min). This PR restores the initial chunk base aproach and makes it parallelizable.

cc @PVince81 @DeepDiver1975 @mrow4a

@mention-bot
Copy link

@butonic, thanks for your PR! By analyzing the annotation information on this pull request, we identified @nickvergessen, @VicDeo and @LukasReschke to be potential reviewers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not help to quarantee parallelity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not help to not explain why ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - too many things to do in parallel ....

What you need to do is to read and set the value in an atomic matter so that a second process will get the incremented value.

A transaction will not help in terms of atomicity - only a lock can help.
But I fear locking the appconfig table might have bad side effects ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we use something like "update set cronjob_user_offset=newvalue ... where cronjob_user_offset=oldvalue" and detect if any entry got changed and abort if it didn't, which would mean a concurrent job took over.
Doesn't sound safe though...

@butonic
Copy link
Contributor Author

butonic commented Aug 11, 2016

@DeepDiver1975 jenkins has 5 jobs failing related to federated sharing. Is there already a PR to fix them?

@DeepDiver1975
Copy link
Member

@DeepDiver1975 jenkins has 5 jobs failing related to federated sharing. Is there already a PR to fix them?

this was merged to master a few minutes ago with #25769

@butonic butonic force-pushed the chunked-parrallel-expiration-jobs branch from f9f97bd to bc1daaa Compare August 11, 2016 14:43
@PVince81
Copy link
Contributor

I agree with the general idea, hopefully we can find a solution for the offset's atomicity

@PVince81
Copy link
Contributor

Ah interesting, I see we have an API to lock tables: https://github.com/owncloud/core/blob/v9.1.0/lib/private/BackgroundJob/JobList.php#L191

Maybe need to add one to lock rows too ? But might not work cross-DB...

@butonic
Copy link
Contributor Author

butonic commented Aug 12, 2016

We could calculate the offset in the update and use a transaction to get the new value in a subsequent select query:

        $this->connection->beginTransaction();
        $sql = 'UPDATE `*PREFIX*appconfig`
                SET `configvalue` = TO_CLOB(TO_NUMBER(`configvalue`) + ?)
                WHERE `appid` = ? AND `configkey` = ?';
        $this->connection->executeUpdate($sql, array($amount, $appId, $key));

        $sql = 'SELECT `configvalue`
                FROM `*PREFIX*oc_appconfig`
                WHERE `appid` = ? AND `configkey` = ?';
        $result = $this->connection->executeQuery($sql, array($appId, $key));
        $this->connection->commit();

This one is specific to oracle because we need some additional CAST magic but you should get the idea. Calculating the new value in the UPDATE statement makes the change atomic, the transaction guarantees that we have our new value.
I still need to figure out how to atomically reset the offset to 0...

@butonic butonic force-pushed the chunked-parrallel-expiration-jobs branch from bc1daaa to 08e3687 Compare August 12, 2016 11:37
@butonic
Copy link
Contributor Author

butonic commented Aug 12, 2016

I think I have it ... um well ... implemented only for oracle. The principle should work for other dbs as well. I'll wait for a review before adding that. And don't even start with "but this is db specific it should go into the db layer" ... whatever, PR welcome.

Using the update as the first statement in the transaction will cause other jobs to wait for the row lock to be released.

I wait with the commit until I have checked that the user backend has at least one user at the offset.

At least that is my understanding.

@DeepDiver1975 @PVince81 @mrow4a comments?

@butonic butonic changed the title Chunked parrallel expiration jobs [WIP] Chunked parrallel expiration jobs Aug 12, 2016
@butonic
Copy link
Contributor Author

butonic commented Aug 12, 2016

for reference: http://blog.2ndquadrant.com/postgresql-anti-patterns-read-modify-write-cycles/ describes the update problem

$sql = 'SELECT `configvalue`
FROM `*PREFIX*appconfig`
WHERE `appid` = ? AND `configkey` = ?';
$result = $connection->executeQuery($sql, array('files_trashbin', 'cronjob_user_offset'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we end the transaction here so that another parallel executor can also work ? Else it would block (from my understanding) any other executor on the update row above.

@PVince81
Copy link
Contributor

@butonic I read the article and I understand the problem. If I understand well, you are trying to use the approach that does an UPDATE + increment. However, I don't think it will work with parallelization.

See my comment above, I think parallelization will not happen because a concurrent executor for this job will wait for the transaction to end. Maybe it's ok, then at least we use the old approach and only prevent it from breaking in case of parallelization.

@PVince81
Copy link
Contributor

@butonic how about SELECT FOR UPDATE ? That sounded interesting too and a quick google showed results for Mysql, PostgreSQL, Oracle. Not sure about SQLite though.

@butonic
Copy link
Contributor Author

butonic commented Aug 15, 2016

TL;dr: the transaction may be needed to reset the offset, SELECT FOR UPDATE is not available everywhere.

Longer version: sqlite doesn't have SELECT FOR UPDATE and oracles mechanism requires temporary tables.

What I am doing is not mentioned in the article. It only talks about one of three scenarios:

read-modify-write cycle

Most solutions on the internet try to SELECT a value from a row and then calculate an updated value in code which needs to be written back to the db. Typically the calculation requires additional checks based on other data. This order (SELECT then UPDATE) is problematic because you need to make the two operations atomic. That is what SELECT FOR UPDATE (which creates a row lock) is meant to solve. The other option is to lock the whole table, as @DeepDiver1975 mentioned. The article sheds some light on all of this.

Another scenario often found on the internet is read-after-update

In the article one solution is called "Avoiding the read-modify-write cycle" and it moves the calculation to the database if possible. Since multiple jobs may increase the value a SELECT after an UPDATE is also not guaranteed to give you the result of the update. UPDATE ... RETURNING (postgres, oracle) and UPDATE ... OUTPUT (sql server) are meant to solve that. If we could use it it would solve our problem without having to use transactions. Unfortunately, mysql and sqlite don't. Which brings us to the last scenario and the solution that I propose:

Using a transaction to read back the updated value read-after-update in transaction

An UPDATE will lock the row and any other UPDATE has to wait for the release. Adding a transaction we can guarantee that we read back the value after the UPDATE. We also need to keep the transaction open in case we need to reset the offset to 0. But since we are only locking a single row in the appconfig table I think it is ok for another expiry job to wait for an ldap query.

@PVince81 hope that answers your two questions.

@PVince81
Copy link
Contributor

So if I understand well, it is acceptable to block concurrent jobs that would process the current one.
Considering that our first goal here is moving away from callFromAllUsers() to the old chunking, this solution fits well for now.

@PVince81
Copy link
Contributor

@DeepDiver1975 what do you think ?

@butonic butonic force-pushed the chunked-parrallel-expiration-jobs branch 4 times, most recently from d7e4bea to 57a4d62 Compare August 15, 2016 14:22
@butonic butonic force-pushed the chunked-parrallel-expiration-jobs branch from 57a4d62 to 564c156 Compare August 16, 2016 08:32
@butonic butonic changed the title [WIP] Chunked parrallel expiration jobs Chunked parrallel expiration jobs Aug 16, 2016
@butonic
Copy link
Contributor Author

butonic commented Aug 16, 2016

Ok, this is now working on all DBs, @DeepDiver1975 please comment on the atomicity of the sql to move the job offset

$offset = $result - self::USERS_PER_SESSION;

// check if there is at least one user at this offset
$users = $this->userManager->search('', 1, $offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other better option? What is the risk of not finding any user here? Notice that the searches aren't cached, and we might need to connect to the LDAP backend in this case.

@jvillafanez
Copy link
Member

Reviewed

@DeepDiver1975
Copy link
Member

Will have a look on Thursday.

@butonic
Copy link
Contributor Author

butonic commented Aug 19, 2016

Owncloud uses READ_COMMITED transactions, which prevent Dirty Reads.

The UPDATE locks the row and other transactions have to wait until the transaction is complete.
READ_COMMITED prevents other transactions from reading the updates by the first transaction (DIRTY READ). See https://en.wikipedia.org/wiki/Isolation_(database_systems)

AFAICT the logic is atomic and works with parrallel cron jobs.

@butonic
Copy link
Contributor Author

butonic commented Aug 19, 2016

@DeepDiver1975 proposed moving to separate occ based cron jobs for long running tasks. @felixboehm agreed with the approach. I do as well, but I think that will take time. meanwhile this approach improves the situation.

I still want this in and backported to 9.0.5. Waiting for feedback from customer.

@DeepDiver1975
Copy link
Member

I do as well, but I think that will take time.

Not really - there are occ commands for both cases available. Admins can add them to their cron tab.

@PVince81
Copy link
Contributor

It would then be good to get this merged soon as we do final 9.0.5 next week.
I agree that it's critical and qualifies for post-RC merge.

@DeepDiver1975
Copy link
Member

Not really - there are occ commands for both cases available. Admins can add them to their cron tab.

Okay - occ commands are there for cleanup - not for expiry. Nevertheless - adding the current implementations as occ command is pretty straight forward and to be honest - I'm having major trouble in backporting such a big change.

@cdamken
Copy link
Contributor

cdamken commented Aug 23, 2016

I'm having major trouble in backporting such a big change.

@DeepDiver1975 would be at least possible to backport to 9.1.1?

IMHO I can see that we made a lot of changes in the background-jobs and would be better to concentrate us improving the newest 2 versions (9.1 and 9.2) than old versions.

What do you think?

@DeepDiver1975
Copy link
Member

This change is obsolet for now. We go with the occ commands for now.
We will discusse how to enhance this for future releases.

@butonic
Copy link
Contributor Author

butonic commented Aug 24, 2016

obsoleted by #25878

@butonic butonic closed this Aug 24, 2016
@DeepDiver1975
Copy link
Member

delete branch? @butonic or do you want to keep it?

@butonic
Copy link
Contributor Author

butonic commented Aug 24, 2016

I'd like to keep it because I think we may very well need the atomic update when we redesign this.

@butonic
Copy link
Contributor Author

butonic commented Aug 24, 2016

well the admins can always restore the branch ... so

@butonic butonic deleted the chunked-parrallel-expiration-jobs branch August 24, 2016 19:02
@lock
Copy link

lock bot commented Aug 4, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants