Skip to content

Comments

Introducing emails and pubkeys stores instead of contacts#3445

Merged
tomholub merged 63 commits intomasterfrom
issue-3332-contacts-refactoring
Apr 3, 2021
Merged

Introducing emails and pubkeys stores instead of contacts#3445
tomholub merged 63 commits intomasterfrom
issue-3332-contacts-refactoring

Conversation

@rrrooommmaaa
Copy link
Contributor

@rrrooommmaaa rrrooommmaaa commented Feb 19, 2021

This PR uses search index on searchable field instead of index_has_pgp, so we can remove has_pgp field.
This particular step can help us measure performance as compared to index_has_pgp usage

close #3332

@rrrooommmaaa
Copy link
Contributor Author

@tomholub What do you think about it? As each array element in searchable is prefixed with either t: for has_pgp=1 or f: for has_pgp=0, we can use range search f: up to t: exclusive (or simply up to t:) instead of former has_pgp=0 search, and t: inclusive upwards instead of has_pgp=1, so we can eliminate the need for has_pgp field as you suggested (the prefix will be calculated based on fingerprints length). It may be slower due to the fact that searchable has many more values, do we need benchmarks?
Do you know of any tests in place that actually check correctness of search based solely on has_pgp field
or should I add one?

Copy link
Collaborator

@tomholub tomholub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A clever hack. This would have to be documented better, don't be afraid to write some prose because it's otherwise not obvious at all.

For tests, try to give it knowingly wrong behavior, and see if anything breaks. I imagine something should break.

@rrrooommmaaa
Copy link
Contributor Author

@tomholub I have populated the store with 12500 dummy records, half of them has_pgp=0, the other half -- has_pgp=1.
Each searchable array field has about 50 entries, so search index is 50 times larger than index_has_pgp (counting the number of keys).
So range search on (... up to 't:') correctly yielded 6250 records of has_pgp=0 in about 26 seconds, same with ('t:' onwards).
The index_has_pgp search took a fraction of second.
However, we can optimize searchable in a way that we no longer have ALL letters entries, e.g. it's enough to have
"t:tom" instead of "t:t", "t:to" and "t:tom", so we can do a range search ("t:t" up to "t:u" exlusive) when user types "t", ("t:to" up to "t:tp" exclusive) when user types "to" etc., so "t:tom" will be found in all cases.
Thus the size of search index will decrease (and speed up) tenfold.
So the cost of not having has_pgp field be about 1 extra second per 1000 entries. Note that this is true for has_pgp only searches, used in a couple in places, like 'Export All Keys'.
Compose uses has_pgp AND substring search, so search index is used there anyway -- no performance loss happens (On the opposite -- performance gain will happen if we reduce the size of search index as I suggested).
Thoughts?

@tomholub
Copy link
Collaborator

You can try this. The code should be commented thoroughly so that it's clear what is it trying to do, and why.

@rrrooommmaaa rrrooommmaaa changed the title refactored out index_has_pgp index usage Introducing emails and pubkeys stores instead of contacts Feb 26, 2021
@tomholub
Copy link
Collaborator

tomholub commented Mar 5, 2021

For the client field - it was used to decide whether to attach public key or not. Let's have the following logic:

  • if all recipients are on the same domain as me, don't need to attach public key automatically
  • if at least one of the recipients are on another domain, then:
    • if my own key is available on WKD, I don't need to attach automatically
    • if my own key is not available on WKD, then:
      • if I've already sent my public key to this person before, don't need to attach automatically
      • else attach automatically

All of this logic only makes the public key attach button either activated or not automatically. User can still change it by clicking on the icon before actually sending the message.

if I've already sent my public key to this person before is already handled by method this.view.recipientsModule.doesRecipientHaveMyPubkey so you'll just change contents of reevaluateShouldAttachOrNot as per the logic above.

Thanks!

Copy link
Collaborator

@tomholub tomholub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks directionally good 👍 I'll scrutinize the details later.


try {
db = await ContactStore.dbOpen(); // takes 4-10 ms first time
await moveContactsToEmailsAndPubkeys(db);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomholub this is the piece of code I was referring to when talking about contacts migration.
It would be more convenient to perform migration inside dbOpen() call. Thus we'll be able to delete contacts store right after successful conversion to emails and pubkeys.
We can, for example, pass a list of functions to dbOpen to nicely inject transformations so they don't clutter ContactStore code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, perhaps, ContactStore.update will cause errors when called from inside upgrade transaction

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do all of this in ContactStore itself, no need to bring the migration out to here.

Copy link
Contributor Author

@rrrooommmaaa rrrooommmaaa Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do all of this in ContactStore itself, no need to bring the migration out to here.

Looks like there is a problem with this approach. IDTransaction docs say that transaction is automatically completed when it goes out of scope, and this disallows usage of async function in onupgradeneeded etc.. They are fired alright, but the transaction is closed by the time the actual processing occurs (before waiting for the promise returned by onupgradeneeded handler to be resolved). I searched the github for projects where an async function is used as event handler, found only a small project https://github.com/SourceCodeBot/crudodb (I suspect they haven't tested it properly).
The general approach for transactional operations is (though I haven't found this exact requirement in IndexedDB specs) -- a handler must immediately open a new request, otherwise the transaction will be auto-completed.
We can't convert contacts to pubkeys/emails in a completely synchronized way because of async KeyUtil.parse function.
(This is a point for another discussion -- KeyUtil.parse is async only because it uses a Stream API approach for dearmoring of potentially large messages. As the keys are small, we may think about a sync implementation of KeyUtil.parse)
Still, maybe it's user-unfriendly to "hang" (will it hang?) the browser window for the process of database upgrade, converting multiple contacts, so it makes sense to keep the conversion out of the upgrade handler. The drawback is that the empty contacts table will be lying around until the next upgrade.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, maybe it's user-unfriendly to "hang" (will it hang?) the browser window for the process of database upgrade, converting multiple contacts, so it makes sense to keep the conversion out of the upgrade handler. The drawback is that the empty contacts table will be lying around until the next upgrade.

It will possibly hang the whole browser when we do something like this in background page. Also depends on contacts count. For 50 contacts, not a problem. For 1000 contacts - possible problem.

I think our most reasonable chance is to do a migration of database structure itself, fire off async migration of the actual database, and hope for the best.

If the actual migration does not finish successfully, then on next migration it will be retried (like a browser restart). I think that's ok as long as there is this retry in place.

Best would be to then not migrate all contacts in one gigantic transaction, but one by one. If one of the transactions throws an error because whatever (parsing, concurency, etc), it would not fail the whole migration. 100 keys already migrated, key 101 failed, 50 more to go - the failed one gets catched and skipped, and it will chug along for the remaining 50 keys. Then next time it only has to retry one.

Copy link
Contributor Author

@rrrooommmaaa rrrooommmaaa Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented migration in batches of 50, but it's going to throw on an error and crash the extension.
If we skip the errored batch, this portion of contacts won't be available for search -- is this what you're after?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is an error, it's better to skip a batch then stop half way through the migration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the skipped batch get retried (without any special retry logic) next time I restart the browser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion stop at the failed batch with the current implementation.
as contacts store objects are deleted within the same transaction as populating new stores, the contacts from the failed batch will remain in the database, thus they will be retried on subsequent browser restart.

@rrrooommmaaa
Copy link
Contributor Author

I tested most of the use cases. Do you have any other suggestions on what to test?

@rrrooommmaaa rrrooommmaaa marked this pull request as ready for review March 31, 2021 15:01
@tomholub
Copy link
Collaborator

Test-wise, I think it's well covered. I'll take another look at the code.

Copy link
Collaborator

@tomholub tomholub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good - a few comments.

(still need to review contact-store.ts closer).

Thanks!

name: string | null;
pubkey: Key | null;
pubkey: Key | undefined;
has_pgp: 0 | 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are doing a migration anyway, could we rename these to be consistently camelCase? To use hasPgp, lastUse etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a purely DTO class. We can of course refactor it in this PR, making it grow even bigger.

Copy link
Collaborator

@tomholub tomholub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments

Copy link
Collaborator

@tomholub tomholub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

Copy link
Collaborator

@tomholub tomholub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tomholub tomholub merged commit 1ea17cf into master Apr 3, 2021
@tomholub tomholub deleted the issue-3332-contacts-refactoring branch April 3, 2021 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ability to store more than one public key per recipient

2 participants