Skip to content

feat: per-document-access#3038

Closed
janl wants to merge 209 commits intomainfrom
feat/access-master-clean
Closed

feat: per-document-access#3038
janl wants to merge 209 commits intomainfrom
feat/access-master-clean

Conversation

@janl
Copy link
Member

@janl janl commented Jul 26, 2020

This is the first per-doc-access PR worth sharing more widely.

This allows the creation of databases with the option access enabled. Access-enabled databases require docs from regular users to have a new "_access": ["owner"] field, that matches the submitting userCtx.name. As a result, a user will get full _all_docs and _changes (and all the other APIs) scoped to just the docs where their name is in the access field. See https://github.com/apache/couchdb-documentation/blob/rfc/010-per-document-access/rfcs/010-per-document-access-control.md for some more info on how it all fits together.

There are multiple parts to all this:

  1. A new access query server that creates two new indexes by-access-id and by-access-seq which include all docs in their respective sort orders, but bucketed by user. This required teaching couch_index to index deleted docs conditionally.

  2. A switch in the mrview code that access _all_docs and _changes that uses the new indexes for users that are not db or server admins.

  3. All the plumbing to handle access information everywhere we need it. It is stored in #doc #doc_info and #full_doc_info records, so we can avoid loading full doc bodies on doc updates.

  4. All the actual access validation for document CRUD, especially updates, including replicated updates (mainly in couch_db.erl and couch_db_updater.erl).

  5. Teaching the replicator to write _local doc checkpoints as the user that owns the replication.

  6. All users from the _users db now have an implicit role _users that will become handy later.

  7. Some odds and ends across various modules to make the acquainted with the new reality.

  8. The test suite is quite extensive (1k + LOC) and covers most scenarios.

  9. Current caveats:

  • performance regression in couchdb_update_conflicts_test.erl something when from O(log n) to O(n) or worse. I hope more sets of eyes help with this 👀👀👀.
  • a few tests needed adjustments in BC-breaking ways (in terms of sort order IIRC, this is a missing, or superfluous lists:reverse() somewhere, that shouldn’t be hard to spot.
  • there are a few unused variables, function args, or full functions, that will get cleaned up of course.

Testing recommendations

make check ;)

But also, read the test suite and the linked RFC.

Related Issues or Pull Requests

#1524

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

rnewson and others added 30 commits May 6, 2017 09:32
Test does not pass yet.
* remove dependency on config
* make checks optional
* support HS256
and make everything truly optional.
* Improve pubkey not found error handling

When the public key identified by the {Alg, KID} tuple is not found on
the IAM keystore server, it's possible to see errors like:

(node1@127.0.0.1)140> epep:jwt_decode(SampleJWT).
** exception error: no function clause matching
                    public_key:do_verify(<<"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IjIwMTcwNTIwLTAwOjAwOjAwIn0.eyJpc3MiOiJodHRwOi8vbG9jYWxob3N0OjEyMzIx"...>>,
                                         sha256,
                                         <<229,188,162,247,201,233,118,32,115,206,156,
                                           169,17,221,78,157,161,147,46,179,42,219,66,
                                           15,139,91,...>>,
                                         {error,not_found}) (public_key.erl, line 782)
     in function  jwtf:public_key_verify/4 (src/jwtf.erl, line 212)
     in call from jwtf:decode/3 (src/jwtf.erl, line 30)

Modify key/1 and public_key_not_found_test/0 to account for keystore
changing from returning an error tuple to throwing one.
Tolerate 5 crashes per 10 seconds
@ermouth
Copy link
Contributor

ermouth commented Jul 29, 2020

This is great, gratz!

One question: it’s not clear to me how /_security response will look for buckets having access restrictions, is there a special field for those restrictions? I mean I want to explicitly mark buckets of the kind in Photon, so how can I detect _access restricted buckets reading /_security endpoint?

RFC states admins can grant individual users and groups access to a database using the database’s _security object, no details.

@joallard joallard mentioned this pull request Aug 16, 2020
6 tasks
Copy link
Member

@wohali wohali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are early comments, I ran out of time today for more. Will review more thoroughly next week.

Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a pass at this and have enough feedback to submit this first review. It does look like we're making the right sort of internal changes to store the additional data.

The majority of my comments are about the inconsistencies in the code. For example, in some cases you explicitly handle all expected results of a function, and in others you have a generic _ field for everything else (even in the case where we are switching on a boolean). I expressed my preference to avoid this sloppiness (aka "defensive programming") and to instead be explicit throughout. There is a place to validate the settings the user has made, and that should be the only place where coercion to type occurs (assuming we don't flat-out reject a malformed request, as I think we should).

Finally, I thought we had moved forward on general formatting conventions. Two blank lines between functions, one blank line between clauses of same function, etc. I've worked that way for some time as has almost all the code I've reviewed from others. Perhaps this was never formally adopted by the couchdb project. Is it the first you are hearing of it, Jan?

% check we sort them again here. See COUCHDB-2735.
Cmp = fun([#doc{id=A}|_], [#doc{id=B}|_]) -> A < B end,
Cmp = fun
([], []) -> false; % TODO: re-evaluate this addition, might be a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-evaluate before we merge

@rnewson
Copy link
Member

rnewson commented Aug 24, 2020

another note. In some places you use a list comprehension, in others a lists:map. In both cases you're changing each item of a list to superset or subset of its original content, but using two different mechanisms to do so. It wasn't clear if this was intentional or not. The only difference is error handling, in that the list comprehension approach can silently drop items that fail, whereas lists:map would not. Is that why you chose lc vs map or were the choices arbitrary?

@wohali wohali changed the base branch from master to main October 21, 2020 18:08
@stemuk
Copy link

stemuk commented Sep 9, 2021

@janl Are there any further plans for this pull request?

@klehmann
Copy link

We are very interested in this new feature, but for our use case (mirroring/migrating data from Notes/Domino databases), just checking the _access field values against a single userCtx.name value is not sufficient. Would be much more powerful if this second value could be a list as well, e.g. that we populate with all name variants, groups and roles that a user has in the web app that is using CouchDB for data storage.

@janl
Copy link
Member Author

janl commented Aug 6, 2022

another note. In some places you use a list comprehension, in others a lists:map. In both cases you're changing each item of a list to superset or subset of its original content, but using two different mechanisms to do so. It wasn't clear if this was intentional or not. The only difference is error handling, in that the list comprehension approach can silently drop items that fail, whereas lists:map would not. Is that why you chose lc vs map or were the choices arbitrary?

I made no conscious decisions but rather picked the variant that was most prominent in the surrounding code. A cursory glance suggests this is fine where employed, but I’ll have another eye on it once the final PR goes up.

@janl
Copy link
Member Author

janl commented Aug 6, 2022

I’m closing this PR in favour of a new cleaner one shortly since our master/main/3.x rejigger has made this one not show correctly here. I resolved all comments here that I have fixed in the future PR. The remaining ones will need to be looked at there again, I’ll add comments where appropriate, since code moved around quite a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.