feat: per-document-access by janl · Pull Request #3038 · apache/couchdb

janl · 2020-07-26T18:23:01Z

This is the first per-doc-access PR worth sharing more widely.

This allows the creation of databases with the option access enabled. Access-enabled databases require docs from regular users to have a new "_access": ["owner"] field, that matches the submitting userCtx.name. As a result, a user will get full _all_docs and _changes (and all the other APIs) scoped to just the docs where their name is in the access field. See https://github.com/apache/couchdb-documentation/blob/rfc/010-per-document-access/rfcs/010-per-document-access-control.md for some more info on how it all fits together.

There are multiple parts to all this:

A new access query server that creates two new indexes by-access-id and by-access-seq which include all docs in their respective sort orders, but bucketed by user. This required teaching couch_index to index deleted docs conditionally.
A switch in the mrview code that access _all_docs and _changes that uses the new indexes for users that are not db or server admins.
All the plumbing to handle access information everywhere we need it. It is stored in #doc #doc_info and #full_doc_info records, so we can avoid loading full doc bodies on doc updates.
All the actual access validation for document CRUD, especially updates, including replicated updates (mainly in couch_db.erl and couch_db_updater.erl).
Teaching the replicator to write _local doc checkpoints as the user that owns the replication.
All users from the _users db now have an implicit role _users that will become handy later.
Some odds and ends across various modules to make the acquainted with the new reality.
The test suite is quite extensive (1k + LOC) and covers most scenarios.
Current caveats:

performance regression in couchdb_update_conflicts_test.erl something when from O(log n) to O(n) or worse. I hope more sets of eyes help with this 👀👀👀.
a few tests needed adjustments in BC-breaking ways (in terms of sort order IIRC, this is a missing, or superfluous lists:reverse() somewhere, that shouldn’t be hard to spot.
there are a few unused variables, function args, or full functions, that will get cleaned up of course.

Testing recommendations

make check ;)

But also, read the test suite and the linked RFC.

Related Issues or Pull Requests

#1524

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

Test does not pass yet.

* remove dependency on config * make checks optional * support HS256

and make everything truly optional.

* Improve pubkey not found error handling When the public key identified by the {Alg, KID} tuple is not found on the IAM keystore server, it's possible to see errors like: (node1@127.0.0.1)140> epep:jwt_decode(SampleJWT). ** exception error: no function clause matching public_key:do_verify(<<"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IjIwMTcwNTIwLTAwOjAwOjAwIn0.eyJpc3MiOiJodHRwOi8vbG9jYWxob3N0OjEyMzIx"...>>, sha256, <<229,188,162,247,201,233,118,32,115,206,156, 169,17,221,78,157,161,147,46,179,42,219,66, 15,139,91,...>>, {error,not_found}) (public_key.erl, line 782) in function jwtf:public_key_verify/4 (src/jwtf.erl, line 212) in call from jwtf:decode/3 (src/jwtf.erl, line 30) Modify key/1 and public_key_not_found_test/0 to account for keystore changing from returning an error tuple to throwing one.

Tolerate 5 crashes per 10 seconds

Improve restart strategy

ermouth · 2020-07-29T10:21:57Z

This is great, gratz!

One question: it’s not clear to me how /_security response will look for buckets having access restrictions, is there a special field for those restrictions? I mean I want to explicitly mark buckets of the kind in Photon, so how can I detect _access restricted buckets reading /_security endpoint?

RFC states admins can grant individual users and groups access to a database using the database’s _security object, no details.

emilio.config

wohali

These are early comments, I ran out of time today for more. Will review more thoroughly next week.

src/chttpd/src/chttpd_db.erl

src/couch/src/couch_db_updater.erl

src/couch/src/couch_db.erl

src/couch/src/couch_db_updater.erl

rnewson

I've taken a pass at this and have enough feedback to submit this first review. It does look like we're making the right sort of internal changes to store the additional data.

The majority of my comments are about the inconsistencies in the code. For example, in some cases you explicitly handle all expected results of a function, and in others you have a generic _ field for everything else (even in the case where we are switching on a boolean). I expressed my preference to avoid this sloppiness (aka "defensive programming") and to instead be explicit throughout. There is a place to validate the settings the user has made, and that should be the only place where coercion to type occurs (assuming we don't flat-out reject a malformed request, as I think we should).

Finally, I thought we had moved forward on general formatting conventions. Two blank lines between functions, one blank line between clauses of same function, etc. I've worked that way for some time as has almost all the code I've reviewed from others. Perhaps this was never formally adopted by the couchdb project. Is it the first you are hearing of it, Jan?

src/chttpd/src/chttpd_db.erl

src/couch/src/couch_db.erl

rnewson · 2020-08-24T13:58:23Z

src/couch/src/couch_db_updater.erl

    % check we sort them again here. See COUCHDB-2735.
-    Cmp = fun([#doc{id=A}|_], [#doc{id=B}|_]) -> A < B end,
+    Cmp = fun
+        ([], []) -> false; % TODO: re-evaluate this addition, might be a


re-evaluate before we merge

src/couch/src/couch_db_updater.erl

src/couch/src/couch_httpd_auth.erl

rnewson · 2020-08-24T14:08:59Z

another note. In some places you use a list comprehension, in others a lists:map. In both cases you're changing each item of a list to superset or subset of its original content, but using two different mechanisms to do so. It wasn't clear if this was intentional or not. The only difference is error handling, in that the list comprehension approach can silently drop items that fail, whereas lists:map would not. Is that why you chose lc vs map or were the choices arbitrary?

stemuk · 2021-09-09T11:21:49Z

@janl Are there any further plans for this pull request?

klehmann · 2021-11-12T09:23:20Z

We are very interested in this new feature, but for our use case (mirroring/migrating data from Notes/Domino databases), just checking the _access field values against a single userCtx.name value is not sufficient. Would be much more powerful if this second value could be a list as well, e.g. that we populate with all name variants, groups and roles that a user has in the web app that is using CouchDB for data storage.

janl · 2022-08-06T13:42:01Z

another note. In some places you use a list comprehension, in others a lists:map. In both cases you're changing each item of a list to superset or subset of its original content, but using two different mechanisms to do so. It wasn't clear if this was intentional or not. The only difference is error handling, in that the list comprehension approach can silently drop items that fail, whereas lists:map would not. Is that why you chose lc vs map or were the choices arbitrary?

I made no conscious decisions but rather picked the variant that was most prominent in the surrounding code. A cursory glance suggests this is fine where employed, but I’ll have another eye on it once the final PR goes up.

janl · 2022-08-06T13:43:39Z

I’m closing this PR in favour of a new cleaner one shortly since our master/main/3.x rejigger has made this one not show correctly here. I resolved all comments here that I have fixed in the future PR. The remaining ones will need to be looked at there again, I’ll add comments where appropriate, since code moved around quite a bit.

rnewson and others added 30 commits May 6, 2017 09:32

Initial commit

9a671b6

Initial commit

2c3f968

Test does not pass yet.

validate nbf

f2e1085

Moar Functional

3888d18

* remove dependency on config * make checks optional * support HS256

unused var

5f93661

add more tests

02ecf5b

Add JKWS cache

5b9dad7

Make typ and alg optional

d7bd8d1

and make everything truly optional.

use public url

8077258

98% coverage

3cb8b7d

kid belongs in the header

e60fa50

some documentation

a18a2e5

Add stats, don't wipe cache on error

69e1ce2

make jwks simpler, caching can happen elsewhere

25bfdc3

allow iss to be optional

31999f4

slightly improve readme

acbaa37

expand algorithm support

bf7a2ed

support P-256 in JWKS

61f47b3

update alg list

373a367

return a public key tuple

ae0e0f4

test EC

e0d61d0

fix test

e180555

add tests for HS384 and HS512

e80c3d1

IAT validation requires it to be a number, any number

6cc182d

provide caching of JWKS keys

e083b22

add ibrowse as dep

9d60fa2

require alg+kid for key lookup

ceeb019

Improve restart strategy

80d4a64

Tolerate 5 crashes per 10 seconds

Merge pull request #5 from jaydoane/improve-restart-strategy

094489f

Improve restart strategy

janl added 16 commits July 26, 2020 20:09

chore(emilio): ignore ioq

66ffdd2

test: per doc access test suite

3914d3c

feat(couch): various records now have an access field

c499bb0

feat(access): introduce new access query server

4a64a8b

feat(btree): handle access field in btree

02d1918

feat(utils): add ddoc validation fun

0701b1a

test: update existing tests to match

df3b76f

feat(access): _users users now have a default _users role

92d1c43

feat(access): main db update and validation logic

672a790

feat(access): use access by-seq/by-id for regular users

e34cad8

feat(chttpd): add access support to chttpd

5650b03

feat(replicator): add access support to replicator

4de5f66

feat(ddoc_cache): make access aware

3434cea

feat(global_changes): make access aware

9922c62

test: adjust peruser tests to access

9da1e4b

feat(fabric): handle access requests

adc4bee

joallard mentioned this pull request Aug 16, 2020

Per-document access control #1524

Open

6 tasks

wohali reviewed Aug 21, 2020

View reviewed changes

emilio.config Show resolved Hide resolved

wohali reviewed Aug 22, 2020

View reviewed changes

rnewson requested changes Aug 24, 2020

View reviewed changes

wohali changed the base branch from master to main October 21, 2020 18:08

nickva force-pushed the main branch from e41407e to a1fc807 Compare June 7, 2022 20:15

janl closed this Aug 6, 2022

janl mentioned this pull request Aug 6, 2022

[WIP, but please review]: per-doc-access-control #4139

Closed

Conversation

janl commented Jul 26, 2020 • edited by wohali Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing recommendations

Related Issues or Pull Requests

Checklist

Uh oh!

ermouth commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wohali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rnewson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rnewson Aug 24, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rnewson commented Aug 24, 2020

Uh oh!

stemuk commented Sep 9, 2021

Uh oh!

klehmann commented Nov 12, 2021

Uh oh!

janl commented Aug 6, 2022

Uh oh!

janl commented Aug 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

janl commented Jul 26, 2020 •

edited by wohali

Loading

ermouth commented Jul 29, 2020 •

edited

Loading