docs: user+password storage research documentation (#30)

* docs: user+password storage research documentation

* docs: label as research instead of arch

* docs: document rename for accuracy
This commit is contained in:
Marc 2023-08-19 14:45:58 -04:00 committed by GitHub
parent 52cef95493
commit 16bb6d3afe
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -0,0 +1,116 @@
# Identity and ownership
## Problem
The system needs a concept of who the user is to have uploaded files have owners. Owners have permissions on files,
whereas non-owners do not.
## Design
### Storing users and credentials
Taking inspiration from [OWASP's guidance on storing
passwords](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html), Argon2 can be used to hash
passwords with a user-specific salt.
#### Comparables
Looking at comparable applications for inspiration about which dependencies to rely on and the standards that exist.
##### Django
[Django](https://docs.djangoproject.com/en/4.2/topics/auth/passwords/) uses PBKDF2 with a SHA256 hash by default, but
allows different hashers to be set by users (incl.
[Argon2](https://docs.djangoproject.com/en/4.2/topics/auth/passwords/)).
Their [Argon2
hasher](https://github.com/django/django/blob/517d3bb4dd17e9c51690c98d747b86a0ed8b2fbf/django/contrib/auth/hashers.py#L374-L473)
[uses](https://github.com/django/django/blob/517d3bb4dd17e9c51690c98d747b86a0ed8b2fbf/setup.cfg#L48-L50) the `argon2-ccfi` [library](https://github.com/hynek/argon2-cffi).
Password hashes are stored as a `VARCHAR(128)` and stores an ASCII string prefixed with the algorithm used and
containing the hash and the salt as a base64 encoded value. This is handled by [the underlying
library](https://github.com/hynek/argon2-cffi/blob/e9473c8f0b8b860bb4369d11f5a605a326255f3f/src/argon2/low_level.py#L53-L118).
The hashed value can be split into parts and `argon2-cffi` can [retrieve the
salt](https://github.com/hynek/argon2-cffi/blob/e9473c8f0b8b860bb4369d11f5a605a326255f3f/src/argon2/_utils.py#L95-L140) for password verification.
##### FastAPI Users
[FastAPI Users](https://github.com/fastapi-users/fastapi-users) uses BCrypt [by
default](https://fastapi-users.github.io/fastapi-users/12.1/configuration/password-hash/) and does not offer
alternatives out-of-the-box without [custom
code](https://fastapi-users.github.io/fastapi-users/12.1/configuration/password-hash/#full-customization) being supplied by adopters.
##### NextCloud
[Nextcloud](https://nextcloud.com) uses Argon [by
default](https://docs.nextcloud.com/server/19/admin_manual/configuration_server/config_sample_php_parameters.html?highlight=htaccess%20rewritebase#hashing).
#### Dependencies
`argon2-cffi` is a good candidate as backing for authentication; being used by Django, it's likely to be closely vetted
for quality.
#### Table design: `users`
|Key|Type|Notes|
|---|---|---|
|`id`|`bigint`|User ID, primary key.|
|`username`|`varchar(64)`|Unique username.|
|`password_hash`|`varchar(128)`|Hashed password, prefixed with algo.|
|`created_at`|`datetime`|UTC datetime of record creation.|
|`password_updated_at`|`datetime`|UTC datetime of the last update to the hashed secret, for renewal tracking.|
|`updated_at`|`datetime`|UTC datetime of last record update.|
The password-storing scheme is largely inspired from Django's, no reason to deviate. Prefixing the algorithm opens the
door to user customization in the future and to changes in algorithm if need be.
Usernames are unique across the table and should be used to refer to the user externally (that way, no leaking of
sequential IDs).
`username` are initially meant to be immutable, but there's no harm in having those be updateable. They do need to be
indexed for searching though.
### Representing ownership
The user table tracks individual users, and the files table tracks file entities. A third table should track the
relationships between the two. This would give entities flexible ownership (i.e. what if a given file could have
multiple owners?).
"Ownership" is too rigid a concept to be represented without needed to be modified a bunch in the future. It might be
best to represent _permissions_ instead such that "owners" have all the permissions on something. This facilitates the
creation of "shared resources" since users that would get files shared to them would just have reduced permissions on
those files.
Permission representation should be flexible such that we can add different permission types along the way. For that
reason, having a convention such that permissions are stored as a number whose bits represent individual permissions is
probably best.
Storing permissions as `bigint` would provide 64 different bits that can be encoded as different permissions. In
principle, we could represent the number as a string and base64 encode it so that the format is more flexible, but
that's not really necessary (a 64b number should be more than enough to account for all cases of permissions. This also
allows it to be indexed, making different levels of share and ownership searcheable without too much trouble.
Permissions can be updated.
#### Sample permissions
Some permissions that we'd need could be:
- Can read file;
- Can edit file;
- Can delete file;
- Can share file;
- Can copy file;
- ...
#### Table design: `permissions`
|Key|Type|Notes|
|---|---|---|
|`id`|`bigint`|Permission entry ID, primary key.|
|`user_id`|`bigint`|Foreign key to the users table, the user who has the permission set.|
|`file_id`|`uuid`|Foreign key to the file that the permission applies to.|
|`value`|`bigint`|Permission value. The bits represent individual permissions.|
|`created_at`|`datetime`|UTC datetime of record creation.|
|`updated_at`|`datetime`|UTC datetime of last record update.|