Marc Cataford
16bb6d3afe
* docs: user+password storage research documentation * docs: label as research instead of arch * docs: document rename for accuracy
116 lines
5.3 KiB
Markdown
116 lines
5.3 KiB
Markdown
# Identity and ownership
|
|
|
|
## Problem
|
|
|
|
The system needs a concept of who the user is to have uploaded files have owners. Owners have permissions on files,
|
|
whereas non-owners do not.
|
|
|
|
## Design
|
|
|
|
### Storing users and credentials
|
|
|
|
Taking inspiration from [OWASP's guidance on storing
|
|
passwords](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html), Argon2 can be used to hash
|
|
passwords with a user-specific salt.
|
|
|
|
#### Comparables
|
|
|
|
Looking at comparable applications for inspiration about which dependencies to rely on and the standards that exist.
|
|
|
|
##### Django
|
|
|
|
[Django](https://docs.djangoproject.com/en/4.2/topics/auth/passwords/) uses PBKDF2 with a SHA256 hash by default, but
|
|
allows different hashers to be set by users (incl.
|
|
[Argon2](https://docs.djangoproject.com/en/4.2/topics/auth/passwords/)).
|
|
|
|
Their [Argon2
|
|
hasher](https://github.com/django/django/blob/517d3bb4dd17e9c51690c98d747b86a0ed8b2fbf/django/contrib/auth/hashers.py#L374-L473)
|
|
[uses](https://github.com/django/django/blob/517d3bb4dd17e9c51690c98d747b86a0ed8b2fbf/setup.cfg#L48-L50) the `argon2-ccfi` [library](https://github.com/hynek/argon2-cffi).
|
|
|
|
Password hashes are stored as a `VARCHAR(128)` and stores an ASCII string prefixed with the algorithm used and
|
|
containing the hash and the salt as a base64 encoded value. This is handled by [the underlying
|
|
library](https://github.com/hynek/argon2-cffi/blob/e9473c8f0b8b860bb4369d11f5a605a326255f3f/src/argon2/low_level.py#L53-L118).
|
|
|
|
The hashed value can be split into parts and `argon2-cffi` can [retrieve the
|
|
salt](https://github.com/hynek/argon2-cffi/blob/e9473c8f0b8b860bb4369d11f5a605a326255f3f/src/argon2/_utils.py#L95-L140) for password verification.
|
|
|
|
##### FastAPI Users
|
|
|
|
[FastAPI Users](https://github.com/fastapi-users/fastapi-users) uses BCrypt [by
|
|
default](https://fastapi-users.github.io/fastapi-users/12.1/configuration/password-hash/) and does not offer
|
|
alternatives out-of-the-box without [custom
|
|
code](https://fastapi-users.github.io/fastapi-users/12.1/configuration/password-hash/#full-customization) being supplied by adopters.
|
|
|
|
##### NextCloud
|
|
|
|
[Nextcloud](https://nextcloud.com) uses Argon [by
|
|
default](https://docs.nextcloud.com/server/19/admin_manual/configuration_server/config_sample_php_parameters.html?highlight=htaccess%20rewritebase#hashing).
|
|
|
|
#### Dependencies
|
|
|
|
`argon2-cffi` is a good candidate as backing for authentication; being used by Django, it's likely to be closely vetted
|
|
for quality.
|
|
|
|
#### Table design: `users`
|
|
|
|
|Key|Type|Notes|
|
|
|---|---|---|
|
|
|`id`|`bigint`|User ID, primary key.|
|
|
|`username`|`varchar(64)`|Unique username.|
|
|
|`password_hash`|`varchar(128)`|Hashed password, prefixed with algo.|
|
|
|`created_at`|`datetime`|UTC datetime of record creation.|
|
|
|`password_updated_at`|`datetime`|UTC datetime of the last update to the hashed secret, for renewal tracking.|
|
|
|`updated_at`|`datetime`|UTC datetime of last record update.|
|
|
|
|
The password-storing scheme is largely inspired from Django's, no reason to deviate. Prefixing the algorithm opens the
|
|
door to user customization in the future and to changes in algorithm if need be.
|
|
|
|
Usernames are unique across the table and should be used to refer to the user externally (that way, no leaking of
|
|
sequential IDs).
|
|
|
|
`username` are initially meant to be immutable, but there's no harm in having those be updateable. They do need to be
|
|
indexed for searching though.
|
|
|
|
### Representing ownership
|
|
|
|
The user table tracks individual users, and the files table tracks file entities. A third table should track the
|
|
relationships between the two. This would give entities flexible ownership (i.e. what if a given file could have
|
|
multiple owners?).
|
|
|
|
"Ownership" is too rigid a concept to be represented without needed to be modified a bunch in the future. It might be
|
|
best to represent _permissions_ instead such that "owners" have all the permissions on something. This facilitates the
|
|
creation of "shared resources" since users that would get files shared to them would just have reduced permissions on
|
|
those files.
|
|
|
|
Permission representation should be flexible such that we can add different permission types along the way. For that
|
|
reason, having a convention such that permissions are stored as a number whose bits represent individual permissions is
|
|
probably best.
|
|
|
|
Storing permissions as `bigint` would provide 64 different bits that can be encoded as different permissions. In
|
|
principle, we could represent the number as a string and base64 encode it so that the format is more flexible, but
|
|
that's not really necessary (a 64b number should be more than enough to account for all cases of permissions. This also
|
|
allows it to be indexed, making different levels of share and ownership searcheable without too much trouble.
|
|
|
|
Permissions can be updated.
|
|
|
|
#### Sample permissions
|
|
|
|
Some permissions that we'd need could be:
|
|
|
|
- Can read file;
|
|
- Can edit file;
|
|
- Can delete file;
|
|
- Can share file;
|
|
- Can copy file;
|
|
- ...
|
|
|
|
#### Table design: `permissions`
|
|
|
|
|Key|Type|Notes|
|
|
|---|---|---|
|
|
|`id`|`bigint`|Permission entry ID, primary key.|
|
|
|`user_id`|`bigint`|Foreign key to the users table, the user who has the permission set.|
|
|
|`file_id`|`uuid`|Foreign key to the file that the permission applies to.|
|
|
|`value`|`bigint`|Permission value. The bits represent individual permissions.|
|
|
|`created_at`|`datetime`|UTC datetime of record creation.|
|
|
|`updated_at`|`datetime`|UTC datetime of last record update.|
|