diff --git a/backend/docs/research/identity-and-ownership.md b/backend/docs/research/identity-and-ownership.md new file mode 100644 index 0000000..31108d2 --- /dev/null +++ b/backend/docs/research/identity-and-ownership.md @@ -0,0 +1,116 @@ +# Identity and ownership + +## Problem + +The system needs a concept of who the user is to have uploaded files have owners. Owners have permissions on files, +whereas non-owners do not. + +## Design + +### Storing users and credentials + +Taking inspiration from [OWASP's guidance on storing +passwords](https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html), Argon2 can be used to hash +passwords with a user-specific salt. + +#### Comparables + +Looking at comparable applications for inspiration about which dependencies to rely on and the standards that exist. + +##### Django + +[Django](https://docs.djangoproject.com/en/4.2/topics/auth/passwords/) uses PBKDF2 with a SHA256 hash by default, but +allows different hashers to be set by users (incl. +[Argon2](https://docs.djangoproject.com/en/4.2/topics/auth/passwords/)). + +Their [Argon2 +hasher](https://github.com/django/django/blob/517d3bb4dd17e9c51690c98d747b86a0ed8b2fbf/django/contrib/auth/hashers.py#L374-L473) +[uses](https://github.com/django/django/blob/517d3bb4dd17e9c51690c98d747b86a0ed8b2fbf/setup.cfg#L48-L50) the `argon2-ccfi` [library](https://github.com/hynek/argon2-cffi). + +Password hashes are stored as a `VARCHAR(128)` and stores an ASCII string prefixed with the algorithm used and +containing the hash and the salt as a base64 encoded value. This is handled by [the underlying +library](https://github.com/hynek/argon2-cffi/blob/e9473c8f0b8b860bb4369d11f5a605a326255f3f/src/argon2/low_level.py#L53-L118). + +The hashed value can be split into parts and `argon2-cffi` can [retrieve the +salt](https://github.com/hynek/argon2-cffi/blob/e9473c8f0b8b860bb4369d11f5a605a326255f3f/src/argon2/_utils.py#L95-L140) for password verification. + +##### FastAPI Users + +[FastAPI Users](https://github.com/fastapi-users/fastapi-users) uses BCrypt [by +default](https://fastapi-users.github.io/fastapi-users/12.1/configuration/password-hash/) and does not offer +alternatives out-of-the-box without [custom +code](https://fastapi-users.github.io/fastapi-users/12.1/configuration/password-hash/#full-customization) being supplied by adopters. + +##### NextCloud + +[Nextcloud](https://nextcloud.com) uses Argon [by +default](https://docs.nextcloud.com/server/19/admin_manual/configuration_server/config_sample_php_parameters.html?highlight=htaccess%20rewritebase#hashing). + +#### Dependencies + +`argon2-cffi` is a good candidate as backing for authentication; being used by Django, it's likely to be closely vetted +for quality. + +#### Table design: `users` + +|Key|Type|Notes| +|---|---|---| +|`id`|`bigint`|User ID, primary key.| +|`username`|`varchar(64)`|Unique username.| +|`password_hash`|`varchar(128)`|Hashed password, prefixed with algo.| +|`created_at`|`datetime`|UTC datetime of record creation.| +|`password_updated_at`|`datetime`|UTC datetime of the last update to the hashed secret, for renewal tracking.| +|`updated_at`|`datetime`|UTC datetime of last record update.| + +The password-storing scheme is largely inspired from Django's, no reason to deviate. Prefixing the algorithm opens the +door to user customization in the future and to changes in algorithm if need be. + +Usernames are unique across the table and should be used to refer to the user externally (that way, no leaking of +sequential IDs). + +`username` are initially meant to be immutable, but there's no harm in having those be updateable. They do need to be +indexed for searching though. + +### Representing ownership + +The user table tracks individual users, and the files table tracks file entities. A third table should track the +relationships between the two. This would give entities flexible ownership (i.e. what if a given file could have +multiple owners?). + +"Ownership" is too rigid a concept to be represented without needed to be modified a bunch in the future. It might be +best to represent _permissions_ instead such that "owners" have all the permissions on something. This facilitates the +creation of "shared resources" since users that would get files shared to them would just have reduced permissions on +those files. + +Permission representation should be flexible such that we can add different permission types along the way. For that +reason, having a convention such that permissions are stored as a number whose bits represent individual permissions is +probably best. + +Storing permissions as `bigint` would provide 64 different bits that can be encoded as different permissions. In +principle, we could represent the number as a string and base64 encode it so that the format is more flexible, but +that's not really necessary (a 64b number should be more than enough to account for all cases of permissions. This also +allows it to be indexed, making different levels of share and ownership searcheable without too much trouble. + +Permissions can be updated. + +#### Sample permissions + +Some permissions that we'd need could be: + +- Can read file; +- Can edit file; +- Can delete file; +- Can share file; +- Can copy file; +- ... + +#### Table design: `permissions` + +|Key|Type|Notes| +|---|---|---| +|`id`|`bigint`|Permission entry ID, primary key.| +|`user_id`|`bigint`|Foreign key to the users table, the user who has the permission set.| +|`file_id`|`uuid`|Foreign key to the file that the permission applies to.| +|`value`|`bigint`|Permission value. The bits represent individual permissions.| +|`created_at`|`datetime`|UTC datetime of record creation.| +|`updated_at`|`datetime`|UTC datetime of last record update.|