Theme: security.
Since there's no perfect security, this is all about taking some baby steps, based on the status quo.
- password hashing and verification using Argon2
- JWT-based authentication using HS256 as the signing algorithm
- impose basic rules on password and username
- enable TLS
I was glad to read that the industry had moved away from quite a few conventions in password control, such as character composition rules, in particular, there's the revised viewpoint that alphanumeric is really no less "special" than other symbols, so in the end, as long as all Unicode characters are allowed (not only space and punctuations, but also CJK and even emojis), password length is pretty much the only thing we need to impose rules upon.
The OWASP Cheat Sheet recommends a particular set of parameters for Argon2, which is different from the default values taken by the reference C implementation, and there isn't a discussion about why the particular values were chosen (m = 15 MiB, t = 2, p = 1), which I suppose wasn't a theoretical derivation, after all, other resources suggested tuning the parameters to aim for the longest tolerable time, like half a second. For now I just treated the whole thing as a blackbox, as it's not the time to dive into such a deep field, with OWASP's parameters, my dev machine consistently took 0.8 second for both password hashing and verification.
And JWT, although gaining popularity against the traditional server-side session management, has its own tradeoffs. But I like the property that the server doesn't have to store the auth "secrets" for all the logins, because even if you hash those secrets (and thus they become more like temporary passwords), they will be more vulnerable as you can't use expensive hashing like Argon2, for request authentication must be fast. The traditional approach grants the server arbitrary control over the sessions, but the server is exactly what we need to trust less, not more. With JWT, the client takes more responsibility to safeguard the auth token, which is I believe the right way, and in the end, if deemed necessary, we can change the signing key on the server to revoke all the issued tokens. Blacklisting specific tokens is possible but tricky in practice. Some people think storing JWT in localStorage is outrageous (because of XSS etc), and HTTP-only cookies should be used instead. OK. But my point is, that's not the point. Again, client-side code today should be more responsible, and unlike servers, which can never be made uncrackable (and they are cracker-magnets), it is possible to just make XSS impossible, by not running other people's scripts in your app! The server is by nature vulnerable in exchange for power, control, and convenience. So the mitigation should be to grant them less knowledge, less secret info, and I think JWT is a move in the right direction.
Logout vs Renew
With JWT, you've got to think twice before granting a long-living token. Maybe the server can treat specially the very first login (i.e. during signup) and grant such a "super JWT", because by definition that's the account owner. But the user must make sure that the special token doesn't get leaked, or at least keep a copy of it so that in the worst case scenario, it can be blacklisted.
But in general, you want to only issue fairly short-lived tokens. And that leads to the UX problem: when a JWT expires, what should the app do?
Definitely not kicking people out, namely, triggering an automatic logout and forcing people to log back in. This is especially important for a real-time messaging app, because people could have been discussing critical issues when the token expired. No disruption allowed here.
So that led to a natural design decision: the WebSocket channel should not be forcibly closed when expiration occurs. Sure, any subsequent HTTP-based requests will fail (ideally, the UI should mark such actions as N/A), but none of these is necessary for the continued communication with existing contacts and groups (the HTTP requests are used to, for example, propose/accept new contacts, or create new groups).
There's one oddball here, which is file sending, because I think HTTP is actually the more robust and feature-ful option here compared to WS, when it comes to arbitrary data control (e.g. imposing size limit). With WS, I suppose similar mechanisms can be implemented, but clearly not the ergonomic choice during this initial phase of development. I intend to reserve WS only for textual message exchange at the moment.
Anyway, the point is, the WS channel should never be affected by toke expiration (unless the app is restarted of course), but as mentioned, the app will then lose a subset of its functionalities, so we do need a UI mechanism to prompt the users to get things back to normal.
There I devised a workflow that is fairly non-disruptive: one (inglorious) benefit of storing JWT in localStorage is, our client app can decode it, even without actually validating the signature, and thus know the TTL of the token. So we can set a "timeout" for it, so that the user will be notified as soon as the expiration occurs (and that event could also trigger the inactivation of relevant UI features that require a valid token). In Elm particularly, where we do such things via Time.every in subscriptions, I find it nicer to set the interval to the whole token TTL at once, instead of checking the token expiration time against the current time frequently - as it not only is less expensive, but achieves true real-time expiration notification, which essentially emulates setTimeout in JS. (This is quick to set up in Elm because subscriptions have access to the model, and thus we can turn the timer off immediately after the event is first raised.)
Then, this expiration event is visually presented as an unobtrusive notification that prompts the user to renew the auth token (instead of "log out"). This is not just a matter of wording: logging out means the user will have to leave the current view, and thus stop the ongoing activity, and for privacy reasons, the app state needs to be reset, and any persisted data purged. In contrast, "renew" is completely non-interrupting: it opens up a modal window (which is dismissible) to only prompt for the password of the current username (and by definition, there's no username field to fill in - authenticating with a different account would require a full logout first). Once the authentication succeeds, the window is auto dismissed, and all the features that depend on HTTP requests are re-enabled. Note that, while the user is renewing, any incoming messages will continue to be received, and an immediate reply is deemed necessary, the renew window can be dismissed temporarily.
Time and again, I've learned that it's exactly this kind of design work that takes more time and effort than learning to implement new features for the first time (in this case, password hashing, JWT-based auth), because today we have the luxury to build our "custom" things based on the quality work of other people. (Construction is somewhat similar: house builders don't have to worry about concrete and steel. But in software, open source brings this kind of "problem decomposition" to a new level.) However, that doesn't mean there's not much left to do: on the contrary, the presence of "the giants" liberates us from building the underlying tools in order to build our own tools or products, therefore, we get to devote ourselves to the tasks at hand with a higher level of focus, and attention to details. Over time, I'm more and more convinced that in software making, esp. for everyday tools, details, refinement, and care for quality in general, are more important than the infrastructural choices / "tech stack". After all, Wikipedia is written in PHP. But to be clear, I do value the tech stack very much; what I'm saying is, it is still possible to build crappy things with the best-in-class building blocks and tools. Or else, machines / AI can most likely take over.