Chat App (2)

December 2021

Huidong Yang ✉

December 19, 2021

Again, won't proofread, writing isn't my profession. Just jotting things down as necessary, before moving on to the next phase.

What I did so far:

Auth: Signup, Login (AuthToken)
Contacts & "Pair Chat"
Groups & "Multi Chat"

Auth

As discussed previously, I'm designing and implementing auth naively, based on only what I want the system to be minimally capable of. Notably, login is implement as the server sending a generated "auth token", similar to an AWS access key pair, to be stored at the client. Before this project, I never cared to learn about this field, but now I'd happily engage in thinking about naive auth (which again is why I'm increasingly hooked to projects). For instance, to avoid storing passwords on the backend, could we use each password to encrypt a per-user random data and then store the pre- and post-encrypted pair instead? Although I have no intention to systematically learn this, I have overheard things, I guess the point is to do things more "organically", stay very close to intuition. I read that yes, encryption algorithms that are proof against "known plaintext attacks" are readily available, but the issue with using passwords as encryption keys is that they are typically not good enough as encryption keys. You could fortify a password by using a "salt", but then you'd have to store three pieces of information per user, and moreover, people have figured out a nicer alternative, namely using hashing instead of encrypting, with which you only have to store two pieces of info per user, i.e. the salt and the hash, and special-purpose hash functions that are tunably slow, such as bcrypt and argon2, have been developed just for that, so solid best practices seem to exist for managing user credentials in the client-server model, so it's mature, kind of a "solved" issue.

So to me the more interesting question is, can the same technique be used to mask the secret of the auth token? There are two issues:

Logins are relatively infrequent, so slow hashing is fine; but auth token needs to be verified with every request, so at least it has to use a fast hash function instead?
Does AWS store the API access key secret? If you're using a signing key that is based on the secret itself, to sign a string that is unique per request, then the server has to have the secret plainly stored in order to verify the signature, doesn't it? If so, is the secret protected somehow?

Chat

I implemented "contact proposal" as a necessary part of the design, instead of going for the minimal, that is, before A can talk to B, A must first send B a request, which B can optionally accept or decline. Once accepted, A and B are mutual contacts. OK. For the ethos, but technically simple.

The more fun part is the pair chat. Initially, I planned to go for "multi chat" and therefore pair chat is solved as a special case, quite natural. But as I started the work, I soon realized that, a chat between two mutual contacts is not exactly equal to a 2-person group chat, and the distinction lies in the immutability of the group membership. A multi-chat group can in theory have zero or more members, where people can keep being added or removed. A pair chat is an immutable group, if you will, and as long as the two stay as contacts, that's their one and only special chat room, no one else can join in. In contrast, the same pair can create any number of groups consisting of just themselves (for whatever reason), while those groups might feel like the same as the pair chat room, membership mutability is, again, what sets them apart.

That's when I decided to go for the pair chat implementation separately. With the multi chat, the plan has always been to use Redis PubSub, but now for a pair-specific situation, I'm aiming for something simpler. That is, using just WebSocket and nothing else.

Now the WebSocket chat examples offered by both Warp and Axum were very helpful, but what I found mysterious was Warp's use of a channel (mpsc::unbounded_channel) in between the WS receiver and sender (I wasn't bothered by Axum's use of a broadcast::channel as much, because that's very much like using Redis PubSub, except it's within a server instance). My main question is, Is a channel necessary logic-wise, or is it added for performance reasons? Reading the example again and thinking it through, I started to feel more confident that a channel wasn't required to make pair chat just work. And that's what I did. I felt relieved to see pair chat working with just WS messages forwarded to the corresponding WS sender, nothing in between.

Now the Warp example's comment did say "Use an unbounded channel to handle buffering and flushing of messages to the websocket", so most likely, it is here for performance reasons, but the point is, I don't know exactly why. Is calling the WS sender directly in the WS receiving loop going to congest or overload the sender somehow when the traffic becomes too heavy? Or is it because calling send on the WS sender is so slow (much slower than sending it down say an MPSC channel) that under heavy traffic the WS receiver (the stream) can be congested/overloaded? What are the consequences when too many messages get piled up at the receiver end? Just a temporary slowdown, or message losses, or a broken connection altogether? How heavy of traffic does it take to see such an effect? And how to prevent and recover from such incidents? Is choosing the unbounded flavor of the MPSC channel the best option?

I really want to see this simpler setup barfing in real-world scenarios and then introducing the buffering mechanism makes the system robust. (I often found myself writing Rust code that I think is probably wrong, just to get the confirmation from the compiler, which is my way of frequently checking my understanding as a Rust newbie.)

Now implementing multi chat turned out to be much smoother than I expected, and it was I believe my first real-world practice in using Arc to handle something (in particular, the WS sender) that needs to be shared between multiple async tasks (corresponding to all the groups that a user is a member of). In contrast, the work to implement "creating chat groups" took much longer, with the API designing, client-side UI, and most of all, dealing with Redis.

Yeah, speaking of Redis, it's simple, flexible, and speedy and all, but it offers much less guarding against writing bad Redis code than things like EdgeDB (which I'd like to try in a project), and that insecurity costs much mental energy! But that always makes me feel immensely grateful for good languages like Elm and Rust that tell us with rigor that your code is wrong and often with very good hints. I'd rather be frustrated about my Rust code not compiling, than worrying that my Redis code might have holes that can misbehave and corrupt my data.

But I did manage to get acquainted with the API of "redis-rs", e.g. when to use redis::transaction and when to use just Pipeline::atomic (the former is for "check-and-set", aka "read-compute-write", and that's why it has the "WATCH" functionality in addition to what an atomic pipeline offers). I was frustrated to see that the transaction API didn't support async, but I'm starting to suspect that because of WATCH, which is meant to "block", it doesn't make sense to make it async? So then whenever you use transaction, you should use spawn_blocking?