Chat App (1)

November 2021

Huidong Yang ✉

December 4, 2021

No time for proofreading, just wanted to "jot" down some thoughts that led to November's preliminary work on the chat app. To recap, I decided to use a Rust web server framework + Redis for the backend, but initially I wanted to work on the Android client as a learning project. But soon I changed my mind: I need a quick way to check if the server implementation works, so picking a familiar tool is a much better option, that means, Elm (and Elm-UI).

As a rule of thumb from my previous experience, a learning project will work much better if you only have one or two new/challenging/unknown things to tackle at a time. Navigating through a combinatorial forest of unexplored territories can be disorienting, and thus, unproductive. Progress is a very effective motivator after all. So I need at least of PoC client while focusing on the server side design and implementation. Once this works in the browser, I can confidently say, it has to work in Android.

Naive Auth

I've seen chat demo apps (e.g. in Axum's examples), but I know that's not what I want. Persistence via a DB is a must, and also I want user authentication, contact request ("proposing"), before moving on to the real-time messaging part backed by most likely Redis Pub/Sub and WebSocket.

Although this auth has to be very naive given my absolute lack of technical experience in this domain, extreme minimalism is not the goal, in fact I do want it to have the mechanisms that are time-tested, but I don't want to look directly at how it's implemented.

All I need is the functional requirement of users being able to "sign up", "log in", "propose contact", and when accepted, open a messaging channel or chat room.

I thought about what was absolutely needed. For instance, in the minimalist way, can the "user credential" be reduced to just the username, without a password? The answer is no, but not mainly for technical reasons, but rather, from the design standpoint: username is meant to be shared with others, so although a username is unique (identifying) in a given backend, it alone is not enough to prevent others from acting as you, and that's why it is a must to have at least a pair of public and private strings that make up your credential.

Next, what is "login"? People might think of cookies when it comes to this, but in essence, I think it's some server-generated info that marks you as a recognized "user session". By session, I mean it doesn't have to necessarily expire, but represents a specific device, user agent (e.g. a particular browser) that is used by you the person, so to me, I tend to think of it as sort of a disposable/expendable credential that is auto-generated once you are verified through your "master credential", and the benefit is that, with this "temp" auth data stored in the device after you logged in, all subsequent communications with the server that require auth will be done transparently without having to entering username and password along with every request, ever again (until you log out). Now because username and password are meant to be managed by humans, while temp credentials are stored and sent for auth automatically, the latter can be made much more crypto-secure, and I think it makes good sense to pass through the wire a potentially less secure auth token infrequently (as in login), while using the much stronger, yet easily disposable auth token much more frequently with every request.

Now there are more crypto details out there. Although I haven't looked into the technical yet, I've heard of the common practice that 1) passwords are not directly stored on the backend, and 2) the secret part of the generated temp auth token is not directly sent with every request. Regarding (1), I have a guess as how it can work. Essentially, the server just has to see if the password is correct, but checking string equality on the password itself is not the only way, crypto can help: if the server, during signup, uses the password to encrypt a generated random data, and then stores this pair of pre- and post-encrypted data, then for all the subsequent logins, server can verify the password by encrypting the same data and check equality of the resulting data, and crypto can offer strong guarantee that a wrong password cannot produce the right result, and the right password cannot be derived from looking at the pre- and post-encrypted data. Regarding (2), thanks to my previous project where I had to implement an incomplete S3 client (also in Rust), which mostly involved computing the right signature of every request to be sent (using the access key secret), I therefore know how industry does this, but there's one caveat: I'm not sure if the two, i.e. "browser cookie" for login, vs. "access key pair" for API, are essentially the same thing (except for minor differences such as, API access keys typically last longer)?

Well, at least conceptually, it seems to me that a cookie does function like what's stored at "~/.aws/credentials". But that doesn't matter for now, I think this can work, and let's try this, it's simple, sounds like a good design. And the details may or may not matter, I'm trying to say here that, they should not be blindly followed, but only followed when you see why they are necessary. For instance, I've never used cookies, so for now, I'm just gonna use localStorage to store the generated auth token, does a cookie offer more security, like obfuscating stored data?

Designing and implementing a naive auth, I believe, is the better way to learn auth. When you actually see a flaw there, you will understand how the "best practices" have come to be. And also, stay skeptical :) OK, so, this has been about, "know-how" vs. "know-why", I didn't choose to go and read in detail how auth is done by the state-of-the-art servers (and the best practices change overtime). I'm more interested in knowing why something more naive doesn't work (e.g. why is it dangerous).

Side note: this reminds me of an old complaint of mine towards biology education. For example, in the textbook we can easily find a very complete description of the entire biochemical pathways of photosynthesis, it is indeed marvelous, but to me, it has always been more interesting to see the progressing studies that led to the final version of the discovery, namely, the debugging process, so to speak. Yes, you can probably find a book about all this on your own, but the point is, our education system never promoted such a spirit, encouraging students to look at this dynamic process of a big discovery, all they face is a static documentation, which is a handy thing to have in our knowledge base, but the more inspiring part definitely lies in how we moved from a naive, wrong picture towards a more correct and complete one, I believe such a journey would have taught us much more about doing research.

Warp Usage

The first step was to pick a server framework from four options: Actix-Web, Rocket, Tide, and Warp, without having to try them (not just because I'm lazy, but I suspect that trying a mere demo example won't be enough). So instead, I tried to "smell", again. 1) Actix is the most popular/feature-rich, but also the messiest. 2) Rocket is a lot like what you see in the Python world, but in Rust, I'm just not a big fan of such things (e.g. a macro/annotation-heavy design means you get less help from the type system, which is fine with Python, but Rust?). People say it's like Django, I would probably be fine with it if it were as mature, complete, and time-tested as Django. After all, being an opinionated/rigid framework that actually delights people is a very tough thing to nail! Heck, some people thought Elm was too rigid (referring to the Elm Architecture). 3) Tide was something I actually looked at last year when working on my 3rd learning project (a Workflowy/Dynalist clone, which I ended up choosing to go with Python, for back then I didn't want to deal with learning client-server, Redis, WebSocket, and Rust programming, all at the same time), but I don't know, it doesn't seem to still have the vibe (I know, very rational talking), I remember it had fairly ergonomic API, but where does it truly stand out against the rest? I can't tell. (Sidebar, total gossip here, since Stjepan Glavina left, async-std has not gained significant momentum in the competition with the Tokio gang, which, as Glavina claimed, had already absorbed most of the new designs first implemented in async-std. But async-std is still using Smol underneath, not directly though, and that is a cool and sweet thing. But in terms of the "ecosystem", Tokio is winning, hands down.) 4) Warp, it had been out of my radar, but after reading Sean's initial announcement, it caught my attention quickly, mainly the filter system, which makes things composable/reusable, due to its "functional" design. Might look a bit verbose at first, but somehow have faith in the design.

But the next day, I stumbled upon Axum, an absolutely shiny new thing, and it's API is very ergonomic, i.e. sexy! Also it's macro-free by design, and it's meant to take advantage of existing ecosystem within the larger Tokio-verse (Tower, Hyper). It's "extension layer" as the way to add a DB in the app context is also very ergonomic, the same goes with the use of extractors for request parsing... It was irresistible. Sorry Warp.

A day later, I realized that, Axum (as of 0.3.2) didn't have CORS support. I thought of learning how to implement it myself, but decided that it wasn't worth it, and indeed, as something published just this August, how could you expect it to be feature complete? Again, as a newbie, choosing the newest is a bad idea. But I felt happy to fall back to Warp, as I wanted very much to try it out anyway.

As I said, it's more verbose than Axum, but not in a bad way. Everything is done using the same building blocks, mostly, "and", "or", "with", "map", "then" (async version of "map"). For instance, you do request parsing/extraction, adding DB to context, configuring CORS, and finally defining your own route handler, all using such filters, this uniformity, this lack of special-purpose gadgets such as "AddExtensionLayer", is along the same line as Elm-UI, namely, all you have is a small number of tools, a toolkit that can easily fit into your "working memory" to just think about what you want to do, instead of looking up the documentation to find the most appropriate things to use. This is liberating. In addition, the aforementioned "verbosity" turned out to be a plus, as it is essentially explicitness, that makes the program logic clear, and free from opaque, magical devices that try to be ergonomic and concise (BTW, I'm not talking about lower-level "magics" employed to make the framework feel nicer, because these are not part of the concern of my app). In short, with its API design, everything I write is clearly declaring something about a constituent of the server that I need to pay attention to, no "shorthands" that try to hide things away.

And Warp has been around since 2018, albeit still at the same version 0.3 now as Axum. It's fairly feature complete, and it's got characteristics!

But one aspect of it I'm not a big fan of, regarding error handling. The docs recommend using custom Rejection (via reject::custom(err: impl Reject)) and then convert back to impl Reply by defining a custom handler for recover(). There I find two annoyances:

Defining your own recover function means you lose (by overriding) the default one built into Warp that you would otherwise get for free. So you'd have to replicate all the boring generic HTTP error code handling on your own, instead of just focusing on your app-specific, business-logic errors.
Returning a custom Rejection in the very final step of the filter chain, but only to map it back to a Reply in a later, messy step, is just unnecessary. In the end, you need to give a Reply, the only thing HTTP recognizes. Rejection is a nice (and necessary) concept for the intermediate filters, as you get to fail early / "short-circuit" and move on to the next candidate route (via "or"), but again, in each route handler where you are to eventually provide a Reply, it's just more straightforward to map every error situation directly to a proper HTTP status code and a custom error message for the response body, instead of deferring the mapping until the recover step. (Again, leave the generic errors to Warp's default recover logic, after all, who's the HTTP expert here?)

OK, enough time spent on writing, back to coding. Not going to proofread.