Chat App (4)

February 2022

Huidong Yang ✉

February 23, 2022

I promise, this one will be brief and to the point. So, progress is solid, the focus has been intense but stress-free; I wanted to just keep going, so more than once did I feel burdened by this writing task. Well, the strategy is "suit yourself", clearly I would never want writing to be a distraction or anxiety-inducer, so while fully occupied, I'd just go with the flow. But at the same time, I know that pausing once in a while for reflection is a worthy effort. And now I finally feel that the moment is right.

Again no need to go into details on everything, I have my notes and git commit messages for that. Here's the space for a higher level review and reflection.

Indexes in IDB

This is kind of ironic, but I never really had a chance to make actual use of indexes in a tech that is literally named "IndexedDB". Well, I know why actually, in my previous projects, IDB is only used for the data storage part, but not the querying part, as I gauged that it's just simpler to retrieve everything on startup, so all the querying is done with custom code in memory. That made sense for Arrow, as full-text search is a big part of its action, and with a plain client-side app holding only personal textual data, in-memory is I believe the simplest and most practical way.

But for a chat app, I prefer to prioritize performance scalability over a long period of time as data gradually grows, and that essentially means efficient (and smart) data retrieval.

For instance, to load chat history into the UI, we need a way to limit the size of each batch in a "paginated" way, and we must make sure there is no duplicated or missed messages across adjacent batches. Initially I got myself in trouble, because I created a compound index on both the channel and then the message time (I needed that to do time-range queries per channel). The problem was, since time is not a unique index (even though in practice, a millisecond resolution can give such an illusion), and thus, when you use time as the resume point for the next batch, ensuring gap-less retrieval is tricky: say you retrieve 50 messages at a time, now once you reach that limit, what if there're more messages with the same timestamp as the last one in the batch? You either have to go beyond the limit to grab all the way down to the last message with that timestamp, or, if you must be strict on the batch size, then you need to remember where exactly you will be continuing the next retrieval, and that means, you need the primary key of the next record. OK, fine, and you use continuePrimaryKey() to get to that resume point in the next batch, right?

But then I got an error from "continuePrimaryKey", it took a while to figure out exactly what's going on, but then I saw the root issue: the time component of my compound index got in the way. The problem is, as aforementioned, time is typically unique per message (but again in theory it is not guaranteed), so when I specify the IDBKeyRange using the resume timestamp to open a cursor, typically the cursor is already sitting at the exact resume record! And then when I call "continuePrimaryKey" with the index key and the primary key, it complains, because according to the spec:

If key is equal to this's position and primaryKey is greater than or equal to this's object store position and this's direction is "prev", throw a "DataError" DOMException.

Basically, it's saying, "continuePrimaryKey" has to at least move one record from the starting position, so if the cursor is already at the target position when you call it, it's going to scream.

So the above only works if at the resume timestamp, there are more than one record ahead, and the target resume record is not the starting one. At first I thought "why do you have to error out if you don't have to move the cursor at all?", the above error could be triggered only under the strictly "greater than" condition, right?

But thank goodness I didn't dwell on that mentality long, soon it came to me that I used the wrong index. If I get rid of the time component, and instead just use a simple index on "channel" alone, then this problem goes away.

So the lesson is, yes, of course, if you're not using indexes, then you're not really getting the most out of IndexedDB, but more importantly, choosing the right index for a particular job is very important. Initially I thought a compound index was just more flexible and versatile, but that's not true, having one extra component could get you in trouble in a specific querying task, as demonstrated above.

Another thing, I wish indexes could be built in descending order via an option, as for certain use cases, such as retrieving recent chat history, starting with the largest key is inherently the predominant usage pattern. Currently I have to use a cursor to get this moving backward behavior, but it's slower than calling getAll.

I saw that MongoDB supports such an option. IndexedDB has an open issue proposing to add "descending" option to "getAll" etc, but not when building the index in the first place.

The Big Merge, Done

Finally, one single WS connection for both contact and group chat, and one single Message type, to unify both messages to an individual contact or an entire group (using a union type inside, of course). With the success of the foundational merge, the other separate parts followed smoothly: now there's one single chat Dict in the app state, and one single IDB object store for all the messages across all channels (ha, before that I was worried about how to dynamically trigger IDB upgrades in order to create new per-channel object stores at the right time, in a reliable way). Again with IDB, a rule of thumb seems to be, create as few object stores as possible, and use proper indexes as much as possible!

And yes, using a tag for JSON payloads is clearly the right to do. It is essentially building the decoding instruction right into the data itself. Without it, your only option is to just try a bunch of decoders in sequence, which clearly doesn't scale well. Those few extra bytes are worth it.

Plans

So originally, I wanted to switch context again in March, and move to the server side to totally revamp the authentication system so that it's no longer a naive, toy implementation. But while working on client-side state persistence (save/load data via IDB and localStorage), not only did I not feel bored, but I felt ever more engaged. So I wanted to change the plan a bit: move the client-side work up the schedule, starting the GUI design, which will also include my first ever attempt at 1) dual color themes (Dark & Light), and 2) multilingual UI. Also quite a few other things to consider: data export & import, explore browser's "desktop notification" API, even read some app bundling news after I replace Browser.application with Browser.document to get rid of the URL handling component, which is blocking the Electron-kind of bundling mechanisms (if I recall correctly, "elm/url" doesn't support "file://" protocol). Route will stay, but it's no longer derived from parsing the URL. Let's see how far we can get.

Then the plan for April for now is the server-side work, auth, security, persistence, etc, all the necessary steps to move the app closer to the "production" level quality. I know, server is vulnerable and a lot of complex remedies have been invented to battle its weakness. But I don't thinking learning these things is a total waste of time. Again, David Irvine started as a server guy, I believe without all the server experience, he wouldn't have come up with the design of the Safe Network. I'm just starting here, but it's never too late to learn. Docker, clusters, etc, even getting to use the cloud for the first time... It's a necessary step to see what's out there.

No experience, no intuition.

No intuition, no opinion.

No opinion, no new design.