JoT
Note
coding

Android App (1)

February 2023

Huidong Yang

So, finally, Android and Kotlin, SQLite, a fairly basic, offline-only app for recording various kinds of regular training efforts. So far the basic data modeling and accessing is on the right track, getting well oriented in UI development (Jetpack Compose), and the next step will be database backup and restore. After that, it will be the next phase - fully developing the data model and UI details to make the app truly ergonomic to use.

For now, let's take a moment to reflect on the experience so far. A few points first.

  • Kotlin feels like a better Python, more so than the typical perception namely a better Java.

  • Jetpack Compose feels like Elm-UI, to some extent, but then the language falls short, not in features obviously, but runtime exception prevention. If it compiles, by no means that implies it works.

Union Type in Relational DB

Now this is the central challenge for me in modeling the app data. My SQL/relational DB has got very rusty, and because I got so used to modeling polymorphism using union types, a technique highly emphasized in Elm development, the obvious question is therefore, how do I do this with tables?

Then I realized that it has been a reoccurring topic in ORM and OOP designs, just with a different way of describing the same problem - How to model inheritance in relational DB? And the verdict is, there are three general approaches, but there is no single best solution, you just have to consider the pros and cons for your particular application at hand.

For instance, the "single-table" approach, where you merge all the subclasses (or the variants of the union type) into one big messy table, filling it with NULL values whenever unapplicable. It is to me obviously the worst way of modeling your app data, right? But being SQL, it has significant advantages because querying a single table is both easier to write and more performant to execute. And for applications where the total number of subclasses/variants (and of their attributes) is small, this could very well be a very economical trade-off to make.

But not for me. First of all, I want the union type to expand easily - whenever I add a new variant, I only create a new table, without having to migrate the existing ones, and I surely don't want to refrain from introducing new variants just because the one big table would get too big (and sparse). But more importantly, making design choices in data modeling / schema definition shouldn't be primarily based on the idiosyncrasies of the particular data persistence machinery employed, it's just not a fun thought. Relational is a good way to do DB, but not the only way, so it's not like obeying the rules of nature if we bend our design towards better SQL efficiency and simplicity. The point is, in app making, the data model of the app is the ruler, not the storage system.

Now two options left. They're similar in that the subclasses have their own tables, the difference is whether there should be a standalone table for the superclass (that is, for all the common/shared attributes), or instead each subclass has those common attributes "inlined" separately. Although the latter approach would cause those common attributes to be repeated across the subclass tables, the former approach actually occupies more "rows" in total to store the same amount of data, because each piece of data (namely object) will be stored in two tables (the superclass and the subclass), whereas the latter approach ensures that every object takes only one row. In addition, because the former approach basically splits an object across the super- and sub-table, it will need e.g. two INSERT ops in a transaction to add an object, and fetching it will need a JOIN.

But since all the common attributes live in a single table, any changes there will be localized without touching all the subclass tables. More importantly, with a superclass table at hand, we have a one-stop shop to keep track of all the instances of the polymorphic data that are meant to be treated as the same type of things in our app. After all, this is the point of modeling polymorphism in the first place, this single table puts the "union" in union types. And on a more technical aspect, I use the superclass table for primary key auto-generation upon insertion, whereas inserting into the subclass table must use that PK value. But again, this is just another demonstration that the superclass is much more than just a set of common attributes, rather, it's a comprehensive tally of the polymorphic objects that we regard as a single type.

Furthermore, this deliberate separation of super- and sub-classes across tables is in resonance with a design philosophy known as "composition over inheritance", which for a large part boils down to the idea of loose coupling between modules. So instead of thinking in terms of superclass vs subclass, namely something more generic vs something more specific/concrete, we can think that the more specific part is just a component that is contained inside the "parent" module, but conceptually "detachable". You can even think that it is semantically just an attribute of the parent class, but this attribute is too complex to just serialize into a single value, so we store it in a dedicated table, and link it back to the parent table via a foreign key. It's tangent on the idea of normalization (as opposed to embedding/de-normalization), where we keep things the least redundant and coupled.

Even if it will take more JOINs to get the answer. But to me most importantly, this approach feels the most intuitive, natural, and clean, among the three. I have the Database Systems book (G.U.W) at hand, but I'm no longer in the phase of being pedantic for the sake of just knowing the theory (normal forms etc), because that would be superficial understanding, even if you know the what and how well. That'd give you A+, but I believe only by working on real projects, and thus thinking about real problems, can one ever see the why, and that leads to true internalization.

Speaking of the book, the discussion (4.6.4) of the three approaches was very well-written, and it used a more general description of the problem, where 1) the inheritance hierarchy is not limited to just two levels (parent and children) but an arbitrarily sized tree, and 2) the sibling subclasses are neither comprehensive/exhaustive nor mutually exclusive, that is, if there are e.g. two subclasses defined, an object may belong to one, or both of them, or neither of them. In my case, however, I think in terms of union types, which is meant to be both exhaustive and mutually exclusive, so it is a simpler mental model than a general inheritance tree!

Union Type in Kotlin

Now the next question is, does Kotlin have union types? There's enum class, but that's not it (I mentioned it only because Rust used the term "enum" to denote union types). A union type isn't just about defining a set of labels/tags, but more importantly, having a different shape of data associated with each of those labels. It's those rich, independent variants that can be used to perfectly model the polymorphic sub-components that we also want to unify type-wise. (Note that you can define extra properties for the enum class, but all the labels ("enum constants") share the same list of properties, and the values are all static.)

Back to my case, recall that I decided to split the superclass (common attribute set) from the subclass variants, by creating a table for each variant containing its own distinct set of attributes. With Room, you simply declare a data class annotated with @Entity. Therefore, the natural question is, can data classes extend some other parent class?

It turns out, there was an interesting evolution story there. Initially, in 2014 (Kotlin 1.0), the answer was essentially, no, by Andrey Breslav:

The truth is: data classes do not play too well with inheritance. We are considering prohibiting or severely restricting inheritance of data classes. For example, it's known that there's no way to implement equals() correctly in a hierarchy on non-abstract classes.

So, all I can offer: don't use inheritance with data classes.

Fast-forwarding to 2016, when Kotlin 1.1 was released, this restriction was removed (emphasis mine):

Kotlin 1.1 removes some of the restrictions on sealed and data classes that were present in Kotlin 1.0. Now you can define subclasses of a top-level sealed class on the top level in the same file, and not just as nested classes of the sealed class. Data classes can now extend other classes.

Then it referred us to the sealed class documentation (emphasis mine):

Sealed classes and interfaces represent restricted class hierarchies that provide more control over inheritance. All direct subclasses of a sealed class are known at compile time. No other subclasses may appear outside a module within which the sealed class is defined.

The same works for sealed interfaces and their implementations: once a module with a sealed interface is compiled, no new implementations can appear.

In some sense, sealed classes are similar to enum classes: the set of values for an enum type is also restricted, but each enum constant exists only as a single instance, whereas a subclass of a sealed class can have multiple instances, each with its own state.

Jack-fucking-pot. This is telling us loud and clear, that sealed classes/interfaces are pretty much all we ask for as union types, but instead of defining it together with all the variants in one place at once, you can define each variant separately as a subclass or interface implementation. This syntax is fine, and actually suits the Android/Java convention of placing each Entity data class in its own file.

What we like the most about union types is that it allows us to do exhaustive pattern matching (case of in Elm, match in Rust, when in Kotlin, etc) where we don't have to worry about the default/else case, because it's guaranteed at compile time that the variants are exhaustive.

Moreover, I find sealed interface a better choice for emulating a union type, at least semantically, because we're not using it to convey shared state, but rather, shared behavior. Recall that by construction, the variants of the union type do not have anything in common (otherwise the common part would have been extracted into the "parent" type where all the common attributes are gathered). So the union type itself shouldn't store any state. Instead, it often should declare a set of common behavior, as abstract properties as well as methods. For example, recall that the ID (PK) generation of the variants is delegated to the parent entity, so it's expected that every variant should have a property that references the parent ID. Now, that is actually a bonus feature that isn't available in a regular union type! There's no way for a union type to enforce that all the variants' associated data must contain certain field of certain type. After all, a union type isn't an interface/trait.


Now, I did say that enum class wasn't what we wanted to emulate a union type, but, it is actually needed elsewhere in the data model. Here's why. Up till now, we have the variants (sub-components each with a distinct set of attributes) stored in their own tables. But when we run the app, it's often necessary to combine all the related pieces together to construct a whole view, right? That's of course where we make those JOIN queries, but the question is, how do we know which of the variant tables to join? We nee some sort of labels that correspond to the tables of the variants. They don't have to be literally the table names in the DB, but I find it adequate to simply have a label enum class whose variants ("enum constants") directly correspond to all the variant data classes that extend our sealed interface.

Because of the simplicity of an enum class, it can be easily passed around as a navigation argument in the routing code, and both the vodel and the Compose-based view can use this label to determine which variant table to join, and which UI component to display. It's easier than I initially imagined.

Speaking of routing (again, Composed-based), it's OK, but not as nice as with Elm, for it doesn't make use of union types. Surprise. Although now we know it's perfectly doable by defining a sealed interface with a fixed number of implementations to represent all possible routes and their associated parameters, the standard way in which NavHost constructs the navigation graph is by using NavGraphBuilder, and we know how builder pattern is, it's kind of "ergonomic", but never guards against missing cases, by that I mean, you don't get any compile errors when you do forget, you will instead get a crash and start wondering what the...

And a similar situation with the new way of making your own VodelProvider.Factory -

Before:

class TrainingVodelFactory(private val trainingDao: TrainingDao) : ViewModelProvider.Factory {
    override fun <T : ViewModel> create(modelClass: Class<T>): T {
        if (modelClass.isAssignableFrom(TrainingVodel::class.java)) {
            @Suppress("UNCHECKED_CAST")
            return TrainingVodel(trainingDao) as T
        }
        throw IllegalArgumentException("Unknown ViewModel class")
    }
}

After:

object VodelProvider {
    val Factory = viewModelFactory {
        initializer {
            TrainingVodel(getTrainingDao())
        }
        // ...
    }
}

The new way is clearly nicer, almost no ugly boilerplate, right? But this comes with a cost, for the same reason: InitializerViewModelFactoryBuilder, however sexy the name sounds, doesn't help you make sure that you have added all the vodels in use! Sure, these obvious mistakes are easy to catch, right? But not if you have made substantial changes to the codebase, and then you try to run the app. In my experience, the app just silently exited, without showing any error message (like "AndroidRuntime FATAL EXCEPTION"), and I was then clueless about which part was to blame. Android as a big framework often takes substantial amount of code to just wire things up for a demo, and the compiler not kicking in often enough to pinpoint the silly mechanical mistakes is a huge shame. It felt like going back to the days of debugging segfaults. Yes, I literally tried to pin the bug down by commenting out one chunk of code at a time! Is this the cool "Android in Kotlin with Jetpack Compose", or some die-hard C++ madness? I hate having to be careful again and keep track of all the mundane things in my head. I guess in a way, I've been spoiled by Elm and Rust compilers (and by many many other nice things in life). But that mentality is not really actionable. Now what's actionable? For one, I do think as Android newbies, we're currently guided towards writing shorter, nicer-looking code, even at the cost of its becoming harder to debug. { Turns out, it was my phone (running Android Pie, API 28), or perhaps how Android Studio failed to work with my phone; as I later resorted to the slow/resource-hungry Android Emulator (running API 33), it reliably showed the stack trace leading to every crash; So we're not going back to debugging segfaults - this is JVM after all. Phew. But hey, what the fuck is wrong with my Mi CC9 Pro? }

So I guess less builder pattern, more pattern matching (on union types / sealed interfaces), is a good start?

Room

Room (v2.5) w/ KSP is pretty good at giving compile-time error messages, and as an ORM, it actually doesn't try to do awy with SQL, in fact it wants you to write SQL queries! A breath of fresh air. It assists mainly in the aspect of data modeling (schema) and access API design (DAO), and I think that's because 1) there's much boilerplate that can be codegen'ed away there, and 2) more importantly, you thereby define your data in a single place, in Kotlin, so no need to manually sync up the app source code with the SQL schema. Kotlin is undoubtedly more expressive and ergonomic for model definition, whereas SQL is, well, better at querying the relational DB.

Where Room falls short is in handling relationships, namely when querying multiple tables. The @Relation annotation is not bad, but it is a toy/gimmick, and they officially admit that: when you're dealing with "nested relationships" (not an accurate term), they advice against using it for performance reasons. I found that given a decently designed schema, a single query with JOIN can achieve the same result with cleaner code, whereas by defining those "intermediate data classes" using the Relation annotation, you end up paying the price of a @Transaction of multiple queries, for apparently Room isn't smart enough to recognize that a single JOIN-based query gets the job done?

But I guess that's why Room now recommends using the newer multimap feature to receive query results. It's especially fit for querying one-to-many relationships, again, without involving a multi-query transaction. However, it has drawbacks.

  • When you fetch the multimap in the vodel (my own replacement term for ViewModel) and use it in your Compose-based view code, there will always be a brief moment initially where the map is empty, so an isEmpty() check is necessary before accessing the content of the map. In contrast, if you use an intermediate data class to receive the data, there will be no such issues. { Well, it's not Map's fault, it's just how State, or StateFlow in particular, works in Compose, or reactive UI in general - you have to specify an initial value yourself, e.g. via Flow.stateIn(), even if you use an intermediate data class instead (but because it's already an object instead of a collection, there's no need to check for empty or use firstOrNull()); Yet, this "complication" makes perfect sense: data fetching/loading takes time, always. }

  • No nested multimap, i.e., the value type cannot be yet another multimap. So in the case of a "nested relationship", you do need to resort to defining an intermediate data class. However, in my case, because the nested relationship is one-to-one, I get to avoid using the Relation annotation, instead, simply embed the two entities in the data class using @Embedded, with one caveat: there can be no column name clashing between the two, e.g. they can't both use "id" as their PK attribute.