September 2020 Slacking

Huidong Yang ✉

October 20, 2020

I thought I would jump right away to coding activities this month, but no, I wasn't ready, and other things weren't ready either. I ended up chilling out mostly, which gave me the freedom to read and think about the next steps starting October. In the end I am actually glad that I took the time to look around a wide variety of places where I hadn't been paying attention to for quite a while.

Summary

Scripted a tiny Rust app to keep one of my hard drives from falling to sleep every three minutes, with Ctrl+C handling.
Pluto.jl lured me to play with Julia.
Arrow dev will be back in October.
Finally time to learn Safe Network's codebase in October.

I think a big change starting next month is to embrace multitasking. I was spoiled in the past two years because I had complete freedom to work on a single project at a time. But as projects take shape one after another, eventually you will find earlier projects calling for your attention, so working on multiple projects simultaneously is inevitable, and a necessary skill as well. Back in school, multitasking, dealing with courses, research, and teaching assistantship, was a stressful matter; but the big difference now is that everything is on my own initiative. One of the most valuable outcomes of my education is perhaps being able to come up with my own projects, things that I really want to do, and I believe eventually they will be of value. So, take them all with a smile, really.

Rust Scripting

So this drive enclosure that I recently got sleeps after 3 min of inactivity, the firmware can't be configured. OK, just do little something at that frequency (or slightly higher), that should do. It's a typical scripting job, right? So, Python? I wanted to do something different though, because 1) I wanted to see how ergonomic the current Rust is, in doing simple things like this, just to prove a case; 2) this looping activity will be long-running, and ideally it should take little system resource and be independent from other processes, which is impossible with an interpreted script; 3) largely irrelevant but just for the record, Armin Ronacher's talks kind of ruined Python a little bit for me (in a good way I suppose). This code doesn't need to be frequently changed (not that it can't be refined, in fact quite a few CLI features can be built for configurability and convenience, but it's not the point of this tiny project, at least not now), so compilation time isn't an issue, I much prefer a super-light, self-contained executable. So, "scripting" in Rust then?

The first version was quick and dirty, but I did learn a trivial fact trying to do "little something" that keeps the drive awake: a root directory listing operation doesn't wake the drive up, probably due to FS caching. I am clueless about hard drives, so I could only go for something dead simple: it turns out that creating an empty file in the root directory is enough to reliably wake the drive up. So basically, the logic is as simple as creating the same empty file in an infinite loop, which can be broken by Ctrl+C.

Speaking of Ctrl+C, I wanted this simpleton app in the next iteration to have a minimal decency of cleaning up after itself, because without a custom handler of SIGINT, the empty dummy file is left behind (the presence of this dummy file should serve as an indicator that the app is running, that's the goal). But more importantly, I wanted to implement more sophisticated Ctrl+C handling for my backup app (e.g. saving progress), so I thought this would be a good place to try the "hello world" usage of the ctrlc library.

fn main() {
    let running = Arc::new(AtomicBool::new(true));
    let r = running.clone();
    ctrlc::set_handler(move || r.store(false, Ordering::SeqCst)).unwrap();

    File::create(FILE).unwrap();

    let mut time = 0.;
    while running.load(Ordering::SeqCst) {
        if time >= TIME {
            File::create(FILE).unwrap();
            time = 0.;
        } else {
            sleep(Duration::from_secs_f32(TICK));
            time += TICK;
        }
    }

    remove_file(FILE).ok();
    println!("Bye");
}

Yup, that's the hello world, I wish the AtomicBool API were less technical, but this is Rust, what do you expect? It can't be fool-proof and a low-level powerhouse at the same time! Anyway, just as an exercise, I then wanted to try doing the wrong thing and see if Rust compiler would save my ass, in action, right? So what would the compiler say when I use a plain bool? When the closure is called, shouldn't we get something like "use after move"?

// misusing `bool` across threads
fn main() {
    let mut running = true;
    ctrlc::set_handler(move || running = false).unwrap();

    File::create(FILE).unwrap();

    let mut time = 0.;
    while running {
        if time >= TIME {
            File::create(FILE).unwrap();
            time = 0.;
        } else {
            sleep(Duration::from_secs_f32(TICK));
            time += TICK;
        }
    }

    remove_file(FILE).ok();
    println!("Bye");
}

I got nothing. The compiler let the code pass (but Clippy did complain that "while running" might never end, in retrospect this is a good hint, yet Clippy is not part of the compiler). When I ran it, Ctrl+C did nothing, the app kept running! That surprised me at that moment. Something went wrong within the handler, because obviously the Boolean didn't get flipped there. Then I read the docs:

Any panic in the handler will not be caught and will cause the signal handler thread to stop.

And that made sense, when the handler thread panicked, the "move" would not take effect, and that's why you didn't hear "move after use" from the compiler. But even that wasn't the point, because to begin with, panic is a runtime thing, so it was really no surprise that the compiler was silent. I was expecting the compiler to catch me when I misused plain bool across threads, but I guess in terms of static analysis, that's a lot to ask for. Maybe Clippy will get there one day, and hopefully, all the more helpful hinting will be integrated into the compiler? After all, we the spoiled always expect to learn more about Rust just by compiling.

Nevertheless, this is more of debugging my own understanding rather than complaining about Rust. After all, what I observed was exactly how Rust code was supposed to behave: if something isn't thread-safe, like mutating a plain bool from another thread, it just won't happen.

Julia Reintroduced

This month I started to itch about new languages, although no real plan to dive into any, because there's no project. I believe project-driven learning is the way to go forward. But the Pluto.jl talk at JuliaCon 2020 really did the trick, and I started looking at the new MIT course that chose this new tool over Jupyter. Will Julia become my third language? I'm not sure, but I still believe that the skill set that people refer to as "data science" is going to be useful even if I don't have a natural project yet. Finding great applications of great ideas has never been a trivial matter in history, and for indies with minimal resources, it's hard not to feel irrelevant as the big fish are driving the advancement of AI with their big computing infrastructure.

But that's actually not the main de-motivator deep down, rather, it's my old instinctual feeling that the secret sauce of excellence in AI today is in essence "brute-force" in computational power (which includes efficiency, so I guess "smart-force" is a better term); in a similar way, a cheap calculator has long outperformed us in arithmetic, and do I feel bad about it? No, because I was expecting that the ultimate excellence in AI would come from the true understanding of "mind", at least the human mind in particular, instead of some brilliant engineering feat that, with the help of extremely beefy hardware, improves and optimizes the what out of certain set of algorithms whose principles can be explained quite plainly. Basically, I was expecting "magic" rather than "force", no matter how smart it is. But I should have known better: there is no magic, and every time some "magic" gets demystified, we typically end up seeing some kind of force that is not completely alien to us, but wielded in some surprising way. So why is the possibility that we are approaching ultimate AI with "plain" (well-defined), utterly physical (via computers) mechanisms so unattractive to you? It's ironic, because I knew neurons are physical, but I was the opposite of indifferent about learning the details of them. Well, introspection is hard, but perhaps there is a component of fear that if intelligence really boils down to "plain" power of computation, then we are no longer... magical. (I don't have the right words for this yet, I'm no dualist, I guess I expect something new and breath-taking that powers intelligence; but actually, there is still hope, after all, the state of the art such as AlphaZero is nowhere near the general intelligence of humans, a lot is missing.)

Believe it or not, the above is no digression. Without looking into Julia, I wouldn't have been thinking about this topic now. I stumbled upon AlphaZero.jl and in its introduction, Richard Sutton's essay "The Bitter Lesson" was mentioned. It's pure gold. That was my reaction reading it. It's so special because it caused so much resonance in me; it confirmed my instinct: in AI there is only computation. But this is no surprise at all if one knows the generality of computation. This is not Sutton's big point though, he was more specific about what kinds of computation could really take us forward, and that closely aligns with AlphaZero's "no human knowledge" principle indeed. But incidentally, this kind of goes back to my calling it "brute-force" again: as brute also means beast, or non-human. I can still come up with many examples where AI is nowhere near us today, but at the same time, it has passed us in many areas where goals and performances can be clearly defined and measured. But I believe the situation is not really humans vs AI though. AI is part of us. I could draw a (perhaps uncomfortable) analogy from "The Selfish Gene", namely, with the relationship between us and "our" genes. The fact is, the genes came way before us and essentially meta-programmed us. We ended up doing so much more than helping them spread, and we can actually choose to not help them. (You probably see why this can be a disturbing analogy: will AI have that kind of liberty?) But if you look at the global picture, genes have been doing great having all the bio-machines carry out the primary imperative like willing slaves. (The atom bomb situation was a scare though.) But you think we will end up having the same level of control over AI as what the genes have on us? If that turned out to be the case, then indeed we were no more sophisticated than a bunch of molecules. The hope is, with the help of our brains, we can strive to be better meta-programmers.

Back to Julia itself, no actual plan because I don't have a project for it, and I haven't done any other form of learning in the past two years. Going through some mathy exercises brought back old memories, but I now have to ask, what's the point? Just for brain workouts? Maybe that's good enough of a reason, but I have stronger motivators now. Moreover, I'm disillusioned about the field, it felt... irrelevant or pointless.

I guess I'm spoiled from working on my own projects, because designing and implementing tools is where you get to do something unique, at least some elements of which are unlikely to exist elsewhere, it feels special, as well as intuitive, and you don't have to doubt whether it's useful, because you are dogfooding it; While in non-project-based learning, despite the serious effort invested, you can't help constantly suspecting that they are mere toy examples, or something not quite within the state of the art. (In a sense, toy examples in engineering are of much less value than those in math or physics, one can even argue that there is no "toy math" or "toy physics", but there is a real notion of a "toy OS".)
Now even if one is looking at the state of the art, it tends to feel not as enlightening, compared to the great scientific breakthroughs (yet we all say that intelligence/mind is one of our last puzzles to crack), because it's more of brilliant engineering feats, more about finding clever tricks around old approaches, rather than discovering a completely novel view, some insight that finally explains the nature of the mind once and for all, like how the theory of relativity forever changed our understanding of the nature of the universe.

But Sutton's essay much brightened this gloomy outlook for me. The key point is, don't let our bias get the better of us: maybe there is no magic, or "deeper" governing law of the mind, who knows? Similarly, "the miracle of life" could also bear no never-before-seen physical mechanisms. Just stay honest and take steps of exploration. Learning and search together works great so far? So invest in it and experiment further, it's actually pretty standard of any scientific endeavor.

So this kept my psychology grounded. Maybe when I get extra time and feel up to it, I will play with Julia more and more, and learn about the advancement of AI. Hey, maybe I can find some components in my projects where Julia can be a nice fit for data-centric functionality, if standalone projects don't come naturally?

I see the foremost value of Julia being ergonomics for prototyping in data-driven R&D, and that largely comes down to syntactic sugar and interactivity, neither of which is superficial, because languages ultimately serve people and enable ideas. In principle, anything you can do in Julia, you can also do in Rust, so why bother? When I heard about Julia's touted multiple dispatch, I thought one could get much of that with Rust's trait system. Similarly, there are quite a few Rust emulations of the Elm Architecture. But I believe these emulations won't ever become as nice, because Rust can never be as highly specialized for these diverse domains simultaneously. In short, human productivity should not be underrated. And Julia is all for that.

As a side note, "Superintelligence" is a fun read, very thought-provoking, and the author is not afraid of speak controversy. Lots of speculations indeed, so I think the right mentality is that of reading a science fiction (or should I say engineering fiction). By that I don't mean the book is full of shit, as we all know, sci-fi have made good predictions, rather, it with certain level of care brings us speculated possibilities in the future, however far that is, and that is exactly the point of good sci-fi, and I think it's fair to say that the book is much more careful in the reasoning. The reading can be disturbing though, because, it's hard to rule those speculated scenarios of doom out, given that the "future" can be arbitrarily far away, or near.

Part of this month was about planning for the next steps.

Coding Plans

I've been dogfooding Arrow since its MVP at the very beginning of 2019, and January marked the end of the initial phase of development. Sort of a phase 1.5 took place in October 2019, where a UI responsiveness issue during typing in large tasks (with lots of sessions) was fixed using Html.Lazy. At that time, in-browser profiling already flagged List.filter as a hotspot, but via Lazy I got around it, and I thought it was done.

But as large tasks kept growing even larger, the sluggishness during typing resurfaced. Looking back, I was surprised that I didn't give much thought to list filtering, as I thought it was the bread and butter of functional programming, or any programming really. Plus, as Arrow was my first webapp experience, I followed exactly how "elm-todomvc" dealt with data persistence: dumping them all as a list on every save, and load them all on app start. Now I interpret that design decision as a way to deemphasize the details of the "backend" logic, which is reasonable for a front-end demo app whose model is a single list of records; but back then I thought naively that this toy approach could nicely scale to more complex models, by which I mean e.g. some hierarchical structure involving several lists of records. Given that the relationship is represented by referencing IDs rather than embedding whole records within a parent record, a single "get children" operation involves N filtering calls, each going through M records, where N is the number of children of the parent, and M is the total number of children of all parents. (Equivalently, one can do a single filtering over all M records, and each Boolean decision involves a membership check that requires going through N records.) In the end, the running time is O(MN), which is lower-bounded by O(N^2), ouch. And I feel the pain for real, dogfooding, remember?

I didn't think of any of that for a long time; again, because, who doesn't do filtering? Yet it's so obvious that Dict is the right tool to use here! Applying the "lazy" trick in the view only hides the glaring negligence in plain sight.

In a way, I'm glad that I paused the development and moved on to other projects, while dogfooding it every day, and only throughout all this time did those real-life issues reveal themselves and gradually ooze into my intuition. Knowing something "in principle" is very far away from having it internalized as your second nature. It takes time and first-hand experiences to get there. So I guess this round-robin approach is not bad for my learning projects, especially when there are a few quite different things to learn.

So OK, now is finally the time to get back to Arrow and start the second phase of development. Besides replacing list filtering with dict lookup, I will also need to revamp the data persistence mechanism to

store objects individually as opposed to lumping the whole list at a single key, and
take advantage of the proper IDB API (I was using "idb-keyval", an API simply emulating localStorage)

Particularly, I am interested in learning about data migration in this context. Of course this is no relational DBMS, but the general concept of establishing a reproducible and safe path to the new "schema" is a virtue worth practicing. The upgrade should be as seamless as I can make it, requiring minimal manual work from users, and certainly not causing any data loss or resulting in other annoying surprises.

Another sort of dilemma I want to investigate is the choice between incremental, "need-based" data retrieval, and the simple "load them all on app start" approach; the latter is very handy for full-text search, given that IDB offers no full-text search functionality. If you will end up having to fetch everything before a search, why bother with the incremental data loading, which would bring caching-related complexity? Need to think about this.

Safe Network

The Safe Network Primer by community member "JPL" was a fun read, and a mini-series of lectures presented by Erick Lavoie gave a nice intro to XOR-based routing. Thanks to how well-written and well-spoken these resources were, things seemed comfortingly demystified, well, there is indeed no complicated stuff at the high level. But I never doubt the actual source code is no light reading. In a sense, there is no mystery in how a rocket can return and land itself by reverse propulsion and control, we laymen sort of "understand" it, physics allows it, but it is a whole other world to actually do it, or to contribute to the making of it.

And I guess the difference is, whether you have dived into the details. There are lots of design and implementation details in engineering: rocket, networking, AI. High-level understanding, although can be philosophically satisfying at first, doesn't take one anywhere away from the bystanders.

Recently, the core team are finishing up a major refactoring effort, and I think the time couldn't be better to start reading the source code. Lots of components, and right now I guess it would be more appropriate to start from the high-level repo, i.e. sn_api, which includes a CLI app, from which one should be able to get to know what the network can do from users' perspective. Then if a particular technical detail seems interesting, just dive into lower-level code, with a purpose and motivation. (By the way, this is actually how my idealized, more intuitive and tantalizing form of biology education would be structured, in which students start at the most "relevant" level of analysis, by learning about how their own body and that of other living things around them are constructed to render them functional, just using a higher-level, less detailed (but still correct) understanding (e.g. one doesn't have to know muscle fibers are ultimately special proteins, before they can gain an intuitive understanding of how they mechanically move to produce force). This contrasts the predominant systematic, "scale-ordered" approach, starting from molecules, to cells, tissues, and further up. Of course, molecular and cellular biology are critical in thoroughly understanding biological workings, but I doubt it is the optimal learning plan for humans to introduce the entire discipline with the microscopic and intangible, because I believe interest and motivation have significant impact on learning effectiveness, and a simple but powerful motivator is showing that something is very relevant to someone. For clueless beginners, why should they care about protein structures? That's why I don't think starting with e.g. the qp2p repo is as effective in ultimately understanding the network enough in order to contribute to it.

I know it sounds daunting, but this is a good project, for everyone, so no matter how long or what it takes, it will be worth it building up my understanding, one step at a time. It will be fun.