Computing Project MAD: October 2025

So I've recently become aware of the Plan 9 family of operating systems and related technologies. Invented in the ‘90s by the geniuses from Bell Labs, it is a distributed operating system using a distributed filesystem - so it's not unlike in Project MAD, and as such, it's a good idea to discuss the one in terms of the other. I have revised this summary many times by now, and I’m still not sure I'm happy with it, but here we go anyhow.

There are a lot of different ways to look at the differences between the two; I'll try to avoid cynical or self-serving “analysis” that paints everything I do as right and anyone else's ideas as wrong. In part so that I can get this blog post out sooner, I'd like to focus in on one particular aspect first, and draw a distinction between the way Plan 9 and MAD handle resources in a distributed system.

I will summarize Plan 9's resource model as a being ‘pull’ model, by which I mean, the resources of the system (eg, devices and running software agents) are made publicly available, and any given program can “pull on them” (which yes, includes pushing data) from wherever the program is on the system. I'll make an analogy here of two ships, at some arbitrary distance apart in a large body of water. A ‘pull’ model involves a direct connection between the two ships, analogous to firing a grappling hook across the distance and pulling a line taut; you can then send resources back and forth across the line as needed, but because that rope is the sole point of contact, all negotiations between the two ships must go across it, from the most mundane to the most meaningful.

MAD does the opposite; it uses a “push” model. Using the same analogy, a MAD ship will send out smaller boats, each towing a line, to go out and meet other ships. Because these boats contain trusted crewmembers, they can negotiate and handle certain problems and events autonomously; not everything that happens needs to be coordinated with the mothership, meaning the line between the boat and ship can be reserved for important communications and resource transfers.

Stepping back from the analogy, Plan 9's uses a protocol called 9P that creates a shared filesystem, and then uses some filesystem functions for configuration, which selects which devices any given process should be using (eg, console, keyboard/mouse, display, etc), among other uses. While I can see the vision, the choice to encode system configuration in a way that's hard to read and write is problematic. Better tooling would help; it would be nice if the canonical configuration tool does what it does in a more user-friendly way. Many systems have trouble with that, but it's something that really ought to be considered when making your tools.

Plan 9's design works because they chose to make everything in the system a file, and make those files available over the network. The “Everything is a file” metaphor of course comes from Unix, and I will get into how MAD uses a different core philosophy another time. Making a distributed filesystem that is available across the network is also a mechanism MAD uses, though differently; I will mostly discuss that more later, too. The point is, Plan 9 decided on a standard interface for devices and other programs, a way to configure that interface, and a way to make that interface available across a network; that is the core of what a distributed system needs. What makes it fundamentally a ‘pull’ model is that they didn't do anything more than making those interfaces available; in that way it is little more than a server-client model that would work just as well using the Internet Protocol as it does under 9P, if for whatever reason you chose to build such a network over IP.

Note that under Plan 9, this resource distribution model does not consider ‘the ability to run programs’ a resource analogous to other devices. Believe me, I sympathize; I spent a lot of time thinking about how you represent the ability to run programs as a consumable resource generally, and while I imagine I have an answer, it is definitely something that can only be borne out through testing. For certain, though, the ability to run program is not something that you can package into a file and make available on the network, at least not the way Plan 9 has done it. (It might work in conjunction with the ADA, below, but not alone)

MAD's ‘push’ model also was born directly from the need to consume remote devices. At first I assumed that, as in Plan 9, you are essentially performing remote procedure calls to reach every device in the system, but I didn't like that. I decided I wanted to try to minimize network congestion, and negotiating with devices frequently comes with a lot of redundant back-and-forth for what is ostensibly a single function call on the program's side. As I described in my blog post about the history of the programming model, I got the idea to execute scripts on remote nodes so that you could do several operations on a device at once, before finally settling on a formal, agent-oriented solution.

That solution, the Agentic Distributed Application model, requires that your application be split into several programs known as agents; no agent may assume the existence of any resource that it does not declare as a requirement, and no agent shall be placed on a hardware module that doesn't meet its requirements. These agents are ‘pushed’ out to some correct hardware machine (the choice of which may be automatic, configured, or manual) when the application is loaded, and then all agents of an application simply communicate with each other, with the network abstracting away the distance between. Thus each agent becomes a proxy with which to access the resource or resources that are local to that piece of hardware.

Of interest to what I said above, you can consider the ability to run programs to be a resource in the ADA model, because the resources here are understood as constraints. This is useful, for example, when you have incompatible processor instruction sets present on a single system; you may only have AMD64 agents compiled for your program, and they shall not be run on an ARM or RISC-V processor. In the same breath, though, you can extend the compatibility of your application by adding extra agents; since most agents will be very simple, the chances that compiling them will be difficult is minimal, and if you have some particular process that is more difficult to adapt to other architectures, it is fine if that specific agent is incompatible, while the rest of the program can be run on different processor types. For example, using an ARM-based Raspberry Pi as your terminal means that you need input and output agents that compile for that processor (some asterisks here that shall be ignored), but the core of your application can depend on AMD64, which may be run on a full server, your desktop, or a laptop in a closet, whatever's easiest.

The ADA model's primary benefit is giving me, the system designer, confidence that we have enough information to say that applications and their agents can be automatically distributed throughout the system. Together with software hooks that allow programs to override the default methodology, this resource requirement model filters out places where it would be incorrect to send a given agent, and because the application doesn't require a single machine to meet all of the requirements at once (I include access to files in this), the application as a whole is compatible with the gestalt system so long as all the capabilities it needs exist somewhere. Similarly, because we control our assumptions, it is vastly more plausible to say that some Agents (especially a CPU-only agent, but really anything that doesn't have local side effects) can be paused and moved from one module to another without impacting the overall performance of the application, if for example the processor you're on becomes busy with some important task, or even if it crashes or disconnects from the system.

Under the ADA, all of these distributed applications have their own place within a shared filesystem, one that works somewhat differently to Plan 9's. Each ADA application instance has a folder in the directory, and when an ADA agent makes requests for resources, it does so relative to this application-instance root; anything that the application needs should be mounted there. Where Plan 9 makes the decision to change how any given application sees the global filesystem, MAD customize this private (but publicly accessible) directory of links. Thus for example, they would look for a console device at ./cons, rather than /dev/cons.

Also in the application directory, of course, are links to the application's agents. The application never needs to explicitly talk about where their agent is after it is deployed; they act upon the local link to that agent, which will proxy their request to the agent no matter where it is. In fact the canonical location of these agents is likely to be mounted under the hardware itself, rather than in the application directory, because that's where the process is. Either way, because these agents are filesystem links, they can be named properly, and thus be human-readable; a small consideration, perhaps, but one that helps programmers and administrators keep their sanity.

There are lots of other things that may or may not belong in this directory, depending on implementation: one good example is the application's user-specific file cache; rather than the application explicitly being told to (or independently deciding to) place that in some “/home/user/.cache/browser” folder, when the app is started, they assume that their own private “./cache” folder will link somewhere, whether that's a user folder or a private ramdisk that will vanish when the program closes. Configuration files can be similarly linked (though MAD has an explicit configuration mechanism).

These application directories implicitly containerize applications. If for example, your application depends on a shared library, they should expect that shared object to “be” within their own private directory, even if what is there is only a link. If your app depends on another program, they should expect that program to “be” within their private directory. In this way, the application works exactly the same way whether their dependencies are embedded or shared; it works the same way whether the application is “installed” or run from a private directory. And, if a user ever has a need to forcibly embed an older or modified version of a shared library, for example for compatibility reasons, the mechanism for this is explicitly already there. (Obviously, this is also a security problem that needs monitoring, but that's not difficult to address, at least in principle)

Much of the above will be modified slightly when I talk, another time, about how MAD doesn't exactly use the “everything is a file” core metaphor, but I think it all stands alone for now. This is all, ultimately, about how you organize data in a distributed system; it all exists to help minimize the assumptions of the application, and thus increase its compatibility.

I have described Plan 9 as coming in at the 60-80% mark for what I'm looking for in MAD; this blog post is a good example of ideas that are in some ways compatible, but approach things from different directions and can have very different consequences. It's easy to say in the abstract, for instance, that the push and pull models of resource distribution are not that different, because their goals and therefore nominal effects are the same; however, building a system on top of one or the other is a different task.

Part of what I see as valuable in MAD is this kind of perspective. While accessing devices remotely (pulling) feels like a perfectly valid model (it is, to be clear, how the internet works), there are some questions inherent to a distributed system that it doesn't provide an answer for or a mechanism to assist with. This difference is clearest when you are looking to solve all of those questions inherent to a distributed system - but it is a difference that any programmer for a distributed system will feel, when they are obliged to solve any of those questions for their niche application.

Having said that: the two models are not incompatible. Indeed, while formal ADA applications take some work to implement, you can informally do the same as an app developer, absent proper tooling. Even without something like Plan 9's distributed filesystem, with user consent, you could login to remote hosts (that are under their control, or to which they have file and network permissions), then copy programs over and run them, then connect to them at their known host and port. The real goal of the ADA is to do as much of this as possible automatically, including (for some simple apps) dividing your program up into agents. I'll be happy with the ADA when (if) you can take a program not meant explicitly to be distributed across multiple machines, and run it distributed anyway without effort, because that's what it takes to make a consumer-ready operating system.

That task, automation, takes some advanced thought and planning, and that's part of why I am hesitant to say that I've solved anything “for real” without talking to experts. It's too easy to find that you have assumed wrong or overlooked something and now have a much more difficult problem to solve than you thought you'd have, a problem that might have been easy if you had done it all a different way. That I can say MAD makes things easy where Plan 9 makes them difficult, also means that MAD might be doing things the difficult way when someone else can do it easily. That's simply how having different methodologies, works.

And that's part of what makes this all so much fun.

Computing Project MAD

Thursday, October 23, 2025

Plan 9: On Resource Distribution

About Me