Thursday, October 23, 2025

Plan 9: On Resource Distribution

So I've recently become aware of the Plan 9 family of operating systems and related technologies. Invented in the ‘90s by the geniuses from Bell Labs, it is a distributed operating system using a distributed filesystem - so it's not unlike in Project MAD, and as such, it's a good idea to discuss the one in terms of the other. I have revised this summary many times by now, and I’m still not sure I'm happy with it, but here we go anyhow.

There are a lot of different ways to look at the differences between the two; I'll try to avoid cynical or self-serving “analysis” that paints everything I do as right and anyone else's ideas as wrong. In part so that I can get this blog post out sooner, I'd like to focus in on one particular aspect first, and draw a distinction between the way Plan 9 and MAD handle resources in a distributed system.

I will summarize Plan 9's resource model as a being ‘pull’ model, by which I mean, the resources of the system (eg, devices and running software agents) are made publicly available, and any given program can “pull on them” (which yes, includes pushing data) from wherever the program is on the system. I'll make an analogy here of two ships, at some arbitrary distance apart in a large body of water. A ‘pull’ model involves a direct connection between the two ships, analogous to firing a grappling hook across the distance and pulling a line taut; you can then send resources back and forth across the line as needed, but because that rope is the sole point of contact, all negotiations between the two ships must go across it, from the most mundane to the most meaningful.

MAD does the opposite; it uses a “push” model. Using the same analogy, a MAD ship will send out smaller boats, each towing a line, to go out and meet other ships. Because these boats contain trusted crewmembers, they can negotiate and handle certain problems and events autonomously; not everything that happens needs to be coordinated with the mothership, meaning the line between the boat and ship can be reserved for important communications and resource transfers.

Stepping back from the analogy, Plan 9's uses a protocol called 9P that creates a shared filesystem, and then uses some filesystem functions for configuration, which selects which devices any given process should be using (eg, console, keyboard/mouse, display, etc), among other uses. While I can see the vision, the choice to encode system configuration in a way that's hard to read and write is problematic. Better tooling would help; it would be nice if the canonical configuration tool does what it does in a more user-friendly way. Many systems have trouble with that, but it's something that really ought to be considered when making your tools.

Plan 9's design works because they chose to make everything in the system a file, and make those files available over the network. The “Everything is a file” metaphor of course comes from Unix, and I will get into how MAD uses a different core philosophy another time. Making a distributed filesystem that is available across the network is also a mechanism MAD uses, though differently; I will mostly discuss that more later, too. The point is, Plan 9 decided on a standard interface for devices and other programs, a way to configure that interface, and a way to make that interface available across a network; that is the core of what a distributed system needs. What makes it fundamentally a ‘pull’ model is that they didn't do anything more than making those interfaces available; in that way it is little more than a server-client model that would work just as well using the Internet Protocol as it does under 9P, if for whatever reason you chose to build such a network over IP.

Note that under Plan 9, this resource distribution model does not consider ‘the ability to run programs’ a resource analogous to other devices. Believe me, I sympathize; I spent a lot of time thinking about how you represent the ability to run programs as a consumable resource generally, and while I imagine I have an answer, it is definitely something that can only be borne out through testing. For certain, though, the ability to run program is not something that you can package into a file and make available on the network, at least not the way Plan 9 has done it. (It might work in conjunction with the ADA, below, but not alone)

MAD's ‘push’ model also was born directly from the need to consume remote devices. At first I assumed that, as in Plan 9, you are essentially performing remote procedure calls to reach every device in the system, but I didn't like that. I decided I wanted to try to minimize network congestion, and negotiating with devices frequently comes with a lot of redundant back-and-forth for what is ostensibly a single function call on the program's side. As I described in my blog post about the history of the programming model, I got the idea to execute scripts on remote nodes so that you could do several operations on a device at once, before finally settling on a formal, agent-oriented solution.

That solution, the Agentic Distributed Application model, requires that your application be split into several programs known as agents; no agent may assume the existence of any resource that it does not declare as a requirement, and no agent shall be placed on a hardware module that doesn't meet its requirements. These agents are ‘pushed’ out to some correct hardware machine (the choice of which may be automatic, configured, or manual) when the application is loaded, and then all agents of an application simply communicate with each other, with the network abstracting away the distance between. Thus each agent becomes a proxy with which to access the resource or resources that are local to that piece of hardware.

Of interest to what I said above, you can consider the ability to run programs to be a resource in the ADA model, because the resources here are understood as constraints. This is useful, for example, when you have incompatible processor instruction sets present on a single system; you may only have AMD64 agents compiled for your program, and they shall not be run on an ARM or RISC-V processor. In the same breath, though, you can extend the compatibility of your application by adding extra agents; since most agents will be very simple, the chances that compiling them will be difficult is minimal, and if you have some particular process that is more difficult to adapt to other architectures, it is fine if that specific agent is incompatible, while the rest of the program can be run on different processor types. For example, using an ARM-based Raspberry Pi as your terminal means that you need input and output agents that compile for that processor (some asterisks here that shall be ignored), but the core of your application can depend on AMD64, which may be run on a full server, your desktop, or a laptop in a closet, whatever's easiest.

The ADA model's primary benefit is giving me, the system designer, confidence that we have enough information to say that applications and their agents can be automatically distributed throughout the system. Together with software hooks that allow programs to override the default methodology, this resource requirement model filters out places where it would be incorrect to send a given agent, and because the application doesn't require a single machine to meet all of the requirements at once (I include access to files in this), the application as a whole is compatible with the gestalt system so long as all the capabilities it needs exist somewhere. Similarly, because we control our assumptions, it is vastly more plausible to say that some Agents (especially a CPU-only agent, but really anything that doesn't have local side effects) can be paused and moved from one module to another without impacting the overall performance of the application, if for example the processor you're on becomes busy with some important task, or even if it crashes or disconnects from the system.

Under the ADA, all of these distributed applications have their own place within a shared filesystem, one that works somewhat differently to Plan 9's. Each ADA application instance has a folder in the directory, and when an ADA agent makes requests for resources, it does so relative to this application-instance root; anything that the application needs should be mounted there. Where Plan 9 makes the decision to change how any given application sees the global filesystem, MAD customize this private (but publicly accessible) directory of links. Thus for example, they would look for a console device at ./cons, rather than /dev/cons.

Also in the application directory, of course, are links to the application's agents. The application never needs to explicitly talk about where their agent is after it is deployed; they act upon the local link to that agent, which will proxy their request to the agent no matter where it is. In fact the canonical location of these agents is likely to be mounted under the hardware itself, rather than in the application directory, because that's where the process is. Either way, because these agents are filesystem links, they can be named properly, and thus be human-readable; a small consideration, perhaps, but one that helps programmers and administrators keep their sanity.

There are lots of other things that may or may not belong in this directory, depending on implementation: one good example is the application's user-specific file cache; rather than the application explicitly being told to (or independently deciding to) place that in some “/home/user/.cache/browser” folder, when the app is started, they assume that their own private “./cache” folder will link somewhere, whether that's a user folder or a private ramdisk that will vanish when the program closes. Configuration files can be similarly linked (though MAD has an explicit configuration mechanism).

These application directories implicitly containerize applications. If for example, your application depends on a shared library, they should expect that shared object to “be” within their own private directory, even if what is there is only a link. If your app depends on another program, they should expect that program to “be” within their private directory. In this way, the application works exactly the same way whether their dependencies are embedded or shared; it works the same way whether the application is “installed” or run from a private directory. And, if a user ever has a need to forcibly embed an older or modified version of a shared library, for example for compatibility reasons, the mechanism for this is explicitly already there. (Obviously, this is also a security problem that needs monitoring, but that's not difficult to address, at least in principle)

Much of the above will be modified slightly when I talk, another time, about how MAD doesn't exactly use the “everything is a file” core metaphor, but I think it all stands alone for now. This is all, ultimately, about how you organize data in a distributed system; it all exists to help minimize the assumptions of the application, and thus increase its compatibility.

I have described Plan 9 as coming in at the 60-80% mark for what I'm looking for in MAD; this blog post is a good example of ideas that are in some ways compatible, but approach things from different directions and can have very different consequences. It's easy to say in the abstract, for instance, that the push and pull models of resource distribution are not that different, because their goals and therefore nominal effects are the same; however, building a system on top of one or the other is a different task.

Part of what I see as valuable in MAD is this kind of perspective. While accessing devices remotely (pulling) feels like a perfectly valid model (it is, to be clear, how the internet works), there are some questions inherent to a distributed system that it doesn't provide an answer for or a mechanism to assist with. This difference is clearest when you are looking to solve all of those questions inherent to a distributed system - but it is a difference that any programmer for a distributed system will feel, when they are obliged to solve any of those questions for their niche application.

Having said that: the two models are not incompatible. Indeed, while formal ADA applications take some work to implement, you can informally do the same as an app developer, absent proper tooling. Even without something like Plan 9's distributed filesystem, with user consent, you could login to remote hosts (that are under their control, or to which they have file and network permissions), then copy programs over and run them, then connect to them at their known host and port. The real goal of the ADA is to do as much of this as possible automatically, including (for some simple apps) dividing your program up into agents. I'll be happy with the ADA when (if) you can take a program not meant explicitly to be distributed across multiple machines, and run it distributed anyway without effort, because that's what it takes to make a consumer-ready operating system.

That task, automation, takes some advanced thought and planning, and that's part of why I am hesitant to say that I've solved anything “for real” without talking to experts. It's too easy to find that you have assumed wrong or overlooked something and now have a much more difficult problem to solve than you thought you'd have, a problem that might have been easy if you had done it all a different way. That I can say MAD makes things easy where Plan 9 makes them difficult, also means that MAD might be doing things the difficult way when someone else can do it easily. That's simply how having different methodologies, works.

And that's part of what makes this all so much fun.

Wednesday, September 17, 2025

Programming MAD

One of the important things about the Agentic Distributed App model, as I've already said in a separate blog post, is that we need an application model that just anyone can jump in and create programs that span the gap between multiple machines. There is, naturally, a lot wrong with merely saying that this is possible, simply leaving it dangling out there with no questions asked nor answered.

After all, when people talk about distributed computing research now, they are talking among other things about problems that would stymie novice programmers: things like memory coherency, race conditions, partition tolerance, reliability, security, and of course system topology. I think it's fair to say that most programmers, to say nothing of new ones, prefer to think about and work with single-threaded logic and not address the various dangers and brain-benders involved in keeping memory synchronized between two independent machines, either of which might create a sudden shift in the status quo.

While I do have some thoughts about multithreading, and Project MAD's place in that morass, I'd like to start somewhere simpler.

Distributed Single Thread Programs

One of the things that might not be obvious from everything I've said so far, is that the ADA is supposed to among other things empower single-threaded programs that just so happen to be distributed across multiple machines. Even if these machines and the software agents on them are executing code at the same time, it remains important to create applications with centralized logic, where all events pass through a central filter, ensuring that events never conspire to have two parts of the same program working at cross purposes. This is not the only logical model that an ADA application can have, but it is the most reasonable model for many applications to take, or to put it more bluntly, it is the only route many application programmers will ever want to take. Understanding and dealing with parallel operations can get complicated very fast.

If this still seems odd, understand that many Agents under the ADA model are nothing more than handlers for specific resources, translating internal application events to external events and vice versa. If you want to process keyboard inputs, something needs to be waiting for those keyboard input events on the machine that actually receives the event, and perhaps some basic processing can be done there, but the active logic that responds to the event should be a part of the main logic loop, at least under the single-thread model. Likewise, if you are performing GUI operations as part of your program, the functions that actually get placed in a GUI Agent may be little more than batch commands that draw on the screen and handle windowing events. They may have little or no program logic involved; that may mostly happen in a centralized location, most likely the program core.

The ADA is not specifically about making multiple threads cooperating in parallel easy, much less trivial. It is about making sharing of resources across device boundaries easy. It is also about creating an application model that explains how you deploy an application into such a system, with minimal assumptions. It is about handling edge cases like only having disk access in one remote location, while most of the computation and devices used are elsewhere. It is about adding new resources in such a way that others how what you have and how to use it, even if that new resource is wildly different from everything you have so far connected. None of that obliges an application programmer to take classes on networking and distributed systems.

Granted, the ADA is more useful when you are tackling parallel or parallelizable loads. A single threaded app in a distributed system will be slower than the same on a monolithic system, but a massively parallel task can use the ADA model to natively farm out tasks to any and all processors within a system (or just to some of them) in order to make use of spare capacity that might otherwise be unused. And while it would be unkind to suggest that the ADA simply will have a load balancer built in for parallel loads, it unquestionably has everything you might need to make a good load balancer. Metrics like system load are important to consider when determining where you will deploy new agents, or which agents to deploy a workload to, and the ADA shall be able to dynamically add and remove agents anywhere in the system. Thus, for things like supercompute workloads and web service load balancing, the ADA should give you everything you need for an ideal solution, whether that ideal solution comes built in or not.

It should be emphasized that this comes without needing to design and build custom hardware. Many data center companies invest heavily in computers designed to pack as much computing power and/or storage into as small a chassis as possible, and while that remains a possibility, the MAD computing paradigm will let you create a machine that has the same capability as one of these advanced systems out of spare parts. Where space or efficiency is critical, it still behooves one to invest in specialty hardware, but it sure would be nice to know that wasn't the only good option.

The App Streaming Model (For All Applications)

It's also worth noting that there are present-day use cases for the MAD paradigm that have little to do with parallel loads. One, for example, is the game streaming model that several games companies have attempted, and to some extent succeeded at, in the past decade or so. It is sometimes very inconvenient to have multiple game consoles tied to your television, or to only be able to play PC games while sitting at your computer. The standard application model we all grew up with had no solution to this problem, and so several companies developed edge-case solutions to attempt to fill the gap. Some, such as Google Stadia, collapsed relatively quickly, while others, such as the game streaming engine built into Valve's Steam platform, have continued.

Likewise, mobile app platforms have attempted over the years to design a “streaming app model” that doesn't require mobile applications to be installed, while still having all of the same benefits to the user. This is designed mostly to remove the need for web apps, which are far from an ideal way to design and distribute applications. Web apps persist even today because they are a model that is completely understood and completely under the programmers' control. Web apps don't need much special training to create if you already have web programmers, and can flexibly handle different mobile operating systems and even desktops - but they are more limited than native apps. At the same time, from a user's perspective, it's nice not to need applications persistently installed on your phone, especially if they insist on taking up background compute resources, need constant updates, or like sending you unwanted notifications. It's nice to simply run an app without giving anything up to the people who created that app, and without going through a store and tedious install process.

And I'd be remiss not to mention interactive video peripherals such as car entertainment displays, or like Chromecast. In a way, these solutions prove the fundamental premise of Project MAD: once you get used to the idea of simply connecting your phone to a thing, it becomes difficult to understand why this has been difficult for so many years. Of course you can trivially stream video from a phone to a TV; of course you can use your phone's navigation in your car. The problem isn't complicated, and the moment these technologies appeared, we wondered why it took so long for them to arrive.

But each of these solutions is a step to the side. They are specific-case solutions to the general problem: Input, Output, and Processing are on different machines, but they are all part of the same application. Trying to take one of these existing, specific-case solutions and turn it into a general-case solution will not work well, because the assumptions do not transfer. Certainly, for the examples I've listed, you can understand that really only input, output, and processing are involved, but for example: where are your files stored? In most cases, they are stored where you do the processing, but what happens when that isn't true? For Chromecast, for instance, the dongle that attaches to your TV doesn't always stream from your phone; it may stream from the original source (Youtube, etc) and bypass your phone completely, which is good, but if you need to store preferences, that will end up being done on your phone, not the dongle nor in the cloud. (Of course, the cloud service will automatically update your history in most cases, so there is storage there, too)

So now, the model must account for input, output, processing, storage, and media source. That growing list only serve to change the question: what have we not thought of yet? What else might be a resource that we need to deal with? How do we deal with all of these existing devices, and how will we deal with any new ones? Do they each need special consideration? In which ways will we need to alter the fundamental model to incorporate each?

And that is the core advantage of a general case solution: the model doesn't change as you add new resources. You don't need to treat adding remote inputs differently from adding remote outputs. You don't need to treat adding writable storage different from adding media sources. Each Agent that you add improves the capabilities of the Application, but that doesn't complicate the model. The ADA handles game streaming, app streaming, chromecast, car displays, home media libraries, and many other edge cases with the same fundamental underlying model. Products that were headline news when they first appeared become obvious applications of the underlying technology.

Similarly, the Internet of Things device paradigm has been waiting for a model where either software on your network takes charge of the peripherals, or the peripherals can seamlessly install a service or application on a piece of hardware you control, which you can then use to control it. The former describes the ADA in general, while the latter is similar to app streaming. In either case, software must bridge the gap between devices; the question is whether the user brings their own software (which controls the devices via ADA agents) or whether the devices supply it (your phone acts as an ADA server and runs an app stored on the IOT device). As a bonus, the ADA as a framework would suggest that IOT devices with their own control apps could be controlled by ‘thin clients’ consisting of nothing more than an interface, such as a wall-mounted display. If you already have an interactive display with a wireless interface, perhaps on your thermostat, refrigerator, toaster, washing machine, oven, shower, water bottle, toothbrush, hair brush, stuffed animal, or sex toy, that display could control your other IOT devices in exactly the same way that your phone or computer could. This, too, has nothing to do with parallel computation; it only has to do with connecting devices.

Granted, I am being perhaps a touch sensationalist, but only because it is frustratingly difficult to do things that should be easy. Without the perspective of MAD/ADA, I can agree, the problems look tangled and difficult. Every addition to the model is something new, and each might add some new complication. And as I've said multiple times, if my solutions turn out to be inadequate, I still believe that finding a general-case solution to the problems created by something like Project MAD will have most if not all of the benefits above.

One way or another, I hope for a future where we can just do fun things with our devices.

Thursday, August 21, 2025

History of the MOS/ADA Programming Model

This topic is tricky to introduce. One of the central tenets to Project MAD as it is today, is a whole new model for applications, and I've said that it was the last things that came together, and it tied everything together in a neat package. What you won't know is that my early concepts of the MOS/DCA project also included a modification to standard programming models, and that legacy is still very obvious once you know to look for it.

It's difficult to ferret out exactly what I meant by it if you're just reading my notes from back then, but it was there: I wanted to expose the internals of executables and shared libraries so that, say, a shell command could execute an arbitrary function. Important to making that work is having an interactive shell that can capture typed data - more like an interpreted language's REPL loop than something familiar like Bash or Windows Command shell. And this has to be a structured tool, in some fashion; in modern applications programming, you pass nothing to an application when it starts except strings, and the application has to figure out literally everything from that.

This ties back nicely to my complains in the last couple blog posts about programmers being obliged to reinvent the wheel.

At the time, what I had wasn't anything like a plan. At most, I had a complaint. And that sparked a wish. After all, especially within the GNU toolset, how many applications are actually just wrappers around library functions? Complex wrappers, sometimes; that complexity is frequently needed so that this one tool can interact with a complex programming library function in multiple ways, or so that common data pipelines (piping a list of files through grep, as an easy example) operate smoothly and the way you would expect them to. As with a lot in programming, things that are simple to say are frequently hard to do, and if you want to actually do them, there is a lot of nuance and complexity that must be navigated.

The other side of that, though, is that the nuance and complexity spawns more nuance and complexity. Your complex tool needs to be interacted with in complex ways. Add a third layer on top, that assumes access to tools that use libraries; then add a fourth layer and more, tools on top of tools. As long as nothing changes, eventually you get a level of tooling where you can issue a one-word command to do exactly what you mean to.

But every level in between the core library and the final level of tooling can shift. Needs to shift. Every layer must be updated whenever any of its dependencies update, for security reasons if nothing else, and sometimes bigger shifts are called for. To a certain extent, there's no getting away from that - but if it were possible to have the library function that you actually care about, be executable in its own right without an application wrapping it, there can suddenly be whole levels of redundant tooling that are at least simplified, if not eliminated.

And of course there will probably be several other layers of tooling that crop up around this new concept; I won't lie. Ultimately, we create tools because we have specific needs, and as long as the tools fulfill specific rather than general needs, they will diversify. But many of those pain points that we are removing are redundant ones. Every application needs to transform text input into data streams. Applications that read from and write to data files have to pick or create libraries to that work with the specific protocols and data streams. And if you're creating a workflow that involves data files being processed, you need individual applications whose only role is to be a part of that workflow, applications that parse files and put them in a form other applications can take as input.

Many of these niche applications, that only exist to take one data format and make it available, or take text input and render it back into a data file, are at once important to someone's workflow and possibly security, but also are tools nobody wants to maintain. Nobody wants to need to maintain them. It's important that someone creates a tool that parlays between data files and workflows, but it's a thankless job, and a product nobody wants to pay for.

In short, a shell that understands data objects and types, and can use that to call a function built into an application or library directly, instead of needing to build one application after another to handle the workflow… that would solve a lot of problems.

The steely-eyed veterans in the audience are saying, “Well, we have exactly that with interpreted languages like Python. In Python, you can open a library and call a specific function out of it. If you have a python file that is meant to be an executable, you can also load that as though it were a library, and call functions out of it, assuming it's designed right.” And this is, of course correct. (For the record, I had these thoughts before I was introduced to Python. Probably. Around that time, at the latest. Again, my notes are not great.)

The difference is having that be an assumption built into the entire operating system, into the fundamental programming method, so that the entire system is built on one foundation instead of a shifting mix of dependencies from various legacies. That is, of course, making a virtue of a vice: it means starting over and doing everything from scratch, which is... painful. And these initial thoughts of mine were missing critical parts of the picture, which is why for many years I set them aside as irrelevant and a waste of time. At the time, all I was thinking was that, doggone it, we shouldn't have to reinvent the wheel over and over.

And in truth… there are still today whole categories of errors that only exist because library functions are sealed away. For example, languages like Python, PHP, Go, and many others can't easily use functions buried in libraries for other languages like C; people have to write new versions of old libraries, and each needs to be maintained separately, by different people, using different language constructs and different dependencies. Ultimately, it becomes such a different problem to maintain each of them that even if one person was willing to maintain every copy of the same algorithm, it would become untenable for some fraction of them.

All this despite the fact that ultimately, all you want is to pass parameters in and get results out. If there were no worry about dependencies or languages, if all you had to do was call the function and capture the return, it would be simple to do from any language. Of course, from a certain point of view… that's always been all you have to do, if your language is willing to parse library headers or code from other languages. But the library formats in use, Linux's .so and Windows' .dll, neither comes with a human- or machine-readable directory of functions meant to inform programmers of how to use them. If you want to know how to use them, you need specific files intended for developers to use. That means if you have an entirely different programming language and they want to use existing library functions, they need to read those files, …despite them being in another language.

In short, there's a reason why we are where we are.

When I talk about Project MAD as a change in the entirety of computing, it can sound like I'm vastly overestimating my own cleverness - and well, maybe. But these are real problems. My first concept almost sounds promising on its own: just call library functions. But that concept does nothing about compatibility and discovery problems. It hardly matters whether or not the code exists on your system if you don't know that it's there, where it is, what it does, how to call it, what side effects to expect, and what the return values and errors mean - and that assumes that errors get propagated correctly. And the details need to be exact; one bit out of place changes everything.

Just like there's a reason why we got here, there's also a litany of reasons why we haven't leaped forward. Even if we all agreed on one bright idea that we would work together on, it would be a massive effort with a lot of problems that need to be solved. And if the bright idea that we all agreed on happens to have massive holes inherent to it, because it's not as bright an idea as we thought… that's a lot of work for nothing.

That's to say, I do understand, have always understood, and will always understand, that none of this is simple, that I can be wrong, and that ultimately, even if my idea was core to anything, it would be the people who actually achieve it that deserve the credit, because it will be very difficult.

But also…

But also, various subsystems that are a part of the MAD concept (here, specifically, the Modular OS and Agentic Distributed Application model) are there specifically to solve these problems. I've described the system directory in this post on the Unified System API, but not all parts of the solution came around at the same time. Part of it, as I've just described in this post, was there from near the beginning. Many years later (five to ten, maybe) I started thinking about the directory, and specifically, having types understood system-wide and having them be a central part of the system directory. Before this, all I really had was “Call a function inside a library, so we don't have to reinvent wheels.” But it didn't make sense, quite.

I said in my first post that the ADA came about in the process of writing blog posts about the MOS/DCA concept. I was trying to explain, and things still felt incomplete. I had deduced that functions and hardware drivers needed to be sorted by function, which fed into the type hierarchy in the system directory, and I was also thinking about embedded libraries and applications, and how that made application deployment and dependency management neater. I was writing about wrapping functions with scripts so that some logic was happening on remote nodes, but that wasn't working for me.

The reason why I like the ADA goes beyond the model itself. The model slotted well into the weirdly shaped hole that the whole “software model” side of the project had. Yes, you could call functions directly instead of only being able to run applications - but why? What made that so compelling, and not merely one person ranting at the sky? Sure, it would be nice to have a directory of all the functions and types within a system, but what made that so critical to the operation of an OS that it has to be a fundamental component?

But within the ADA, you may be calling functions within your own application agents and within others'. You will be deploying agents to hardware modules specifically because there is software on that module that is required by that Agent. And it goes beyond functions and code (or, well, it doesn't, but it helps to think of this way): the ADA is also about exposing data so that applications can be monitored and debugged, but that requires types to be in some sense objective, even if that only means a type connected to the application and not a system-wide one.

Equally, within ADA, an Agent exists to handle the tricky problems of accountability. There is an entirely reasonable question when you try to, for example, perform GUI API calls remotely. The memory being used by your GUI application can't leave the display controller, or at the very least, there will be a copy of the output in the display controller, which must be managed by the GUI stack itself--while being owned by an external party. Without an agent such as the ADA's, there are massive, massive questions to be answered about how this is not an absolutely terrible idea. But if you can't do things like call GUI functions remotely, then... what even is the plan?

But perhaps most importantly, in the ADA, deploying Agents and routing messages is a system function, one attached to the ADA servers and not one implemented by applications themselves. That means that knowledge of functions, data, and types needs to be standardized at a system level. If the ADA model were implemented separately by each application, then each application would need to separately track and manage type and API version compatibility. Implementing this as a system function makes things not only possible, but easy for programmers to handle, by doing the hard work ahead of time, in the operating system.

I like to, perhaps arrogantly, think that I am touching on fundamental truths with some of these analyses. It's why I keep hammering, for example, on don't force every programmer to reinvent the wheel. It is hubris to say I alone know best (I've demurred enough that you know I'm not that hubristic), but it's still satisfying to line up a whole bunch of problems in a row, pointing at them, and say, Example one, example two, example three. Tying things up with a bow on top is satisfying, and the point of feeling genuinely excited about the ADA and Project MAD in general is that a stack of problems are placed in a box and wrapped neatly up.

It also feels good because this isn't all the work of a day, month, or year. While this talk of modifying programming models may not have happened in the first year of the project (the first part was mostly the DCA, honestly), it's at least 15 years in the making. Tricky problems that have tickled the back of my mind for years are coming together. These are problems that weren't just tricky in practice. They were tricky conceptually; it wasn't clear that there even was an answer. It wasn't hard to imagine that I was wasting my time, because there was no clear vision of the end.

There's still a lot rough, and there's still lots of room for me to be wrong. But it's astonishing how the last few pieces have made the puzzle come together. At least from where I stand, it looks promising, like there's really an answer. Not an easy one, but one that leaves us all much better off.

And even if I'm wrong, perhaps getting more eyes on the collection of problems I've been working through will help.

Tuesday, August 19, 2025

The Problem with Modern Computing

It's really hard to summarize all of Project MAD; it's one of the reasons why I didn't shy away from that particular three letter acronym. Perhaps more important, it's hard to explain why I feel so darn sure that I have stumbled onto something meaningfully important. My descriptions of things can get all tangled up, so it's easy to think that the core of the idea is equally tangled, equally lost. But I genuinely, truly, believe that there is something fundamental here.

Our world has many more computers than ever before, but each computer we sit down at can feel just as confining as the last. We're chasing after the feeling of being empowered by computers, the feeling that computers are the future, not merely the present. I suspect that's part of why so many people are jumping wholeheartedly into AI fever; they've been looking for a long time for something that will make computers feel infinitely deep, again. Like you can get lost in all the possibilities that a single computer can offer you.

In the era of the internet, it's become less important that a single computer feels like it contains infinite possibilities, because we can always gesture out the window and say, infinite possibilities are out there. But that answer is less compelling than it seems, for numerous reasons that I don't want to get into right now. A single computer can be an infinitely flexible tool, and we understood that from nearly the beginning. Computers on wheels become cars; computers with cameras become security guards; computers with arms can weld or do other labor. But many of these implementations moved away from depending on normal computers, by which I mean “the same as we use at home”, and with good reasons. Security and stability, mostly, but also… also deeper reasons.

Computers are generally split, and again with good reason, into utility and user-facing computers. Utility machines exist to get work done; user-facing machines cater to the person in control of them. What is lost in this description is that any user-facing machine could do work, even if not every utility machine can be user-facing. But that's not what we see, not what we feel when we work with user-facing machines. Even though you might be part of a larger system with great capabilities, it doesn't feel like it.

The reason is because programmers need to reinvent the wheel, whenever they try to merge two machines into one. It can be confusing to try to wrestle with the very concept of controlling one machine from another, not because that's difficult, but because it feels like it should be a solved problem already, and it isn't. There are solutions, but… as my last post tries to explain, perhaps poorly, they are kind of poor solutions. They were never meant to do what we really want them to do.

When I talk about remote procedure calls and the unified system API, what I'm talking about is controlling one program from another, one computer from another, in the same way that a programmer controls their own program. If one of those remote computers, one of those programs, is in charge of handling hardware devices, then controlling one program from another means controlling hardware devices as though they were just your own bit of software. If you use those remote computers to run software, to provide services, they should be as easy to interact with as if you were running software on your local machine. The services should always be there, ready to serve you.

But most important, you should be able to write a program that just does stuff. Not only doing stuff on your local machine, because as I just said, most user-facing machines don't actually do stuff by themselves. There are other computers that do stuff, while your desktop, laptop, or phone just… well, provides you access, in theory. In practice, because there are fewer standards than we'd like, figuring out how to control things that you own, even things designed to be controlled, can be an exercise in frustration.

Because the people who make those devices are all forced to reinvent the wheel. Collecting all of the various hardware capabilities of a household's machines, or a user's personal stuff, into one gestalt system isn't a standard function built into computers, and neither is it straightforward to make good use of them once you have them collected. If you want to connect multiple devices, you have to work with multiple vendors' proprietary and non-standard interfaces - and many, perhaps most, of these interfaces weren't meant to be controlled by another piece of software, but only by an interactive user. That, too, is because they would need to invent that particular wheel. They would need to secure it, and take responsibility for what happens when it leaves their warehouse and enters your home. It's a lot of risk for little if any monetary reward.

The reason why these problems are unsolved, is that the alternative is reinventing all of computing. That sounds extreme, but this is the argument I've laid out: software as we know it is not meant to connect multiple computers together, it is meant to be run on one single computer and nothing more.

This, ultimately, is the problem. The big one. It's the axle that must pivot if you want the world to change. Run a single program that spans two or more computers - but it's not just about running it. It's about teaching everyone who wants to program computers how to control two machines with one program. Or three computers, or four. Or an infinite number.

How do you simplify the process of making that program so that just anyone can do it? So that Joe Shmoe, who has a day job and doesn't want to go to college for programming, can hack together a quick and dirty program that makes use of all the computers in his house? What does a programming class look like, when you are teaching children to not merely twiddle with bits on a single computer, but to bring a motley assortment of machines into a room and unite them into a grand symphony, all orchestrated by a few words of text?

How do you let someone explore the infinite depths of possibility that exist within their computer? Those depths exist, but we cannot explore them, not without reinventing wheels over and over. We shouldn't need to re-solve problems that others have good answers to, because our amateur efforts will be… amateurish. Any skill we lack, and problem we can't tackle ourselves, derails a project.

Project MAD is all about making the capabilities of a computer available for you to tinker with. It is about the capabilities of a single physical machine, but also, about all the capabilities of all the things you own, thrown in a pot and stirred. It's about having a spare computer running in a closet that takes part of the workload that would be on your desktop, or your laptop, or your phone, and you don't have to set it up specifically to do those things.

It's about taking a computer that has worked fine for years and still making use of it even when you replace it with a faster and better one. It's about dumpster diving and collecting random pieces of hardware and making a real workhorse out of them. It's about buying an under-powered laptop and not caring, because as long as it's in a network with other machines, it will be just as powerful as any other. It's about finding creative uses for old TVs and monitors, beyond being picture frames.

And from the corporate perspective? It is, genuinely, about creating new products, products that people want, and it's about being able to just turn the crank and trust the process. It's about creating an operating system image that you barely touch when you ship a product. It's about shipping a hardware product and not meddling with software at all. Imagine, not worrying about completely unrelated software that is only necessary for your device because you don't know how to let consumers control your hardware except with complicated software stacks running on top of full operating system images.

It's also, on some level, about going back to computer hardware being purely a commodity, able to be produced cheaply, and not needing to be shiny, chrome-plated best-in-class specs to be useful. It means that when our need for computing power increases, we don't need to cram more into a single chip, with all the heat issues and power consumption that come as a result. It means that the top of the line chip you buy this year will work together with the chip you bought two years ago, and the chip you will buy two years from now, instead of them all ending up in the garbage.

At the same time, it's about creating hardware that really is genuinely unique. About creating new processors that are optimized for a specific task or problem, without needing to invent a new OS to run on them. It's about creating hardware VPN devices that will let you connect securely to your home system while traveling, or your work system from home, just by plugging them into an otherwise unsecure system. It's about creating software that is secure because the hardware it runs on can't be overwritten or even looked at by others in the network.

But it's also about taking your computing environment with you. If you can expand one computer by connecting it to another, then have your phone really, genuinely be your computer. Even if you aren't plugged in, massive processor-heavy tasks can be run on a machine that is plugged in, and all your phone has to power is the wifi. But it's more than just power. The system you sit in front of being the same as the system in your pocket, means that some important things aren't locked in one place or the other. Your calendar and messages, phone calls and podcasts. Your music and movies, your games and documents. You shouldn't need someone else's permission to have things you bought accessible anywhere.

All of this - all of it. Every last bit of this comes down to the idea that any program, written by just anyone, can unite multiple hardware systems, and do it as easily as we can write any other program today. When you can do that, your system gets more powerful because other systems are available. Your system gets more flexible when you have more options. You can have a definitive “core" of your system that expands to fill any container you put it in. Vendors can compete to be a thing worth adding to your system.

But most of all, once it's trivial to unite systems, we are no longer trapped within one. In a world with the internet, it feels weird to suggest that we are trapped. My mother bought a laptop just a few years ago that is under-powered and can't be upgraded, and all she really knew was that it was “slow”. She hardly needs much from it - she writes, mostly - but it was still inadequate. Someone sold a computer with so little ability that a writer feels limited using it, and there's simply no way to make it more powerful by adding to it. You can't add memory, you can't even add disk space - not except SD cards, which doesn't help when Windows is complaining it doesn't have enough disk space for OS updates.

If she could just use that laptop as a dumb terminal for a session on a more powerful computer, she would be happy. But she's not tech-savvy; me, I am, but I still struggle to combine my home server, main PC, assorted Raspberry Pis, and multiple laptops in any meaningful way. I run some applications on the server as web apps (including the one I'm using to write this, before posting), and I store files on the server so that I can use them anywhere, but all those solutions are …weird. If I tried to explain any of them to my mother, or frankly anyone else in my family, they'd dismiss the possibility. Too much effort for a result that can't just be used. Even I have to work around the limitations, get frustrated by the nuances and complications. Even I chafe at making sure all my operating systems are up to date, especially when it's all redundant. Even I, a tech enthusiast through all of my life, would like to have my entire tech world be part of a single, simple system.

It's one think to talk about the pie-in-the-sky dreams and hopes I have for Project MAD as being features. I am reluctant to promise features because it feels like I'm trying to plate the project with chrome and gold, garnished with diamonds to give it that extra sparkle. Clearly, from this post, I believe that the potential exists to do great things with the Project.

But it's more correct to say that I am constantly, constantly disappointed with computers. I am disappointed with Windows. I am disappointed with Linux. I am disappointed with Android. And while I haven't used Macs and iStuff in a long time, I don't think I'd feel much different about them, because it's not about the vendor. We simply don't have solutions to some problems, currently. We can make the best of what we have, but it's not the same as having some very real, very frustrating problems just be… solved.

Maybe Project MAD can't do that. Maybe I can't. Maybe all of my philosophizing and promises will turn up empty. Who knows, maybe I'll get in a car wreck tomorrow and since there's nobody else on this project but me, it will all vanish into the aether and be forgotten.

But the problems are real. The fact that we are working off of a flawed model is real. MAD may not be The Answer, but it is an arrow pointing in the direction we need to go. If it's never more than that, but if it succeeds in being that, then I'll be happy. If I can never make the OS, if I can never make computers that snap together like legos, if my model for distributed programming is discarded and never looked at again… but if, if all of my efforts point the way to a solution later on, then that's fine.

Part of calling it Project MAD is that I'll understand if the MAD solution isn't the one everyone embraces. But I know that there is something here. I just… don't know how to convince anyone else.

Monday, August 11, 2025

IP: The Worst RPC Stack

So I was originally going to write on another topic, and you know, I ended up going off on a side rant about something I otherwise wouldn't have really thought about, and I've come to realize that it deserves to be called out on its own. That topic is: The IP-Port-Protocol stack that we use to communicate with servers is really just about the worst form of remote procedure call that we could have, isn't it?

Now, when I say that, keep in mind I'm an author at heart - I like to make inflammatory statements to get people riled up, especially when the statement is true at heart but knowingly wrong in its particulars. You can always imagine, or find, worse ways to call a remote procedure, than this. But I say it this way because most people wouldn't really think of IP:Port addressing as being a remote procedure call at all. It is how you consume a service, though, which ultimately means that you are calling one well-defined (well, maybe) function from one program in your own program, frequently over a network.

Just on the merits of success, clearly the IP stack has done a lot for us. But we already expect more from it than it was designed for, and that's before I start talking about the many and varied requirements that I am coming to expect from an RPC mechanism, after exploring Project MAD.

The IP Suite: A Partisan Overview

So when I say “the IP-Port-Protocol stack”, it's not necessarily clear exactly what I mean. The Internet Protocol suite, which is the basis of literally the entire internet as we know it today, can be understood a lot of ways and with various levels of depth, but I'd like to break the topic up into two pieces: Addressing, and communication.

The part that I'll summarize as Addressing contains numerous mechanisms, some of which are barely relevant to the point I'd like to make, but broadly, they let you translate a name into a series of numbers (which can be summarized as a single number, if you prefer), and then all of the hardware in between you and the destination know what to do to connect you to that number, and connect others to your number. There is a service, called DNS, which translates friendly website names into an IP number, which is the official canonical number you use to reach that website. Depending on how complex the service you're trying to reach is, that IP address may actually send you to a service that distributes packets and queries to a number of different handlers, to ensure that nothing gets overwhelmed or to let you use a service more local to yourself instead of a single global server. (And I may be getting parts of this a little wrong, because it's complicated)

But it's interesting to talk about IP addresses in a local context, because for many home users, IP addresses aren't reserved for a given user or machine; they're temporary. And even when they aren't temporary, they're mostly arbitrary, meaning that unless someone tells you what an address is supposed to be, you have no context from the number itself. But it's inconvenient and therefore rare, for local users to set up tools like DNS for their home network, so that you are dealing with descriptive names rather than numbers.

But the inconvenience goes a bit further: all network packets in an IP switched network are targeted not only at a machine, but at a specific port on that machine. This port is simply another number, and although there are standards suggesting that several well-known ports should be used for specific protocols and applications, there's not actually anything that's stopping you from using the port normally reserved for websites, to do almost anything else you want. Oh, it'll be inconvenient and confusing if someone tries to access that server from a webpage, but the system itself doesn't care. This problem cuts both ways: You can use (almost) any port for (almost) anything you want, and you can't tell what a port is actually being used for just by the number in use. This becomes an even bigger problem when we start trying to manage complex data spaces with IP address hierarchies, but I'll get back to that in a minute.

The second half of my description for the IP suite is “communication.” Once you've established a data channel to a remote server, you then need both computers to speak the same language across that channel, and this is known as a protocol. The Internet Protocol itself, of course, is one such language, one that is used to get your message to the right place; but once your message is in the right place, two pieces of software need to agree on how you make requests and how results are returned. There are several existing protocols, and the most used ones are well-known, but it can still leave you feeling ignorant of how your own machine works. If you know the IP number and port of a service, but not the right protocol to use, you can do nothing with it - but, as I said, the IP number and port are in some sense arbitrary. From a certain perspective, the internet is full of billions of holes, and it is your job to cup your hands and yell code phrases into exactly the right hole. Yell the wrong code phrase or pick the wrong hole, and nothing (or nothing useful) happens. Pick the right hole, and you'll hear a code phrased yelled back at you from the other end of a pipe.

The end result of all of this, functionally, is that if you have the right combination of IP number, port number, and protocol, you can call one function on a remote machine. It is, in other words, the worst possible remote procedure call stack.

If that doesn't sound right to you, it's probably because many servers have come to expose a lot of functionality on a single port. That's all down to the parameters you pass the service when you make a request; the exposed function that you're calling is itself a directory of other functions. Perhaps, one might argue, that means that you are really exposing far more than a single function, but that argument is flawed.

In fact, generally speaking, it is the job of anything that wants to provide services to invent a new remote procedure call mechanism. It is your task to take a data block as input and decide what function to call, with what arguments, and if data is returned, what to do with that return value. It is one of many places in modern computing where the solution to a problem is, force programmers to find their own solutions to that problem. One significant consequence: every programmer is forced to find their own ways to parse, validate, and verify incoming method parameters from that data block, which provides a massive and porous surface where programmer mistakes can become exploitable program vulnerabilities - and frequently, those vulnerabilities affect general purpose services that are capable of doing a lot of things.

If you want to build a system on top of remote procedure calls, the first thing you must do is not require everyone to reinvent remote procedure calls with every program they write. It is in this context that I want to talk about how the MOS/ADA subsystems think about remote procedure calls.

RPCs Require Better Communication

The Agentic Distributed Applications model, and the Modular OS concept built on top of it, both talk about listing what remote procedure calls are available. The list of procedure calls is used for three purposes: one, it tells remote callers what is available, two, it is used to verify that incoming requests are valid, and three, it is used to direct the final result to the function that will be handling the request. It's worth noting that the IP suite in fact does not tell people what functions are available to be called on a server, and the IP suite itself doesn't verify or validate any incoming requests; if there is a service which can receive the request, that service handles it.

But the MOS/ADA RPC stack is expected to do more than that. Ultimately, it is the function of the RPC mechanism and not of the individual service's back end, to convert packed data into arguments for the function to be called, and to convert the function's return value and/or error condition into a packed data type for the return trip. This is known as not requiring programmers to reinvent remote procedure call mechanisms, or if you prefer shorthand, it is known as an RPC mechanism.

In order to do that, the system must have an awareness of data types, which is why data types are an important part of the system directory. Not only must you be able to convert incoming arguments to the appropriate data type, but you must be able to verify that the incoming data type is an acceptable match for the type expected by the exposed function. And by having those types be explicitly listed, the calling procedure can do certain checks before sending the request.

(I suppose you could expect every function to take an undefined, fully variable data type as parameter to every function, do no type checking ahead of time, and expect the end programmer to dedicate the first lines of every function to finding out whether or not they were passed the correct arguments, the way for example C programs do with their command line arguments… or you could compartmentalize those checks by having them done in an explicit precondition function, which may be an overridable hook, allowing exported functions to only contain that function's code. One of those sounds like requiring programmers to reinvent remote procedure calls to me, while the other sounds like having an RPC mechanism.)

Already, we can see incredibly clear distinctions between what I am calling RPC mechanisms, and raw server sockets that get piped straight into application code. There is a language to RPC calls, and if you want to understand that language, it can be explicitly queried, finding out exactly what the protocol expects and how to format your requests. Rather than shouting memorized code phrases at one specifically chosen holes among billions in a wall, there is a piece of paper taped on the wall telling you what to say into which pipe to receive what reply.

Of course, someone developing their own RPC mechanism to respond to queries on a raw server socket can do all this… and they can do it separately for every single application they create, because it isn't a defined mechanism, it is a private standard that they have decided on. It makes far more sense for this mechanism to be standardized and this directory to be machine generated, because most if not all of the information is already there when you compile your program (or run your script, if the language used doesn't get compiled). As long as you agree on how to translate the information already required to create and present within your program, into this directory listing, it should be relatively simple to implement.

Relatively simple, of course, does not mean easy, or even simple. There are a lot of questions about types that need definitive answers and canonical standards. Today, before any of those questions are answered and those standards canonized, it would be relatively difficult and complex. To put it another way, Project MAD has always, from the very beginning, been about doing the hard work ahead of time so that other people can simply walk in and do the fun parts. This is known in some circles as creating an operating system, though perhaps not everyone would agree.

I would argue that once you have all of those standards, you can do everything that the IP-Port-Protocol stack can do, but with names instead of numbers so that you can read what you're doing, and with documentation that tells you what's available and how to use it. A protocol, in other words, that is not only machine readable, but human readable. One that not only allows communication at runtime, between processes that already speak the same language, but allows the programmer who created a program to communicate with users and other programmers in the future, making explicit their assumptions, needs, and intentions.

The cost in exchange, of course, is complexity and speed. As with many aspects of Project MAD, that slowdown might be a real problem, but equally, the speed we have now comes with fragility and ignorance. We have built a lot of very important systems on top of tools that only barely allow us to communicate; we've discovered that we don't need much structure in order to make something like the Internet work. But as we have wanted to create more powerful and flexible tools, we've chafed at the restrictions. Better communication is needed.

A Word or Two on Addressing

So I promised to come back to a bit about the Addressing side of the IP-Port-Protocol stack. As I said at the beginning, this entire topic originated in a blog post about other things, and specifically, I was trying to list all of the problems that Project MAD tries to address with its solutions. One of those is the confusing morass that is IP addressing within application containers and private networks.

For those not aware, application containers are used to explicitly contain a piece of software and its dependencies, so that if two applications each require different versions of the same library or tool in order to function, they will not accidentally select the other app's incompatible version in place of their own. They likewise won't share some system configuration bits, which is useful if two apps each require the same embedded tool but configure them differently. There is another conversation to be had, here, about configuration, but we'll get back to it in a bit.

Suppose for the moment you had a home server farm, with several hardware servers, each running hypervisors that split the system into multiple virtual servers, and each virtual server used IP-based application containers, with some of these containers incorporating their own private IP networks containing multiple internal containers. It's fair to say that there would be a lot of IP addresses being used to direct traffic on this distributed system: each hardware server's network adapters have their own, each virtual server's network adapters have their own, each container and sub-container has its own.

Part of the very real problem with this setup is that all of these IP addresses are arbitrary, and due to the way their scopes work, it's entirely plausible that if you leave things to automated processes, random containers on different virtual servers may have the same internal IP address. They shouldn't be visible to each other unless you've done something wrong, and perhaps no part of the distributed system as a whole may malfunction, but as an administrator, programmer, debugger, or power user of the application containers, it is frustrating that what is supposed to be a unique identifier is neither unique nor identifying. An IP address alone may tell you neither what is at that address, nor can you be sure that it is used exactly once, even within your own, privately operated server room. And it may not be quite as simple as an amateur would like to change those internal IPs so that everything is unique, especially if those unique IDs are only useful for reading logs and chasing down problems in a complex and distributed application.

Part of what I hoped to guarantee with the MOS System Directory is that across an entire, theoretically infinite distributed system, it is still plausible to uniquely identify every resource, by having explicit nested scopes. Because systems list their internals, even privately, as long as you can distinguish peers from each other within their own context, you can create a correct, unique, descriptive identifier for every resource - meaning that at every level of context, you can uniquely identify all children of your peers (even ones you have no access to), and if you have permission to explore the peerages going up the stack and outwards as well as down, you can discover your system's parent's long-lost uncle's step-son's cousin's application container's database, and the fully qualified unique identifier for it and for you will tell you exactly what your relationship is and isn't.

At the same time, because the unique identifier is based on nested scopes, you don't need a fully qualified unique ID to describe anything, not unless you share no relationship except having the same system root. In fact, if you know the scope you're targeting, you can convert a local directory reference into an absolute one, and vice versa. It's still wise, of course, to use the fully qualified ID once you start going up in scope, in case there are subtleties you are missing, but it's not necessarily technically required. Once any two resources share a scope, that scope is the highest you need to go in order for the two to identify each other and communicate across the network. Even if there is a faster way to communicate, the two fully qualified IDs demonstrate the maximum number of scope changes required for one to reach the other.

It may be somewhat crude for me to say, that I believe an addressing protocol should allow you to give, get, and understand the address of that thing. It may be a bit rude to suggest that the IP protocol's arbitrary identifiers make it difficult to feel like you are in charge of the layout of your system. It is unquestionably arrogant of me to suggest that I could do better than the technology that united and changed the world. But what I don't think is arrogant, rude, or crude, is pointing out the flaws in any technology, especially where those flaws have consequences. Any system built on my ideals will have its own flaws, and I have to trust that some person smarter than me will someday figure out a way to address them... no pun intended.

And IP has its flaws. All of this is without getting into the limitations of the still-ubiquitous IPv4, which hasn't had enough addresses in a very long time and has felt for a while like it should just get replaced by its successor, IPv6, to prevent some very real, and some other mostly theoretical problems. Frankly, that's not my problem with it, not as a power user of home networks, nor as the architect of Project MAD.

My problem is that the IP stack as a whole was a very early (starting in the 1970s) way to do remote procedure calls, and all of computing has advanced and changed since then. Even if you don't like my solutions, look at the general problems we are facing. We want to combine systems together, tightly, in ways the designers of the internet protocol did not foresee at the time. If you don't like my formulation of the general case problem that needs to be solved, find a better one. But ever since I've had this general case problem that needed solving, my mind has been seeking out solutions that fit it. You can build a distributed system on top of IP using yet another layer of abstraction, and maybe that'll be necessary. But for addressing parts of a system, there are better ways than depending on arbitrary and not-guaranteed-unique numeric identifiers.

One Last Topic: Distributed Configuration

So I just barely touched on the topic of distributed configuration when talking about application containers, and it's an interesting side topic, one that deserves its own full blog post later, but I believe I've teased that before, and so it's better to have a brief discussion about it, in case it keeps slipping my mind in future posts.

Configuration in a distributed system is tricky, because at any given time, an application is under multiple, potentially conflicting policy and rule domains. The user running the application may have rules and policies. The hardware device that the application is running on may have certain specific configuration rules necessary to make it work. The distributed system may have an administrator that is enforcing certain rules and policies. The application itself may have multiple default policy sets, for example, one for power users and one for non-technical ones. And an application may be embedded in another application, and the parent application may have specific rules that it needs the embedded application to follow.

But also, general rules and policies are not the only items of interest. Anyone who has any claim over an application, such as the hardware, user, administrator, or parent application, may call out that specific application and make an explicit configuration change to it, in order to ensure proper functioning. A user may wish to start an instance of the application with a special configuration, whether that's part of a script, or a deliberate choice made by an interactive user in a shell.

It is my intention that the Modular OS, and the Agentic Distributed Applications model specifically, has an explicit mechanism for gathering and resolving all configurations present on the system. The resolution step may be accomplished by a callback hook, that is, an application can choose to apply configuration changes in a specific way or even may choose to misunderstand or ignore configuration changes, but there needs to be a mechanism that presents the application with a list of all configurations, and there ought to be a standard way to resolve any disputes.

Although I'm not quite happy with it yet, the working name for this mechanism is the Standard Environment Variable Evaluator, or STEVE. STEVE is an important part of the ADA model, and arguably fundamentally necessary for ADA to work properly, just as STEVE makes little sense outside of a distributed context.

This topic, about how best to handle distributed configuration, is one that will be determined, in the end, by people who know the topic better than I. No matter what I try to say about it now, it is a topic that will get revisited and revised once good intentions meet practical concerns, and as such, feel free to take much of what I say below with a very large grain of salt. But naively, I believe that there is a hierarchy of default and explicit desired configuration states. The least important configuration, for example, is the program default built into the code, which is used only if nothing else is configured. On the other hand, an explicit decision made by the user when starting the program is the highest priority among its peers in the configuration hierarchy.

Beyond that, though, all defaults yield to all explicit configurations, and both defaults and explicit configurations are ordered by how specific the configuration is. Configurations are less specific if, for example, they refer to an application category broadly, like Start all windows maximized. More specific would be a configuration referencing a specific application in general, then a specific application when run by a particular user, and then a specific instance of an application when run by that user, such as when the application is passed a configuration parameter when it is launched.

But there is a major exception to this rule. One of the things that came out of Project MAD is that, in a distributed system, every component has the ability and authority to refuse service--indeed, it's frankly the only control you have in a distributed system--and as such, everything that has dominion over an application, such as the hardware it's operating on, the user, or an administrator, can kill a misbehaving application if it does not follow policy. Consequently, when we talk about configuration, there is a distinction between desires and rules. A program caught not following the rules may be killed, but a program that merely disrespects the desires of others will not be.

In the hierarchy of configuration options, then, any applicable rule should take precedence over a stated desire. If, say, the Administrator refuses to allow full-screen applications, and the user explicitly tries to start an application in full-screen mode, your options when parsing the configuration are clear: do you allow the user to do something that may attract the administrator's ire, possibly causing the application to be forcefully quit, or do you prevent the user from doing what they have explicitly said they want to do?

Of course, it's even more complicated than that, because we're talking about a distributed system, which may have multiple competing sources for various software and hardware services. Suppose for example you have your private phone connected to a desktop machine, in order to use the desktop's monitor, CPU, GPU, and input devices to play games. The system administrator for that desktop machine forbids full-screen applications, but your phone is not subject to that system administrator; it is completely under your control. Your phone, however, may not be configured to not display desktop-style applications on its own screen, and for good reason - it may be too small, too low-resolution, and in the wrong orientation, and the processors may be lackluster and prone to overheating. In that context, if you want to launch a full-screen application, do you try to launch it on the desktop screen where that isn't allowed, on the phone screen that it isn't suited to, or neither?

My thoughts on this matter are incomplete, but there is one thing that I am certain of: it should not be down to a programmer to find and read configuration options across a distributed system, nor to determine what the nominal hierarchy is. An operating system for a distributed system should find and request all relevant configuration options and present them to the application for parsing. This needs to be done before a distributed application selects what hardware it is going to use, because the rules and policies governing the hardware will be used to determine which hardware is selected.

A bonus that comes from the system handling these matters, is that several lists of configurations can be made explicit: the total configuration on a system with sources, the total configuration as applied to a specific application, and depending on the configuration mechanism, you may get a list of what configuration options the application itself requests, including requests from all internal libraries and embedded utilities, which tells you what configuration changes you can make to influence its behavior. You can see a list of what configuration changes a user has made, and which of them have and have not been actually queried by applications in the last day, week, month, or year--indicating that they may be useless, malformed, or out of date.

But likewise, a system-wide configuration parser can validate a list of configuration options, providing a list of warnings and errors that indicate some options are malformed, out of date, or nonexistent. It can simulate how configuration would be applied if you used it on a piece of hardware, on a system you're connected to. It can tell you how Administrator-applied policies have affected you and your programs, and how your own configurations have affected you. It can tell you when a configuration option made explicitly simply matches the default. It can tell you when rules and configuration options have changed, in ways that affect you.

These kind of features are only possible in a system where handling configuration is a standard process. As with the other mechanisms I've described in this blog post, the current solution is “Bring your own,” or in other words, there is no configuration mechanism. Even centralized systems like the Windows Registry and Windows Active Directory don't give you all the information you could possibly want, and too many applications have completely internal and private configuration files that will never be explained to anyone, meaning the applications basically cannot be configured even though they were designed to be.

These are all systems that solved specific cases of the problem when they needed to, but nobody tried to solve the problem in the general case. Perhaps if they had tried, it would have worn out its welcome wherever and whenever application developers wanted something different. I believe that it is laudable to at least examine the problem's general case and search for solutions - but more than that, in the case of a distributed system, I think it will be necessary. To not have a standard solution to this problem will put undue burden on application developers.

Wrapping Up

I'll consider it fair if I get feedback on a lot of the points I've raised in this blog, not merely this post, as being incorrect on a technical level. Others know more than me about a lot of the things I've discussed. But the point of view of this blog, and the reason why it's worth writing, is that I see challenges ahead that we're not ready for. Building a system on top of remote procedure calls requires a lot, and our current model of clients and servers is a poor excuse for that. Arguably, the client-server model was never meant to stand in for general RPC mechanisms - but in the modern day, it's been a hammer tasked with far more than driving in and pulling out nails.

I feel like the various web services and web APIs prove this point as well as anything. Many of these are utilizing the tools meant for web pages as a middleware to enable general remote procedure calls over the internet, and as such, they are awkward, insecure, and suffer from a lot of problems whose primary source is programmers needing to reinvent various wheels. Tasks that should be mechanical underpinnings of a system are being hacked together. That's before talking about using webpages as a display in non-web contexts because standards between various operating systems are massively inconsistent, or talking about webpages being the best answer we have for a remote server presenting a local interface to the user.

The tools are already stretched. Utilizing webpages on nonstandard ports is inconvenient, even if it's the best way to get multiple separate server applications to cooperate, rather than integrating them into a single app. Trying to understand the flow of data in a container stack is a headache, but it's still easier than trying to manage dependencies on a system not built to properly handle dependencies. Trying to understand what is going wrong when a client and server communicate over a network can be maddening, but it's the best way we have to make use of another machine's computing resources in parallel with your own machine's.

If you want to merge machines and utilize one's capabilities from another, you need a better mechanism. If you don't like mine, by all means, show me up. Do better. I'd love to hear about it.

Tuesday, August 5, 2025

MOS: The Unified System API

So I've said for a bit now that under the ADA distributed applications model and the Modular OS paradigm, applications post a bunch of APIs to a central directory managed by ADA servers and by the OS. Naturally it's important to talk about what that means. And while I will try to sound like this is all well-thought-out by me, this remains one of the many aspects of the project where I will cheerfully bow to people who know more about API design - and the many other related, difficult tasks involved - than I do.

To be clear: This discussion is categorized under the Modular OS portion of the project, and to a certain extent, is independent of the ADA model. That means that the ADA can exist without this, and this can exist without the ADA. But if you want to understand the whole project as it exists in my brain, it may be important to talk about these frustrating implementation details.

Types and MOS Standard Interfaces

Before I talk too much about the directory, I'd like to get a little bit of groundwork out of the way, and that is to ask: assuming you have a massive distributed system whose capabilities are listed in a directory, what sort of things does the directory itself contain? Perhaps the easiest thing to imagine is that the directory lists nothing more than addresses where you can find system functions--it is a massive list of strings matched to addresses. If you want to invoke a particular method, in other words, you only need to send that request to a certain address, along with all necessary parameters, all formatted appropriately.

This gets complicated when you want to do more than simple projects, for a number of reasons.

One very important topic, for example, is type safety: sooner or later, you are going to be returned (or sent, as a parameter) a data object from a remote procedure call, and that mere fact is more of a problem than it may seem. Attaching methods to types, for instance, is something that is done when you create your program, not when you run it; object methods are a part of your program's code, not a part of the object per se. Nor would you necessarily want every object that is passed back and forth between application Agents to carry its own independent copy of the object's methods around, even if that would be the most portable option; that would be data-intensive and prone to security problems, as each data object at rest or in transit would be another chance for malicious actors to alter your code and inject some nefarious payload.

But at the same time, the idea that object methods are compiled into your programs creates significant issues in a distributed system. It becomes certain that eventually, due to a version difference, old and new code will view the same data stream differently, and running old methods on new objects will create an incorrect result. Sure, this won't happen within a single ADA application, assuming all affected Agents are recompiled every time the master program's object definitions are changed. But whenever and wherever you are depending on third party applications and libraries, there is the possibility that a breaking change in the definition of an object means that new and old code won't interpret the same data object the same way.

In order to address this issue, it has been my belief for some years that the system directory will also contain a list of canonical types, and if an Application fetches a canonical type from this list, they fetch the up-to-date code for that type's methods, to be used in place of compiled-in object methods. That means that if two applications developed years apart both reference the same canonical type, they will use the same method code to interface with the data object per se.

Of course, that's not a panacea. There are frequently changes to APIs that change how methods work, their assumptions and guarantees; in that kind of case, the old code would expect new methods to provide old-style results instead of what they actually will produce. Left alone, this continues to be a case where old and new code collide and produce erroneous results. And that, of course, is why I never planned to leave things there.

In fact, there are a few different problems with having exactly one list of canonical types. One is the versioning problem above, but there is also a vendor problem. Suppose that you have two competing vendors each providing a backend that, for instance, implements a GUI. In order for the GUI to work in general, both of these vendors must implement a shared, standard API, but it's also reasonable to expect that each of these vendors will have advanced features that require another, more expressive API. When you receive a data object representing these vendor-specific APIs, they may contain more information than the basic standard, and as such, you will need to use the methods from that vendor to ensure that the extra data is properly handled, even if you are only using methods and functions from the standard-compliant API. And, naturally, these competing vendors' API extensions are unlikely to be compatible with each other.

This kind of problem is already addressed in some advanced programming languages, such as Python or C#. Because certain type information is embedded in the objects, if one type inherits an interface, you can treat the object in code as an “object implementing this interface”, no matter what its actual type is, and that solves our problem above. However, to my (amateur) knowledge, these languages do not attempt to provide a directory sorted by type, and therefore, there is a problem here that they largely leave unsolved, one which the MOS project must address to accomplish my goals.

Specifically, the modular OS requires that Applications are able to specify an interface API and receive from the OS or ADA server, a list of providers of that API. Suppose for example those two competing GUI vendors exist on the same system, and suppose that the application being run isn't targeting either vendor's specific API, but rather, targets the core OS standard. When the application is first run on this system, the application (and by extension the user running it) needs to be faced with a question: which API provider do you wish to use to implement your GUI? And in order to be faced with this question, the listing of providers for the GUI should include both vendors, even if they are each providing vendor-specific implementations of the underlying API interface. When you talk about listing canonical object types, then, these two competing vendors both implement the canonical GUI API interface, in addition to providing a specific vendor-specific version of that interface, and so both must be listed when you ask for a basic GUI API object.

If this sounds trite, it's probably because you haven't yet understood that I am an idealist, and arguably a perfectionist. It may sound trivial to list various vendors, and various versions, of API interface providers, as child nodes under the core standard, within the directory. This would let you not only use competing vendors to implement the same interface, but would let old programs continue to use old interface versions (in case something has changed) while newer programs can use the modern interface.

But wait, that bit about versions contradicts what I said earlier, doesn't it? You want new and old programs alike using the same codebase, not different ones, don't you?

There is a distinction here: I said that old interface versions, not old code, would be listed in the directory. That means, for instance, that when a library is forced to change how a function works, in order to fix a security vulnerability, it is expected that they will provide shims that translate calls aimed at the older API to calls compatible with how their code now works. These shims use the assumptions and requirements of the older API version, but may error out if the program does something unsafe. This is still not a true cure-all, because sometimes the unsafe behavior is actually part of program logic and can't be removed, but it significantly closes the gap, and allows API interfaces to evolve in form and function without necessarily breaking old code.

When I say that I am an idealist, however, that extends beyond ensuring that this kind of capability exists in the system directory, into describing its form.

Directory Nodes and the Virtue of Canonical Shorthand

All of this brings us back to the question: what is actually stored in the system directory? What do you actually get when you query the directory and get information back about a particular node? At the very least, there needs to be some kind of type data there. If you were to fetch a GUI API interface, for example, you would need to know what methods to expect on that object, what types are expected for parameters and return values of those methods, and what data may be contained within the object itself, along with any type data for that data if it is accessible.

But also, the node in the system directory has other specific data. If the node represents an API type, for example, you may want a listing of versions and vendors provided for that type. If the node represents a file on disk, you may want metadata about that file rather than its contents. In other words, the directory node is its own object with its own data, and is simply another standard object type. And because the directory itself is a hierarchical tree, one of the functions of this directory node object will be confirming that what comes after it in a request string, is valid. If you take an object that has no children and attempt to get a child object from it, that is invalid, and equally, if you take a node that has no alternate versions and attempt to get a different version of it, it is invalid.

This is required to be a function of the node, and not a centralized system function, because the system itself is distributed. You may not know until the first time you query the directory, whether your specific combination of Application ID and running user ID is permitted to perform a specific operation. If you query an application and determine that are allowed to, for instance, list all the files stored by within, you may be directed to another piece of physical hardware in the system that actually contains those files, and the provider there may not allow you to actually read any of those files. Likewise, you will need and may be permitted to read a listing of methods available on that node object, but you may not be allowed to call any such method, or you may be allowed a list of child nodes under that directory node, but not be permitted to use any of them.

This task of handling directory permissions on a cascading basis is a distressingly un-optimized solution, and there may be no optimized solution, but you can see how the distributed nature all but requires it, because each independent Application server may have different access controls.

But one of the sub-topics here that I find interesting is the use of shorthand to minimize string pattern matching, especially when dealing with standard queries. For example, if the list of API versions is always stored under the sub-directory “VERSIONS” within the directory, every time you need to request that list, you must check that an eight-character string exactly matches (or worse, loosely matches) the canonical name. It only gets worse if you try to make that mandatory identifier string something that no Application programmer would never use themselves; as an example, Python likes to have two underscores at the beginning and end of certain reserved words, which would take the eight character string VERSIONS and produce a twelve-character string, __VERSIONS__, which must be exactly matched on every check.

Given that something like the Versions subdirectory is actually a mandatory, system-designated resource, and given that latency is a critical problem in distributed systems, it seems reasonable to use shorthands to minimize characters in a directory path string, and this is a topic that I actively enjoy theorizing about.

For instance, I've mandated in this post already that type information be provided very frequently. It is attached to data, to parameters, as part of directory nodes, and it is also listed in the directory itself, meaning that a canonical type string is an ADA-compatible URI string. Assuming that the system directory starts from a common base, you would generally expect that all type strings begin with “/API/” or similar. And while five characters is not an offensive quantity, it's a lot to ignore repeatedly when reading a list of types. But this goes beyond API calls; for instance, it will be very common for Applications to reference a URI that is relative to their own root within the directory, or relative to their user's directory root. If all these calls needed, for example, “/Applications/Example/” or “/Users/GenghisKhan/” prepended to the URI, this would need to be parsed and re-parsed constantly, each time ensuring that the strings exactly matched.

Instead of starting every ADA URI from the root of the system directory, it is my expectation that the first character in the URI determines which of several possible directory roots the request starts from. I'll use placeholders here, but for example, “%GUI” may be short for “/API/GUI”, or “@Documents” may start with the currently running user's root, accessing the subfolder Documents. “$Cache” might be the cache directory for the currently running application, while of course, “/” remains the symbol indicating that you start at the full system root.

Likewise, once you have a directory node and are trying to perform an operation on it, it only makes sense to use shorthand for some functions, not least because it could prevent namespace collisions. For example, you may want to use an internal method name that is in some contexts a reserved word, like Versions or API. If you wanted to call the method “API” on your application, that string may look like “$.API()”, where if instead you wanted to get a listing of the APIs exported by your application, that request string may look like “$API/” or “$%”. Because the “.” operator would resolve to a method name (whose canonical name might be, for instance, “$Methods/API” or similar), there is no ambiguity between the strings. And if you are trying to find specific version of an API, perhaps that will be found under "%GUI:4".

There may, of course, be a problem with readability depending on how you implement this, and I acknowledge this is a limitation, especially when dealing with the basic ascii characters present on your average keyboard. Too many basic symbols look like each other or like letters; $ and S, the various bars like \|/, the various dots like .:;, and so on. Presumably if it became common to use specific unicode or advanced characters for path operators, you would start finding them on keyboards, but one way or another, I'm sure that it's a solvable problem, if not one for which I have a genius answer prepared.

The point of these symbols is truly only to act as a bridge between three requirements: these path strings should be short, human-readable, and unambiguous. The various operating systems that exist today have largely eschewed the idea of advancing the language of path strings, and I think that's a mistake. Using long names for concepts that are unambiguous in a given context is not ideal either for readability nor for fast parsing. If you can represent that by using a single reserved character, that's preferable to matching long strings. Granted, it's only really critical here because of how often we'll be performing these parsing operations one after the other, but I still think it's important.

Honestly, if the perfectionist in me had my way, I'd also compile some path requests directly into domain-specific function calls, so that this parsing was done at compile time and not runtime, for instance, having versions of the ADA path search or call that are specific to various types as implied by the operators used on them, but I'm not going to insist on that. Goodness knows, this concept is complex enough as it is!

Shims, Type Conversions, and CODECs

One of the interesting consequences to everything I've said above is that the filesystem coexists with type information and objects containing methods. It is therefore plausible that when you can be assured a file on disk matches a system type, that file on disk can have methods as though it were a data block retrieved from a program. Which is convenient, because in order to retrieve the data from that file, you need a program which will return it as a data block.

In other words, it is to be assumed here that the system directory will allow you to make the border between data-at-rest and application data very fuzzy. If you wish to store data objects at rest on a disk, or you wish to translate a stored data object into a live version, you almost don't need to worry about it at all. You could, for example, have an image file with the same kind of methods that an uncompressed image in memory would have. (It might be difficult to make a clear distinction with syntax alone, between changing the at-rest version of that data, and changing a copy of the data, but I have faith it can be done.)

All of this comes from the public type directory, and specifically, the fact that these public types include the type's method code for distribution to Applications. The way programs currently work under modern operating systems presumes that each application will figure out its own way to translate data formats into live data, which means that for certain standard formats, you can have many applications each using different third-party libraries of various quality all to accomplish the same task. And while you can have multiple providers of the same API under MOS, this is no different from dealing with live objects; the fact that these types are public means that you can have old programs use new code, or keep multiple programs using the same code between them, as long as the providers that you are using provide API shims that the old programs expect.

But there are more uses for API shims than this. I've said that you can use API shims to keep compatibility between new and old versions of the same software, and perhaps you read between the lines and might have realized that competing API vendors can create shims that let them work on each other's data objects, extracting and encoding information that is stored differently by the competitor. If that seems odd, it is; I am talking about Vendor A providing a shim compatible with Vendor B, such that if you are using Vendor A as your back-end and someone requests advanced functions using Vendor B's API, Vendor A can still provide them, while presenting the requesting program an object that claims to be from Vendor B. This only makes sense if these shims are completely compatible - but when the API, its requirements, its assumptions, and its guarantees are all public knowledge, that isn't an impossible task. Difficult and perhaps unlikely, but not impossible.

But perhaps more interesting than that: API shims can be used to convert various objects into various other objects, by allowing one data type to masquerade as another. This happen a fair bit when we are talking about serializing data objects to and from text, for instance, but in our case it's equally reasonable to, for instance, convert a live data format into an at-rest data format, and vice versa. In that case, for example, a provider of an image manipulation API may provide a shim that allows you to treat JPEG images stored on disk as though they were an instance of that live data object. Or, the type for at-rest JPEG objects might include a shim that allows you to encode any live image of the standard type into that form. Or… it's possible that I have those two examples backwards. That sort of thing depends on implementation and finalized design, and to be frank, there's no reason for me to care which way it goes at this stage.

One interesting side topic in API shims I already touched on in my first post, and that is that ADA applications can target an API provided by an embedded library or application (For example, $/libraryX%XLib, that is, embedded Library X within the current application providing an API called XLib), and that embedded library or application might in fact do nothing else except serve as a translator for the API calls to another format. This kind of case would be very useful if you wanted to ensure forwards compatibility; you are guaranteeing that your application has a specific place where you can place your own shims, or equivalently, where hackers and power users in the future can place their own shims in order to provide your application compatibility with libraries and applications that have changed since your last update to the application. This would be most useful when applications are being abandoned; it allows third parties who do not have access to the application code to change how your program interfaces with the system years after maintenance has ceased. If GUI standards have changed, if required libraries have security flaws, or if your preferred library vendor has also quit and now your application needs to target another, it would be trivial in all these cases to add shims.

Obviously, for some more-secure applications and for things like multiplayer games, some things need to specifically not be shimmed or exposed externally, but that can be monitored by checking that the code remains in a known state. And because this choice to add internal shims to your application is done by the application developer themselves, it can provide as much or as little flexibility as the developer wishes.

Reaping the Benefits: Shell Scripts as Programs

So after having talked about what I want and why I think it's necessary, there's also an obvious question to be asked: is there really a significant benefit for all of this extra complexity? And I'd like to think that the question can already be answered “yes” from the above, but there's more to say about the benefits of this advanced system directory.

One thing that I have always assumed, but perhaps not made explicit yet, is that the system directory can be navigated by an interactive text-mode command shell, and the addition of all the type system information to the system directory makes it possible for shell scripts to become nearly as powerful as programs. I suppose I'm talking about something more akin to an interpreted language like Python's REPL loop, but with the added advantage of having data from the filesystem and from external programs just as accessible as data from your own program; in either case, the result is the same.

Most importantly, this capability exists without needing to create specific code libraries and application versions dedicated to command-line tasks. For example, the Linux shell has had multiple people try to make shell- and script-compatible GUI frameworks, but they are all somewhat more awkward than doing the same tasks in a first-party library. Under MOS, in theory, you could use exactly the same GUI functions while scripting as you would while running a compiled application - though, you would need to set up, for instance, Agents to handle your GUI memory.

Now, it's fair to frown and even balk at the idea of interactive shells having Agents, because Agents are explicitly and literally on another hardware machine, and multiple Agents will be running at the same time. But at the same time, most Agents are not actually the core of the application, and are not producing events apropos of nothing; they are awaiting external events and providing functions to the main application. So it makes a certain sense for the user of the Shell to simply be able to context-switch their command interpreter to be currently working from and executing commands on a different hardware node, with all of the other Agents, including what would otherwise be the application core, simply responding to events from wherever your command interpreter currently is. It only really gets complicated if more than one Agent is the source of new events at any given time.

In this way, you could build a complicated and interwoven application line-by-line, by switching to various Agents, defining code on them, then switching away and utilizing the functions you left behind. It would… perhaps be very slow. But it would allow you to explore the capabilities of your system, piece by piece.

Speaking of exploration, one of the nice things about having a directory of types being a feature of your operating system is allowing programmers, power users, and shell scripters (some of which are not properly programmers) to explore the system as it is in fact, not merely trusting to theory. As such, I have always assumed that this interactive system mode comes with ways to translate APIs and type information into interactive text, a feature that is so obvious it barely needs to be said, but is also so exciting for power users and new programmers that it cannot be underestimated.

This “shell plus system API directory” allows you to simply interact with systems and data on your system, and to simply understand how your program's data works and what data you have been returned by various functions. Yes, you can write programs, but there are some tasks and some requests that should be commands, not programs. If you want to, say, read every temperature sensor on your system and output them all as text, that shouldn't require someone to make a program; the list of temperature API providers is a native function of the directory, and iterating through them, reading them and translating the result to text, are all standard shell operations. If you want to write a shell script that, for example, raises an alert every time a temperature sensor goes above a certain level, that's likely just as straightforward. You could write that as a program and compile it, but you could also just write a single command that performs the action for you, at least during your current session.

That is, ultimately, exactly what the premise of the Modular OS project always was. If the system has capabilities, you should simply be able to make use of them. If you need to read a sensor, the sensor should be providing an API that lets any user, and command interpreter, extract that data effortlessly. If you want to create a GUI window, you should be able to issue a command that creates a GUI window using the native system capabilities, even if making good use of it is somewhat more complicated than that. And the same with other things; if you wish to record sound, or play back sound, or convert text to sound, or convert sound to text, as long as those capabilities exist within the system, the existence of the capability should necessarily mean that you can issue a command to do that thing.

In fact my ancient notes from the first years of the MOS/DCA project have random, difficult-to-parse thoughts on exactly this topic. I was thinking about narrowing the distinction between library functions and shell scripts; I thought, surely, if you just expanded the capabilities of your command parser, you could capture the data block being returned by a library function in a variable, and pass it as a parameter to another function. Those thoughts were naive; I was really thinking about reducing code duplication. The idea only began to make sense once the type directory was added to the Project, but my point is, the thought was there from nearly the beginning. (Actually, the papers aren't dated, so it may have been from some years afterward; I no longer remember, except that it is not recent)

The Modular OS Unified System API

In this context, what exactly do I mean by the Unified API? Well, I mean that everything, from compiled applications, to shell scripts, to remote procedure calls, everything can be on the same page, in ways that current operating systems can't be. If you want a shell script under Linux to be able to read a JPG file and display it on screen, that's… complicated, unless you have the exact and specific tools already present on your system. It feels like it should be possible to just do, because other programs on your computer can do all of those things natively.

But the APIs between programs and the shell are not unified under Linux, nor under any other OS. The shell simply isn't meant to do anything and everything; it was meant to, first and foremost, load and launch other programs. Making a shell capable of doing ever more, making it more and more flexible, that makes less sense than making individual programs to do the specific things that you want.

Except that eventually, a shell that is considered feature-complete becomes the place where you make the system do what you want, and if you can't do what you want with the shell, it suddenly stops feeling complete. If the system can do something, but you can't simply command the system to do that thing, it feels like you should be able to, like the function should already exist. So administrators and power users sometimes feel cheated by the lack of a program doing the very specific thing they want, and they sometimes treat the people who create and maintain the programs that they rely on poorly, taking them for granted. While using a shell, it feels like you should simply be able to do, what the system can do.

Perhaps nowhere is this more obvious than hardware. I am something of a tinkerer; I have many Arduino-compatable devices around, and have an Adafruit micropython-based macro keypad constantly to the left of my keyboard. I have for many years followed with excitement peripherals such as keyboards with blinky lights, dials, and text displays, or side USB monitors. It feels like making use of these peripherals should be easy, but the systems were never meant to be played with on the kind of API level that I'm talking about. Peripherals on most operating systems still mostly depend on low-level busses, kernel drivers, and programs designed very specifically to interface with one piece or brand of hardware. They do not simply have their capabilities exposed.

In contrast, for example, suppose that a standard 16x2 text display is connected to your MOS system. This is a very cheap and very standard kind of external peripheral, and it usually connects to your system via one of a few low-level busses, such as SPI, I2C, or serial. On modern operating systems, if you want to make use of this display, you need a program that controls the low-level bus and sends the appropriate commands, and whenever that program or one like it isn't running, the display is useless.

Under MOS, I will assume that text-only displays in general have their own unique standard API, and potentially that small displays are a subset of that, and that the text display attached to the system can be recognized as being what it is. A small driver program would be required to translate the bus commands to the device, but once that is done, you can simply treat the text display as being a text display, and any application in your entire system that expects a text display can find it and output text to it. Obviously if they expect a lot of free space, that will be a bad idea, but if you want to, for example, use the display to output temperature data, that may be the kind of thing that doesn't need an explicit program, and can simply be done by issuing commands.

The idea extends to other displays. Under current operating systems, there are only two metaphors for displays: either they are a part of the Desktop, or they are some specific device driver that has to be targeted specifically. But under MOS, if you attach a small display that isn't part of the desktop, it may still be treated as a display--though perhaps not a hardware-accelerated one--and all normal display functions should apply. If that display also has a touchscreen, then your normal display that is not on the desktop becomes a customizable button, or several. If you want to create that set of customizable buttons with commands instead of an explicit program, that can work.

In short, you can just play with the capabilities of the system. That is the goal, and it has always been. That is what gets me excited about something like the Modular OS project. It is why I have continued to return to it year after year. Because operating systems today simply do not let you play with the system's full capabilities. You can't interface that simply with hardware, or make use of system functions natively from the shell. You can't just treat your system like all of its abilities are at your fingertips.

But you should be able to. It's your system. You're in control. That's the ideal behind the MOS Project, and while that's difficult, and probably quite naive, I still think it's worth chasing after.

Computing Project MAD