This topic is tricky to introduce. One of the central tenets to Project MAD as it is today, is a whole new model for applications, and I've said that it was the last things that came together, and it tied everything together in a neat package. What you won't know is that my early concepts of the MOS/DCA project also included a modification to standard programming models, and that legacy is still very obvious once you know to look for it.
It's difficult to ferret out exactly what I meant by it if you're just reading my notes from back then, but it was there: I wanted to expose the internals of executables and shared libraries so that, say, a shell command could execute an arbitrary function. Important to making that work is having an interactive shell that can capture typed data - more like an interpreted language's REPL loop than something familiar like Bash or Windows Command shell. And this has to be a structured tool, in some fashion; in modern applications programming, you pass nothing to an application when it starts except strings, and the application has to figure out literally everything from that.
This ties back nicely to my complains in the last couple blog posts about programmers being obliged to reinvent the wheel.
At the time, what I had wasn't anything like a plan. At most, I had a complaint. And that sparked a wish. After all, especially within the GNU toolset, how many applications are actually just wrappers around library functions? Complex wrappers, sometimes; that complexity is frequently needed so that this one tool can interact with a complex programming library function in multiple ways, or so that common data pipelines (piping a list of files through grep, as an easy example) operate smoothly and the way you would expect them to. As with a lot in programming, things that are simple to say are frequently hard to do, and if you want to actually do them, there is a lot of nuance and complexity that must be navigated.
The other side of that, though, is that the nuance and complexity spawns more nuance and complexity. Your complex tool needs to be interacted with in complex ways. Add a third layer on top, that assumes access to tools that use libraries; then add a fourth layer and more, tools on top of tools. As long as nothing changes, eventually you get a level of tooling where you can issue a one-word command to do exactly what you mean to.
But every level in between the core library and the final level of tooling can shift. Needs to shift. Every layer must be updated whenever any of its dependencies update, for security reasons if nothing else, and sometimes bigger shifts are called for. To a certain extent, there's no getting away from that - but if it were possible to have the library function that you actually care about, be executable in its own right without an application wrapping it, there can suddenly be whole levels of redundant tooling that are at least simplified, if not eliminated.
And of course there will probably be several other layers of tooling that crop up around this new concept; I won't lie. Ultimately, we create tools because we have specific needs, and as long as the tools fulfill specific rather than general needs, they will diversify. But many of those pain points that we are removing are redundant ones. Every application needs to transform text input into data streams. Applications that read from and write to data files have to pick or create libraries to that work with the specific protocols and data streams. And if you're creating a workflow that involves data files being processed, you need individual applications whose only role is to be a part of that workflow, applications that parse files and put them in a form other applications can take as input.
Many of these niche applications, that only exist to take one data format and make it available, or take text input and render it back into a data file, are at once important to someone's workflow and possibly security, but also are tools nobody wants to maintain. Nobody wants to need to maintain them. It's important that someone creates a tool that parlays between data files and workflows, but it's a thankless job, and a product nobody wants to pay for.
In short, a shell that understands data objects and types, and can use that to call a function built into an application or library directly, instead of needing to build one application after another to handle the workflow… that would solve a lot of problems.
The steely-eyed veterans in the audience are saying, “Well, we have exactly that with interpreted languages like Python. In Python, you can open a library and call a specific function out of it. If you have a python file that is meant to be an executable, you can also load that as though it were a library, and call functions out of it, assuming it's designed right.” And this is, of course correct. (For the record, I had these thoughts before I was introduced to Python. Probably. Around that time, at the latest. Again, my notes are not great.)
The difference is having that be an assumption built into the entire operating system, into the fundamental programming method, so that the entire system is built on one foundation instead of a shifting mix of dependencies from various legacies. That is, of course, making a virtue of a vice: it means starting over and doing everything from scratch, which is... painful. And these initial thoughts of mine were missing critical parts of the picture, which is why for many years I set them aside as irrelevant and a waste of time. At the time, all I was thinking was that, doggone it, we shouldn't have to reinvent the wheel over and over.
And in truth… there are still today whole categories of errors that only exist because library functions are sealed away. For example, languages like Python, PHP, Go, and many others can't easily use functions buried in libraries for other languages like C; people have to write new versions of old libraries, and each needs to be maintained separately, by different people, using different language constructs and different dependencies. Ultimately, it becomes such a different problem to maintain each of them that even if one person was willing to maintain every copy of the same algorithm, it would become untenable for some fraction of them.
All this despite the fact that ultimately, all you want is to pass parameters in and get results out. If there were no worry about dependencies or languages, if all you had to do was call the function and capture the return, it would be simple to do from any language. Of course, from a certain point of view… that's always been all you have to do, if your language is willing to parse library headers or code from other languages. But the library formats in use, Linux's .so and Windows' .dll, neither comes with a human- or machine-readable directory of functions meant to inform programmers of how to use them. If you want to know how to use them, you need specific files intended for developers to use. That means if you have an entirely different programming language and they want to use existing library functions, they need to read those files, …despite them being in another language.
In short, there's a reason why we are where we are.
When I talk about Project MAD as a change in the entirety of computing, it can sound like I'm vastly overestimating my own cleverness - and well, maybe. But these are real problems. My first concept almost sounds promising on its own: just call library functions. But that concept does nothing about compatibility and discovery problems. It hardly matters whether or not the code exists on your system if you don't know that it's there, where it is, what it does, how to call it, what side effects to expect, and what the return values and errors mean - and that assumes that errors get propagated correctly. And the details need to be exact; one bit out of place changes everything.
Just like there's a reason why we got here, there's also a litany of reasons why we haven't leaped forward. Even if we all agreed on one bright idea that we would work together on, it would be a massive effort with a lot of problems that need to be solved. And if the bright idea that we all agreed on happens to have massive holes inherent to it, because it's not as bright an idea as we thought… that's a lot of work for nothing.
That's to say, I do understand, have always understood, and will always understand, that none of this is simple, that I can be wrong, and that ultimately, even if my idea was core to anything, it would be the people who actually achieve it that deserve the credit, because it will be very difficult.
But also…
But also, various subsystems that are a part of the MAD concept (here, specifically, the Modular OS and Agentic Distributed Application model) are there specifically to solve these problems. I've described the system directory in this post on the Unified System API, but not all parts of the solution came around at the same time. Part of it, as I've just described in this post, was there from near the beginning. Many years later (five to ten, maybe) I started thinking about the directory, and specifically, having types understood system-wide and having them be a central part of the system directory. Before this, all I really had was “Call a function inside a library, so we don't have to reinvent wheels.” But it didn't make sense, quite.
I said in my first post that the ADA came about in the process of writing blog posts about the MOS/DCA concept. I was trying to explain, and things still felt incomplete. I had deduced that functions and hardware drivers needed to be sorted by function, which fed into the type hierarchy in the system directory, and I was also thinking about embedded libraries and applications, and how that made application deployment and dependency management neater. I was writing about wrapping functions with scripts so that some logic was happening on remote nodes, but that wasn't working for me.
The reason why I like the ADA goes beyond the model itself. The model slotted well into the weirdly shaped hole that the whole “software model” side of the project had. Yes, you could call functions directly instead of only being able to run applications - but why? What made that so compelling, and not merely one person ranting at the sky? Sure, it would be nice to have a directory of all the functions and types within a system, but what made that so critical to the operation of an OS that it has to be a fundamental component?
But within the ADA, you may be calling functions within your own application agents and within others'. You will be deploying agents to hardware modules specifically because there is software on that module that is required by that Agent. And it goes beyond functions and code (or, well, it doesn't, but it helps to think of this way): the ADA is also about exposing data so that applications can be monitored and debugged, but that requires types to be in some sense objective, even if that only means a type connected to the application and not a system-wide one.
Equally, within ADA, an Agent exists to handle the tricky problems of accountability. There is an entirely reasonable question when you try to, for example, perform GUI API calls remotely. The memory being used by your GUI application can't leave the display controller, or at the very least, there will be a copy of the output in the display controller, which must be managed by the GUI stack itself--while being owned by an external party. Without an agent such as the ADA's, there are massive, massive questions to be answered about how this is not an absolutely terrible idea. But if you can't do things like call GUI functions remotely, then... what even is the plan?
But perhaps most importantly, in the ADA, deploying Agents and routing messages is a system function, one attached to the ADA servers and not one implemented by applications themselves. That means that knowledge of functions, data, and types needs to be standardized at a system level. If the ADA model were implemented separately by each application, then each application would need to separately track and manage type and API version compatibility. Implementing this as a system function makes things not only possible, but easy for programmers to handle, by doing the hard work ahead of time, in the operating system.
I like to, perhaps arrogantly, think that I am touching on fundamental truths with some of these analyses. It's why I keep hammering, for example, on don't force every programmer to reinvent the wheel. It is hubris to say I alone know best (I've demurred enough that you know I'm not that hubristic), but it's still satisfying to line up a whole bunch of problems in a row, pointing at them, and say, Example one, example two, example three. Tying things up with a bow on top is satisfying, and the point of feeling genuinely excited about the ADA and Project MAD in general is that a stack of problems are placed in a box and wrapped neatly up.
It also feels good because this isn't all the work of a day, month, or year. While this talk of modifying programming models may not have happened in the first year of the project (the first part was mostly the DCA, honestly), it's at least 15 years in the making. Tricky problems that have tickled the back of my mind for years are coming together. These are problems that weren't just tricky in practice. They were tricky conceptually; it wasn't clear that there even was an answer. It wasn't hard to imagine that I was wasting my time, because there was no clear vision of the end.
There's still a lot rough, and there's still lots of room for me to be wrong. But it's astonishing how the last few pieces have made the puzzle come together. At least from where I stand, it looks promising, like there's really an answer. Not an easy one, but one that leaves us all much better off.
And even if I'm wrong, perhaps getting more eyes on the collection of problems I've been working through will help.
No comments:
Post a Comment