I think that one of the reasons why it has been difficult to enunciate exactly what it is I want project MAD to accomplish, is because to an alarming extent, I want to do things that we are not doing in a modern machine, but extended to encompass all of several machines at once. Thus to my chagrin, I end up effectively making a list of demands that might be considered absurd if they were to happen in a contained, well-understood environment, and then adding “…to everything, everywhere” at the end of the list. And to be clear, Computer Science as a discipline is well versed in problems that do not scale well. Proposing something difficult and then demanding that it scale is, itself, extremely arrogant.
The only reason why it doesn't approach being farcical is that I am proposing each sub-project with methods already in mind, if not in hand. Ultimately, when it comes to actually implementing the ideas, new problems will crop up that we have never experienced before. But to better understand why those projects are worth the problems they cause, it's worth discussing some of them in a smaller, more general context.
What is a Device and What Do They Do?
I talk a great deal about how Project MAD is designed to let you access devices or resources or capabilities of multiple pieces of hardware. This is a strange topic for an application programmer; if you haven't already delved into microcontrollers and peripheral interfaces, you may wonder why anyone would care about all the devices attached to a system anyway. When all you are doing is building some random bit of application logic and wrapping a user interface around it, using fixed and well-defined OS concepts or third party libraries, well, if you end up needing to talk about drivers for literally any device then someone has done their job wrong. And indeed, if I wanted the average, user-facing application developer's life to change, I would probably be making their life worse, not better. Part of the design of MAD/ADA is that, in fact, they don't need to worry about devices.
But also, devices will absolutely be used to control how their application works and how it is distributed across the system. That's all fine, assuming they don't have to worry about it - but of course, eventually, a professional developer will run into every bug that comes within a mile of their code, let alone the ones actually inside it. So if they do, technically, need to worry about devices, it's worth asking: what are they, why do we care, and what do the changes I'm making actually do for us?
Obviously, at its simplest, a “device” either changes the world outside the computer, or produces data based on the world outside the computer, or else it makes some data-to-data transformation task easier. Those are, basically, the only things a device could even possibly provide: output, input, or services. If you want one of those three things, you are going to be touching a device, even if that's only with an OS-standard ten foot pole mounted to the nearest wall.
An application developer's life is usually pretty simple: if we need input, we take it, if we need to output, then we output, if we need some service, then we talk to it. Usually, someone else simply handles these things, in fairly standard ways.
The trouble starts whenever the OS itself either doesn't care about a device, or makes faulty assumptions about how you want or need to use it. Say you have a TV across the room and you want to cast video to it. Modern OSes tend to only do two things with monitors: extend the desktop onto it, or copy some other monitor's output onto it, and neither of those may serve your needs well. That's why in 2013 Google put together a tiny little dongle with a surprisingly important feature: you could just send video to it. It took input and produced output, entirely separate from the normal desktop (or phone) metaphor that we had become familiar with. And while I won't say that has or hasn't changed the world, I for one am grateful for such a device.
You can understand devices like that with a simple, generic model. They have some requirements (the Chromecast needs internet and video out), have some capabilities (it will display internet video streams on the TV), and have some way to control it (a remote, or network packets). When I talk about devices, I pretty much mean anything that fits that, extremely broad definition of input, output, and control.
So it may surprise people who don't look at how OSes work that an average programmer probably doesn't know how to get a list of all devices present on a given machine, can't tell what a random device is capable of, and can't figure out how to control that device.
On all major operating systems today, the device model can generally be broken up into two categories: if the OS itself uses it, then the OS takes care of it and you live with the OS model for better and for worse. Otherwise, it may be technically available but you'd better find a library or application that knows what it is and how to use it, because they ain't touchin' it.
And that's… that's kind of it. There isn't a whole lot of nuance here. Either the OS takes care of it, or you're on your own. There are some cases that are better or worse, but broadly… yeah.
Why? Well, that's not hard to understand. Someone has to invent a way to deal with it. If that someone works for Microsoft, then Windows can handle it; if they work for Apple, then Macs can handle it; if they enjoy open source projects, Linux can probably handle it. If they are part of some other company, such as the manufacturer of some device, they will provide an application and/or library that works and consider their job done.
Unless the device is really, mind-bogglingly important (and/or you convince people that it is), nobody is going to reshape how the OS works in order to help your device fit into the ecosystem better. And… it has been that way since the dawn of computing. Generally speaking, everyone involved would rather make small changes than big ones, not least because if you make big changes, consumers may not like it and you may be out of business in anywhere from two months to five years.
Something like MAD's OS project doesn't necessarily need to reinvent the wheel so hard that it changes the way the world itself turns. I may end up sounding like that's what I'm doing, because I want to paint a picture of what MAD is capable of, but you could just… add some of MAD's features to existing OSes, or as a third party structure on top of them. To wit, devices.
But first: IPC.
What is IPC/RPC and Why Should I Care?
Inter-Process Communication and Remote Procedure Calls are both fancy ways of saying that two programs talk to one another. Generally, IPC is happening among two programs on the same computer, and RPC is the same but going between machines; there are some differences there, including important ones, but let's set that aside for the moment and just talk about how computers work in general.
I said in the last segment that you can divide device support into “The OS Does It” and “You're on your own”. Categorically, the same is true with talking to other programs. If the OS does it (and by it, I mean, making some given program do something specific for you), then there is a function, one that probably works very similar to every other OS function that you have had to use while learning to program, which talks to that system program and then gets back to you. If the OS doesn't care about that specific thing, at best you will find a library or programming language that makes the task of talking to some specific programs just as easy. (There is an aside here for scripting, but that requires a whole ecosystem to set up and so, like OSes themselves, can be difficult to make pivot if a change is necessary)
However, it is at least as likely, if not more, that you will have to do a whole lot of research into how that specific program works and how developing IPC/RPC mechanisms work in general and then you do a whole lot of work to make the mechanism talk to the specific program, and oftentimes, talk to a specific version of that specific program, because that might change. And this is why we have kind of standardized on IP and web services nowadays - everywhere you see them pop up, there are a whole lot of tools that almost everyone involved did not need to invent. Reinvent, sometimes, but not from scratch.
IPC/RPC mechanisms, however, do still need to be made. If you have a program, say, one that takes your company's new fancy keyboard with per-key LED lights, and exposes the light-output capabilities of that keyboard to the user (letting them make all the colors their own brand of pretty), you will probably not design a server into that program which lets other programs talk with it. And why would you? That sounds like a lot of work!
But look at it from another app developer's standpoint - if they want to make a key flash when a notification comes in, say, they have three general options: learn how the keyboard driver works (which may be, if I may summarize, very custom), interact with the manufacturer's application (which we just ruled out), or interact with some third party library which can do one of the other two options for them. If nobody has or wants to make that third party library, they're also out of luck.
MAD takes a tact that may be very alarming, which is that all programs should by default have some IPC mechanisms - it's part of making them distributable across multiple pieces of hardware, after all. But you could also… you know, just, use that same mechanism to expose some functionality. That exposure doesn't need to be reckless; can maintain some controls, make sure the user is involved, something like that. But if we decided that IPC/RPC should be easy, maybe we should expect you to interact with other applications more often.
And the point of deciding that it should be easy, is that having made that proclamation, there ought to be a very standard way to do RPC, such that it is just as easy to work with as when the system takes over handling that task for you, because you're talking to a system component and they know how to do that. If we want programmers to take it at all seriously, you have to make it easier for programs to interact fairly regularly.
And if you're going to interact more often, it might be best to have standards for communication. And if you wanted, perhaps, to make that OS/DIY dichotomy even less severe… then OS IPC should work in the same way general application IPC works, adhering to the same standards. That way, people who are forced to learn OS methods when they start programming, don't need to learn a whole new separate thing when they start dealing with new and more interesting tasks.
Which brings us back to devices, or rather, device drivers.
Making Devices Accessible: The System API Directory
Again, this is all stuff I've talked about before, but let's dial it back. You have devices on your machine. How do you interact with them?
Well, generally, every device you'd like to talk to exists on the far end of some kind of connector, and what is exposed at the very bottom of the CPU/Kernel system is the near end of that connector, not the device itself. Many of these connectors require a little bit of finesse to use properly, so there may be a relatively complex driver just to ensure that any message you send across that bus gets where it's going, meaning you will be using a slightly higher level API that accounts for the bus itself. Only once you are in contact with that other device can you talk about, you know, talking to the device itself. Now, modern CPUs aren't just big-banging busses (meaning the CPU itself isn't hot-looping to make sure that every bit gets sent at the correct time, or anything similar); they use support chips, microcode, and other conveniences that might be relatively specific to the processor/motherboard combo you are using, but there are standards and firmware to sort of even out the complexities of making that part of the system work.
Once you can talk across a bus to a device, you have to know what language it speaks. If you know exactly what device you're talking to, this shouldn't be hard - the manufacturer should list all the commands, what they do, what the return data means, and so on. A few manufacturers prefer to keep some secrets… but let's not get into that. Some things that you will be interacting with on the other side of a bus are themselves programmable, and you may be sending code to them (which is a whole other thing that is only partly covered by MAD/ADA), but frequently, you are just issuing commands that make it do whatever it is you want to do, and listening to the bus for both expected replies and updates caused by an event.
The software which converts requests from a programmer to commands that arrive at the device, and responses from the device to normal software events and function returns, can be understood as drivers. This is where the OS/DIY dichotomy comes into play: the OS will keep track of devices it cares about, and ensure that there are very standard ways to interact with drivers it cares about. Devices that the OS doesn't care about are not well kept track of, and the ways to interact with them are not terribly standard - at least, not outside of the specifications of whatever bus they are on the other side of.
The MAD/SAD proposes to tackle these two problems at the same time. It asks device drivers to expose a standard API and an implementation library; programs that utilize the API can link to that library and they will be interacting with the device itself. Then, all device drivers are listed in a filesystem directory, indexed by a key representing the API, and linking to that library. If more than one device implements an API, even if they use the same library to do so, the implementations are listed separately and configured differently, so when your app links to a given device's library, you will be talking to that specific device when you make API calls to it.
That on its own represents a massive difference to how libraries and devices exist in any modern operating system. Today, libraries are just things that may or may not exist, and devices are just things that may or may not exist. You can't just link to a library and be talking to a device - you link to a library and then it will let you ask the system if any devices exist, because the library itself isn't configured, but is instead generic. Likewise, you can't just point to a device and get a library that interacts with it.
When I talk about “You can just add shit from MAD”, I'm talking about shit like this.
But I go a little further with MAD/SAD. I mentioned the libraries are indexed by an API key (in the filesystem, but you can ignore that for the moment), and if you've read any of my articles, you know that the API keys are in a tree, corresponding to an inheritance model. And since I'm trying to explain all this simply, let's touch on that for a moment.
The API itself promises that the library attached to it will have certain functions; if you call those functions, you will get certain results, and the API makes all that very clear. That doesn't mean that those are the only functions in the library - it's only a promise that if you go looking for certain ones, you will find them. So naturally, a library may implement more than one API, each time exposing different functions.
Likewise, an API may provide different levels of what amounts to the same API. It may provide a dirt-simple one, and a slightly more complex one, and then a far more complex one. Generally, the simplest one gives you the fewest ways to customize it, but it also means you need to know the least about what kind of specific device you're using; the most complex one requires you to know exactly what the device is, but may essentially give you direct access to everything the device is capable of. Since you choose the API when you write the program, you get to decide how specific the API you're targeting is.
Thus, in the MAD/SAD API tree, the folders closest to the bottom of the tree contain very simple APIs that require you to know very little about the devices, and which apply to far more devices, but which also expose very few features - and then there are more complex APIs that require you to know what you're asking for, but give you more control, until eventually you get to a library which may provide you direct access to the device with no hand holding whatsoever.
But in an inheritance model, you don't necessarily hide the API that came before - the parent and child APIs coexist, and the child API is obliged to expose every function the parent does.
Suppose you have a device that acts as a source of character data - that may be, for example, a keyboard, but it also describes files, network streams, and some busses like serial ports. In any of those cases, the API will provide you with a generic “read” function that provides you with a “firehose” of data sent directly from the device, and probably also a generic “write” function to shout character data back down the pipe to be heard at the other end. Questions like “What does the data mean?” and “What should I say to this device?” are not answered at all by this level of API. If you use that level of API, you will not know what device is on the other end, or if there even is a device at the other end, not when you are writing the program itself - the user may know, but that's not your business. Maybe the ‘device’ is not even hardware; maybe it's software, or maybe it's a black hole that says nothing and eats all input (aka /dev/null).
That same low-level API can be more useful to a programmer that knows what the device is, but when you know what device it is, you can usually also provide better service than just “read” and “write” functions. Not always - some things really are that dumb and simple - but usually, you can provide an extension to that character device which provides new functionality. Under MAD/SAD, that extended API will only be presented if the device promises that the library, and the device, implement those functions. It will also still be listed among the low-level, raw character device drivers, because the extension adds to the lower level, instead of replacing it.
Suppose the character device in question your fancy backlit keyboard, for example; it may have three or four levels of API just in the ‘character device’ tree, to say nothing of (eg) a ‘raw usb device’ tree. First, your keyboard is a generic character device; second, it is a keyboard (the output should be understood as keys, a keymap may be involved, and you can send commands to turn on capslock and so on); third, it may implement some standard “backlit keyboard” API, if there is one, and fourth, there may be a vendor-specific “This is our backlit keyboard” API that has some juicy extras the generic API does not. This last API extends the other three; the third API extends the first two; the Keyboard API extends the character one. But if you really want, you can just open the keyboard as a character device and listen to every keystroke that comes in down the line, and shout random character strings down the line just to see what happens.
Of course, the OS will recognize keyboards as something special, because the OS knows that most apps will care about keyboard input in one way or another - so it will make use of that second level of API (char/keyboard) automatically. In fact it will probably create a virtual keyboard device that collects input from all keyboards (which should be a standard itself) and uses that instead of any specific device, because then the user doesn't need to change how anything is configured when they plug a USB keyboard into their laptop. But the real point is, we have made the generic way to reach devices good enough that the OS can just use this tree to do what the system normally does, instead of making very unique and specific ways for the system to make critically important devices accessible, ways that kind of clash with how we program other applications.
Sounds nice, right? Maybe a little boring. But the way I introduced this topic was about APIs and libraries - it has nothing specific about devices in it. Any persistent application that wants to provide an API and library can do so, even if it's not a driver or even a background service - and that application's APIs will be listed in the directory, same as any other. So… if you install a new database service, say, you will be able to find it in the list of all databases on the system, even if that database is a foreground application for some reason.
Now, here's a funny little twist (which I've already talked about in another post): when you use IP and ports to connect to running applications and services, you normally locate services by a standard port number, because that's how it was when the internet was created. Since more than one service can't share that port, other copies of the same have to have other ports; there isn't a standard for how to do that, so the user or service has to figure something else out. You may be able to examine the other server processes and find out what ports they have open, or examine all opened ports to see if they sound like that service, but neither of those is a smooth, well-defined operation. Under MAD/SAD however, if you know what API the service provides, you have a list of all of them that are running, and each item in that list is preconfigured to let you talk to it.
Moreover, you don't necessarily need to publish to the SAD globally. Remember, I said the API keys are published in the filesystem; you might publish such a configured library file privately, if you had, say, an embedded database in a larger application, and you only wanted the parent application to talk to it. Linking to that preconfigured library would connect you to the database; it need not be exposed to the network, or configured to deal with the network. Presumably, an administrator or user could find it if they knew where to look, but that's about all the exposure it would have
The same mechanism that helps you find and utilize devices also helps you utilize things that are not listed? Wow, it's almost as though it's just a better way to utilize things. But what exactly is this mechanism? I took it for granted that you can dynamically link to a library and it will be preconfigured to talk with a specific device. That is… also not how things work.
Perhaps more to the point: Just before this I was talking about IPC/RPC and I never got back to it. Why did I use that segment to introduce this one? We already have dynamically linked libraries, even if they aren't preconfigured. Making preconfigured libraries would be a fairly straightforward thing that has nothing to do with IPC/RPC mechanisms.
Well… the point with IPCs was that MAD argues every running application should expose IPC endpoints as default under MAD, meaning that if you write a service or application, you can interact with that application via another application directly. Of course, if you want to interact with something via some IPC/RPC mechanism… you really should have standards. Like an API.
What's the difference between calling a function embedded in a linkable library, and calling a function via an IPC mechanism directly? Well… the linking bit, I suppose. And it matters exactly what application and user is “running” the code, as far as the system is concerned. But if the library mostly exists to provide a translation layer, between the published API and IPC endpoints exposed by a service… what if there just wasn't a need for a translation layer? Or rather, what if the “translation layer” was a bog-standard system call?
What if your client application, the one that wants to use the API, can't tell the difference between linking to a library and directly talking via IPC with a service? What if the programmer involved need never care? What if they simply… use the API? What if they do so by using a system call that does whatever is needed to reach the target function?
Bringing It All Together (By Taking It Back Apart)
Now… Does this all sound like hyperbole? Does it sound like I think this is all easy? Because this isn't all easy. Let's look back at a list of things described in this chain of logic that are new:
- Have all applications use detailed IPC hooks by default
- System call that translates API call into IPC to a running service
- Configuring linked libraries at runtime so that they point to a specific device or service, and using such libraries
- System call that translates API call to loading and running a configured, dynamically linked library
- The two aforementioned system calls are the same system call
- Listing APIs in a systemwide tree
- That systemwide tree is in the filesystem
- That tree is an interface-inheritance tree (or some other semantic structure)
- Allowing the API system call to be directed towards libraries that are not in the system tree
- The system uses the API system call for basic functions (where feasible)
Now, we could quibble, and maybe I've missed some small points, but if I have an idea that requires making ten changes to the way operating systems work, some of them pretty freaking fundamental, then you certainly can't take all of those changes and tack on to the end,
“Now do this for a dozen systems linked together into a gestalt system.”
Yeah, that sounds a little crazy, huh? About that.
Anything, Anyplace, All at Once
Funny thing about those changes. One of the key themes is that it lets you address and control devices and services as though they were files. (In the case of applications, though I've not talked about it here, there's also a big portability thing so that you can run them from anywhere, but same basic idea - an application being accessible means it being runnable when all you have is the “file”. See MAD/SAD AF)
The point of them being accessible as files was that it's not hard to have a filesystem that represents multiple machines at once - you basically just make a folder with multiple machines in it, and list out the contents of each machine under them. That's literally all it takes; you can list the entire filesystem, or just what you think is important, or both in different places. Sticking trees on top of trees… is just how trees work.
The question becomes: what's the best way to make use of that? UNIX already has the general philosophy of “everything is a file”, so in theory, you could make a distributed system where they simply… all use a distributed filesystem listing all files everywhere all at once. But first, modern machines don't actually stick to “everything is a file,” and second, sometimes interacting with things at the file level is very… back and forth. Which is something you may not want to do over a network, where every back and forth has significantly longer latency than a local call. Like… thousands of times longer, maybe. And also, you would need some universal addressing schema for this multi-machine directory so that path strings work the same everywhere and are unique.
But anyway, so you want to do some things locally, to avoid thrashing the network. That means running a program on the machine that hosts some specific device. How?
We already have some technologies to do this, and the one that most people will be familiar with is the “serverless” cloud infrastructure made famous by Amazon Elastic Compute Cloud (EC2). The point of the serverless model is that a program can be loaded on any machine that has room, and you just keep track of where it actually is loaded, so that whenever someone goes looking for it, you can direct them. If that program is part of a larger structure, have it keep track of where other pieces of the same program are, in case they need to talk.
And that's basically the ADA (it's not, but never mind). It's serverless infrastructure designed for an operating system and home network. All you're really doing, is running a program somewhere else, so that when it access a file that is actually a driver or service, it is doing so locally instead of across the network.
And that's it. With that, everything works! Victory fanfare, parades, we all go home. Right? Well, sort of. No, not really.
Clear the Air (Once Again)
You see, part of the problem with the ADA is that if you do it wrong, people will hate to touch it, let alone depend on it. They will avoid it, frankly, because people are lazy. And while you could, maybe probably, have the system still work when programmers avoid it, by doing a bunch of extra legwork across the network, if developers being lazy means that the experience sucks for the user, nobody will want to be a user.
Even if I'm right about how this all should work, if we do a bad job on the implementation, that's it - game over, we lose, things to back to the comfortable old ways with minimal changes. That's why I did a lot of extra thinking about how certain things should work, most of which sounds crazy out of context, because I'm planning for a future where OSes work very differently than they do today, and then on top of that, computers are combined in some way that developers don't yet understand. So if I were, today, to ask a developer to operate under MAD rules, or to just assume that my OS will work the way I say it does… they would be totally justified in telling me to piss off.
They'd have to be mad to believe in MAD, before it's proven. And that's kind of the point, at least of this post. Some of the things that make the MAD system work, might be implemented on top of a normal operating system. The API inheritance tree doesn't have to be the only way drivers work; it doesn't even have to be the canonical system way that drivers work. But if it's there in a form that people can use it, then other things I've said might start to slot into place.
Likewise, you can have an IPC mechanism that application use, which is separate from the normal system IPC mechanism - it would just have to use some higher-level mechanism, like an IP:Port interface, with extra code wrapped around it. This IPC mechanism could let applications and services publish APIs into the same tree, and you could talk to them the same way you talk to drivers.
That thing about using a system call to reach the tree… well, if it's not part of the system, you'd simply have to use a standard server on each machine. Aside from that (and it would be inconvenient), things could work about the same, with it farming API requests off to the appropriate source.
And the distributed, agent-oriented MAD/ADA programming model? Part of its genius is that you don't write drivers and services to listen on the network - they all expect to be contacted by a process on the same machine (IPC, not RPC), which is the same way they would be used if the ADA doesn't exist in the first place (well, mostly - there's user authentication, security and so on that needs to be addressed, but forget that for now). Meaning you could design and run such services today, if the rest of the ecosystem existed.
Is all of this still a lot? Hell yes! But for all of that, it's still only half-mad, because it's less than half of MAD. Still feasible, still important, still better. But easier to understand, less intrusive, and I presume, vastly less intimidating.
Well, we'll have to see.