So I've said for a bit now that under the ADA distributed applications model and the Modular OS paradigm, applications post a bunch of APIs to a central directory managed by ADA servers and by the OS. Naturally it's important to talk about what that means. And while I will try to sound like this is all well-thought-out by me, this remains one of the many aspects of the project where I will cheerfully bow to people who know more about API design - and the many other related, difficult tasks involved - than I do.
To be clear: This discussion is categorized under the Modular OS portion of the project, and to a certain extent, is independent of the ADA model. That means that the ADA can exist without this, and this can exist without the ADA. But if you want to understand the whole project as it exists in my brain, it may be important to talk about these frustrating implementation details.
Types and MOS Standard Interfaces
Before I talk too much about the directory, I'd like to get a little bit of groundwork out of the way, and that is to ask: assuming you have a massive distributed system whose capabilities are listed in a directory, what sort of things does the directory itself contain? Perhaps the easiest thing to imagine is that the directory lists nothing more than addresses where you can find system functions--it is a massive list of strings matched to addresses. If you want to invoke a particular method, in other words, you only need to send that request to a certain address, along with all necessary parameters, all formatted appropriately.
This gets complicated when you want to do more than simple projects, for a number of reasons.
One very important topic, for example, is type safety: sooner or later, you are going to be returned (or sent, as a parameter) a data object from a remote procedure call, and that mere fact is more of a problem than it may seem. Attaching methods to types, for instance, is something that is done when you create your program, not when you run it; object methods are a part of your program's code, not a part of the object per se. Nor would you necessarily want every object that is passed back and forth between application Agents to carry its own independent copy of the object's methods around, even if that would be the most portable option; that would be data-intensive and prone to security problems, as each data object at rest or in transit would be another chance for malicious actors to alter your code and inject some nefarious payload.
But at the same time, the idea that object methods are compiled into your programs creates significant issues in a distributed system. It becomes certain that eventually, due to a version difference, old and new code will view the same data stream differently, and running old methods on new objects will create an incorrect result. Sure, this won't happen within a single ADA application, assuming all affected Agents are recompiled every time the master program's object definitions are changed. But whenever and wherever you are depending on third party applications and libraries, there is the possibility that a breaking change in the definition of an object means that new and old code won't interpret the same data object the same way.
In order to address this issue, it has been my belief for some years that the system directory will also contain a list of canonical types, and if an Application fetches a canonical type from this list, they fetch the up-to-date code for that type's methods, to be used in place of compiled-in object methods. That means that if two applications developed years apart both reference the same canonical type, they will use the same method code to interface with the data object per se.
Of course, that's not a panacea. There are frequently changes to APIs that change how methods work, their assumptions and guarantees; in that kind of case, the old code would expect new methods to provide old-style results instead of what they actually will produce. Left alone, this continues to be a case where old and new code collide and produce erroneous results. And that, of course, is why I never planned to leave things there.
In fact, there are a few different problems with having exactly one list of canonical types. One is the versioning problem above, but there is also a vendor problem. Suppose that you have two competing vendors each providing a backend that, for instance, implements a GUI. In order for the GUI to work in general, both of these vendors must implement a shared, standard API, but it's also reasonable to expect that each of these vendors will have advanced features that require another, more expressive API. When you receive a data object representing these vendor-specific APIs, they may contain more information than the basic standard, and as such, you will need to use the methods from that vendor to ensure that the extra data is properly handled, even if you are only using methods and functions from the standard-compliant API. And, naturally, these competing vendors' API extensions are unlikely to be compatible with each other.
This kind of problem is already addressed in some advanced programming languages, such as Python or C#. Because certain type information is embedded in the objects, if one type inherits an interface, you can treat the object in code as an “object implementing this interface”, no matter what its actual type is, and that solves our problem above. However, to my (amateur) knowledge, these languages do not attempt to provide a directory sorted by type, and therefore, there is a problem here that they largely leave unsolved, one which the MOS project must address to accomplish my goals.
Specifically, the modular OS requires that Applications are able to specify an interface API and receive from the OS or ADA server, a list of providers of that API. Suppose for example those two competing GUI vendors exist on the same system, and suppose that the application being run isn't targeting either vendor's specific API, but rather, targets the core OS standard. When the application is first run on this system, the application (and by extension the user running it) needs to be faced with a question: which API provider do you wish to use to implement your GUI? And in order to be faced with this question, the listing of providers for the GUI should include both vendors, even if they are each providing vendor-specific implementations of the underlying API interface. When you talk about listing canonical object types, then, these two competing vendors both implement the canonical GUI API interface, in addition to providing a specific vendor-specific version of that interface, and so both must be listed when you ask for a basic GUI API object.
If this sounds trite, it's probably because you haven't yet understood that I am an idealist, and arguably a perfectionist. It may sound trivial to list various vendors, and various versions, of API interface providers, as child nodes under the core standard, within the directory. This would let you not only use competing vendors to implement the same interface, but would let old programs continue to use old interface versions (in case something has changed) while newer programs can use the modern interface.
But wait, that bit about versions contradicts what I said earlier, doesn't it? You want new and old programs alike using the same codebase, not different ones, don't you?
There is a distinction here: I said that old interface versions, not old code, would be listed in the directory. That means, for instance, that when a library is forced to change how a function works, in order to fix a security vulnerability, it is expected that they will provide shims that translate calls aimed at the older API to calls compatible with how their code now works. These shims use the assumptions and requirements of the older API version, but may error out if the program does something unsafe. This is still not a true cure-all, because sometimes the unsafe behavior is actually part of program logic and can't be removed, but it significantly closes the gap, and allows API interfaces to evolve in form and function without necessarily breaking old code.
When I say that I am an idealist, however, that extends beyond ensuring that this kind of capability exists in the system directory, into describing its form.
Directory Nodes and the Virtue of Canonical Shorthand
All of this brings us back to the question: what is actually stored in the system directory? What do you actually get when you query the directory and get information back about a particular node? At the very least, there needs to be some kind of type data there. If you were to fetch a GUI API interface, for example, you would need to know what methods to expect on that object, what types are expected for parameters and return values of those methods, and what data may be contained within the object itself, along with any type data for that data if it is accessible.
But also, the node in the system directory has other specific data. If the node represents an API type, for example, you may want a listing of versions and vendors provided for that type. If the node represents a file on disk, you may want metadata about that file rather than its contents. In other words, the directory node is its own object with its own data, and is simply another standard object type. And because the directory itself is a hierarchical tree, one of the functions of this directory node object will be confirming that what comes after it in a request string, is valid. If you take an object that has no children and attempt to get a child object from it, that is invalid, and equally, if you take a node that has no alternate versions and attempt to get a different version of it, it is invalid.
This is required to be a function of the node, and not a centralized system function, because the system itself is distributed. You may not know until the first time you query the directory, whether your specific combination of Application ID and running user ID is permitted to perform a specific operation. If you query an application and determine that are allowed to, for instance, list all the files stored by within, you may be directed to another piece of physical hardware in the system that actually contains those files, and the provider there may not allow you to actually read any of those files. Likewise, you will need and may be permitted to read a listing of methods available on that node object, but you may not be allowed to call any such method, or you may be allowed a list of child nodes under that directory node, but not be permitted to use any of them.
This task of handling directory permissions on a cascading basis is a distressingly un-optimized solution, and there may be no optimized solution, but you can see how the distributed nature all but requires it, because each independent Application server may have different access controls.
But one of the sub-topics here that I find interesting is the use of shorthand to minimize string pattern matching, especially when dealing with standard queries. For example, if the list of API versions is always stored under the sub-directory “VERSIONS” within the directory, every time you need to request that list, you must check that an eight-character string exactly matches (or worse, loosely matches) the canonical name. It only gets worse if you try to make that mandatory identifier string something that no Application programmer would never use themselves; as an example, Python likes to have two underscores at the beginning and end of certain reserved words, which would take the eight character string VERSIONS and produce a twelve-character string, __VERSIONS__, which must be exactly matched on every check.
Given that something like the Versions subdirectory is actually a mandatory, system-designated resource, and given that latency is a critical problem in distributed systems, it seems reasonable to use shorthands to minimize characters in a directory path string, and this is a topic that I actively enjoy theorizing about.
For instance, I've mandated in this post already that type information be provided very frequently. It is attached to data, to parameters, as part of directory nodes, and it is also listed in the directory itself, meaning that a canonical type string is an ADA-compatible URI string. Assuming that the system directory starts from a common base, you would generally expect that all type strings begin with “/API/” or similar. And while five characters is not an offensive quantity, it's a lot to ignore repeatedly when reading a list of types. But this goes beyond API calls; for instance, it will be very common for Applications to reference a URI that is relative to their own root within the directory, or relative to their user's directory root. If all these calls needed, for example, “/Applications/Example/” or “/Users/GenghisKhan/” prepended to the URI, this would need to be parsed and re-parsed constantly, each time ensuring that the strings exactly matched.
Instead of starting every ADA URI from the root of the system directory, it is my expectation that the first character in the URI determines which of several possible directory roots the request starts from. I'll use placeholders here, but for example, “%GUI” may be short for “/API/GUI”, or “@Documents” may start with the currently running user's root, accessing the subfolder Documents. “$Cache” might be the cache directory for the currently running application, while of course, “/” remains the symbol indicating that you start at the full system root.
Likewise, once you have a directory node and are trying to perform an operation on it, it only makes sense to use shorthand for some functions, not least because it could prevent namespace collisions. For example, you may want to use an internal method name that is in some contexts a reserved word, like Versions or API. If you wanted to call the method “API” on your application, that string may look like “$.API()”, where if instead you wanted to get a listing of the APIs exported by your application, that request string may look like “$API/” or “$%”. Because the “.” operator would resolve to a method name (whose canonical name might be, for instance, “$Methods/API” or similar), there is no ambiguity between the strings. And if you are trying to find specific version of an API, perhaps that will be found under "%GUI:4".
There may, of course, be a problem with readability depending on how you implement this, and I acknowledge this is a limitation, especially when dealing with the basic ascii characters present on your average keyboard. Too many basic symbols look like each other or like letters; $ and S, the various bars like \|/, the various dots like .:;, and so on. Presumably if it became common to use specific unicode or advanced characters for path operators, you would start finding them on keyboards, but one way or another, I'm sure that it's a solvable problem, if not one for which I have a genius answer prepared.
The point of these symbols is truly only to act as a bridge between three requirements: these path strings should be short, human-readable, and unambiguous. The various operating systems that exist today have largely eschewed the idea of advancing the language of path strings, and I think that's a mistake. Using long names for concepts that are unambiguous in a given context is not ideal either for readability nor for fast parsing. If you can represent that by using a single reserved character, that's preferable to matching long strings. Granted, it's only really critical here because of how often we'll be performing these parsing operations one after the other, but I still think it's important.
Honestly, if the perfectionist in me had my way, I'd also compile some path requests directly into domain-specific function calls, so that this parsing was done at compile time and not runtime, for instance, having versions of the ADA path search or call that are specific to various types as implied by the operators used on them, but I'm not going to insist on that. Goodness knows, this concept is complex enough as it is!
Shims, Type Conversions, and CODECs
One of the interesting consequences to everything I've said above is that the filesystem coexists with type information and objects containing methods. It is therefore plausible that when you can be assured a file on disk matches a system type, that file on disk can have methods as though it were a data block retrieved from a program. Which is convenient, because in order to retrieve the data from that file, you need a program which will return it as a data block.
In other words, it is to be assumed here that the system directory will allow you to make the border between data-at-rest and application data very fuzzy. If you wish to store data objects at rest on a disk, or you wish to translate a stored data object into a live version, you almost don't need to worry about it at all. You could, for example, have an image file with the same kind of methods that an uncompressed image in memory would have. (It might be difficult to make a clear distinction with syntax alone, between changing the at-rest version of that data, and changing a copy of the data, but I have faith it can be done.)
All of this comes from the public type directory, and specifically, the fact that these public types include the type's method code for distribution to Applications. The way programs currently work under modern operating systems presumes that each application will figure out its own way to translate data formats into live data, which means that for certain standard formats, you can have many applications each using different third-party libraries of various quality all to accomplish the same task. And while you can have multiple providers of the same API under MOS, this is no different from dealing with live objects; the fact that these types are public means that you can have old programs use new code, or keep multiple programs using the same code between them, as long as the providers that you are using provide API shims that the old programs expect.
But there are more uses for API shims than this. I've said that you can use API shims to keep compatibility between new and old versions of the same software, and perhaps you read between the lines and might have realized that competing API vendors can create shims that let them work on each other's data objects, extracting and encoding information that is stored differently by the competitor. If that seems odd, it is; I am talking about Vendor A providing a shim compatible with Vendor B, such that if you are using Vendor A as your back-end and someone requests advanced functions using Vendor B's API, Vendor A can still provide them, while presenting the requesting program an object that claims to be from Vendor B. This only makes sense if these shims are completely compatible - but when the API, its requirements, its assumptions, and its guarantees are all public knowledge, that isn't an impossible task. Difficult and perhaps unlikely, but not impossible.
But perhaps more interesting than that: API shims can be used to convert various objects into various other objects, by allowing one data type to masquerade as another. This happen a fair bit when we are talking about serializing data objects to and from text, for instance, but in our case it's equally reasonable to, for instance, convert a live data format into an at-rest data format, and vice versa. In that case, for example, a provider of an image manipulation API may provide a shim that allows you to treat JPEG images stored on disk as though they were an instance of that live data object. Or, the type for at-rest JPEG objects might include a shim that allows you to encode any live image of the standard type into that form. Or… it's possible that I have those two examples backwards. That sort of thing depends on implementation and finalized design, and to be frank, there's no reason for me to care which way it goes at this stage.
One interesting side topic in API shims I already touched on in my first post, and that is that ADA applications can target an API provided by an embedded library or application (For example, $/libraryX%XLib, that is, embedded Library X within the current application providing an API called XLib), and that embedded library or application might in fact do nothing else except serve as a translator for the API calls to another format. This kind of case would be very useful if you wanted to ensure forwards compatibility; you are guaranteeing that your application has a specific place where you can place your own shims, or equivalently, where hackers and power users in the future can place their own shims in order to provide your application compatibility with libraries and applications that have changed since your last update to the application. This would be most useful when applications are being abandoned; it allows third parties who do not have access to the application code to change how your program interfaces with the system years after maintenance has ceased. If GUI standards have changed, if required libraries have security flaws, or if your preferred library vendor has also quit and now your application needs to target another, it would be trivial in all these cases to add shims.
Obviously, for some more-secure applications and for things like multiplayer games, some things need to specifically not be shimmed or exposed externally, but that can be monitored by checking that the code remains in a known state. And because this choice to add internal shims to your application is done by the application developer themselves, it can provide as much or as little flexibility as the developer wishes.
Reaping the Benefits: Shell Scripts as Programs
So after having talked about what I want and why I think it's necessary, there's also an obvious question to be asked: is there really a significant benefit for all of this extra complexity? And I'd like to think that the question can already be answered “yes” from the above, but there's more to say about the benefits of this advanced system directory.
One thing that I have always assumed, but perhaps not made explicit yet, is that the system directory can be navigated by an interactive text-mode command shell, and the addition of all the type system information to the system directory makes it possible for shell scripts to become nearly as powerful as programs. I suppose I'm talking about something more akin to an interpreted language like Python's REPL loop, but with the added advantage of having data from the filesystem and from external programs just as accessible as data from your own program; in either case, the result is the same.
Most importantly, this capability exists without needing to create specific code libraries and application versions dedicated to command-line tasks. For example, the Linux shell has had multiple people try to make shell- and script-compatible GUI frameworks, but they are all somewhat more awkward than doing the same tasks in a first-party library. Under MOS, in theory, you could use exactly the same GUI functions while scripting as you would while running a compiled application - though, you would need to set up, for instance, Agents to handle your GUI memory.
Now, it's fair to frown and even balk at the idea of interactive shells having Agents, because Agents are explicitly and literally on another hardware machine, and multiple Agents will be running at the same time. But at the same time, most Agents are not actually the core of the application, and are not producing events apropos of nothing; they are awaiting external events and providing functions to the main application. So it makes a certain sense for the user of the Shell to simply be able to context-switch their command interpreter to be currently working from and executing commands on a different hardware node, with all of the other Agents, including what would otherwise be the application core, simply responding to events from wherever your command interpreter currently is. It only really gets complicated if more than one Agent is the source of new events at any given time.
In this way, you could build a complicated and interwoven application line-by-line, by switching to various Agents, defining code on them, then switching away and utilizing the functions you left behind. It would… perhaps be very slow. But it would allow you to explore the capabilities of your system, piece by piece.
Speaking of exploration, one of the nice things about having a directory of types being a feature of your operating system is allowing programmers, power users, and shell scripters (some of which are not properly programmers) to explore the system as it is in fact, not merely trusting to theory. As such, I have always assumed that this interactive system mode comes with ways to translate APIs and type information into interactive text, a feature that is so obvious it barely needs to be said, but is also so exciting for power users and new programmers that it cannot be underestimated.
This “shell plus system API directory” allows you to simply interact with systems and data on your system, and to simply understand how your program's data works and what data you have been returned by various functions. Yes, you can write programs, but there are some tasks and some requests that should be commands, not programs. If you want to, say, read every temperature sensor on your system and output them all as text, that shouldn't require someone to make a program; the list of temperature API providers is a native function of the directory, and iterating through them, reading them and translating the result to text, are all standard shell operations. If you want to write a shell script that, for example, raises an alert every time a temperature sensor goes above a certain level, that's likely just as straightforward. You could write that as a program and compile it, but you could also just write a single command that performs the action for you, at least during your current session.
That is, ultimately, exactly what the premise of the Modular OS project always was. If the system has capabilities, you should simply be able to make use of them. If you need to read a sensor, the sensor should be providing an API that lets any user, and command interpreter, extract that data effortlessly. If you want to create a GUI window, you should be able to issue a command that creates a GUI window using the native system capabilities, even if making good use of it is somewhat more complicated than that. And the same with other things; if you wish to record sound, or play back sound, or convert text to sound, or convert sound to text, as long as those capabilities exist within the system, the existence of the capability should necessarily mean that you can issue a command to do that thing.
In fact my ancient notes from the first years of the MOS/DCA project have random, difficult-to-parse thoughts on exactly this topic. I was thinking about narrowing the distinction between library functions and shell scripts; I thought, surely, if you just expanded the capabilities of your command parser, you could capture the data block being returned by a library function in a variable, and pass it as a parameter to another function. Those thoughts were naive; I was really thinking about reducing code duplication. The idea only began to make sense once the type directory was added to the Project, but my point is, the thought was there from nearly the beginning. (Actually, the papers aren't dated, so it may have been from some years afterward; I no longer remember, except that it is not recent)
The Modular OS Unified System API
In this context, what exactly do I mean by the Unified API? Well, I mean that everything, from compiled applications, to shell scripts, to remote procedure calls, everything can be on the same page, in ways that current operating systems can't be. If you want a shell script under Linux to be able to read a JPG file and display it on screen, that's… complicated, unless you have the exact and specific tools already present on your system. It feels like it should be possible to just do, because other programs on your computer can do all of those things natively.
But the APIs between programs and the shell are not unified under Linux, nor under any other OS. The shell simply isn't meant to do anything and everything; it was meant to, first and foremost, load and launch other programs. Making a shell capable of doing ever more, making it more and more flexible, that makes less sense than making individual programs to do the specific things that you want.
Except that eventually, a shell that is considered feature-complete becomes the place where you make the system do what you want, and if you can't do what you want with the shell, it suddenly stops feeling complete. If the system can do something, but you can't simply command the system to do that thing, it feels like you should be able to, like the function should already exist. So administrators and power users sometimes feel cheated by the lack of a program doing the very specific thing they want, and they sometimes treat the people who create and maintain the programs that they rely on poorly, taking them for granted. While using a shell, it feels like you should simply be able to do, what the system can do.
Perhaps nowhere is this more obvious than hardware. I am something of a tinkerer; I have many Arduino-compatable devices around, and have an Adafruit micropython-based macro keypad constantly to the left of my keyboard. I have for many years followed with excitement peripherals such as keyboards with blinky lights, dials, and text displays, or side USB monitors. It feels like making use of these peripherals should be easy, but the systems were never meant to be played with on the kind of API level that I'm talking about. Peripherals on most operating systems still mostly depend on low-level busses, kernel drivers, and programs designed very specifically to interface with one piece or brand of hardware. They do not simply have their capabilities exposed.
In contrast, for example, suppose that a standard 16x2 text display is connected to your MOS system. This is a very cheap and very standard kind of external peripheral, and it usually connects to your system via one of a few low-level busses, such as SPI, I2C, or serial. On modern operating systems, if you want to make use of this display, you need a program that controls the low-level bus and sends the appropriate commands, and whenever that program or one like it isn't running, the display is useless.
Under MOS, I will assume that text-only displays in general have their own unique standard API, and potentially that small displays are a subset of that, and that the text display attached to the system can be recognized as being what it is. A small driver program would be required to translate the bus commands to the device, but once that is done, you can simply treat the text display as being a text display, and any application in your entire system that expects a text display can find it and output text to it. Obviously if they expect a lot of free space, that will be a bad idea, but if you want to, for example, use the display to output temperature data, that may be the kind of thing that doesn't need an explicit program, and can simply be done by issuing commands.
The idea extends to other displays. Under current operating systems, there are only two metaphors for displays: either they are a part of the Desktop, or they are some specific device driver that has to be targeted specifically. But under MOS, if you attach a small display that isn't part of the desktop, it may still be treated as a display--though perhaps not a hardware-accelerated one--and all normal display functions should apply. If that display also has a touchscreen, then your normal display that is not on the desktop becomes a customizable button, or several. If you want to create that set of customizable buttons with commands instead of an explicit program, that can work.
In short, you can just play with the capabilities of the system. That is the goal, and it has always been. That is what gets me excited about something like the Modular OS project. It is why I have continued to return to it year after year. Because operating systems today simply do not let you play with the system's full capabilities. You can't interface that simply with hardware, or make use of system functions natively from the shell. You can't just treat your system like all of its abilities are at your fingertips.
But you should be able to. It's your system. You're in control. That's the ideal behind the MOS Project, and while that's difficult, and probably quite naive, I still think it's worth chasing after.
No comments:
Post a Comment