So I was originally going to write on another topic, and you know, I ended up going off on a side rant about something I otherwise wouldn't have really thought about, and I've come to realize that it deserves to be called out on its own. That topic is: The IP-Port-Protocol stack that we use to communicate with servers is really just about the worst form of remote procedure call that we could have, isn't it?
Now, when I say that, keep in mind I'm an author at heart - I like to make inflammatory statements to get people riled up, especially when the statement is true at heart but knowingly wrong in its particulars. You can always imagine, or find, worse ways to call a remote procedure, than this. But I say it this way because most people wouldn't really think of IP:Port addressing as being a remote procedure call at all. It is how you consume a service, though, which ultimately means that you are calling one well-defined (well, maybe) function from one program in your own program, frequently over a network.
Just on the merits of success, clearly the IP stack has done a lot for us. But we already expect more from it than it was designed for, and that's before I start talking about the many and varied requirements that I am coming to expect from an RPC mechanism, after exploring Project MAD.
The IP Suite: A Partisan Overview
So when I say “the IP-Port-Protocol stack”, it's not necessarily clear exactly what I mean. The Internet Protocol suite, which is the basis of literally the entire internet as we know it today, can be understood a lot of ways and with various levels of depth, but I'd like to break the topic up into two pieces: Addressing, and communication.
The part that I'll summarize as Addressing contains numerous mechanisms, some of which are barely relevant to the point I'd like to make, but broadly, they let you translate a name into a series of numbers (which can be summarized as a single number, if you prefer), and then all of the hardware in between you and the destination know what to do to connect you to that number, and connect others to your number. There is a service, called DNS, which translates friendly website names into an IP number, which is the official canonical number you use to reach that website. Depending on how complex the service you're trying to reach is, that IP address may actually send you to a service that distributes packets and queries to a number of different handlers, to ensure that nothing gets overwhelmed or to let you use a service more local to yourself instead of a single global server. (And I may be getting parts of this a little wrong, because it's complicated)
But it's interesting to talk about IP addresses in a local context, because for many home users, IP addresses aren't reserved for a given user or machine; they're temporary. And even when they aren't temporary, they're mostly arbitrary, meaning that unless someone tells you what an address is supposed to be, you have no context from the number itself. But it's inconvenient and therefore rare, for local users to set up tools like DNS for their home network, so that you are dealing with descriptive names rather than numbers.
But the inconvenience goes a bit further: all network packets in an IP switched network are targeted not only at a machine, but at a specific port on that machine. This port is simply another number, and although there are standards suggesting that several well-known ports should be used for specific protocols and applications, there's not actually anything that's stopping you from using the port normally reserved for websites, to do almost anything else you want. Oh, it'll be inconvenient and confusing if someone tries to access that server from a webpage, but the system itself doesn't care. This problem cuts both ways: You can use (almost) any port for (almost) anything you want, and you can't tell what a port is actually being used for just by the number in use. This becomes an even bigger problem when we start trying to manage complex data spaces with IP address hierarchies, but I'll get back to that in a minute.
The second half of my description for the IP suite is “communication.” Once you've established a data channel to a remote server, you then need both computers to speak the same language across that channel, and this is known as a protocol. The Internet Protocol itself, of course, is one such language, one that is used to get your message to the right place; but once your message is in the right place, two pieces of software need to agree on how you make requests and how results are returned. There are several existing protocols, and the most used ones are well-known, but it can still leave you feeling ignorant of how your own machine works. If you know the IP number and port of a service, but not the right protocol to use, you can do nothing with it - but, as I said, the IP number and port are in some sense arbitrary. From a certain perspective, the internet is full of billions of holes, and it is your job to cup your hands and yell code phrases into exactly the right hole. Yell the wrong code phrase or pick the wrong hole, and nothing (or nothing useful) happens. Pick the right hole, and you'll hear a code phrased yelled back at you from the other end of a pipe.
The end result of all of this, functionally, is that if you have the right combination of IP number, port number, and protocol, you can call one function on a remote machine. It is, in other words, the worst possible remote procedure call stack.
If that doesn't sound right to you, it's probably because many servers have come to expose a lot of functionality on a single port. That's all down to the parameters you pass the service when you make a request; the exposed function that you're calling is itself a directory of other functions. Perhaps, one might argue, that means that you are really exposing far more than a single function, but that argument is flawed.
In fact, generally speaking, it is the job of anything that wants to provide services to invent a new remote procedure call mechanism. It is your task to take a data block as input and decide what function to call, with what arguments, and if data is returned, what to do with that return value. It is one of many places in modern computing where the solution to a problem is, force programmers to find their own solutions to that problem. One significant consequence: every programmer is forced to find their own ways to parse, validate, and verify incoming method parameters from that data block, which provides a massive and porous surface where programmer mistakes can become exploitable program vulnerabilities - and frequently, those vulnerabilities affect general purpose services that are capable of doing a lot of things.
If you want to build a system on top of remote procedure calls, the first thing you must do is not require everyone to reinvent remote procedure calls with every program they write. It is in this context that I want to talk about how the MOS/ADA subsystems think about remote procedure calls.
RPCs Require Better Communication
The Agentic Distributed Applications model, and the Modular OS concept built on top of it, both talk about listing what remote procedure calls are available. The list of procedure calls is used for three purposes: one, it tells remote callers what is available, two, it is used to verify that incoming requests are valid, and three, it is used to direct the final result to the function that will be handling the request. It's worth noting that the IP suite in fact does not tell people what functions are available to be called on a server, and the IP suite itself doesn't verify or validate any incoming requests; if there is a service which can receive the request, that service handles it.
But the MOS/ADA RPC stack is expected to do more than that. Ultimately, it is the function of the RPC mechanism and not of the individual service's back end, to convert packed data into arguments for the function to be called, and to convert the function's return value and/or error condition into a packed data type for the return trip. This is known as not requiring programmers to reinvent remote procedure call mechanisms, or if you prefer shorthand, it is known as an RPC mechanism.
In order to do that, the system must have an awareness of data types, which is why data types are an important part of the system directory. Not only must you be able to convert incoming arguments to the appropriate data type, but you must be able to verify that the incoming data type is an acceptable match for the type expected by the exposed function. And by having those types be explicitly listed, the calling procedure can do certain checks before sending the request.
(I suppose you could expect every function to take an undefined, fully variable data type as parameter to every function, do no type checking ahead of time, and expect the end programmer to dedicate the first lines of every function to finding out whether or not they were passed the correct arguments, the way for example C programs do with their command line arguments… or you could compartmentalize those checks by having them done in an explicit precondition function, which may be an overridable hook, allowing exported functions to only contain that function's code. One of those sounds like requiring programmers to reinvent remote procedure calls to me, while the other sounds like having an RPC mechanism.)
Already, we can see incredibly clear distinctions between what I am calling RPC mechanisms, and raw server sockets that get piped straight into application code. There is a language to RPC calls, and if you want to understand that language, it can be explicitly queried, finding out exactly what the protocol expects and how to format your requests. Rather than shouting memorized code phrases at one specifically chosen holes among billions in a wall, there is a piece of paper taped on the wall telling you what to say into which pipe to receive what reply.
Of course, someone developing their own RPC mechanism to respond to queries on a raw server socket can do all this… and they can do it separately for every single application they create, because it isn't a defined mechanism, it is a private standard that they have decided on. It makes far more sense for this mechanism to be standardized and this directory to be machine generated, because most if not all of the information is already there when you compile your program (or run your script, if the language used doesn't get compiled). As long as you agree on how to translate the information already required to create and present within your program, into this directory listing, it should be relatively simple to implement.
Relatively simple, of course, does not mean easy, or even simple. There are a lot of questions about types that need definitive answers and canonical standards. Today, before any of those questions are answered and those standards canonized, it would be relatively difficult and complex. To put it another way, Project MAD has always, from the very beginning, been about doing the hard work ahead of time so that other people can simply walk in and do the fun parts. This is known in some circles as creating an operating system, though perhaps not everyone would agree.
I would argue that once you have all of those standards, you can do everything that the IP-Port-Protocol stack can do, but with names instead of numbers so that you can read what you're doing, and with documentation that tells you what's available and how to use it. A protocol, in other words, that is not only machine readable, but human readable. One that not only allows communication at runtime, between processes that already speak the same language, but allows the programmer who created a program to communicate with users and other programmers in the future, making explicit their assumptions, needs, and intentions.
The cost in exchange, of course, is complexity and speed. As with many aspects of Project MAD, that slowdown might be a real problem, but equally, the speed we have now comes with fragility and ignorance. We have built a lot of very important systems on top of tools that only barely allow us to communicate; we've discovered that we don't need much structure in order to make something like the Internet work. But as we have wanted to create more powerful and flexible tools, we've chafed at the restrictions. Better communication is needed.
A Word or Two on Addressing
So I promised to come back to a bit about the Addressing side of the IP-Port-Protocol stack. As I said at the beginning, this entire topic originated in a blog post about other things, and specifically, I was trying to list all of the problems that Project MAD tries to address with its solutions. One of those is the confusing morass that is IP addressing within application containers and private networks.
For those not aware, application containers are used to explicitly contain a piece of software and its dependencies, so that if two applications each require different versions of the same library or tool in order to function, they will not accidentally select the other app's incompatible version in place of their own. They likewise won't share some system configuration bits, which is useful if two apps each require the same embedded tool but configure them differently. There is another conversation to be had, here, about configuration, but we'll get back to it in a bit.
Suppose for the moment you had a home server farm, with several hardware servers, each running hypervisors that split the system into multiple virtual servers, and each virtual server used IP-based application containers, with some of these containers incorporating their own private IP networks containing multiple internal containers. It's fair to say that there would be a lot of IP addresses being used to direct traffic on this distributed system: each hardware server's network adapters have their own, each virtual server's network adapters have their own, each container and sub-container has its own.
Part of the very real problem with this setup is that all of these IP addresses are arbitrary, and due to the way their scopes work, it's entirely plausible that if you leave things to automated processes, random containers on different virtual servers may have the same internal IP address. They shouldn't be visible to each other unless you've done something wrong, and perhaps no part of the distributed system as a whole may malfunction, but as an administrator, programmer, debugger, or power user of the application containers, it is frustrating that what is supposed to be a unique identifier is neither unique nor identifying. An IP address alone may tell you neither what is at that address, nor can you be sure that it is used exactly once, even within your own, privately operated server room. And it may not be quite as simple as an amateur would like to change those internal IPs so that everything is unique, especially if those unique IDs are only useful for reading logs and chasing down problems in a complex and distributed application.
Part of what I hoped to guarantee with the MOS System Directory is that across an entire, theoretically infinite distributed system, it is still plausible to uniquely identify every resource, by having explicit nested scopes. Because systems list their internals, even privately, as long as you can distinguish peers from each other within their own context, you can create a correct, unique, descriptive identifier for every resource - meaning that at every level of context, you can uniquely identify all children of your peers (even ones you have no access to), and if you have permission to explore the peerages going up the stack and outwards as well as down, you can discover your system's parent's long-lost uncle's step-son's cousin's application container's database, and the fully qualified unique identifier for it and for you will tell you exactly what your relationship is and isn't.
At the same time, because the unique identifier is based on nested scopes, you don't need a fully qualified unique ID to describe anything, not unless you share no relationship except having the same system root. In fact, if you know the scope you're targeting, you can convert a local directory reference into an absolute one, and vice versa. It's still wise, of course, to use the fully qualified ID once you start going up in scope, in case there are subtleties you are missing, but it's not necessarily technically required. Once any two resources share a scope, that scope is the highest you need to go in order for the two to identify each other and communicate across the network. Even if there is a faster way to communicate, the two fully qualified IDs demonstrate the maximum number of scope changes required for one to reach the other.
It may be somewhat crude for me to say, that I believe an addressing protocol should allow you to give, get, and understand the address of that thing. It may be a bit rude to suggest that the IP protocol's arbitrary identifiers make it difficult to feel like you are in charge of the layout of your system. It is unquestionably arrogant of me to suggest that I could do better than the technology that united and changed the world. But what I don't think is arrogant, rude, or crude, is pointing out the flaws in any technology, especially where those flaws have consequences. Any system built on my ideals will have its own flaws, and I have to trust that some person smarter than me will someday figure out a way to address them... no pun intended.
And IP has its flaws. All of this is without getting into the limitations of the still-ubiquitous IPv4, which hasn't had enough addresses in a very long time and has felt for a while like it should just get replaced by its successor, IPv6, to prevent some very real, and some other mostly theoretical problems. Frankly, that's not my problem with it, not as a power user of home networks, nor as the architect of Project MAD.
My problem is that the IP stack as a whole was a very early (starting in the 1970s) way to do remote procedure calls, and all of computing has advanced and changed since then. Even if you don't like my solutions, look at the general problems we are facing. We want to combine systems together, tightly, in ways the designers of the internet protocol did not foresee at the time. If you don't like my formulation of the general case problem that needs to be solved, find a better one. But ever since I've had this general case problem that needed solving, my mind has been seeking out solutions that fit it. You can build a distributed system on top of IP using yet another layer of abstraction, and maybe that'll be necessary. But for addressing parts of a system, there are better ways than depending on arbitrary and not-guaranteed-unique numeric identifiers.
One Last Topic: Distributed Configuration
So I just barely touched on the topic of distributed configuration when talking about application containers, and it's an interesting side topic, one that deserves its own full blog post later, but I believe I've teased that before, and so it's better to have a brief discussion about it, in case it keeps slipping my mind in future posts.
Configuration in a distributed system is tricky, because at any given time, an application is under multiple, potentially conflicting policy and rule domains. The user running the application may have rules and policies. The hardware device that the application is running on may have certain specific configuration rules necessary to make it work. The distributed system may have an administrator that is enforcing certain rules and policies. The application itself may have multiple default policy sets, for example, one for power users and one for non-technical ones. And an application may be embedded in another application, and the parent application may have specific rules that it needs the embedded application to follow.
But also, general rules and policies are not the only items of interest. Anyone who has any claim over an application, such as the hardware, user, administrator, or parent application, may call out that specific application and make an explicit configuration change to it, in order to ensure proper functioning. A user may wish to start an instance of the application with a special configuration, whether that's part of a script, or a deliberate choice made by an interactive user in a shell.
It is my intention that the Modular OS, and the Agentic Distributed Applications model specifically, has an explicit mechanism for gathering and resolving all configurations present on the system. The resolution step may be accomplished by a callback hook, that is, an application can choose to apply configuration changes in a specific way or even may choose to misunderstand or ignore configuration changes, but there needs to be a mechanism that presents the application with a list of all configurations, and there ought to be a standard way to resolve any disputes.
Although I'm not quite happy with it yet, the working name for this mechanism is the Standard Environment Variable Evaluator, or STEVE. STEVE is an important part of the ADA model, and arguably fundamentally necessary for ADA to work properly, just as STEVE makes little sense outside of a distributed context.
This topic, about how best to handle distributed configuration, is one that will be determined, in the end, by people who know the topic better than I. No matter what I try to say about it now, it is a topic that will get revisited and revised once good intentions meet practical concerns, and as such, feel free to take much of what I say below with a very large grain of salt. But naively, I believe that there is a hierarchy of default and explicit desired configuration states. The least important configuration, for example, is the program default built into the code, which is used only if nothing else is configured. On the other hand, an explicit decision made by the user when starting the program is the highest priority among its peers in the configuration hierarchy.
Beyond that, though, all defaults yield to all explicit configurations, and both defaults and explicit configurations are ordered by how specific the configuration is. Configurations are less specific if, for example, they refer to an application category broadly, like Start all windows maximized. More specific would be a configuration referencing a specific application in general, then a specific application when run by a particular user, and then a specific instance of an application when run by that user, such as when the application is passed a configuration parameter when it is launched.
But there is a major exception to this rule. One of the things that came out of Project MAD is that, in a distributed system, every component has the ability and authority to refuse service--indeed, it's frankly the only control you have in a distributed system--and as such, everything that has dominion over an application, such as the hardware it's operating on, the user, or an administrator, can kill a misbehaving application if it does not follow policy. Consequently, when we talk about configuration, there is a distinction between desires and rules. A program caught not following the rules may be killed, but a program that merely disrespects the desires of others will not be.
In the hierarchy of configuration options, then, any applicable rule should take precedence over a stated desire. If, say, the Administrator refuses to allow full-screen applications, and the user explicitly tries to start an application in full-screen mode, your options when parsing the configuration are clear: do you allow the user to do something that may attract the administrator's ire, possibly causing the application to be forcefully quit, or do you prevent the user from doing what they have explicitly said they want to do?
Of course, it's even more complicated than that, because we're talking about a distributed system, which may have multiple competing sources for various software and hardware services. Suppose for example you have your private phone connected to a desktop machine, in order to use the desktop's monitor, CPU, GPU, and input devices to play games. The system administrator for that desktop machine forbids full-screen applications, but your phone is not subject to that system administrator; it is completely under your control. Your phone, however, may not be configured to not display desktop-style applications on its own screen, and for good reason - it may be too small, too low-resolution, and in the wrong orientation, and the processors may be lackluster and prone to overheating. In that context, if you want to launch a full-screen application, do you try to launch it on the desktop screen where that isn't allowed, on the phone screen that it isn't suited to, or neither?
My thoughts on this matter are incomplete, but there is one thing that I am certain of: it should not be down to a programmer to find and read configuration options across a distributed system, nor to determine what the nominal hierarchy is. An operating system for a distributed system should find and request all relevant configuration options and present them to the application for parsing. This needs to be done before a distributed application selects what hardware it is going to use, because the rules and policies governing the hardware will be used to determine which hardware is selected.
A bonus that comes from the system handling these matters, is that several lists of configurations can be made explicit: the total configuration on a system with sources, the total configuration as applied to a specific application, and depending on the configuration mechanism, you may get a list of what configuration options the application itself requests, including requests from all internal libraries and embedded utilities, which tells you what configuration changes you can make to influence its behavior. You can see a list of what configuration changes a user has made, and which of them have and have not been actually queried by applications in the last day, week, month, or year--indicating that they may be useless, malformed, or out of date.
But likewise, a system-wide configuration parser can validate a list of configuration options, providing a list of warnings and errors that indicate some options are malformed, out of date, or nonexistent. It can simulate how configuration would be applied if you used it on a piece of hardware, on a system you're connected to. It can tell you how Administrator-applied policies have affected you and your programs, and how your own configurations have affected you. It can tell you when a configuration option made explicitly simply matches the default. It can tell you when rules and configuration options have changed, in ways that affect you.
These kind of features are only possible in a system where handling configuration is a standard process. As with the other mechanisms I've described in this blog post, the current solution is “Bring your own,” or in other words, there is no configuration mechanism. Even centralized systems like the Windows Registry and Windows Active Directory don't give you all the information you could possibly want, and too many applications have completely internal and private configuration files that will never be explained to anyone, meaning the applications basically cannot be configured even though they were designed to be.
These are all systems that solved specific cases of the problem when they needed to, but nobody tried to solve the problem in the general case. Perhaps if they had tried, it would have worn out its welcome wherever and whenever application developers wanted something different. I believe that it is laudable to at least examine the problem's general case and search for solutions - but more than that, in the case of a distributed system, I think it will be necessary. To not have a standard solution to this problem will put undue burden on application developers.
Wrapping Up
I'll consider it fair if I get feedback on a lot of the points I've raised in this blog, not merely this post, as being incorrect on a technical level. Others know more than me about a lot of the things I've discussed. But the point of view of this blog, and the reason why it's worth writing, is that I see challenges ahead that we're not ready for. Building a system on top of remote procedure calls requires a lot, and our current model of clients and servers is a poor excuse for that. Arguably, the client-server model was never meant to stand in for general RPC mechanisms - but in the modern day, it's been a hammer tasked with far more than driving in and pulling out nails.
I feel like the various web services and web APIs prove this point as well as anything. Many of these are utilizing the tools meant for web pages as a middleware to enable general remote procedure calls over the internet, and as such, they are awkward, insecure, and suffer from a lot of problems whose primary source is programmers needing to reinvent various wheels. Tasks that should be mechanical underpinnings of a system are being hacked together. That's before talking about using webpages as a display in non-web contexts because standards between various operating systems are massively inconsistent, or talking about webpages being the best answer we have for a remote server presenting a local interface to the user.
The tools are already stretched. Utilizing webpages on nonstandard ports is inconvenient, even if it's the best way to get multiple separate server applications to cooperate, rather than integrating them into a single app. Trying to understand the flow of data in a container stack is a headache, but it's still easier than trying to manage dependencies on a system not built to properly handle dependencies. Trying to understand what is going wrong when a client and server communicate over a network can be maddening, but it's the best way we have to make use of another machine's computing resources in parallel with your own machine's.
If you want to merge machines and utilize one's capabilities from another, you need a better mechanism. If you don't like mine, by all means, show me up. Do better. I'd love to hear about it.
No comments:
Post a Comment