Thursday, July 31, 2025

Remote Procedure Calls under MOS/ADA

I have cast some aspersions against the existing format of remote procedure calls on this blog, and it would be unkind of me not to admit that I have never actually used remote procedure call mechanisms myself.  I know that RPCs do not meet my requirements, because my requirements are insanely high and weirdly specific, but I can admit I don't really know a whole lot about the RPC mechanism in existing operating systems.

Instead of trying to talk about existing RPC mechanisms, let's talk about calling and handling remote procedures under Project MAD.

MAD Procedures

I've already said that Agentic Distributed Applications expose a list of procedures available to be called remotely, both procedures only intended for internal use and those more publicly exported, and I've said that applications are supposed to deploy an Agent onto remote nodes in order to make use of resources.  I've heavily implied that resources are, therefore, not intended to be consumed except by Agents, I'm happy to mostly leave that there, with an asterisk saying that some very basic resources, such as sensors, might be accessible remotely.

If that sounds unrelated, it's not quite.  The directory of available API calls that the ADA server builds, must have a built-in distinction for what calls can be made remotely, and what calls are consumable only by local Agents.  After all, it makes no sense to deploy Agents to consume your own application's internal API, which is itself provided by your own Agent already; the internal API is one of possibly many examples of remote APIs.  When you need to decide where something should fall, there are a number of questions to ask, and some more obvious than others.  For example, a purely remote API call only makes sense if either:

  • The call is stateless; it does not store data or set internal state that will affect future calls; generally, if multiple calls are made by different applications, there is no way for them to interfere with one another except in reasonable, predictable ways.
  • Or, the procedure manages its internal state in such a way that no combination of remote calls can have unintended consequences, for example, internally managing its memory such that different callers have different memory dedicated to them.

There are interesting side topics here, such as the need for a distributed configuration mechanism, but that's not what I'd like to focus on today.

The typical example I have in my mind when thinking about remote calls, and why Agents are necessary, is the GUI.  If you want to create a window to display information to the user, you are asking the graphics subsystem to set aside a bunch of memory for your application to manage.  Ultimately, this memory needs to be co-managed by the Application and the graphics subsystem; it can't be purely under the Application's control, because the back-end needs to read that data to actually display it, and may need to take control over the memory if something happens to the Application.  But, it can't be purely under the graphics subsystem's control, because the Application needs to freely make a lot of fast modifications to the memory, possibly not using the subsystem's helper functions at all to do so.

Calls like this should never be made without an Agent, because the subsystem needs something to take responsibility for that reserved memory.  If you aren't counting on Agents, then you need some other mechanism to notify the graphics subsystem when an Application quits or crashes, or the hardware that the Application was running on gets disconnected.  Even if you had very fast mechanisms for accessing the video memory remotely, and a dedicated network bus that lets you pipe in uncompressed video data straight into the buffers without causing network congestion, the need to manage the memory itself, and assign responsibility for it, remains.

Contrast that, however, with something like an atomic sensor read, setting a hardware light to on or off, or even creating a simple popup notification in the GUI.  Even when these actions have side effects, they aren't the kind of side effects that explicitly need management.  You could draw a distinction, say, if you wanted to be the only Application in control of that hardware light, and to lock out all other users, or if you wanted a modal dialog box in your GUI that the user must interact with, and which the application needs feedback from; those would require management and therefore an Agent.  But if you simply wish to flip a light on or off, or set its color value or intensity, or if you merely wish to put a bit of text in front of the user, that does not.  It may require authorization; you may need confirmation that the application is allowed to do these things.  But you don't necessarily need an ongoing presence.

But even when talking about actual remote procedure calls, my expectations of a remote procedure call mechanism are higher than average.  Some of the reasons why are better explained later; for instance, it is my hope that the MOS system directory is also a directory of types, and that type data is used in detailing APIs, and can be used for verification of the incoming parameters.  That one-sentence summary hides a number of other thoughts and details, but it's better not to get into them right this moment.

Decentralized Procedure Calls

A more important thing to say about MOS/ADA remote procedure calls is that the distributed and decentralized nature of the system requires some explicit handling.  Let's say that you have an application that uses a machine learning chip to monitor an external camera, and when the ML chip detects that a bird is in front of the camera, it takes a picture and puts it on your computer monitor.  For our purposes, assume that all of these are on separate hardware, and therefore remote to each other.  There are numerous data streams involved in this process; most notably, from the camera to the ML chip and from the camera to the display.  However, the trigger that puts the picture on-screen doesn't come from the camera, it comes from the ML chip.

There are a couple valid ways for the ML chip to display an image from the camera on the monitor; the ML chip could store the image frame and pass it to the display if it meets the requirements, or the ML chip could ask the display Agent to go and fetch an image frame, or the ML chip could ask the camera Agent to send a frame to the display.  But I will argue that the “correct” form of this request sends two requests: one to the camera, asking it to send a frame of data to the display, and one to the display, asking it to be ready to display the frame of data it is about to receive.  This solution keeps the ML chip from needing to keep frames, but it also minimizes the overall wait, by having two operations begin in parallel.  The display process and the image send process both start at roughly the same time, and by the time the image data makes it to the display, the display is ready to handle it.

But perhaps the most interesting reason to argue for this interpretation of inter-Agent communication, is that you can imagine writing the request in a single line of highly readable code:

  • video.display( camera.capture() )

This line makes intuitive sense to a programmer; you are sending the camera data to the video display.  But what is perhaps most important is what this single line of code says about how remote procedure calls under the MOS/ADA specification should work.

Specifically, in all remote procedure calls, you eventually need to send raw data as parameters to a remote function.  It is our common experience in programming that you are only able to access data that is directly under your control, but any distributed application is going to need to refer to data that exists somewhere else, when making a declarative statement of intent.  In this case, from the ML hardware node, we are calling a function on one remote hardware node, passing as parameter data from another hardware node.  All of this data is under the application's purview; there is no problem with the scope of the data involved, no reason why this request should be impossible.  But with existing libraries and programming languages, it may be a very awkward three-way handshake to try to coordinate.

Rather than saying “This is the ideal form of this data stream,” I am saying that if we work hard to enable this kind of decentralized data command as a basic programming language feature, it will empower programmers to more easily perform operations in parallel.  And programming in parallel is somewhat of a problem; computers have run multiple programs at once for, what, 35, 40 years?  More?  And yet many programmers still think, fundamentally, in single-threaded terms.  The biggest exception I can find to this rule is actually webpage programming, because the very nature of web programming depends on remote data requests.

But instead of that, and not coincidentally, the compound statement I wrote above reads more like a shell script, where two applications are both started, and input from one is piped to the other.  You could easily see how the syntax of this compound remote procedure call can be extended to run dozens of operations in parallel - by simply passing more parameters to the destination function, each sourced from a different Agent.  If the chain of logic got more complicated, for example, the camera image was passed through a filter on another module, perhaps to remove all image data except the bird (with another ML algorithm, perhaps on a different ML hardware node, or a different Agent on the same hardware node), you could see how the statement would remain a perfectly valid, single-line statement:

  • video.display( ml_filter.isolate_bird( camera.capture() ) )

And if this seems a bit redundant for a single image frame, consider that if we were setting up a video stream instead, this same chain of logic becomes a workflow in which each hardware node does its own job and nothing more.  Ideally, once this workflow is set up, it can operate at the fastest possible speed, making full use of the parallel architecture to get things done without overwhelming any single piece of hardware.

Looking Under the Hood

In the meantime, we are left with a question that people may find uncomfortable: supposing that this chain of logic is acceptable, how exactly do we phrase this remote procedure call in low-level code?  Even just using the earlier example, without the filter, it can seem complicated.  The video.display function needs to know it will be waiting for a data block parameter, one not included in the RPC call that starts the function.  And the camera needs to know that the return value of the incoming RPC request will not be sent back to the Agent that made the request, but sent to another Agent specifically to be used as part of a function call that it did not initiate.

The answer I have currently is that the MOS/ADA resource directory, which I said is used to expose API calls, can also expose chunks of memory for reading and writing--and specific to this use of that mechanism, there needs to be a syntax for reserving memory for temporary variables.  There's a lot to unpack there, I admit; I was under the impression that I had already suggested in my first post that the ADA indexed memory, but I look back and don't see it (I did refer to ‘application resources’, but I wasn't explicit), so it's worth taking a moment to justify that.

Recall that the ADA's exposed API is handled primarily by the ADA server, and that all requests come with a sending application, agent, and user; as such, when I suggest that raw memory can be read or written, we are not talking about leaving memory access unprotected.  There are many processes in a distributed system that will be both data-heavy and latency-sensitive, such as parsing video, and as such, it's generally best to have some mechanism to simply transfer memory, because if there isn't a built-in mechanism, applications programmers will simply write wrappers to do the same thing, leading to code that does nothing beyond circumventing the limitations of the system.  And while those explicit wrappers are good in some circumstances, especially when they add verification and validation or similar checks, that's not a good justification for having no direct memory access mechanism.

Equally, having the ADA server provide direct access to application memory is good for debuggers, administrators, and power users.  Debugging a distributed application is a nasty business; you don't have all of your memory in the same scope all at once, meaning that if the system doesn't have some mechanism to expose any part of the application's memory on demand, you will with only the uncomfortable dance familiar to all programmers, where you insert code randomly to check, log, or output values.  While programmers will inevitably do that anyway, it behooves us to have a proper solution to the problem.

Likewise, a power user may take a relatively standard application and want to get specific data out of it.  An easy example that comes to mind are video games; while there are many data values the game developer would not want to give you access to, it would not be hard to have, for instance, your player health (and maximum health) exposed for reading (but obviously not writing), and a power user could create a third-party application to display your health bar on a separate display, or as a video overlay that can be moved around the screen.  As trivial as that may sound, players may prefer their data in a different format (bars, dials, or raw numbers) than the game designer intended.  Likewise, access to application internals can be good for accessibility, turning what would normally be video into sound, or sound into text, or text into braille, without necessarily receiving the developer's explicit permission or counting on them providing software hooks.

Coming back to our remote procedure calls, however, it makes a certain sense to be able to pass ADA URI paths as parameters to functions, or as the destination for return values.  When we are talking about setting up a data workflow between two remote targets, it makes sense to have a syntax specific to creating temporary names, unambiguously describing a location that will await a single very specific bit of incoming data, possibly only from a specific source.  In our example case, the target display sets aside memory with that ID and waits for all parameters to be received; the camera source sends its procedure return value to that ID on the target Agent.  And once the data is received, the display function goes about its merry business.

Of course, in reality, things are more complicated than that.

If you create a mechanism for setting aside memory, that can be exploited maliciously for a denial of service attack.  Presuming for now that the sender of a request can at least be verified, this denial of service can be mitigated by detecting unreasonable requests and having the requester, not the data sender, be blacklisted.  (Unless it can be guaranteed, it should not be assumed, that the two are part of the same application, but I would assume that when a malicious actor is detected, the entire application gets blacklisted, for obvious reasons.  For some multi-user systems, it may be the entire user that gets blacklisted, and an Administrator will be notified.)  But what if the request gets through the sender's entire workflow before the memory gets set aside at the receiver, due to network congestion or some other slowdown factor?  Does the sender's data packet get lost in transit?  Does the sender get treated like a malicious actor if it requests to write into a data block that doesn't exist?

The answer to that, at least for now, is that these requests are handled by the ADA server on all three sides, meaning we have the opportunity and obligation to handle these exceptions as a matter of policy.  For example, the data sender may not attempt to send data out unless the receiver acknowledges that the named data opening exists, and the data may not even be generated by the sender until the request is acknowledged at the receiver (because again, generating data with a malicious request can create a denial of service exploit).  And part of the point of having Agents being involved on all three sides, is that if there is an error due to network congestion or something similar, that error should percolate up through the chain of logic, distributed across the system, until it is handled by all relevant Agents.

Suppose for example that the requester's data packet to the receiver disappears, and therefore, when the sender attempts to send data to the receiver, there is no such named temporary variable to write to.  This will cause an error at the sender, but the sender is only an intermediary; it may be logged there, but the error condition needs to be reported to the requester, who actually set up the chain of events.  And because the sender's operation fails, that will generate its own error, and the error stack is sent back to the requester.  It is there that the error gets its full context: the receiver was not ready for the sender's data packet, despite the requester having definitively sent out both.  If this happens repeatedly, there may be a larger issue to investigate, and may require the intervention of a programmer or administrator.

This Isn't Everything

There are other interesting topics in this kind of distributed, remote API environment.  For example, callbacks and event delegates are a common mechanism in event-driven software models such as the GUI, and while these callbacks should be handled entirely by Agents (again, because it involves managing relationships, just as with memory as described above), it is worth having a discussion about event subscription in a distributed system.  And I do mean a discussion; I'm not sure I could put together another long-winded rant on the topic, but I'm sure there are pieces to the question that are more complicated than I know.  We are, after all, talking about users and applications and Agents, with possibly all of these being different between the event source and the subscribers.  There are matters of security, efficiency, and best practices that I may not be aware of.

Likewise, as I teased before, there is a question of distributed configuration, and although I'll go into that more in another post, I will say that distributed configuration is meant to be handled with an explicit mechanism in the MOS/ADA schema.  That configuration needs to account for local hardware module policy, system administrator policy, user policy, application default configuration, stored configuration changes in the user files, temporary values specific to the user login session and/or parent application session (aka, the “environment” as understood in existing operating systems), as well as values specific to the user application session (which can be understood as application-local variables, except that they represent and are best described as configuration).  Having an explicit mechanism to gather and resolve contradictions in this wide field of configuration sources only makes sense, and the general goal is to be able to simply query the configuration and receive an answer, no matter where it comes from.

There are also probably other questions I wouldn't have any immediate knowledge of or answer for.  Perhaps complications that arise where hardware APIs interface with the MOS/ADA APIs.  Perhaps complications where code libraries and embedded applications interface with the parent application, or where those embedded functions and parent application functions share (or arguably compete for) resources on remote nodes.  There are doubtless complications when it comes to truly confirming that applications are operating under the auspices of a user, or that an Agent is truly what it claims to be, running the code it is claiming to run.

The general Modular OS Agentic Distributed Applications model has a lot of nuances and complexity that I have no right to try to decide.  I can only, and have only, sketched out the broad strokes of this system.  I think I have answers, but only testing and implementation will determine how right, or wrong, I am.  All I can really say for sure, is that if you are trying to build an entire operating system on top of remote procedure calls, you have a lot of work to do in order to ensure that the system is powerful, stable, and easy for programmers to understand and manage.  And… forgive me if I may sully the name of remote procedure calls as they currently exist, but I really think it's going to take a lot more.

Most likely, the full list of what it does take won't begin to take shape until people who know these systems a lot better than me have their say.

No comments:

Post a Comment