Friday, November 7, 2025

Beyond Unix: Application Folders

 Originally, as with the last blog post, I wanted this to be part of a longer post, but it seems wiser to separate posts by topic.  I… honestly wrote a long thing and then realized that it became even harder to understand than my average post, because it was just all over the place.  Part of the reason for this, as I realized after writing the last draft, is that this, like many parts of the MAD concept, can really stand on its own, and jamming concepts together, even if they would work together in MAD, is not a great way to treat your audience.

So for now, let's talk a little bit about MAD-style application folders.  But first, I'd better justify that post title.

Unix File Collections

I first got to play with a Unix-style OS when I went to college and installed a copy of Red Hat linux on my (one and only, tower) PC, circa 2003.  I got that copy of Linux from a CD provided in the back of a big book about the operating system.  To this day, Linux derivatives have been my only real point of contact for the Unix philosophy, and of course famously, Linux Is Not UniX.  Even so, many of their central mechanisms are (in my understanding) derived from the Unix fundamentals.

I have vivid memories of prowling through the /usr/bin directory in that first Linux box, trying to run every executable in order to figure out what it was and how to use it.  Naturally, I left this exercise disappointed, if still starry-eyed and naively believing that someday I would understand it all.  Today the consolidated bin directories on my (relatively fresh) Arch Linux box have over 5500 executables in them, and I would never imagine trying to understand them all.  There are too many applications that I didn't want or explicitly install registered in the GUI list, let alone the various helper commands and components that make them up or are tied to various system services, libraries, and …others.

I guess that sense of wonder and exploration has been beaten out of me, because it really is my instinct to want to understand and have control over my own system as best I can - what I can do with it, how to configure it, and how it works.  Unsurprisingly, MAD reflects that philosophy, and to wit: we're going to be talking about breaking up that /bin collection.

If it were put to me, I would divide programs up into six general categories: device drivers, services, applications, extensions, commands, and functions.  Of these, services are arguably background applications, and drivers are arguably services.  These three persist and wait loop on something, a user input, device event, or other external impetus.  Commands are an interface into another system (usually a service or driver); functions generally transform input into output, for as long as there is input to transform.  Extensions are applications, services, commands, or functions that are only meant to be consumed within a particular context (another service or application); there is no point to them being accessible outside of that context.  They may be embedded, or they may be separately installed, but they are useless without the parent application or service.

The point of making these distinctions, is that I consider it virtuous to collect all your system programs, but I do not consider it virtuous to keep them all in the same bucket.  If nothing else, commands and functions should be understood as part of a workflow (and extensions, within their context), while the rest should not be.  (If you have a workflow that ends up launching an application or starting a service, …well, you've basically made an extension, haven't you?  If you aren't extending the app, you're extending your shell.)  But rather than really being about the collection, this is about the system treating all of these executables the same, as nothing more than executables generally.

Among the reasons to draw a distinction: generally, commands and functions do not have their own application files, and they are less likely to have a configuration file; they may be nothing more than a single executable and any shared dependencies.  It makes sense to have a collection of these independent executables, because you will browse and search that collection in order to find commands to use in your workflow.  Likewise, it makes sense to have a collection of applications, a collections of services, and a collections of drivers.  And to a certain extent, Unix has all of these… just really not in a useful format.  It continues to grind my gears that there isn't simply a global folder in a standard location that just has all the system services and nothing else in it (and I don't mean under /etc).  Likewise, GUIs have to go out of their way to create collections of applications, and they get to decide on how they do that and where they keep it; it's not a central, standard directory.

Another problems with apps and services in Linux is that, by default, they are assumed to have a single, canonical configuration, plus (possibly) one additional configuration per user.  Anything more than that, and you get into the realm of passing in configuration files on the command line, or tweaking the environment before running an application.  Both of which work… but both are abusing more general mechanisms without actually addressing the problem.  This global-centric system has another consequence; network ports are a global and limited resource, and being numerical rather than name-based, reserving and targeting ports becomes problem when multiple copies of a service exist, or you want to run a service but hide its port, among other things.  These problems have solutions… but they all arise from old and faulty (or at least suboptimal) assumptions.

Over and over again when trying to write about MAD I come back to the idea of assumptions.  MAD/ADA, the distributed app model, is one tool to help you control the assumptions of a given application, but I'd like to talk about a different tool today: the MAD Application Folder.  The App Folder framework tends to be assumed to exist by ADA, but neither truly needs the other, as you'll see.

Application Folders For Fun and Profit

This topic is going to need a little technical overview before I get into the why's (and you know I love getting into the why's), simply because if I don't say this explicitly, I'm going to try to say in the middle of some other paragraph and it won't come out well.

The App Folder framework is fundamentally three pieces.  First, it is a standard for packaging applications for distribution, as a folder; but unlike many packaging schema, the application files are meant to stay together and installed just as they were received.  This folder format is intended to be eminently portable; the same format works whether the application comes bundled with its dependencies or whether they are separated for more efficient distribution.  Likewise, the same format works whether the application is installed system-wide, stored in some arbitrary filesystem folder (say a user Download folder), or embedded in another application.  We'll get to that.

Second, the framework is a standard for configuring applications, by editing, replacing, moving, adding, or deleting files within the application folder.  This configuration can be kept separately and applied at runtime, so that the application distribution folder is kept unmodified - we'll get to that, too.

Third, it is a standard for representing running applications and services as live folders (similar to the Unix procfs), with a specific emphasis on representing dependencies, imports, and exports as files in this live directory.

If it helps you make sense of what I'm trying to say, the application live folder is modeled directly off of the distribution folder, as modified by the configuration.  The line between these three concepts is straight and direct - once you understand the purpose of the live folders.

One main point of live folders is to keep most non-shell applications from ever directly accessing the global filesystem - their dependencies, exports and outputs, inputs and configuration, some internal program state, and their collection of shared, user, and system files, all are represented here.  If they would be going out to the filesystem to reach, for example, a hardware device, the application loader does it instead, mounting the file in the live folder, and the application looks for the hardware device to be mounted in that specific position.  The same is true of configuration files, application extensions like themes, and so on.  The application distribution file tells the app loader what files to collect; when finally the application is running, everything it needs should all be collected in one place (even if only in link form).

The other major part of the framework is meant to expose data in a way that the system can use.  Suppose for example you have a database which is nominally configured to use a network port.  In normal C code (and pardon me if I've misremembered or misunderstood), this is done as a two stage process, by opening a socket and then binding that socket to the network port, each with different system calls.  What the MAD application folders concept considers ideal in this situation, is for the application to open a socket and mount it in their live application folder, as for example Exports/database.sock.  Separately, a workflow that lives embedded in the application folder and managed by the application launcher (or core) gets triggered by the creation of this file.  Suppose that workflow lives in the distro folder as Exports/Port/3306.  Prior to the socket being created, the application live folder contains nothing in the Exports/Port folder; if the port binding succeeds, the same workflow creates that 3306 file in the live folder, making it point to the database.sock file, which is the real workhorse.  If you wanted to change the port number, in this example, you would only need to rename the 3306 file, and if you wanted to prevent the database from opening any port (we'll get back to that in a moment), you would only need to remove it entirely.

If you establish all of the above as the way things should work, then the system could maintain a collection of open network ports as files, each of which points to the appropriate application's live folder.  Incoming network requests on that port are forwarded to the database application's socket, using the filesystem to store the routing data.  That doesn't necessarily mean that the filesystem must be involved every time a packet crosses the network; it may suffice for “filesystem changed” notifications to cascade through the system, forcing updates in routing tables.  But this collection of ports as files would make for a very intuitive interface, allowing system administrators to use the filesystem itself to query what application is managing what port.

It goes beyond that, however.  Assume for the moment that the exported database.sock file can be used just like a network port, but without involving the networking stack.  Thus if you had, for example, a program that wanted to embed this database application, it could configure the embedded database to open its socket but not bind it to a port, and then your program uses the database's live folder to attach to the socket; the networking stack isn't even consulted, much less used as an intermediary.  That's clean and efficient - but what if I said that the same mechanism could let you connect to any other database in the same system without using its port number, even if the port is open, just by doing the same thing with the database service's live folder?  What if I said that the same mechanism could let you connect to a remote database by mounting a database client in the same place that you might have mounted the database server, in your database-consuming application's live folder?  In all three cases, your application will expect to find an application live folder at for example, Imports/Database, and in all three cases, it will specifically target Imports/Database/Exports/database.sock (though perhaps this could be simplified) as the socket it wishes to connect to.  Suddenly, your own application is much more portable, no matter whether it gets configured by the end user to use an embedded database, a system database, or a remote database.  More than that, any application living in that location which provides a compatible database.sock (or indeed, anything else you do to put a link to a database socket at that location) could be used by your application, even if it's not the exact vendor of database you were expecting (obviously, this is a little problematic for databases, which may have variances in their query language, but you get the general idea).

This one example shows the application live folder being used to export resources, import resources, configure those imports and exports, represent application state (eg, whether the socket/port has been opened), make the application embedable, automatically trigger workflows in response to events, provide a compatibility layer between competing service vendors, and make a collection of system resources.

But we aren't done.  Where does your database store its files?  Without googling, I was honestly unsure where in my system a, for example MySQL database file would actually reside, according to its default configuration (and frankly, MySQL's default location annoys me - not that it matters).  But Project MAD is about a distributed system, and there's a problem with storage in a distributed system: there isn't guaranteed to be a single canonical file store.  In fact, there may not be any general-purpose persistent storage anywhere in the system (though in that case you probably wouldn't be creating a database).

Categorically, in a MAD distributed system, a given application will have three datapoints to start from when looking for storage: the location of the application distribution files (or application configuration; if this is embedded, it will be another application's files, and you can go up the stack if need be), the private folder for the user running the application, and a generic system collection or index.  Under MAD, none of these are guaranteed to provide you with a writable storage volume (it's a reasonable guess, but not guaranteed).  Disk access is a resource in the same sense as any other device driver, though it may be hard to put some of the constraints into words.  MAD generally assumes that arbitrary system components can be removed from the system at any moment - and that includes both the long-term storage you might otherwise use to store the database files, and the module on which the application files themselves are stored.  It would be unlikely to be a problem in practice, but it is a burden I chose to undertake.

That means developing a language to describe what you want to store and why.  For example, a “System Service-persistent storage” disk space request might be different from an “Embedded Service-Application Data” disk space request.  In the former, you would generally ask the system whether there are any disks configured for storing service data first, and if not, fall back on asking the user or storing the database files with the database configuration or application files.  In the latter, you would generally want to ask the parent application where it's storing its data, and failing that, you would ask the user running the application where they want your data (more properly their data, assuming there is a user running your application), then try the application configuration location, and only last would you want to query the system for a global storage location, because your embedded database and the application files may go separate ways without warning.

Whichever way you choose a storage location, once one is selected, a data storage folder will be mounted in your application's (or the database's) live folder, and the application targets that data folder within its own process space with all subsequent filesystem requests.  And to reiterate: the application should not care where the files go.  A distributed system cannot expect applications to understand the system topology.  Your database should not try to learn where it should keep files, and the people maintaining the database code shouldn't need to decide.  It should not be the developer's job to find some clever, out-of-the-way place to put it, even by default.  It suffices that the database asks for a place to keep its files, and one is provided (assuming of course, that if you ask again tomorrow, you will get access to the same volume as you got today).  If the data needs to be encrypted, add that constraint to the request.  Everything else is up to the user and system; either they provide what you want, or they don't.  And if they don't, well then, it's time to get users involved.

Now, you as the user or system administrator?  You get to care.  You set up a file access server and client so that the database can use it.  But the tyranny of the default remains; it matters that the database can install itself without user intervention.  It must, if it can.

I'll mostly finish this part of the blog post for now, though there is one quick aside to make.  When categorizing executables, I said that commands, distinct from functions, depend on services or applications.  Services and applications have managed exports from their live app directory.  Thus it stands to reason, that if those commands are exported from an application or service live directory, then the system can create and manage a dynamic collection of commands, depending on what services are currently running.  Any services that are not running, need not contribute their commands to the collection.

Now, not everyone may find this dynamism valuable; some people would prefer to know what all the potential commands are, irrespective of whether services are running or stopped.  But the whole point of separating the different kinds of executables is managing scope.  Going back to my 5500-file /bin folder on my home machine, it would take some legitimate effort to sort that cluttered heap of random words into categories that help me, the user and administrator, know what I can do with my own damn machine.  Making that collection easier to manage is virtuous in itself.

Where Does This Leave Us?

So I've described application folders.  What about this is really going beyond Unix's philosophy?  If anything, we're depending more on the “everything is a file” philosophy, which you could argue is more of a back-to-your-roots movement.  Well, let's go back and look at what I've said so far.

Suppose we have an ideal Linux box or similar, based on the above.  Instead of having a mixed /bin directory, it is split into /cmd, /app, and /svc folders.  The command directory is a dynamic collection of functions and commands that have been installed separately, or exposed by currently running apps and services.  It does not contain the entry points for applications and services; those are split into their respective directories, but those entry points can still be collected automatically.

The application and service directories contain all applications and services installed at the system level, as opposed to installed in user storage.  It contains distribution folders for each of these applications, and under that distribution folder, you will also find a list of currently running instances of the application, if any.  (Alternately, they could be neighbors, eg, AppFolder and AppFolder.ProcID, though that sounds sloppy to me.  Don't worry, MAD does this part differently, but we aren't talking about that for now.)

Given everything I've said, the /svc folder becomes a list of all system services that have been installed, and a list of all services currently running, and the configuration of those services, and each service provides a collection of commands that exist specifically to interact with that service, and each service may provide direct exports that you interact with, such as sockets, and each service points to any system resources that they are consuming, and each service can collect a list of any applications currently using this service's resources.

Thus for example, if the port management I talked about before was run by a service, its live folder would contain a collection of all currently opened ports on a given net interface, and would contain a link pointing to which net interface device it is regulating, among other bits of information.  Each of the opened ports in the collection would be a filesystem link to the live folder of the system service or application that has requested that port.  Because it links to a live folder, if you wanted to kill the process currently managing a specific port, you could very simply kill /svc/net.eth0/port/80, and the kill command would have no trouble getting its process ID implicitly.  And if you want to know who's using that socket or port, the app using the port should maintain a list of open connections in its live folder - because that is a portion of the application state that can and probably should be represented in the filesystem, once you've gotten all the rest of this up and running.

Oh… and as a nice side effect, you get a system collection of applications.  Almost seems like an afterthought now, doesn't it?  Even though distributed applications are the whole reason the entire system was invented…

Anyway, unless I grossly misunderstand the Unix specifications, all of this goes well beyond them, and it goes beyond Linux and other operating systems that I've heard of.  It still inherits some of the general attitude of Unix; I'm not looking to make it more like Windows, for example.  But it goes well beyond Unix, which is all that I claimed in the post title.

Okay now, take a deep breath.  Almost done with this one, I promise.

If the above were to be read by any serious veterans in the operating system design space, I suspect they would say among other things that app folders are all unnecessary and inefficient.  Even disregarding the, “What we already have, works” argument (which is kind of a lame argument, even when it is totally correct)… they still have a point.  Reconfiguring an entire Linux or other *nix based distribution to run on top of this service-collection-folder instead of non-filesystem-based services would be a massive undertaking, and there are absolute loads of questions that would need to be answered about how all the standards around these services should be set up: how app distro folders and live folders work, how the configuration mechanisms work, how drivers work - all of that before you start talking about what exports specific services should have, or whether and how the underlying architecture of the system should be reworked to use ports less and direct sockets more, plus more and more questions that bubble up as we move forward.  And trust me, everything I've said in this blog post - everything - is simplified, with pieces left off because I don't want to overload anyone, or trip over my own metaphorical tongue any more than necessary.

Honestly, you could multiply the whole collection of questions that are rattling around in your head by ten or a hundred and probably still not come close to all the various problems and questions involved in understanding and/or creating Project MAD.  The MAD App Folder framework is itself just a fraction of the Agentic Distributed App model, which itself is a fraction of the operating system, which was always meant to be attached to a whole hardware project that will probably never come to be - but that hardware project informs the software design and the operating system.  It is the nominal model, the stress test; it is important, not something I intend to just set aside or forget about.

When I say that Project MAD is a tangled mess of thoughts in my head, that is what I'm talking about.  It is “Take this problem that should probably take an entire community to sort out, multiply it by a hundred, cram it into my mind, and then try to reach in with chopsticks to untangle and remove it piece by piece” levels of tangled.  Again: the whole point of Application Folders was to help create distributed applications, but by golly gee whiz it sure would really make it easier to keep track of system services in a Linux box.  And again, this entire blog post is a trimmed down version of the application folders framework.  It's meant to be integrated into the directory and used alongside other concepts.

Weird how when you have a good idea, you start seeing uses for it all over the place, even though many of those “good ideas” won't be when you actually test them.  And yes - some parts of this won't be, and goodness knows I can't tell which from here.  But I think that it's a worthy idea that needs exploring, even if the MAD/AF framework is the only part of the system you port out.  Maybe it wouldn't be the best single piece, out of all the pieces that you could chop off and use separately… but you could.  And it would be interesting.

No comments:

Post a Comment