I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL. It would be much more flexible and extensible.
AFAIK, originally the reason why they were made as programs was that the available programming languages were too cumbersome for such use. Now we have plenty of experience making lightweight scripting languages that are pleasant to use in a live environment (vs premade script), so why give up the flexibility of ad-hoc scripting?
what if it were written as a library, with the traditional cli implementation as a thin layer over it?
I'm thinking about the wider FOSS ecosystem, for example if Firefox was built as a gui gluing together a modular collection of libraries people could do all sort of cool things with them.
Monolithic applications make sense for proprietary software, not so much for FOSS.
You're proposing abandoning the Unix/Posix standards and philosophy in favor of an untested strategy.
"Move fast and break things" makes sense for acquiring investors, not so much for core infrastructure that everyone everywhere depends on.
This could work, but you'd need at least a decade of widespread usage to work out all the problems before it would even be worth considering for core infrastructure tools.
Why would that be relevant if the thin wrapper is fully compliant with the POSIX/GNU standards?
Busybox with it's single binary, depend-on-arg[0]-hack implantation was at one point an "untested strategy", yet look at it now. Rust's Coreutils need to offer a real uvp if they want to see real adoption. Providing coreutil as a library and not forking processes would certainly qualify as that.
If stability is a concern, exposing a greater surface for user interaction would surely slow down development as these interfaces would have to be reworked with care.
If stability is not a concern, then any user tools built upon these interfaces would be subject to breakage at the rate of upstream development. That’s got to be frustrating.
One of the nice things about the POSIX command‐line interface is that the build systems that interact with them know what to expect, because the interface has been much the same for a very long time, while still providing hugely useful capability.
As stable as, say, golang's standard library.
Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.
In the case of coreutils, the problem space is fairly simple and well-understood, so it should be quite easy to commit to a stable interface. Even for something exceptionally complex like a web browser, I'd expect most components to be easily kept backwards-compatible in terms of public api.
> As stable as, say, golang's standard library. Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.
That's actually far less stable than is needed for core utils.
> what if it were written as a library, with the traditional cli implementation as a thin layer over it?
That's kind of the way it is. Most of the core utils are thin wrappers around C libraries.
- - - -
It sounds like you're thinking of things like the Oberon OS, where there were no separate applications, instead the system was extended by adding new commands to a unitary GUI. Or the Canon Cat.
nope. it's still a traditional binary meant to be used as traditional binaries from the posix shell. What I mean is, replace both the binaries and the shell with a library equivalent of coreutils running from a REPL.
Sometimes, hitting those roadblocks leads to a better solution.
Maybe the new model is slower, and somebody looks into it, and realizes if they add a caching layer between the "REPL" module and the kernel ioctl, or service orwhatever, it will speed things up.
I run find and grep lot. And I'm sure the kernel caches a lot of the FS stuff, but there are higher-level things that could be cached and shared with other "REPL" modules. Like predictive URL middleware in browsers. Pluggable middleware that can be enabled or disabled.
Available now on the OS module store:
Larry's Grep Count Document Prefetch Module. Certified Safe by BlahCorp.
This isn't a new idea, and I'm sure others have had it before me.
it's not just about forking processes. Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.
> Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.
Maybe I'm missing something here (it's been a long time since I last looked at the busybox code), but isn't busybox a single file that has a lot of atomic functions that callers can mix and swap as needed, using the shell as a REPL?
IIRC, and please correct me if I am wrong), all those little functions in busybox are simply single functions. There's a `cat` function, and a `head` function, and a `cp` function, etc.
I don't see what can be gained by moving them into a library file, and using the shell to call those functions, instead of leaving them in the shell program and calling them.
which is not a bash builtin (on Mac or Linux); use type instead:
$ type echo
echo is a shell builtin
$ type cat
cat is /bin/cat
$ type which
which is /usr/bin/which
$ alias a=true
$ type a
a is aliased to `true'
$ function f { true; }
$ type f
f is a function
f ()
{
true
}
Incidentally, zsh, the current default Mac shell, has both type and which as internal commands, with different output:
% which echo
echo: shell built-in command
% type echo
echo is a shell builtin
% which cat
/bin/cat
% type cat
cat is /bin/cat
% which which
which: shell built-in command
% type which
which is a shell builtin
% alias a=true
% which a
a: aliased to true
% type a
a is an alias for true
% function f { true; }
% which f
f () {
true
}
% type f
f is a shell function
Note that, on zsh, the "native" command is actually whence; which and type are equivalent to "whence -c" and "whence -v", where
% man -W zshbuiltins \
| xargs groff -Tutf8 -mandoc -P -cbdu \
| awk '
/^ [^ ]/ { out = 0 }
/^ whence / { out = 1 }
{ if (out) print }
'
whence [ -vcwfpamsS ] [ -x num ] name ...
For each name, indicate how it would be interpreted if used as a
command name.
If name is not an alias, built-in command, external command,
shell function, hashed command, or a reserved word, the exit
status shall be non-zero, and -- if -v, -c, or -w was passed --
a message will be written to standard output. (This is differ‐
ent from other shells that write that message to standard er‐
ror.)
whence is most useful when name is only the last path component
of a command, i.e. does not include a `/'; in particular, pat‐
tern matching only succeeds if just the non-directory component
of the command is passed.
-v Produce a more verbose report.
-c Print the results in a csh-like format. This takes
precedence over -v.
-w For each name, print `name: word' where word is one of
alias, builtin, command, function, hashed, reserved or
none, according as name corresponds to an alias, a
built-in command, an external command, a shell function,
a command defined with the hash builtin, a reserved word,
or is not recognised. This takes precedence over -v and
-c.
-f Causes the contents of a shell function to be displayed,
which would otherwise not happen unless the -c flag were
used.
-p Do a path search for name even if it is an alias, re‐
served word, shell function or builtin.
-a Do a search for all occurrences of name throughout the
command path. Normally only the first occurrence is
printed.
-m The arguments are taken as patterns (pattern characters
should be quoted), and the information is displayed for
each command matching one of these patterns.
-s If a pathname contains symlinks, print the symlink-free
pathname as well.
-S As -s, but if the pathname had to be resolved by follow‐
ing multiple symlinks, the intermediate steps are
printed, too. The symlink resolved at each step might be
anywhere in the path.
-x num Expand tabs when outputting shell functions using the -c
option. This has the same effect as the -x option to the
functions builtin.
Finally, note that the bash type command also has many options,
$ info bash -n 'Bash Builtins' \
> | awk "
> /^'/ { out = 0 }
> /^'type'/ { out = 1 }
> { if (out) print }
> "
'type'
type [-afptP] [NAME ...]
For each NAME, indicate how it would be interpreted if used as a
command name.
If the '-t' option is used, 'type' prints a single word which is
one of 'alias', 'function', 'builtin', 'file' or 'keyword', if NAME
is an alias, shell function, shell builtin, disk file, or shell
reserved word, respectively. If the NAME is not found, then
nothing is printed, and 'type' returns a failure status.
If the '-p' option is used, 'type' either returns the name of the
disk file that would be executed, or nothing if '-t' would not
return 'file'.
The '-P' option forces a path search for each NAME, even if '-t'
would not return 'file'.
If a command is hashed, '-p' and '-P' print the hashed value, which
is not necessarily the file that appears first in '$PATH'.
If the '-a' option is used, 'type' returns all of the places that
contain an executable named FILE. This includes aliases and
functions, if and only if the '-p' option is not also used.
If the '-f' option is used, 'type' does not attempt to find shell
functions, as with the 'command' builtin.
The return status is zero if all of the NAMEs are found, non-zero
if any are not found.
because most of the coreutil functionality is already availible in libraries of most languages. Article mentions that there are crates for the logic. The hard part is command line parsing and output formatting, and your library should have neither of those.
I've seen plenty of shell scripts rewritten in Python because they grew too big, and most of the time coreutil commands just get replaced with standard library calls. There are exceptions (like sorting files which do not fit in memory) but otherwise standard library is good enough
The problem is POSIX. It says operating systems must have mv, cp and all that stuff. This is the reason why people say Linux is not an operating system.
> I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL.
Funny you mention that. I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls. It's been a huge challenge trying to bootstrap and get the garbage collector working without any libc support, still haven't cracked it.
Languages like Python and Ruby already have system call capabilities. You can literally do anything with those calls. So this already exists in some form, albeit not in the extreme form I envisioned.
> I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls.
Are you building something similar to babashka? Would you be able to figure out what they did with babashka to figure out what you've been unable to do, or are you challenging yourself?
Thanks, that's a nice project I didn't know about! Always happy to see more projects along these lines!! I'm not sure to what extent it permits systems programming though. I searched the repository for common system calls like mmap and didn't find anything. I assume it links to either libc or JVM.
I suppose I'm challenging myself. What I had in mind is much lower level: a Lisp thing where I can use the Linux system calls directly. It's gonna look like this:
; mmap some memory
(set memory (mmap 0 4096 '(read write) '(private anonymous) -1 0))
; query the kernel for some data
; terminal size for example
; have the kernel put the data at the start of that memory
(ioctl 1 'TIOCGWINSZ memory)
; memory now points to a struct winsize
; decode the 4 unsigned shorts
; first two unsigned shorts are the terminal's rows and columns
The language runtime is completely freestanding: it doesn't link to any library at all, not even libc. I made it so eval supports a special system-call function which executes a Linux system call from C, and I want to build literally everything else on top of that. I want to be able to run strace on any coreutils binaries, see what system calls they make and then implement the same thing on top of the system-call primitive. It should be possible to make a coreutils module that contains an mv function, for example.
; boils down to:
; (renameat2 'fd-cwd "file" 'fd-cwd "renamed" 'no-replace)
(mv "file" "renamed")
I had to use static allocation to pre-allocate a stack of Lisp cells when the process is loaded just to get it to evaluate at all. Now I'm trying to get the garbage collector to work so I can get it to bootstrap to a point where it can allocate memory, read files and load more code. I wish I had something real to show for all this effort but right now it's not real yet.
$ txr -i winsz.tl
TXR doesn't really whip the llama's ass so much as the lambda's.
1> (let ((ws (new winsize)))
(ioctl-winsz 0 TIOCGWINSZ ws)
ws)
#S(winsize ws-row 37 ws-col 80 ws-xpixel 0 ws-ypixel 0)
I'd like to have system call support as a feature built right into eval. Maybe a JIT compiler that emits code with Linux system call calling conventions whenever eval encounters a (system-call ...) form.
Basically all/most Common Lisp implementations have a foreign function interface. Those who run on some UNIX or Linux need support for low level access.
See for example the SBCL sources for mmap and ioctl.
AFAIK, originally the reason why they were made as programs was that the available programming languages were too cumbersome for such use. Now we have plenty of experience making lightweight scripting languages that are pleasant to use in a live environment (vs premade script), so why give up the flexibility of ad-hoc scripting?