I wonder why not reimplement coreutils as library functions, to be used within a...

pdpi · on Feb 10, 2023

The goal here is to provide a safer implementation of the core tools you need for a Unix system. The nature of these things is set in stone.

You’re describing an interesting project, but it’s not _this_ project.

unixgoddess · on Feb 10, 2023

what if it were written as a library, with the traditional cli implementation as a thin layer over it?

I'm thinking about the wider FOSS ecosystem, for example if Firefox was built as a gui gluing together a modular collection of libraries people could do all sort of cool things with them.

Monolithic applications make sense for proprietary software, not so much for FOSS.

kerkeslager · on Feb 10, 2023

You're proposing abandoning the Unix/Posix standards and philosophy in favor of an untested strategy.

"Move fast and break things" makes sense for acquiring investors, not so much for core infrastructure that everyone everywhere depends on.

This could work, but you'd need at least a decade of widespread usage to work out all the problems before it would even be worth considering for core infrastructure tools.

Zuiii · on Feb 11, 2023

Why would that be relevant if the thin wrapper is fully compliant with the POSIX/GNU standards?

Busybox with it's single binary, depend-on-arg[0]-hack implantation was at one point an "untested strategy", yet look at it now. Rust's Coreutils need to offer a real uvp if they want to see real adoption. Providing coreutil as a library and not forking processes would certainly qualify as that.

bentley · on Feb 10, 2023

How stable would this interface be?

If stability is a concern, exposing a greater surface for user interaction would surely slow down development as these interfaces would have to be reworked with care.

If stability is not a concern, then any user tools built upon these interfaces would be subject to breakage at the rate of upstream development. That’s got to be frustrating.

One of the nice things about the POSIX command‐line interface is that the build systems that interact with them know what to expect, because the interface has been much the same for a very long time, while still providing hugely useful capability.

unixgoddess · on Feb 10, 2023

As stable as, say, golang's standard library. Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.

In the case of coreutils, the problem space is fairly simple and well-understood, so it should be quite easy to commit to a stable interface. Even for something exceptionally complex like a web browser, I'd expect most components to be easily kept backwards-compatible in terms of public api.

kerkeslager · on Feb 10, 2023

> As stable as, say, golang's standard library. Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.

That's actually far less stable than is needed for core utils.

carapace · on Feb 10, 2023

> what if it were written as a library, with the traditional cli implementation as a thin layer over it?

That's kind of the way it is. Most of the core utils are thin wrappers around C libraries.

- - - -

It sounds like you're thinking of things like the Oberon OS, where there were no separate applications, instead the system was extended by adding new commands to a unitary GUI. Or the Canon Cat.

seanhunter · on Feb 10, 2023

This particular implementation uses various libraries (crates in rust) already which basically do the sort of thing you're looking for.

i.e. This is the thing you want it's just not one library it's a collection of libraries and a collection clis which use those libraries.

ahepp · on Feb 10, 2023

Isn't what you're describing basically a unix "shell" like bash, csh, zsh, etc?

I agree that something like bash leaves a lot to be desired in REPL functionality. Its ubiquity is convenient, however.

westurner · on Feb 10, 2023

BusyBox has Ash sh and a number of other binaries all compiled into a multiply-symlinked executable.

BusyBox and Ash (and Bash) in Rust would be neat. IDK that docstring parity would be a good thing?

There's also RustPython.

lelanthran · on Feb 10, 2023

> I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL. It would be much more flexible and extensible.

Doesn't busybox do something similar - (almost) everything is in a single binary.

unixgoddess · on Feb 10, 2023

nope. it's still a traditional binary meant to be used as traditional binaries from the posix shell. What I mean is, replace both the binaries and the shell with a library equivalent of coreutils running from a REPL.

ricardobeat · on Feb 10, 2023

What's the difference then? Especially if your REPL uses a shell-inspired scripting language.

LeonidasXIV · on Feb 10, 2023

The difference would be that you wouldn't fork processes for `cat` etc, instead treat them as builtins (like `echo` or `which`).

dmurray · on Feb 10, 2023

What's the advantage? Forking processes is more heavyweight than a typical function call, but that's hardly a concern if the use case is a REPL.

lelanthran · on Feb 10, 2023

> What's the advantage? Forking processes is more heavyweight than a typical function call, but that's hardly a concern if the use case is a REPL.

I was wondering the same thing, to be honest - if you're in a REPL, does it matter that it takes 200ms to "call" a function than if it takes 20ms?

jgerrish · on Feb 10, 2023

Sometimes, hitting those roadblocks leads to a better solution.

Maybe the new model is slower, and somebody looks into it, and realizes if they add a caching layer between the "REPL" module and the kernel ioctl, or service orwhatever, it will speed things up.

I run find and grep lot. And I'm sure the kernel caches a lot of the FS stuff, but there are higher-level things that could be cached and shared with other "REPL" modules. Like predictive URL middleware in browsers. Pluggable middleware that can be enabled or disabled.

Available now on the OS module store:

Larry's Grep Count Document Prefetch Module. Certified Safe by BlahCorp.

This isn't a new idea, and I'm sure others have had it before me.

jgerrish · on Feb 10, 2023

That's actually pretty cool.

My post was a bit cynical, but the network effects would make it. The community would make it.

If it ever happened, I hope the contributors have fun.

unixgoddess · on Feb 10, 2023

it's not just about forking processes. Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.

lelanthran · on Feb 10, 2023

> Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.

Maybe I'm missing something here (it's been a long time since I last looked at the busybox code), but isn't busybox a single file that has a lot of atomic functions that callers can mix and swap as needed, using the shell as a REPL?

IIRC, and please correct me if I am wrong), all those little functions in busybox are simply single functions. There's a `cat` function, and a `head` function, and a `cp` function, etc.

I don't see what can be gained by moving them into a library file, and using the shell to call those functions, instead of leaving them in the shell program and calling them.

chrisjc · on Feb 10, 2023

I use Linux with ignorance.

But how do you know what is part of the shell vs whatever `cat` is (system/kernel function)?

macOS (prob diff from Linux obviously since based on BSD):

    $which echo
    /bin/echo
    $ which cat
    /bin/cat
    $ which which
    /usr/bin/which

It's these kinds of threads that I learn so much.

jasomill · on Feb 10, 2023

which is not a bash builtin (on Mac or Linux); use type instead:

   $ type echo
   echo is a shell builtin
   $ type cat
   cat is /bin/cat
   $ type which
   which is /usr/bin/which
   $ alias a=true
   $ type a
   a is aliased to `true'
   $ function f { true; }
   $ type f
   f is a function
   f () 
   { 
       true
   }

Incidentally, zsh, the current default Mac shell, has both type and which as internal commands, with different output:

    % which echo
    echo: shell built-in command
    % type echo
    echo is a shell builtin    
    % which cat
    /bin/cat
    % type cat
    cat is /bin/cat
    % which which
    which: shell built-in command
    % type which
    which is a shell builtin
    % alias a=true
    % which a
    a: aliased to true
    % type a
    a is an alias for true
    % function f { true; }
    % which f
    f () {
     true
    }
    % type f
    f is a shell function

Note that, on zsh, the "native" command is actually whence; which and type are equivalent to "whence -c" and "whence -v", where

    % man -W zshbuiltins \
      | xargs groff -Tutf8 -mandoc -P -cbdu \
      | awk '
          /^       [^ ]/ { out = 0 }
          /^       whence / { out = 1 }
          { if (out) print }
        '
           whence [ -vcwfpamsS ] [ -x num ] name ...
                  For each name, indicate how it would be interpreted if used as a
                  command name.
    
                  If name is not an alias,  built-in  command,  external  command,
                  shell  function,  hashed  command,  or a reserved word, the exit
                  status shall be non-zero, and -- if -v, -c, or -w was passed  --
                  a  message will be written to standard output.  (This is differ‐
                  ent from other shells that write that message  to  standard  er‐
                  ror.)
    
                  whence  is most useful when name is only the last path component
                  of a command, i.e. does not include a `/'; in  particular,  pat‐
                  tern  matching only succeeds if just the non-directory component
                  of the command is passed.
    
                  -v     Produce a more verbose report.
    
                  -c     Print the results  in  a  csh-like  format.   This  takes
                         precedence over -v.
    
                  -w     For  each  name,  print `name: word' where word is one of
                         alias, builtin, command, function,  hashed,  reserved  or
                         none,  according  as  name  corresponds  to  an  alias, a
                         built-in command, an external command, a shell  function,
                         a command defined with the hash builtin, a reserved word,
                         or is not recognised.  This takes precedence over -v  and
                         -c.
    
                  -f     Causes  the contents of a shell function to be displayed,
                         which would otherwise not happen unless the -c flag  were
                         used.
    
                  -p     Do  a  path  search  for name even if it is an alias, re‐
                         served word, shell function or builtin.
    
                  -a     Do a search for all occurrences of  name  throughout  the
                         command  path.   Normally  only  the  first occurrence is
                         printed.
    
                  -m     The arguments are taken as patterns  (pattern  characters
                         should  be  quoted), and the information is displayed for
                         each command matching one of these patterns.
    
                  -s     If a pathname contains symlinks, print  the  symlink-free
                         pathname as well.
    
                  -S     As  -s, but if the pathname had to be resolved by follow‐
                         ing  multiple  symlinks,  the  intermediate   steps   are
                         printed, too.  The symlink resolved at each step might be
                         anywhere in the path.
    
                  -x num Expand tabs when outputting shell functions using the  -c
                         option.  This has the same effect as the -x option to the
                         functions builtin.

Finally, note that the bash type command also has many options,

    $ info bash -n 'Bash Builtins' \
    >   | awk "
    >       /^'/ { out = 0 }
    >       /^'type'/ { out = 1 }
    >       { if (out) print }
    >     "
    'type'
              type [-afptP] [NAME ...]
    
         For each NAME, indicate how it would be interpreted if used as a
         command name.
    
         If the '-t' option is used, 'type' prints a single word which is
         one of 'alias', 'function', 'builtin', 'file' or 'keyword', if NAME
         is an alias, shell function, shell builtin, disk file, or shell
         reserved word, respectively.  If the NAME is not found, then
         nothing is printed, and 'type' returns a failure status.
    
         If the '-p' option is used, 'type' either returns the name of the
         disk file that would be executed, or nothing if '-t' would not
         return 'file'.
    
         The '-P' option forces a path search for each NAME, even if '-t'
         would not return 'file'.
    
         If a command is hashed, '-p' and '-P' print the hashed value, which
         is not necessarily the file that appears first in '$PATH'.
    
         If the '-a' option is used, 'type' returns all of the places that
         contain an executable named FILE.  This includes aliases and
         functions, if and only if the '-p' option is not also used.
    
         If the '-f' option is used, 'type' does not attempt to find shell
         functions, as with the 'command' builtin.
    
         The return status is zero if all of the NAMEs are found, non-zero
         if any are not found.

deadly_syn · on Feb 10, 2023

Echo is a builtin in bash.

https://www.gnu.org/software/bash/manual/html_node/Bash-Buil...

kasabali · on Feb 10, 2023

Busybox can do that: https://unix.stackexchange.com/a/274322

nibbleshifter · on Feb 10, 2023

I wish I'd known this a few years ago, when I wrote a fucking disgusting wrapper for busybox to do basically the same thing.

Maybe I'll revisit that project.

theamk · on Feb 10, 2023

because most of the coreutil functionality is already availible in libraries of most languages. Article mentions that there are crates for the logic. The hard part is command line parsing and output formatting, and your library should have neither of those.

I've seen plenty of shell scripts rewritten in Python because they grew too big, and most of the time coreutil commands just get replaced with standard library calls. There are exceptions (like sorting files which do not fit in memory) but otherwise standard library is good enough

matheusmoreira · on Feb 10, 2023

The problem is POSIX. It says operating systems must have mv, cp and all that stuff. This is the reason why people say Linux is not an operating system.

> I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL.

Funny you mention that. I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls. It's been a huge challenge trying to bootstrap and get the garbage collector working without any libc support, still haven't cracked it.

Languages like Python and Ruby already have system call capabilities. You can literally do anything with those calls. So this already exists in some form, albeit not in the extreme form I envisioned.

chrisjc · on Feb 10, 2023

> I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls.

Are you building something similar to babashka? Would you be able to figure out what they did with babashka to figure out what you've been unable to do, or are you challenging yourself?

https://github.com/babashka/babashka

matheusmoreira · on Feb 10, 2023

Thanks, that's a nice project I didn't know about! Always happy to see more projects along these lines!! I'm not sure to what extent it permits systems programming though. I searched the repository for common system calls like mmap and didn't find anything. I assume it links to either libc or JVM.

I suppose I'm challenging myself. What I had in mind is much lower level: a Lisp thing where I can use the Linux system calls directly. It's gonna look like this:

  ; mmap some memory
  (set memory (mmap 0 4096 '(read write) '(private anonymous) -1 0))

  ; query the kernel for some data
  ; terminal size for example
  ; have the kernel put the data at the start of that memory
  (ioctl 1 'TIOCGWINSZ memory)

  ; memory now points to a struct winsize
  ; decode the 4 unsigned shorts
  ; first two unsigned shorts are the terminal's rows and columns

The language runtime is completely freestanding: it doesn't link to any library at all, not even libc. I made it so eval supports a special system-call function which executes a Linux system call from C, and I want to build literally everything else on top of that. I want to be able to run strace on any coreutils binaries, see what system calls they make and then implement the same thing on top of the system-call primitive. It should be possible to make a coreutils module that contains an mv function, for example.

  ; boils down to:
  ; (renameat2 'fd-cwd "file" 'fd-cwd "renamed" 'no-replace)
  (mv "file" "renamed")

I had to use static allocation to pre-allocate a stack of Lisp cells when the process is loaded just to get it to evaluate at all. Now I'm trying to get the garbage collector to work so I can get it to bootstrap to a point where it can allocate memory, read files and load more code. I wish I had something real to show for all this effort but right now it's not real yet.

kazinator · on Feb 11, 2023

TXR Lisp:

  (defvarl TIOCGWINSZ #x5413)

  (typedef winsize (struct winsize
                     (ws-row ushort)
                     (ws-col ushort)
                     (ws-xpixel ushort)
                     (ws-ypixel ushort)))

  (with-dyn-lib nil
    (deffi ioctl-winsz "ioctl" int (int ulong : (ptr-out winsize))))

Then:

  $ txr -i winsz.tl 
  TXR doesn't really whip the llama's ass so much as the lambda's.
  1> (let ((ws (new winsize)))
       (ioctl-winsz 0 TIOCGWINSZ ws)
       ws)
  #S(winsize ws-row 37 ws-col 80 ws-xpixel 0 ws-ypixel 0)

matheusmoreira · on Feb 11, 2023

That's amazing! I'll definifely look at it for inspiration.

Am I correct in assuming that it depends on the C library for its system call support? The code seems to be a Lisp equivalent of:

  self  = dlopen(NULL, flags);
  ioctl = dlsym(self, "ioctl"):

I'd like to have system call support as a feature built right into eval. Maybe a JIT compiler that emits code with Linux system call calling conventions whenever eval encounters a (system-call ...) form.

Do you know more low level Lisp systems?

_19qg · on Feb 11, 2023

Basically all/most Common Lisp implementations have a foreign function interface. Those who run on some UNIX or Linux need support for low level access.

See for example the SBCL sources for mmap and ioctl.

cleanchit · on Feb 10, 2023

Inertia