If you really know your GNU coreutils, you probably don't need as many extra tools as you think.

r1w1s1@snac.bsd.cafe

KISS isn't just a design principle — it's already built into your system.

A comprehensive review of every coreutils command, with examples and honest opinions. The lobste.rs discussion is also worth reading.

Article: https://ratfactor.com/slackware/pkgblog/coreutils
Discussion: https://lobste.rs/s/xqf5ex/coreutils_comprehensive_review_2023

#linux #unix #coreutils #gnu

restorante@social.linux.pizza

@r1w1s1 If I am not mistaken grep and sed are not parts of coreutils, isn't it?

I am not really sure whether these can be replaced with coreutils.

pgeorgi

On a tangent: For the specific task of (what's now called) word processing, there's a mighty fine book showing how the tools on a UNIX system can add up to something greater than the sum of the parts. See https://www.oreilly.com/openbook/utp/

r1w1s1@snac.bsd.cafe

True — but also interesting how some tools are rarely used or replaced by others (awk, diff, etc).
Knowing when to use coreutils is just as important as knowing them.

mason@partychickens.net

This post is deleted!

r1w1s1@snac.bsd.cafe

Slackware also uses another implementation.

https://mirrors.slackware.com/slackware/slackware64-current/source/a/hostname/

rozenglass@fedi.dreamscape.link

@r1w1s1@snac.bsd.cafe Thank you so much for this article, it was a really fun read, and I learned about csplit which is much nicer than the awk state machines I usually write to handle multi-line pattern-separated records, I wish csplit patterns can be multi-line though; it doesn't seem possible. I also use the paste command (e.g. paste - - -) to work around this if the multi-line record has a set number of lines. I found ptx fascinating, and I'm having lots of fun using it over all my writings and journals. Anyway, what follows are some random thoughts, comments, and ideas I had while reading the article

One aspect I would suggest looking into is the atomicity of mv and how useful it can be for writing reliable scripts or for deploying new versions of running services without down-time. Especially, `mv -T --exchange deployment-standby deployment-current' which would swap the two directories atomically. I think I will also be using it soon as part of a script to "garbage collect" millions of files, roughly something like:

1. mkdir store2

2. for each file we want to remain, we hardlink with ln store1/filename store2/

3. when all the files we want are hard-linked, mv -T --exchange store2 store1

4. rm -r store2

This way, we "garbage collect" all the files that we don't have references to in our list, atomically, deleting them without the store itself being "offline" for reads at any point, and without using any extra disk space for copies of the files. If the garbage collection process is interrupted at any point, store1 remains as is without any change.

Additionally, nproc --ignore=N is useful for Make build scripts, passing it to make -j, especially when using those make files on many computers with different numbers of cores, make -j $(nproc --ignore=2) for example, would guarantee that a build would use the maximum number of cores on that machine, leaving at least 2 free cores as to not overload the machine and deteriorate its service. Of course, nproc at least returns 1 if the machine has only 1 or 2 cores

printf is a lot more flexible than people think. I use it for making separators for example, combined with tr, like this:

printf '+%78s+\n' | tr ' ' -

Or, to pad and surround text for example, with the help of xargs, like this:

cat my.txt | xargs -L1 -d '\n' printf " | %-52s|\n"

Where my.txt is a pre-formatted text, hard-wrapped at 50 columns (I usually use par for this, instead of fmt, as it allows me to justify text, and is smarter about indention and line prefixes of various kinds, but none of those CLI formatting tools seems to handle multi-byte UTF-8 text).

seq is nice for doing loops with a specific number of iterations in a POSIX-y way. for i in $(seq 10); to loop 10 times. The form {1..10} is not POSIX sh compliant.

The shuf command is indeed fun, try the following for some random words; helps with brainstorming names and cool nonsense phrases sometimes, with:

shuf /usr/share/dict/words | head

I came up with those just now:

tight ambiguity
unsound lavender
despairing ravages
median ocean
delegate troll

The sleep command is nice for reminders, if you have a notification daemon. On Slackware I put /usr/lib64/xfce4/notifyd/xfce4-notifyd& in my .xinitrc to use it with my tiling window manager:

sleep 5m && notify-send -u critical 'The Tea' "It's boiling!"

sort is nice with du -sh, when cleaning up some disk
space, I use this shell function:

cdd () { cd "$1" && du -sh ./* | sort -h; }

Example:

cdd ~/Downloads/

120M ./palemoon
147M ./renpy-8.5.2-sdk.tar.bz2
173M ./slack-wallpapers-1.0.tar.gz
181M ./slack-wallpapers-deviantart-1.0.tar.gz

On a network with good bandwidth but very high packet loss, I used the split combined with lftp parallel downloads over sftp, to speed up downloads of big files greatly, as each file downloaded resets the TCP slow-start algorithm, thus starting with a full 15K of data. So, instead of the download speed grinding to a halt due to packet loss, the link was almost saturated instead. All it took was to split the big file into 14K chunks, and a new TCP connection for each of the chunks

I think your example of stdbuf doesn't work because you used echo, which ends with a newline, and newlines flush the output, processes exiting and closing the their pipe / their redirection also flushes, if I recall correctly. So, you need a process that doesn't exit, and doesn't newline, but keeps writing to the file. Apache logs may do that, but I noticed it with other things too, like netstat in continuous mode. In such cases, you may find the last line in the log partially missing, because some chunks of it haven't flushed yet. I've seen cases where newlines are not even used at all in the stream, so buffering was really annoying when you're trying to tail -f for debugging.

Anyway, thanks again for this interesting article, a lot of effort went into this, your site is now on my RSS feed

r1w1s1@snac.bsd.cafe

Thanks so much for the detailed comment, lots of great tips here!
The mv -T --exchange pattern for atomic GC is really elegant, I hadn't thought of using hardlinks that way. And the TCP slow-start trick with split + parallel downloads is wild

Just one quick clarification: the blog isn't mine, I just shared the link because I enjoyed the article. The author is Dave (ratfactor.com), and he's on Mastodon at @ratfactor@mastodon.art If you have corrections or things you'd like to see added to the article itself, reaching out to him there would be the best way.

But please keep the comments coming here, I'm learning a lot from them!

rozenglass@fedi.dreamscape.link

@ratfactor@mastodon.art @r1w1s1@snac.bsd.cafe ah, oops, thanks for the correction, and thanks for sharing anyway

BSD Cafe Billboard

If you really know your GNU coreutils, you probably don't need as many extra tools as you think.