The 'Unix Way' Has a Right Way That's Almost a Lost Way


Команда форума
I've often extolled the philosophy of Unix, and as the title implies, I'm not about to stop. Before I learned computer science, I thought all computers were impenetrably arcane. But when I grasped Unix, through the imperfect medium of Linux, it made intuitive sense to me. Through all its evolution, at its heart Unix retains the charm that I have previously remarked on.

To touch on one such trait that is relevant to the point I want to make, I love that Unix's simplest tools are also its most versatile. This is because its creators believed that a handful of default tools should allow users to do anything imaginable. To that end, Unix's brain-parents also ensured effortless interoperation via the common interface of textual data. All these design choices consciously facilitated user freedom.

But this bears an important caveat: freedom has reasonable implied limits. Philosophies with the cardinal virtue of liberating adherents can never afford to shed all limitations for the simple fact that philosophies without doctrines are self-effacing. A philosophy, by existing, defines what it is and thus implicitly delineates what it is not.

This is what I call the "Daoism Paradox." Without getting too esoteric, Daoist philosophy holds that all is a perfectly effortless way. Existence is as it must be. In fact, its nature is so all-encompassing that defining it is impossible. So how does Daoism express this if expressing Daoism is impossible?

So where am I going with this, exactly?

Unix deliberately lets you go whatever way you want, yes, but some ways you shouldn't want to go. As I study tech sector innovations, I see signs that the old traditions are fading. I'm not one to sanctify tradition for tradition's sake, but I see merit in maintaining a traditional approach to computing tasks that encourages shrewdness.

To illustrate what I mean, these are some ways we are straying from the Unix way, and my view on why we should return to the path.

Use 'Cats' Responsibly​

The 'cat' command is a candidate poster child for elegantly simple Unix tools. It does one thing, dump out the contents of all passed arguments, and does it brilliantly. However, useful as any tool is, it can be overused. Few commands are so egregiously wronged by overuse than 'cat'. There's even a name for it: useless use of cat (UUOC).

The idea is that you shouldn't use 'cat' every time you need to operate on the contents of a file because many tools can read file contents themselves. The most common example of this faux pas is running 'cat' on a file and then piping that into 'grep' to do pattern searching. This is unnecessary because 'grep' can already read files: simply pass the file as an additional argument after the regular expression. So you should opt for the second of these two commands over the first.

$ cat file | grep regex

$ grep regex file

Why does this matter? Fewer commands mean less CPU cycles. True, this doesn't usually matter, but if you fall into this pattern, you'll pay for it when it does.

More importantly, violating UUOC entrenches the bad habit of not fully understanding a tool before combining it with another. If you didn't use 'grep' to read the file, it shows you didn't adequately learn 'grep'.

This isn't an academic exercise. In a tutorial for a widely deployed Web service that frequently involves Linux configuration, the author instructs users to pipe 'cat' into 'grep' in this exact way. To those readers who don't know better, they ingest a bad habit without recognizing it as such.

Verify the Target Before Going in for the Kill​

Things don't always go as planned. The more things going, the more they go awry. Suffice it to say, computers have a lot going on.

The Unix kill signals are still the gold standard for stopping frozen programs dead in their tracks. The main command that transmits these signals, 'kill', is not at issue, but the terrifyingly convenient 'pkill'. In best Unix practice, one looks up a program's process ID (PID) and then passes that to 'kill'. For instance, if your browser is frozen, list your processes with the following command and find your browser's entry.

$ ps aux

Let's say your browser's PID is 5000. We then hand that to 'kill' and it carries out the hit.

$ kill 5000

The popular way to do this now is to just give the name of the program to 'pkill' and let it figure out who to off.

$ pkill program

But what happens if you mistype the program name? What if another running program calls it (or a program containing that name)? It's easy to kill the wrong process when specifying it purely by name. You wouldn't tell a U.S. mail carrier to deliver a package to "John Smith" because who knows where it would end up? You shouldn't use 'pkill' this way for the same reason.


When All Is 'Sed' and Done​

Another puzzling practice I saw while reviewing this same Web service guide was that of editing configuration files using 'sed'. I've got nothing against 'sed'. Quite the contrary, it's my go-to tool for text transformation. However, it's not something to rely on for software configuration.

I'm not about to say with 100 percent conviction that using 'sed' to tweak configurations is un-Unix, but it is unwise. When editing configuration files, it is usually safer to do so manually with a text editor. There are a few reasons for this.

First, it's easier for you to review your changes, since you're in the file, altered lines within view. When 'sed' executes, there's no output, so you can't immediately verify what changed.

Second, developers occasionally retool the format of configuration files, or change the options enabled and commented out by default, in a program's software updates. When you apply a 'sed' command to a configuration file, because it specifies what to find and what to replace it with, you are making assumptions about the contents of that file.

I fear adept use of 'sed' is becoming a lost art because it doesn't always seem like tutorial authors fully perceive possible edge cases for their commands. For instance, I see tons of materials using the "g" flag when it isn't necessarily appropriate or a good idea.

To give the bumpiest of 'sed' crash courses, 'sed' requires a script string and some text to operate on. There's a lot 'sed' can do, but a common use is finding and replacing, which is done with a script in this format.

sed 's/find_exp/replace_lit/'

This finds every line with the regular expression find_exp and replaces the first occurrence of that expression in the line with the literal replace_lit.

Appending "g" after the trailing slash directs 'sed' to replace all occurrences of find_exp with replace_lit on any line that contains find_exp.

The 'sed' configuration file editing commands I've seen use the "g" flag, despite the fact that one may not want to change all instances in a line and shouldn't assume that there is only one instance per line. When I first learned 'sed', I learned to append "g" as "just the way it's done," but although I know better now, there's no way I'm the only one to internalize that faulty assumption.

Keeping the Unix Way from the Dodo's Way​

If I've done my job, I hope I've convinced you that at least these practices aren't the product of stodgy custom, but intentional focus on optimal outcomes. Abstract as it can be at times, the Unix way is the best one I know of for elevating one's command of such systems, programming efficiency, or applying any principles of computer science practically.