Nerdbox Hack: April 2009

from http://billharlan.com/pub/papers/Bourne_shell_idioms.html

§ Text filtering commands

Administrating from scripts and the
command-line often benefit from pipes of text
filtering commands. Here are some that are
easy to overlook or forget.

mmencode converts to and from
base64 and "quoted-printable" formats for email.
Search for the metamail package.
Unfortunately, this has become hard to find.

Alternatively uuencode -m converts to base64,
and uudecode -m converts from base64.

Or decode and encode quoted-printable and
base64 with

perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::decode($_);'
perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::encode($_);'
perl -pe 'use MIME::Base64; $_=MIME::Base64::encode($_);'
perl -pe 'use MIME::Base64; $_=MIME::Base64::decode($_);'

uniq lets you remove duplicated lines
from a sorted file.

Count the number of times a given line
occurs with

sort | uniq -c | sort -n

Break one word per line with

perl -pe 's/\s+/\n/g'

Combine separate lines into a single line
of words with

paste -s -d" "

comm lets you suppress lines unique to
one or both of two files.

cat -s never prints more than one blank
line in a row.

Remove all blank lines with

perl -ne 'print if /\S/'

Print lines starting with one containing
FOO and ending with one containing BAR.

sed -n '/FOO/,/BAR/p'

merge and diff3 for merging changes
in files edited from a common ancestor.

fold breaks lines to proper width, and
fmt will reformat lines into paragraphs.

dirname and basename let you
extract the directory and filenames from a
full path to a file.

namei breaks a pathname into pieces and
follows symbolic links.

expand and col -x replace tabs by
spaces.

col -b removes backspaces from a file.

cat -v shows non-printing characters as
ascii escapes.

sed '1,10d' deletes the first 10
lines.

sed -n '3p' and sed -n '3{p;q}'

both print the third line, but the latter is
more efficient.

sed '/foo/q' truncates a file after
the line containing foo.

sed -ne '/foo/,/bar/p' prints
everything from the line containing foo
to the line containing bar.

Align space-delimited fields into orderly
columns with column -t.

Right justify queries with

printf "%40s" "Do you want to delete? [y/N] "

Convert dos text files to unix, and vice
versa:

dos2unix file.txt
unix2dos file.txt
tr -d \\r < win.txt > unix.txt  # if you can't find dos2unix
sed -e 's/$/\r/' < unix.txt > win.txt  # if you can't find unix2dos

cat -n and nl numbers lines.

Both of these perform string substitution,
but the latter allows more general regular
expressions:

sed -e 's/oldtext/newtext/g'
perl -pe 's/oldtext/newtext/g'

Here's how to replace double quotes by single
quotes for TeX:

< in.tex perl -pne 's%\B"\b%``%g' | 
  perl -pne "s%\b\"\B%''%g" > out.tex

Use iconv to convert between character
encodings.

Here are two ways to find string patterns
in a file:

grep 'pattern' filename [file] [< file]
perl -ne 'print if /pattern/' [file] [< file]

Print the first and third columns of each
line:

awk '{print $1,$3}'
perl -alne 'print "$F[0] $F[2]"'

Convert to lower-case:

tr '[A-Z]' '[a-z]'
perl -pe 'tr/[A-Z]/[a-z]/'
perl -pe '$_ =lc'

Simple character substitutions and
deletions may be simplest with tr.

tr -d '\n'  # delete newlines
tr '\n' '\0' # replace newlines by null characters.

$ echo 1-2a-3b | tr "[1-9]" "[2-9]" | tr '-' '_' | tr -d 'a'
2_3_4b

You can pipe into a loop with read.
Here is a complicated way to cat a text file,
piping in and out of a loop.

cat file | while read a; do echo "$a" ; done | cat

To read lines in pairs from two files try

paste file1 file2 | while read a b ; do echo "$a $b" ; done

Sort a list of dependencies with tsort.

Reverse lines with tac and words with
rev.

§ Files and directories

Select text (non-binary) files with
one of these

\ls | perl -lne 'print if -T'
perl -le 'for (glob "*") {print if -T }'
perl -le 'print for grep -T, <*>'

The perl algorithm for detecting text files
is very good.

To do something to files with goofy names,
including spaces and dashes, delimit the
files with null characters instead of
whitespace or newlines.

find -type f -print0 | xargs -r0 ls

See if a directory contains any accessible
contents with

  if ls -A "${dir}"/* >& /dev/null ; then echo "has files" ; else echo "no files" ; fi
[or]
  if [ "`ls -A ${dir}`" ]  ; then echo "has files" ; else echo "no files" ; fi

See if files of a certain type exist:

  if ls *.par >& /dev/null ; then echo "has par files" ; fi
[or]
  ls *.par >& /dev/null && echo "has pars" || echo "no pars"

readlink -f will fully resolve what a
symbolic link points to.

Find all bad symbolic links with

find . -type l |
  while read f ; do if ! readlink -f "$f" >&/dev/null
  then echo "$f" ; fi ; done

§ Variables

To see if a variable contains a regular
expression, combine if and grep. For
example to see if the name of a file begins
with a dot, try

 if echo "$filename" | grep '^[.]' >/dev/null 
 then echo yes ; else echo no ; fi

expr also has a support for limited
regular expressions.

if [ `expr "$filename" : '[.].*'` -ne 0 ] 
then echo yes ; else echo no ; fi

Use read to avoid tokenizing filenames
with spaces. Here's how to find all files
containing a space, and replace them by
underscores.

find . -iname '*  *' | 
  while read f ; do 
    echo mv "$f" "`echo "$f" | sed 's/  */_/g'`"
  done

For simple integer arithmetic use expr:

N=`expr "$N" + 3`

For arbitrary-precision floating-point
math, use bc -l

# Get pi to 10 places with arctangent (bc man page)
PI=`echo "scale=10; 4*a(1)" | bc -l`
# Expensive calculation of zero (Craig Artley):
ZERO=`echo "c($PI/4)-sqrt(2)/2" | bc -l`

seq 1 100 generates all integers
between 1 and 100. To iterate a loop 100
times, try

for i in `seq 1 100` ; do ... ; done

You can set the environment of a subprocess
by defining a variable on the same line. The
current shell is not affected.

$ x=doggie sh -c 'echo x=$x'
x=doggie
$ x=pig ; x=doggie echo x=$x
x=pig

Test that a string has non-zero length with

if [ -n "$string" ] ; then echo "not empty" ; fi

The -n is actually the default for a
string expression, so you can omit it:

if [ "$string" ] ; then echo "not empty" ; fi

There are several good ways to set default
values for environmental variables. Many do
this

if [ ! "$VARIABLE" ] ; then VARIABLE="default value" ; fi
export VARIABLE

A simple alternative is

: ${VARIABLE:="default value"} 
export VARIABLE

The colon at the beginning of the line is
necessary as a no-op that allows its
arguments to be evaluated.

Rarely you may want to accept a variable
defined as an empty string. If so, then omit
the colon before the equals when setting the
default.

: ${VARIABLE="default value"} 
export VARIABLE

To test whether a string is defined, even if
empty test

if [ "${VARIABLE+x}" ] ; then echo DEFINED ; fi

To echo all variables starting with X:
echo ${!X*}

§ Running commands

Use "$@" when passing command-line
arguments unaltered to subprocesses. This is
equivalent to passing "$1" "$2" ..., but
the first version works properly for no
arguments.

Test the processing of arguments, like this

$ set a 'b c' d
$ for i in "$@" ; do echo "|$i|" ; done
|a|
|b c|
|d|
$ for i in "$*" ; do echo "|$i|" ; done
|a b c d|
$ for i in $* ; do echo "|$i|" ; done
|a|
|b|
|c|
|d|

See what runtime options you may have set
with set -o bind -p shopt -p and

stty -a.

For example, you can edit a bash by default
in emacs mode. Change to vi with

set -o vi

In emacs mode, you can edit your command in
your environmental $EDITOR with cntl-x cntl-e

In vi-mode, use esc-v. See help fc
for more.

Repeat the last argument of the previous
command with !$. Repeat all arguments
without the command with !*.

To guarantee that a background process
outlives the current shell, add extra
parentheses like this:

( command & )

Otherwise, your current shell, by exiting X
or ssh, may terminate all processes that have
your shell as the parent process. The extra
parentheses starts a subshell that exits as
soon as the command is spawned in the
background. The background process changes
its parent process ID to 1. This is a
command-line version of the "double fork."

Repeat until a command succeeds:

while ! cvs -z 3 -q update -dPA ; do echo -n . ; sleep 60 ; done

Make a progress bar (loop while waiting on
a process)

sleep 10 & while ps -p $! >/dev/null; do echo -n . ; sleep 1 ; done ; echo 
or
while pidof mozilla-bin > /dev/null ; do echo -n . ; sleep 1 ; done ; echo

pgrep -f or killall -0 are alternatives to pidof for this purpose.

§ Manipulating paths

Loop over the elements of a PATH by
tokenizing with the character ':'.

IFS=':' ; for dir in $PATH ; do echo $dir ; done

Check for the existence of an executable
version of a command in your PATH:

function checkPath() {IFS=':' ; for dir in $PATH ; do if [ -x "$dir/$1" ] ;
                      then return 0; fi ; done; return 1;}
if checkPath commandName ; then ... ; fi

Here is a lovely function to modify a PATH
that I found in /etc/profile:

pathmunge () {
 if ! echo $PATH | egrep "(^|:)$1($|:)" >/dev/null ; then
    if [ "$2" = "after" ] ; then
       PATH=$PATH:$1
    else
       PATH=$1:$PATH
    fi
 fi
}

§ Common script chores

Debug the script with set -x.

Make a script exit immediately after any failed
command with set -e.

Process flags in a script:

for i in "$@" ; do
        case $i in 
                -a) FLAG_A=1
                        shift ;;
                -b) FLAG_B="$2"
                        shift ; shift ;;
                --) shift ; break ;;
        esac
done

Print help from a script:

if [ $# -lt 1 -o "$1" = "-h" -o "$1" = "-help" -o "$1" = "--help" ] ; then
     cat <<-END
 Usage: `basename $0` [-flag] arg1 [arg2]
 More information.
END
     exit
fi

Handle errors with functions:

Often an error exit is handled most cleanly
with a function.

print_usage_and_exit() {
 cat <<-END
Usage: `basename $0` arg1 arg2 [arg3]
The first two arguments are required.
END
 exit
}

if [ $# -lt 2 ] ; then
 print_usage_and_exit
fi

Here's a robust way to locate the directory
containing a script, following symbolic
links. (Taken from the launch script of
FindBugs.)

program="$0"
while [ -h "$program" ]; do
        link=`ls -ld "$program"`
        link=`expr "$link" : '.*-> \(.*\)'`
        if [ "`expr "$link" : '/.*'`" = 0 ]; then
                dir=`dirname "$program"`
                program="$dir/$link"
        else
                program="$link"
        fi
done
script_directory=`dirname $program`
script_directory=`cd $script_directory && /bin/pwd`

Trapping signals to stop scripts:

Ever try to interrupt your script, then
discover that it killed only one command and
continued to the next? Force a complete exit
by adding the following line early in your
script.

trap "exit 1" 1 2 3 15

You can also trap normal and error exits:

# force script to exit when any command fails
set -e 

# Trap on any exit
trap "echo Always called before exit" 0

# Trap on error exit only
trap "echo Error exit was called " ERR

echo "Next command will fail"

# Returns error code of 1
false

echo "Will not see this comment"

Process ID's

Get the process ID of the current shell as
$$, of the parent shell with $PPID
and $! for the most recently backgrounded
child process.

Interactively, you get see child PID's
with jobs -p.

Here's how to ask a yes or no question,
with a default of no. It checks whether the
first letter is a y or Y and ignores leading
spaces.

echo -n "Do you want to continue? [y/N]: "
read answer
if expr "$answer" : ' *[yY].*' > /dev/null; then 
   echo Continuing 
else 
   echo Quitting
   exit
fi

Here's how to ask for a password without
echoing the characters. The trapping ensures
that an interrupt does not leave the echoing
off.

stty -echo
trap "stty echo ; echo 'Interrupted' ; exit 1" 1 2 3 15
echo -n "Enter password: "
read password
echo "Your password is \"$password\""
stty echo

Gnome and other frameworks often allow simple
scripting of GUIs:

password=`zenity --entry --text "Enter password:"`

§ File descriptors

Redirecting output file descriptors

Here are common ways to capture the
standard output and standard error
of a single command in a log file:

command >file.log 2>&1 
command 2>&1 | tee file.log

If you have a script with many commands,
you can have them all write to the same log
file by default:

# save default standard output in file descriptor 10
exec 10>&1
# redirect standard output to a log file.
exec >file.log
# redirect standard error to same log file
exec 2>&1
# close stdin
exec 0<&-
# This command will write to log file
command
# echo to default standard output instead of log file
echo "Visible message" 1>&10

Avoid file descriptor 5, which bash already
uses. (ulimit -n should show many
available file descriptors.)

Avoid writing to stdout if it is not connected to a terminal:

  test -t 1 && echo "Connected to a terminal"

Open a socket

Associate a file descriptor, say 4, with a
socket, and close with

4< /dev/tcp/$hostname/$port
4<&-

A more portable solution is to use nc.
Listen on a port with

nc -l -p 3535

Connect to a remote host port like

echo 'GET /' | nc hostname 80

An even more general utility is socat,
which also handles Unix sockets.

Nerdbox Hack

Wednesday, 15 April 2009

Handy Links

Tuesday, 14 April 2009

More bash shell tips and tricks

§ Text filtering commands

§ Files and directories

§ Variables

§ Running commands

§ Manipulating paths

§ Common script chores

§ File descriptors

Slideshow

Blog Archive

About Me

click tracking