Tuesday, 14 April 2009

More bash shell tips and tricks

from http://billharlan.com/pub/papers/Bourne_shell_idioms.html

§    Text filtering commands



Administrating from scripts and the
command-line often benefit from pipes of text
filtering commands. Here are some that are
easy to overlook or forget.


  • mmencode converts to and from
    base64 and "quoted-printable" formats for email.
    Search for the metamail package.
    Unfortunately, this has become hard to find.


    Alternatively uuencode -m converts to base64,
    and uudecode -m converts from base64.


    Or decode and encode quoted-printable and
    base64 with

    perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::decode($_);'
    perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::encode($_);'
    perl -pe 'use MIME::Base64; $_=MIME::Base64::encode($_);'
    perl -pe 'use MIME::Base64; $_=MIME::Base64::decode($_);'


  • uniq lets you remove duplicated lines
    from a sorted file.
  • Count the number of times a given line
    occurs with
    sort | uniq -c | sort -n

  • Break one word per line with
    perl -pe 's/\s+/\n/g'


  • Combine separate lines into a single line
    of words with
    paste -s -d" " 

  • comm lets you suppress lines unique to
    one or both of two files.

  • cat -s never prints more than one blank
    line in a row.
  • Remove all blank lines with
    perl -ne 'print if /\S/'

  • Print lines starting with one containing
    FOO and ending with one containing BAR.
    sed -n '/FOO/,/BAR/p'


  • merge and diff3 for merging changes
    in files edited from a common ancestor.
  • fold breaks lines to proper width, and
    fmt will reformat lines into paragraphs.

  • dirname and basename let you
    extract the directory and filenames from a
    full path to a file.
  • namei breaks a pathname into pieces and
    follows symbolic links.
  • expand and col -x replace tabs by
    spaces.

  • col -b removes backspaces from a file.
  • cat -v shows non-printing characters as
    ascii escapes.
  • sed '1,10d' deletes the first 10
    lines.
  • sed -n '3p' and sed -n '3{p;q}'

    both print the third line, but the latter is
    more efficient.
  • sed '/foo/q' truncates a file after
    the line containing foo.
  • sed -ne '/foo/,/bar/p' prints
    everything from the line containing foo
    to the line containing bar.
  • Align space-delimited fields into orderly
    columns with column -t.

  • Right justify queries with
    printf "%40s" "Do you want to delete? [y/N] "

  • Convert dos text files to unix, and vice
    versa:
    dos2unix file.txt
    unix2dos file.txt
    tr -d \\r < win.txt > unix.txt # if you can't find dos2unix
    sed -e 's/$/\r/' < unix.txt > win.txt # if you can't find unix2dos


  • cat -n and nl numbers lines.
  • Both of these perform string substitution,
    but the latter allows more general regular
    expressions:
    sed -e 's/oldtext/newtext/g'
    perl -pe 's/oldtext/newtext/g'



    Here's how to replace double quotes by single
    quotes for TeX:

    < in.tex perl -pne 's%\B"\b%``%g' | 
    perl -pne "s%\b\"\B%''%g" > out.tex


  • Use iconv to convert between character
    encodings.
  • Here are two ways to find string patterns
    in a file:
    grep 'pattern' filename [file] [< file]
    perl -ne 'print if /pattern/' [file] [< file]


  • Print the first and third columns of each
    line:
    awk '{print $1,$3}'
    perl -alne 'print "$F[0] $F[2]"'

  • Convert to lower-case:
    tr '[A-Z]' '[a-z]'
    perl -pe 'tr/[A-Z]/[a-z]/'
    perl -pe '$_ =lc'


  • Simple character substitutions and
    deletions may be simplest with tr.
    tr -d '\n'  # delete newlines
    tr '\n' '\0' # replace newlines by null characters.

    $ echo 1-2a-3b | tr "[1-9]" "[2-9]" | tr '-' '_' | tr -d 'a'
    2_3_4b


  • You can pipe into a loop with read.
    Here is a complicated way to cat a text file,
    piping in and out of a loop.
    cat file | while read a; do echo "$a" ; done | cat

  • To read lines in pairs from two files try
    paste file1 file2 | while read a b ; do echo "$a $b" ; done


  • Sort a list of dependencies with tsort.
  • Reverse lines with tac and words with
    rev.



  • §    Files and directories




  • Select text (non-binary) files with
    one of these
    \ls | perl -lne 'print if -T'
    perl -le 'for (glob "*") {print if -T }'
    perl -le 'print for grep -T, <*>'


    The perl algorithm for detecting text files
    is very good.

  • To do something to files with goofy names,
    including spaces and dashes, delimit the
    files with null characters instead of
    whitespace or newlines.

    find -type f -print0 | xargs -r0 ls

  • See if a directory contains any accessible
    contents with
      if ls -A "${dir}"/* >& /dev/null ; then echo "has files" ; else echo "no files" ; fi
    [or]
    if [ "`ls -A ${dir}`" ] ; then echo "has files" ; else echo "no files" ; fi



    See if files of a certain type exist:

      if ls *.par >& /dev/null ; then echo "has par files" ; fi
    [or]
    ls *.par >& /dev/null && echo "has pars" || echo "no pars"


  • readlink -f will fully resolve what a
    symbolic link points to.


    Find all bad symbolic links with

    find . -type l |
    while read f ; do if ! readlink -f "$f" >&/dev/null
    then echo "$f" ; fi ; done




  • §    Variables



  • To see if a variable contains a regular
    expression, combine if and grep. For
    example to see if the name of a file begins
    with a dot, try
     if echo "$filename" | grep '^[.]' >/dev/null 
    then echo yes ; else echo no ; fi



    expr also has a support for limited
    regular expressions.

    if [ `expr "$filename" : '[.].*'` -ne 0 ] 
    then echo yes ; else echo no ; fi

  • Use read to avoid tokenizing filenames
    with spaces. Here's how to find all files
    containing a space, and replace them by
    underscores.

    find . -iname '*  *' | 
    while read f ; do
    echo mv "$f" "`echo "$f" | sed 's/ */_/g'`"
    done

  • For simple integer arithmetic use expr:
    N=`expr "$N" + 3`


  • For arbitrary-precision floating-point
    math, use bc -l
    # Get pi to 10 places with arctangent (bc man page)
    PI=`echo "scale=10; 4*a(1)" | bc -l`
    # Expensive calculation of zero (Craig Artley):
    ZERO=`echo "c($PI/4)-sqrt(2)/2" | bc -l`

  • seq 1 100 generates all integers
    between 1 and 100. To iterate a loop 100
    times, try
    for i in `seq 1 100` ; do ... ; done


  • You can set the environment of a subprocess
    by defining a variable on the same line. The
    current shell is not affected.
    $ x=doggie sh -c 'echo x=$x'
    x=doggie
    $ x=pig ; x=doggie echo x=$x
    x=pig

  • Test that a string has non-zero length with
    if [ -n "$string" ] ; then echo "not empty" ; fi



    The -n is actually the default for a
    string expression, so you can omit it:

    if [ "$string" ] ; then echo "not empty" ; fi


  • There are several good ways to set default
    values for environmental variables. Many do
    this
    if [ ! "$VARIABLE" ] ; then VARIABLE="default value" ; fi
    export VARIABLE


    A simple alternative is

    : ${VARIABLE:="default value"} 
    export VARIABLE



    The colon at the beginning of the line is
    necessary as a no-op that allows its
    arguments to be evaluated.

  • Rarely you may want to accept a variable
    defined as an empty string. If so, then omit
    the colon before the equals when setting the
    default.
    : ${VARIABLE="default value"} 
    export VARIABLE


    To test whether a string is defined, even if
    empty test

    if [ "${VARIABLE+x}" ] ; then echo DEFINED ; fi


  • To echo all variables starting with X:
    echo ${!X*}



  • §    Running commands



  • Use "$@" when passing command-line
    arguments unaltered to subprocesses. This is
    equivalent to passing "$1" "$2" ..., but
    the first version works properly for no
    arguments.

  • Test the processing of arguments, like this
    $ set a 'b c' d
    $ for i in "$@" ; do echo "|$i|" ; done
    |a|
    |b c|
    |d|
    $ for i in "$*" ; do echo "|$i|" ; done
    |a b c d|
    $ for i in $* ; do echo "|$i|" ; done
    |a|
    |b|
    |c|
    |d|

  • See what runtime options you may have set
    with set -o bind -p shopt -p and

    stty -a.


    For example, you can edit a bash by default
    in emacs mode. Change to vi with

    set -o vi


    In emacs mode, you can edit your command in
    your environmental $EDITOR with cntl-x
    cntl-e



    In vi-mode, use esc-v. See help fc
    for more.

  • Repeat the last argument of the previous
    command with !$. Repeat all arguments
    without the command with !*.
  • To guarantee that a background process
    outlives the current shell, add extra
    parentheses like this:
    ( command & )



    Otherwise, your current shell, by exiting X
    or ssh, may terminate all processes that have
    your shell as the parent process. The extra
    parentheses starts a subshell that exits as
    soon as the command is spawned in the
    background. The background process changes
    its parent process ID to 1. This is a
    command-line version of the "double fork."

  • Repeat until a command succeeds:
    while ! cvs -z 3 -q update -dPA ; do echo -n . ; sleep 60 ; done

  • Make a progress bar (loop while waiting on
    a process)

    sleep 10 & while ps -p $! >/dev/null; do echo -n . ; sleep 1 ; done ; echo 
    or
    while pidof mozilla-bin > /dev/null ; do echo -n . ; sleep 1 ; done ; echo


    pgrep -f or killall -0 are alternatives to pidof for this purpose.



  • §    Manipulating paths



  • Loop over the elements of a PATH by
    tokenizing with the character ':'.
    IFS=':' ; for dir in $PATH ; do echo $dir ; done


  • Check for the existence of an executable
    version of a command in your PATH:
    function checkPath() {IFS=':' ; for dir in $PATH ; do if [ -x "$dir/$1" ] ;
    then return 0; fi ; done; return 1;}
    if checkPath commandName ; then ... ; fi

  • Here is a lovely function to modify a PATH
    that I found in /etc/profile:
    pathmunge () {
    if ! echo $PATH | egrep "(^|:)$1($|:)" >/dev/null ; then
    if [ "$2" = "after" ] ; then
    PATH=$PATH:$1
    else
    PATH=$1:$PATH
    fi
    fi
    }




  • §    Common script chores



  • Debug the script with set -x.
  • Make a script exit immediately after any failed
    command with set -e.

  • Process flags in a script:
    for i in "$@" ; do
    case $i in
    -a) FLAG_A=1
    shift ;;
    -b) FLAG_B="$2"
    shift ; shift ;;
    --) shift ; break ;;
    esac
    done

  • Print help from a script:
    if [ $# -lt 1 -o "$1" = "-h" -o "$1" = "-help" -o "$1" = "--help" ] ; then
    cat <<-END
    Usage: `basename $0` [-flag] arg1 [arg2]
    More information.
    END
    exit
    fi


  • Handle errors with functions:


    Often an error exit is handled most cleanly
    with a function.

    print_usage_and_exit() {
    cat <<-END
    Usage: `basename $0` arg1 arg2 [arg3]
    The first two arguments are required.
    END
    exit
    }

    if [ $# -lt 2 ] ; then
    print_usage_and_exit
    fi


  • Here's a robust way to locate the directory
    containing a script, following symbolic
    links. (Taken from the launch script of
    FindBugs.)
    program="$0"
    while [ -h "$program" ]; do
    link=`ls -ld "$program"`
    link=`expr "$link" : '.*-> \(.*\)'`
    if [ "`expr "$link" : '/.*'`" = 0 ]; then
    dir=`dirname "$program"`
    program="$dir/$link"
    else
    program="$link"
    fi
    done
    script_directory=`dirname $program`
    script_directory=`cd $script_directory && /bin/pwd`

  • Trapping signals to stop scripts:


    Ever try to interrupt your script, then
    discover that it killed only one command and
    continued to the next? Force a complete exit
    by adding the following line early in your
    script.

    trap "exit 1" 1 2 3 15


    You can also trap normal and error exits:

    # force script to exit when any command fails
    set -e

    # Trap on any exit
    trap "echo Always called before exit" 0

    # Trap on error exit only
    trap "echo Error exit was called " ERR

    echo "Next command will fail"

    # Returns error code of 1
    false

    echo "Will not see this comment"


  • Process ID's


    Get the process ID of the current shell as
    $$, of the parent shell with $PPID
    and $! for the most recently backgrounded
    child process.


    Interactively, you get see child PID's
    with jobs -p.

  • Here's how to ask a yes or no question,
    with a default of no. It checks whether the
    first letter is a y or Y and ignores leading
    spaces.

    echo -n "Do you want to continue? [y/N]: "
    read answer
    if expr "$answer" : ' *[yY].*' > /dev/null; then
    echo Continuing
    else
    echo Quitting
    exit
    fi

  • Here's how to ask for a password without
    echoing the characters. The trapping ensures
    that an interrupt does not leave the echoing
    off.
    stty -echo
    trap "stty echo ; echo 'Interrupted' ; exit 1" 1 2 3 15
    echo -n "Enter password: "
    read password
    echo "Your password is \"$password\""
    stty echo



    Gnome and other frameworks often allow simple
    scripting of GUIs:

    password=`zenity --entry --text "Enter password:"`



  • §    File descriptors




  • Redirecting output file descriptors


    Here are common ways to capture the
    standard output and standard error
    of a single command in a log file:

    command >file.log 2>&1 
    command 2>&1 | tee file.log

  • If you have a script with many commands,
    you can have them all write to the same log
    file by default:

    # save default standard output in file descriptor 10
    exec 10>&1
    # redirect standard output to a log file.
    exec >file.log
    # redirect standard error to same log file
    exec 2>&1
    # close stdin
    exec 0<&-
    # This command will write to log file
    command
    # echo to default standard output instead of log file
    echo "Visible message" 1>&10


    Avoid file descriptor 5, which bash already
    uses. (ulimit -n should show many
    available file descriptors.)

  • Avoid writing to stdout if it is not connected to a terminal:
      test -t 1 && echo "Connected to a terminal"

  • Open a socket


    Associate a file descriptor, say 4, with a
    socket, and close with

    4< /dev/tcp/$hostname/$port
    4<&-


    A more portable solution is to use nc.
    Listen on a port with

    nc -l -p 3535



    Connect to a remote host port like

    echo 'GET /' | nc hostname 80


    An even more general utility is socat,
    which also handles Unix sockets.

  •