Path Functions
The Problem
Is there a problem? After all
PATH=$PATH:/usr/local/bin
is not that hard to do.
It's true but there is a subtle problem with the above for a start. If the current value of PATH is empty or unset then the result will be
:/usr/local/bin
where the empty value in front of the colon means the current directory. That's not good if there are unexpected commands in the current directory.
Besides, how many times do you have /usr/local/bin on your PATH?
Hmm, you still aren't convinced. Well, let's write some functions anyway. We might be able to do some useful things on the side: check the directory exists; remove duplicates; insert elements before or after another. They might begin to look useful after all.
The Next Problem
We'll not just be playing with PATH but any similarly formed list:
- MANPATH and LD_LIBRARY_PATH
- PERL5LIB
- CLASSPATH is a list of jar files as well as directories
- TCLLIBPATH is a SPACE separated list of directories
- Windows systems will use a SEMICOLON separator
So we'll need to pass the name of the PATH we want to manipulate and handle different path SEPARATORS.
First things first. If we're passing the name of the path we want to manipulate we have two problems: how do we get the value given the name and how do we set the value?
Variable Indirection
Variable indirection is our friend:
% var=PATH % echo ${var} PATH % echo ${!var} /bin:/usr/bin
We could do with all those elements separately. For that we want to play with IFS. Given that it is used as the field separator during the Word Splitting stage of expansion it looks like we can trivially split the value of the path into words by setting IFS to the value of the separator.
Note
Whenever you manipulate IFS you must remember to save the original value and put it back afterwards.
% OIFS="${IFS}" % IFS=: % echo ${!var} /bin /usr/bin
which looks good. If we use that in an array initialization we'll be looking very good:
% dirs=( ${!var} ) % IFS="${OIFS}" % echo "${#dirs[*]}: ${dirs[@]}" 2: /bin /usr/bin
Modifying the PATH
To manipulate the PATH we can manipulate the elements of the array.
Note
Note the quoting in "${dirs[@]}" in the following sections to preserve the whitespace.
Prepending
dirs=( new value "${dirs[@]}" )
Appending
dirs=( "${dirs[@]}" new value )
Removing Elements
This is slightly more subtle as we can't simply walk over the list with:
for d in "${dirs[@]}" ; do
because, whilst we have the value ${d} we don't have a reference into the array. We'll have to walk the array the long way:
max=${#dirs[*]} for ((i=0; i< max; i++ )) ; do if [[ ! -d "${dirs[i]}" ]] ; then unset dirs[i] fi done
Note
We have to calculate the length of the array before we start as the length of the array ${#dirs[*]} will change if we remove elements and were using the length calculation in the for loop:
for ((i=0; i < ${#dirs[*]}; i++ )) ; do
Getting the new value
Converting an array into a SEPARATOR separated string can be done with a similar IFS trick:
% echo "${dirs[*]}" new value /bin /usr/bin % IFS=: % echo "${dirs[*]}" new:value:/bin:/usr/bin
Note
Remember to set IFS back.
Setting the path
This turns out to be a real problem. The obvious thing to do is:
${var}=new:value PATH=new:value: command not found
Hmm, it seems the shell isn't fooled by our hijinks and doesn't believe that ${var}= is a variable assignment because ${var} has several characters ($, { and }) that aren't allowed in an identifier.
eval considered bad
We could say:
% eval ${var}=new:value % echo ${var} new:value
However, eval introduces a whole new world of pain.
% new='more;echo oops' % echo ${new} more; echo oops % PATH=/bin:/usr/bin % eval ${var}=${!var}:${new} oops % echo ${!var} /bin:/usr/bin:more
The problem is that the eval line has been expanded to:
PATH=/bin:/usr/bin:more; echo oops
and you can see why oops is printed and the value of PATH isn't what you expect.
We can delay the inevitable by escaping, ie. preventing the expansion of ${new}:
% eval ${var}=${!var}:\${new}
Here, the expanded line that eval will evaluate is:
PATH=/bin:/usr/bin:${new}
which is safe enough:
% echo ${!var} /bin:/usr/bin:more;echo oops
However, when we run this again:
% eval ${var}=${!var}:\${new} oops:more;echo oops % echo ${!var} /bin:/usr/bin:more
This time it is the value expanded from ${!var}, /bin:/usr/bin:more;echo oops, which is causing the problem, not the escaped addition (\${new}). The expanded line eval will evaluate looks like:
% PATH=/bin:/usr/bin:more;echo oops:${new}
We could start trying to enclose the variables in escaped double-quotes:
eval ${var}=\"${!var}:\${new}\"
but we will quickly run out of the will to live when attempting to escape all possible dangerous strings. As a trivial example, consider if ${new} contains a double-quote character.
declare
declare looks like a promising candidate:
declare ${var}="${!var}:${new}"
which it would be except for one small thing. If declare is used in a function it acts like local and the NAME takes on local scope, ie. we won't be setting the value outside the function.
read and <<<
In the end we need to use a couple of shell tricks. read lets us set variables:
read NAME
but it is reading from its stdin. To forge a stdin we need to use the string version of a here document, <<<:
read NAME << VALUE
putting it together:
read ${var} <<< "${!var}:${new}"
or
read ${var} <<< "${dirs[*]}"
Prototype path_append
Putting all the parts together will give us something like:
path_append () { typeset var=$1 typeset val="$2" typeset sep="${3:-:}" typeset OIFS OIFS="${IFS}" IFS="${sep}" typeset origdirs origdirs=( ${!var} ) typeset newdirs newdirs=( ${val} ) typeset vardirs vardirs=( "${origdirs[@]}" "${newdirs[@]}" ) read ${var} <<< "${vardirs[*]}" IFS="${OIFS}" }
Options
As suggested, we might want to check that entries exist before adding them (YMMV). Shell functions can use getopts just like shell scripts can:
typeset opt_op OPTIND=1 while getopts "def" opt ; do case "${opt}" in d|e|f) opt_op=${opt} ;; ?) error "Unexpected argument" ;; esac done shift $(( $OPTIND - 1 ))
and
if [[ ${opt_op} ]] ; then typeset n typeset maxn=${#newdirs[*]} for (( n=0 ; n < ${maxn} ; n++ )) ; do # if ... ; then # where ... is a case statement! if case "${opt_op}" in d) [[ ! -d "${newdirs[n]}" ]] ;; e) [[ ! -e "${newdirs[n]}" ]] ;; f) [[ ! -f "${newdirs[n]}" ]] ;; esac then unset newdirs[n] fi done fi
Note
We have to do this complex if + case statement (or something similar) because we're using [[ which insists on conditional operators being unquoted (and certainly not the result of variable expansion).
If we were using the non-preferred [ (or test) then we could have simply said:
if [ -${opt_op} "${newdirs[n]}" ]
although we definitely have to double quote the ${newdirs[n]} expression.
Prototype 2 path_append
path_append () { typeset opt_op OPTIND=1 while getopts "def" opt ; do case "${opt}" in d|e|f) opt_op=${opt} ;; ?) error "Unexpected argument" ;; esac done shift $(( $OPTIND - 1 )) typeset var=$1 typeset val="$2" typeset sep="${3:-:}" typeset OIFS OIFS="${IFS}" IFS="${sep}" typeset origdirs origdirs=( ${!var} ) typeset newdirs newdirs=( ${val} ) if [[ ${opt_op} ]] ; then typeset n typeset maxn=${#newdirs[*]} for (( n=0 ; n < ${maxn} ; n++ )) ; do if case "${opt_op}" in d) [[ ! -d "${newdirs[n]}" ]] ;; e) [[ ! -e "${newdirs[n]}" ]] ;; f) [[ ! -f "${newdirs[n]}" ]] ;; esac then unset newdirs[n] fi done fi if [[ ${#newdirs[*]} -eq 0 ]] ; then return 0 fi typeset vardirs vardirs=( "${origdirs[@]}" "${newdirs[@]}" ) read ${var} <<< "${vardirs[*]}" IFS="${OIFS}" }
Other functions
path_append (ie. insert at the end), path_prepend (insert at the start) and path_insert all sound very similar. So do path_remove, path_replace and path_verify. It sounds like we want an all singing all dancing path_modify function and some wrapper functions to it.
path_remove and path_replace will probably want an option to do their action only once.
path_modify
path_modify, then, wants to do stuff:
path_modify () { typeset opt_op opt_once OPTIND=1 while getopts "1def" opt ; do case "${opt}" in 1) opt_once=1 ;; d|e|f) opt_op=${opt} ;; ?) error "Unexpected argument" ;; esac done shift $(( $OPTIND - 1 )) typeset var=$1 typeset val="$2" typeset act="$3" typeset wrt="$4" typeset sep="${5:-:}" typeset OIFS OIFS="${IFS}" IFS="${sep}" typeset origdirs origdirs=( ${!var} ) typeset newdirs newdirs=( ${val} ) if [[ ${opt_op} ]] ; then typeset n typeset maxn=${#newdirs[*]} for (( n=0 ; n < ${maxn} ; n++ )) ; do if case "${opt_op}" in d) [[ ! -d "${newdirs[n]}" ]] ;; e) [[ ! -e "${newdirs[n]}" ]] ;; f) [[ ! -f "${newdirs[n]}" ]] ;; esac then unset newdirs[n] fi done fi if [[ ${#newdirs[*]} -eq 0 ]] ; then case "${act}" in verify|replace|remove) ;; *) IFS="${OIFS}" return 0 ;; esac fi typeset vardirs case "${act}" in first|start) vardirs=( "${newdirs[@]}" "${origdirs[@]}" ) ;; last|end) vardirs=( "${origdirs[@]}" "${newdirs[@]}" ) ;; verify) vardirs=( "${newdirs[@]}" ) ;; after|before|replace|remove) typeset todo=1 typeset o typeset maxo=${#origdirs[*]} for (( o=0 ; o < ${maxo} ; o++ )) ; do if [[ "${todo}" && "${origdirs[o]}" = "${wrt}" ]] ; then case "${act}" in after) vardirs=( "${vardirs[@]}" "${origdirs[o]}" "${newdirs[@]}" ) ;; before) vardirs=( "${vardirs[@]}" "${newdirs[@]}" "${origdirs[o]}" ) ;; replace) vardirs=( "${vardirs[@]}" "${newdirs[@]}" ) ;; remove) ;; esac if [[ "${opt_once}" ]] ; then todo= fi else vardirs=( "${vardirs[@]}" "${origdirs[o]}" ) fi done ;; *) vardirs=( "${origdirs[@]}" ) ;; esac read ${var} <<< "${vardirs[*]}" IFS="${OIFS}" }
and therefore path_append can become a wrapper to path_modify:
path_append () { typeset opt_flags OPTIND=1 while getopts "def" opt ; do case "${opt}" in d|e|f) opt_flags=-${opt} ;; ?) error "Unexpected argument" ;; esac done shift $(( $OPTIND - 1 )) path_modify ${opt_flags} "$1" "$2" last '' "${3:-:}" }
and (with option handling removed) the other path functions look like:
path_prepend () { ... path_modify ${opt_flags} "$1" "$2" first '' "${3:-:}" } path_verify () { ... # As path_modify checks the paths to be added we pass the expansion of NAME, ie # our own value path_modify ${opt_flags} "$1" "${!1}" verify '' "${2:-:}" } path_replace () { ... # The expression is path_replace OLD NEW but path_modify takes the arguments # the other way round path_modify ${opt_flags} "$1" "$3" replace "$2" "${4:-:}" } path_remove () { ... path_modify ${opt_flags} "$1" '' remove "$2" "${3:-:}" }
path_trim
Depending on how we've gotten to where we are, we might well have /usr/local/bin, say, on our PATH more than once. We could do with trimming the cruft.
To do this we would use a set or a map in other languages but we're a bit short of those in the shell (Bash 4 does have associative arrays but we should look for something more portable). What we can do is string comparisons, in particular with case. For example:
case "${string}" in *a*) ;; esac
To make this work with our paths we have to construct a ${string} such that it can be matched against each individual element and if we've seen the element before do nothing and if we've not seen it before then add it to the path.
The constructed path itself is an obvious such string. However, we need to be very careful when matching as /bin can be matched against /bin, /usr/bin, /usr/local/bin etc.. We'll need to include the separator as part of the match:
case "${path}" in *${sep}${dir}${sep}*) ;; esac
But that's not quite all as there are a couple of other issues:
quoting - we need to quote the patterns:
*"${sep}${dir}${sep}"*)
as both ${sep} and ${dir} can contain whitespace
if the element we are trying to match against is the first (or last) element in the path then ${sep}${dir}${sep} won't match it (as the path (probably) won't have a leading/trailing ${sep}). To fix that we need to augment the string being compared to:
case "${sep}${path}${sep}" in
This leads us to the visually confusing (but quite simple):
case "${sep}${path}${sep}" in *"${sep}${dir}${sep}"*) ;; esac
Brought together it looks like:
path_trim () { typeset var=$1 typeset sep="${2:-:}" typeset OIFS OIFS="${IFS}" IFS="${sep}" typeset origdirs origdirs=( ${!var} ) IFS="${OIFS}" typeset o typeset maxo=${#origdirs[*]} typeset seen= for (( o=0 ; o < ${maxo} ; o++ )) ; do case "${sep}${seen}${sep}" in *"${sep}${origdirs[o]:-.}${sep}"*) unset origdirs[o] ;; *) seen="${seen+${seen}${sep}}${origdirs[o]:-.}" ;; esac done IFS="${sep}" read ${var} <<< "${origdirs[*]}" IFS="${OIFS}" }
Convenience Wrappers
When we're adding distributions of code we're quite likely to be performing the same steps repeatedly:
path_append PATH /usr/local/bin path_append MANPATH /usr/local/man
and in some cases
path_append LD_LIBRARY_PATH /usr/local/lib
It fairly obvious we should be writing some shortcuts, std_paths_append, say, for PATH and MANPATH and all_paths_append for the same plus LD_LIBRAY_PATH.
path_append and path_prepend behave in much the same way and it would be a shame to have to write both std_paths_append and std_paths_prepend and thanks to the way the shell processes lines we don't:
% base=/usr/local % act=prepend % path_${act} PATH "${base}"/bin
Variable expansion occurs before Word Splitting and therefore before the shell decides what the command name is [1] (or even if there is a command). path_${act} is expanded to path_prepend and away we go.
std_paths (and all_paths) should therefore take an argument which is the action it should be performing. On top of which they both can do some small checks that the relevant directories exist or perhaps check a few (eg. man or share/man). For example:
std_paths () { typeset act="$1" typeset val="$2" typeset sep="${3:-:}" typeset OIFS OIFS="${IFS}" IFS="${sep}" typeset origdirs origdirs=( ${!var} ) IFS="${OIFS}" typeset dir for dir in "${origdirs[@]}" ; do path_${act} PATH "${dir}/bin" typeset md for md in man share/man ; do if [[ -d "${dir}/${md}" ]] ; then path_${act} MANPATH "${dir}/${md}" fi done done }
pathname_flatten
While we're messing about with paths we could write a useful little function to flatten pathnames. As we play with pathnames automatically, particularly with wrappers, we're likely to encounter pathnames with embedded ., .. directories and multiple / separators, eg. ///full//./path/to/../to/bin/.
There's nothing wrong with that, it is a perfectly valid pathname, but if we're going to set some environment variables with it then it is hard to scan and wastes a few bytes.
What do we need to look at?
/ - we don't need multiple ones. We're going to be using an IFS trick again and split the pathname up by the directory separator, /. In the resultant array, the / will disappear but if you have two adjacent separators in your pathname then you will get an empty string element in the array, ie. foo//bar would become a three element array, ( foo '' bar ).
There is one special case, if the pathname is an absolute pathname, ie. begins with a /, then we need to preserve the empty string in the array.
While we're here, if flattening the pathname results in just the empty string in the array (eg., /.. would become ( '' )) then the usual IFS trick of recombining array elements will not give us a /. We'll have to handle that case specially.
. - we can simply junk this element
.. - we need to remove the last element in the array, ie. go back up a directory, unless:
- it is the first element, eg. ../bin, in which case there is no directory (in the pathname) to go back up
- you are already at the top of the directory tree, ie. /.. is /
The resultant function looks like:
pathname_flatten () { typeset val=$1 typeset sep="${2:-/}" typeset OIFS OIFS="${IFS}" IFS="${sep}" typeset origdirs origdirs=( ${val} ) IFS="${OIFS}" typeset newdirs newdirs=() typeset o typeset maxo=${#origdirs[*]} typeset seen= for (( o=0 ; o < ${maxo} ; o++ )) ; do case "${origdirs[o]}" in '') # ///foo -> ( '' '' '' foo ) # but we still need the first! if [[ $o -eq 0 ]] ; then newdirs=( '' ) fi ;; .) ;; ..) if [[ $o -eq 0 ]] ; then # .. at the start cannot be flattened newdirs=( "${newdirs[@]}" "${origdirs[o]}" ) else # remove the last element if [[ ${#newdirs[*]} -gt 1 ]] ; then unset newdirs[$((${#newdirs[*]} - 1))] fi fi ;; *) newdirs=( "${newdirs[@]}" "${origdirs[o]}" ) ;; esac done # If all we are left with in newdirs is '' (ie /) then the IFS # trick fails us, we need to handle this case specially if [[ ${#newdirs[*]} -eq 1 && "${newdirs[0]}" = "" ]] ; then echo / else IFS="${sep}" echo "${newdirs[*]}" IFS="${OIFS}" fi }
[1] | Unfortunately, it does identify and handle separately Variable Assignments which is why we can't do the ${var}="${dirs[*]}" trick before. |
Document Actions