Wednesday, January 19, 2011

Regular expressions and using find

The find Unix command has the ability to do searches with the different regular expression formats, but how does one actually use it? What if we want to search for a specific pattern? (i.e. 2010-01-19.12.34.56)

I decided to try out a few combinations:

MYDIR="/home/mydir/"
# Match against the full date/time
find ${MYDIR} -maxdepth 1 -mindepth 1 -regextype posix-egrep -regex ${MYDIR}[0-9]{4}-[0-9]{2}-[0-9]{2}.* -type d -print
# For POSIX awk types, we must use [[:digit:]] to match the first 4 digits.  We can match with {4}
find ${MYDIR} -maxdepth 1 -mindepth 1 -regextype posix-awk -regex ${MYDIR}[[:digit:]]{4}.* -type d -print
# Emacs regex (default) can only support [0-9] and can't do multiple matches (i.e. {4})?
find ${MYDIR} -maxdepth 1 -mindepth 1 -regex ${MYDIR}[0-9].* -type d -print
# Extended regexp: we can also use [0-9], and use {4}
find ${MYDIR} -maxdepth 1 -mindepth 1 -regextype posix-extended -regex ${MYDIR}[0-9]{4}.* -type d -print

The link below summarizes all the various differences between the regular expressions:
http://www.greenend.org.uk/rjk/2002/06/regexp.html

According to the document, apparently \d matching can only be done in Python, Perl, and Tcl. We use egrep if we want to specify multiple matching characters {4} (instead of \{4\}) in regular grep. Within Emacs, multiple matching works but at the command-line with find it doesn't seem to work.

No comments:

Post a Comment