Linux shell commands 104

Chapter 4 About Text Edit

1 regular expression

1.1 基础的正则表达式

regex Desription
^ start of line
$ end of line
. matches any one character
[] matches any one char in [chars]
[^] matches any one char EXCEPT in [chars]
[-] matches any char within range [chars]
? matches one or zero times
+ matches one or more times
* matches zero or more times
() substring as one item to match
{n} match n times
{n,} match at least n times
{n, m} match n to m times
| alternation, OR
\ escape

1.2 POSIX 字符集

regex Desription
[:alpha:] alphabet
[:digit:] digit
[:alnum:] alp & number
[:lower:] lowercase
[:upper:] uppercase
[:punct:] punctuation
[:blank:] space & tab
[:space:] whitespace

1.3 Perl-style

regex Desription
\b word boundary
\B non-word boundary
\d single digit
\D single non-digit
\w single word
\W single non-word
\n newline
\s single whitespace
§ single non-space
\r return
# IP address
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}

2 grep

The master unix utility for searching in the text.

grep PATTERN FILE
# Extented regular expression
grep -E PATTERN
egrep PATTERN
# Only matched portion
grep -o -E PATTERN
# Except lines containing PATTERN
grep -v PATTERN FILE
# Count number of lines
grep -c PATTERN FILE
# Recursively
grep -R -n DIRE
# Byte Offset
grep -b -o PATTERN FILE
# Locate matched pattern
grep -l PATTERN FILE
# Locate non-matched files
grep -L PATTERN FILE
# Ignore case of pattern
grep -i PATTERN FILE
# Multi patterns
grep -e PATTERN1 -e PATTERN2 FILE
# Print 4 lines After matched pattern
grep PATTERN -A 4
# Print 4 lines Before matched pattern
grep PATTERN -B 4
# Print 4 lines that matched pattern as Center
grep PATTERN -C 4
# include & exclude
grep PATTERN DIRE -r --include *.{c,cpp}
grep PATTERN DIRE -r --exclude "readme"

3 cut & concatenate

Column-wise cutting of a file

# Field 2 in file
cut -f2 FIELD FILE
# Bytes 1st to 5th
cut -b1-5 BYTE FILE
# Character since 3rd
cut -c3- CHAR FILE
# Delimiter
cut FILE -c1-3,5- --output-delimiter ","

Column-wise concatenate of files

paste FILE1 FILE2
# Delimiter
paste FILE1 FILE2 -d ','

4 sed

Stream editor

# First occurrence of pattern in each line
sed 's/PATTERN/REPLACE/' FILE
# Write changes Into file
sed -i 's/PATTERN/REPLACE/' FILE
# Global replace
sed 's/PATTERN/REPLACE/g' FILE
sed 's:PATTERN:REPLACE:g' FILE
sed 's|PATTERN|REPLACE|g' FILE
# Global replace since N+1th occurrence
sed 's/PATTERN/REPLACE/2g' FILE
# Delete blank lines
sed '/^$/d' FILE
# Delete line
sed '/PATTERN/d' FILE
# Matched string notation & -- \w\+
sed 's/\w\+/[&]/g' FILE
# Matched substring notation \1 -- \(PATTERN\)
sed 's/STRING\(PATTERN\)/\1/'
# Multiple expressions
sed 'exp' | sed 'exp'
sed 'exp; exp'
# Quoting
sed "s/$VAR/REPLACE/"

5 awk

Data streams

awk 'BEGIN {statements} {statements} END {end statements}'
awk "BEGIN {statements} {statements} END {end statements}"
# Print 3rd & 2nd field of every line
awk '{ print $3, $2 }' FILE
# Count number of lines, NR -- records, NF -- fields
awk 'END{ print NR}' FILE
# Variable passed from outside to awk
awk -v VARI=$VAR '{print VARI}'
awk '{print v1, v2}' v1=$var1 v2=$var2 FILE
# Filtering lines
awk '/START_PATTERN/, /END_PATTERN/' FILE
awk 'NR==1,NR==4' FILE
awk 'NR < 5' FILE
awk '!/linux/' FILE
# Setting delimiter
awk -F: '{ print $NF }' /etc/passwd
awk 'BEGIN { FS=":" } { print $NF }' /etc/passwd

6 Misc

6.1 Parsing email address or url from text

egrep -o '[A-Za-z0-9]+@[A-Za-z0-9]+\.[a-zA-Z]{2,4}' FILE
egrep -o "http://[A-Za-z0-9]+\.[a-zA-Z]{2,3}" FILE

6.2 delete a sentence containing a word

# [^.]* -- any char except ., and comb of it any times
sed 's/ [^.]*PATTERN[^.]*\.//g' FILE