Table of Contents
Chapter 4 About Text Edit
1 regular expression
1.1 基础的正则表达式
regex | Desription |
---|---|
^ | start of line |
$ | end of line |
. | matches any one character |
[] | matches any one char in [chars] |
[^] | matches any one char EXCEPT in [chars] |
[-] | matches any char within range [chars] |
? | matches one or zero times |
+ | matches one or more times |
* | matches zero or more times |
() | substring as one item to match |
{n} | match n times |
{n,} | match at least n times |
{n, m} | match n to m times |
| | alternation, OR |
\ | escape |
1.2 POSIX 字符集
regex | Desription |
---|---|
[:alpha:] | alphabet |
[:digit:] | digit |
[:alnum:] | alp & number |
[:lower:] | lowercase |
[:upper:] | uppercase |
[:punct:] | punctuation |
[:blank:] | space & tab |
[:space:] | whitespace |
1.3 Perl-style
regex | Desription |
---|---|
\b | word boundary |
\B | non-word boundary |
\d | single digit |
\D | single non-digit |
\w | single word |
\W | single non-word |
\n | newline |
\s | single whitespace |
§ | single non-space |
\r | return |
# IP address |
2 grep
The master unix utility for searching in the text.
grep PATTERN FILE |
3 cut & concatenate
Column-wise cutting of a file
# Field 2 in file |
Column-wise concatenate of files
paste FILE1 FILE2 |
4 sed
Stream editor
# First occurrence of pattern in each line |
5 awk
Data streams
awk 'BEGIN {statements} {statements} END {end statements}' |
6 Misc
6.1 Parsing email address or url from text
egrep -o '[A-Za-z0-9]+@[A-Za-z0-9]+\.[a-zA-Z]{2,4}' FILE |
6.2 delete a sentence containing a word
# [^.]* -- any char except ., and comb of it any times |