Table of Contents
Chapter 4 About Text Edit
1 regular expression
1.1 基础的正则表达式
| regex | Desription |
|---|---|
| ^ | start of line |
| $ | end of line |
| . | matches any one character |
| [] | matches any one char in [chars] |
| [^] | matches any one char EXCEPT in [chars] |
| [-] | matches any char within range [chars] |
| ? | matches one or zero times |
| + | matches one or more times |
| * | matches zero or more times |
| () | substring as one item to match |
| {n} | match n times |
| {n,} | match at least n times |
| {n, m} | match n to m times |
| | | alternation, OR |
| \ | escape |
1.2 POSIX 字符集
| regex | Desription |
|---|---|
| [:alpha:] | alphabet |
| [:digit:] | digit |
| [:alnum:] | alp & number |
| [:lower:] | lowercase |
| [:upper:] | uppercase |
| [:punct:] | punctuation |
| [:blank:] | space & tab |
| [:space:] | whitespace |
1.3 Perl-style
| regex | Desription |
|---|---|
| \b | word boundary |
| \B | non-word boundary |
| \d | single digit |
| \D | single non-digit |
| \w | single word |
| \W | single non-word |
| \n | newline |
| \s | single whitespace |
| § | single non-space |
| \r | return |
# IP address |
2 grep
The master unix utility for searching in the text.
grep PATTERN FILE |
3 cut & concatenate
Column-wise cutting of a file
# Field 2 in file |
Column-wise concatenate of files
paste FILE1 FILE2 |
4 sed
Stream editor
# First occurrence of pattern in each line |
5 awk
Data streams
awk 'BEGIN {statements} {statements} END {end statements}' |
6 Misc
6.1 Parsing email address or url from text
egrep -o '[A-Za-z0-9]+@[A-Za-z0-9]+\.[a-zA-Z]{2,4}' FILE |
6.2 delete a sentence containing a word
# [^.]* -- any char except ., and comb of it any times |