text-processing

reference:

charset

[!NOTE|label:references:]

HEX
NAME
ABBREVIATION
ESCAPE
CODE

0x00

null

NUL

\0

^@

0x07

bell

BEL

\a

^G

0x08

backspace

BS

\b

^H

0x09

horizontal tab

HT

^I

0x0a

line feed

LF

^J

0x0b

vertical tab

VT

\v

^K

0x0c

form feed

FF

\f

^L

0x0d

carriage return

CR

^M

0x1a

Control-Z

SUB

-

^Z

0x1b

escape

ESC

\e

^[

echo ascii in bash

escape

[!NOTE|label:references:]

ESCAPING SEQUENCES
COMMENTS

\'

single quote

\"

double quote

\\

backslash

new line

horizontal tab

carriage return

\?

question mark

control character

read without \001 and \002
read with \001 and \002
CHAR/SEQUENCE
ASCII
USAGE/MEANING
EXAMPLE
TYPICAL SCENARIO

\001

SOH (Start Of Heading) (0x01)

start of non-printing section (== \[ in PS1)

\001\033[31m\002

Wrap ANSI color codes to avoid cursor position miscalculation

\002

STX (Start Of Text) (0x02)

end of non-printing section (== \] in PS1)

\001\033[0m\002

Wrap ANSI color codes to avoid cursor position miscalculation

\e \033

ESC (Escape, 0x1B) (0x1B)

begins an ANSI escape sequence

\e[31m (set text red)

Control text color, cursor position, terminal behavior

\a

BEL (BELL) (0x07)

triggers terminal bell sound

echo -e "\a"

Alert user with sound

CR (Carriage Return) (0x0D)

returns cursor to beginning of line

printf "Progress: 50%%\r"

Overwrite text on same line ( i.e. spinner/progress bar )

LF (line feed) (0x0A)

newline

echo -e "Line1\nLine2"

Standard line break

TAB (Horizontal Tab) (0x09)

horizontal tab

echo -e "Name:\tJohn"

Column-style output, Align text

\b

BS (Delete) (0x08)

deletes one character backward

echo -e "abc\b\bxyz" (result: axyz)

Inline deletions

\xHH

Hex Byte

represents a character by its hex value

\x1B == \e \x41A

Injecting arbitrary bytes

\uXXXX

Unicode Char

4-digit Unicode character (Bash extension, requires terminal support)

\u2714 → ✔️

Emojis or symbols in terminal

\nnn

Octal Char

3-digit octal character

\101A

Injecting arbitrary bytes

\033[XXm

SGR (Select Graphic Rendition)

text color/style control (i.e.: 31m=red,1m=bold,0m=reset)

\033[32;1mSuccess!\033[0m

Highlight key information

\033[K

-

erase to end of line

echo -e "Loading...\033[K"

Clear the content at the end of each line dynamically

\033[nA

-

cursor up n lines

\033[2A

Multi-line Output Control

\033[nB

-

cursor down n lines

\033[3B

Multi-line Output Control

\033[nC

-

cursor forward n columns

\033[10C

Align complex output

\033[nD

-

cursor backward n columns

\033[5D

Rollback Edit

encryption

base64

  • decryption

show

align

[!NOTE|label:see also]

numfmt

[!NOTE|label:references:]

  • setup

  • usage

  • convert format

  • padding

  • field

[!NOTE|label:references:]

combinations

single line to multiple lines

[!TIP]

execute commands from file

combine every 2 lines

[!NOTE|label:references:]

sample output

also using for sed output :

  • xargs

  • paste

  • sed

  • awk

  • while

combine every 3 lines

  • paste

  • awk

  • xargs

format output

[!TIP|label:sample data]

echo

[!TIP]

echo -e
  • print file with ansicolor

    echo -ne file with ansicolor

diff

[!NOTE|label:references:]

comm

  • diff

  • common

join

[!NOTE|label:references:]

alignment

[!TIP]

expand (POSIX)

pr (POSIX)

rs (BSD)

column (BSD)

[!NOTE|label:references:]

sort

[!NOTE|label:references:]

sort the last column

  • awk: print( $NF" "$0 ) | sort | cut -f2- -d' '

  • awk: similar with rev for words

get lines

get second-to-last line

[!NOTE|label:references:]

  • sed

  • tail & head

get next line by the pattern

  • awk

    • or

    • or

    • or get second column of next line of pattern

  • sed

  • to get docker registry mirrors

change next line of pattern

[!NOTE|label:references:]

  • replace dollars to $ right after line of /Quarter [1-4]/

    • sed

    • awk

  • replace dollars to $ every 3 lines after /Quarter [1-4]/

get lines between 2 patterns

[!NOTE|label:reference:]

[!TIP] sample data:

sed

[!NOTE|label:references:]

  • include all patterns

  • exclude both patterns

  • exclude single pattern

with empty line

[!NOTE]

get line from pattern to the end

[!TIP|label:references:]

  • sample content:

get from first empty line to the end

[!NOTE|label:references:]

  • including pattern

    [!TIP]

    • solution: to print from pattern to end ,$ == ,$p

    • for both CRLF and LF

  • not including pattern

    [!TIP]

    • solution: to delete / not print from first line to pattern

      • delete: /d

      • not print: -n /!p

    • for both CRLF and LF

    [!TIP]

    • solution: with matched line number + 1 : "$(( n+1 ))"',$p'

      • head -n1 : for first matches pattern line number

      • tail -n1 : for the last matches pattern line number

get from last empty line ( ^$ ) to end

[!NOTE|label:references:]

  • awk

  • tac + awk

  • sed

    [!WARNING]

    • for LF only, not support CRLF \r

reverse search empty line

[!NOTE|label:for show TODO]

return first matching pattern

{% hint style='tip' %}

references:

[!TIP]

sed

awk

return second matching pattern search range

{% hint style='tip' %}

references:

[!TIP]

return the last matching pattern search range

[!NOTE|label:references:]

sed

awk

replace the last matching pattern

xargs

{% hint style='tip' %}

references:

complex commands with xargs

[!NOTE|label:references:]

sort all shell script by line number

[!TIP] Pipe xargs into find

diff every git commit against its parent

[!TIP] precondition:

compress sub-folders

ping multiple IPs

[!TIP]

  • or

find

[!NOTE|label:reference:]

output format

[!TIP|label:man find:]

  • man find

    • %P File's name with the name of the command line argument under which it was found removed.

    • %f File's name with any leading directories removed (only the last element).

output file name only

cat config file in all .git folder

  • xargs && cat

  • find && -exec

exec and sed

  • change IP address in batch processing

find and rename

find && tar

[!TIP] more can be found in imarslo: find and tar

  • backup all config.xml in JENKINS_HOME

  • back build history

find by timestamp

[!NOTE|label:references:]

via mtime

[!TIP|label:tricky on -mtime:]

via newermt

[!TIP|label:tips for -newerXY]

inject commands inside find

[!NOTE|label:references:]

printf

[!NOTE|label:references:]

formats

[!NOTE|label:references:]

  • time format

    FORMAT
    DESCRIPTION
    EXAMPLE

    %A

    last access time

    %A+: 2023-02-20+05:19:18.0000000000

    %T

    last modification time

    %T@: 1676899158.0000000000

    %t

    last modification time in ctime format

    Mon Feb 20 05:19:18.0000000000 2023

    %C

    last status change time

    %C+: 2024-11-20+03:30:18.8905999140

    %c

    last status change time in ctime format

    Wed Nov 20 03:30:18.8905999140 2024

    %B

    birth time

    %B@: 1676899158.0000000000

    TIME FIELD
    DESCRIPTION
    EXAMPLE

    +

    date time

    2023-02-20+05:19:18.0000000000

    @

    unix epoch

    1676899158.0000000000

    H

    hour

    00..23

    k / I

    hour in 24-hour / 12-hour format

    00..23 / 01..12

    M

    minute

    00..59

    S

    second

    00..60

    p

    AM/PM

    AM / PM

    T / X

    time in 24-hour format

    hh:mm:ss:xxxxxxxxxx

    Z

    timezone

    PST / PDT

    DATA FIELD
    DESCRIPTION
    EXAMPLE

    a / A

    abbreviated / full weekday

    Wed / Wednesday

    b(h) / B

    abbreviated / full month name

    Jan / January

    m

    month

    01..12

    d

    day of month

    01..31

    w

    day of week

    01->Monday; 02->Tuesday

    j

    day of year

    001..366

    U / W

    week number: Sunday / Monday as first day

    00..53

    y / Y

    last 2-digits-of-year / 4-digits-of-year

    00..99 / 1970..

    r

    time in 12-hour format

    hh:mm:ss [A/P]M

    F

    full date; same as %Y-%m-%d

    2023-02-20

    D

    date; same as %m/%d/%y

    02/20/23

    x

    locale date

    02/20/2023

  • name format

    FORMAT
    DESCRIPTION
    EXAMPLE

    %p

    file's name

    ./sample.txt

    %P

    file's name without starting-point

    sample.txt

    %f

    basename

    sample.txt

    %h

    leading directories of file's name

    .

  • permision format

    FORMAT
    DESCRIPTION
    EXAMPLE

    %m

    file's mode

    644

    %M

    file's mode in human-readable format

    -rw-r--r--

  • T : time in 24-hour format : hh:mm:ss.xxxxxxxxxx

  • X : locale time : hh:mm:ss.xxxxxxxxxx

  • c : locale time in ctime format

  • D : date : mm/dd/yy

  • F : date : yyyy-mm-dd

  • x : locale date : mm/dd/yy

  • R : hour and minute in 24 hour format : HH:MM

  • + : date and time

  • Formatting Tips

    • center-align

    • left-align

    • mixed align

tips

[!TIP|label:rules:] size first, then md5 hash

trim

trim tailing chars

  • awk + rev

  • ${var:: -x})

remove empty lines

[!NOTE|label:references:]

  • Delete empty lines using sed

    • sed

      • '/^[[:space:]]*$/d'

      • '/^\s*$/d'

      • '/^$/d'

      • -n '/^\s*$/!p'

    • grep

      • .

      • -v '^$'

      • -v '^\s*$'

      • -v '^[[:space:]]*$'

    • awk

      • /./

      • 'NF' or 'NF > 0'

      • '!/^$/'

      • 'length'

      • '/^[ \t]*$/ {next;} {print}'

      • '!/^[ \t]*$/'

remove empty line at the end of file

[!NOTE|label:references:]

remove duplicate empty lines

[!NOTE|label:references:]

[!NOTE|label:reference]

  • ${variable//search/replace}

  • sed

    or

echo "${string:0:$(( position - 1 ))}${replacement}${string:position}"

or

$ sed 's:\s\s*:|:g' <<< "${str}" aa|bb|cc

  • or

check line ending

[!NOTE|label:references:]

OS
CHARACTER ENCODING
ABBREVIATION
HEX
DEC
ESCAPE SEQUENCE

UNIX or Unix-like

ASCII

LF

0A

10

MS-DOS

ASCII

CR LF

0D 0A

13 10

\r

Commodore 8-bit machines

ASCII

CR

0D

13

QNX pre-POSIX

ASCII

RS

1E

30

\036

Acorn BBC and RISC OS

ASCII

LF CR

0A 0D

10 13

\n

Atari 8-bit machines

ATASCII

-

9B

155

-

IBM mainframe systems

EBCDIC

NL

15

21

\025

ZX80 and ZX81

non-ASCII

NEWLINE

76

118

-

  • od -c

  • hexdump -c

  • hexdump -C

  • vim

remove the ending '\n'

[!NOTE|label:references:]

add '\n' to line-ending

[!TIP]

  • check last char in the file

  • add new line

  • add new line ending without modifying the file

fold

check the params valid

{% hint style='tip' %}

available params should be contained by 'iwfabcem' {% endhint %}

insert

insert new line

insert right after the second match string

{% codetabs name="original", type="bash" -%} DCR DCR DCR {%- language name="expected", type="bash" -%} DCR DCR check DCR {%- endcodetabs %}

insert after the matched string

insert new line base on pattern

[!NOTE|label:references:]

  • awk

  • sed

write a file without indent space

  • or

{% codetabs name="example", type="bash" -%} $ sed -e 's:^\s*::' <<-'EOF' items.find ({ "repo": "${product}-${stg}-local", "type" : "folder" , "depth" : "1", "created" : { "${opt}": "4mo" } }) EOF items.find ({ "repo": "${product}-${stg}-local", "type" : "folder" , "depth" : "1", "created" : { "${opt}": "4mo" } }) {%- endcodetabs %}

cat

<< - and <<

Here Documents:

This type of redirection instructs the shell to read input from the current source until a line containing only delimiter (with no trailing blanks) is seen. All of the lines read up to that point are then used as the standard input for a command.

The format of here-documents is:

bash

$ cat -A sample.sh LANG=C tr a-z A-Z <<- END_TEXT$ Here doc with <<$ A single space character (i.e. 0x20 ) is at the beginning of this line$ ^IThis line begins with a single TAB character i.e 0x09 as does the next line$ ^IEND_TEXT$ $ echo The intended end was before this line$

$ bash sample.sh HERE DOC WITH <<- A SINGLE SPACE CHARACTER (I.E. 0X20 ) IS AT THE BEGINNING OF THIS LINE THIS LINE BEGINS WITH A SINGLE TAB CHARACTER I.E 0X09 AS DOES THE NEXT LINE The intended end was before this line

$ cat -A sample.sh LANG=C tr a-z A-Z << END_TEXT$ Here doc with <<$ A single space character (i.e. 0x20 ) is at the beginning of this line$ ^IThis line begins with a single TAB character i.e 0x09 as does the next line$ ^IEND_TEXT$ $ echo The intended end was before this line$

$ bash sample.sh sample.sh: line 7: warning: here-document at line 1 delimited by end-of-file (wanted `END_TEXT') HERE DOC WITH << A SINGLE SPACE CHARACTER (I.E. 0X20 ) IS AT THE BEGINNING OF THIS LINE THIS LINE BEGINS WITH A SINGLE TAB CHARACTER I.E 0X09 AS DOES THE NEXT LINE END_TEXT

ECHO THE INTENDED END WAS BEFORE THIS LINE

Last updated

Was this helpful?