Text processing commands in Linux

Text processing commands in Linux

ยท

4 min read

Filters/Text processing commands

  • cut

    • allows us to cut the output, takes input of command/content of file and cut it to our desired output
  • awk

    • list by columns
  • grep and egrep

    • search by the keyword. Like searching for a specific word.
  • sort

    • sorts out the output in alphabetical order
  • uniq

    • not show duplicates in a file.
  • wc

    • word count command waca command. tells us how many words, letters, line we've in a file/output.

cut - Text Proecssors Commands

  • Cut is a command line utility that allows us to cut parts of lines from specified files or piped data and print the result to stdout. It can be used to cut parts of a line by delimiter, byte position, and character.

  • cut filename = doesn't work you've to specify options in this command

  • cut --version = check utility version.

  • cut -c1 filename = list one character

  • cut -c1,2,4 filename = Pick and chose character

  • cut -c1-3 filename = list range of characters

  • cut -c1-3,6-8 filename = list specific range of characters.

  • cut -b1-3 filename = list by byte size

  • cut -d: -f 6 /etc/passwd = list first 6th column separated by :

  • ls -l | cut -c2-4 = only print user permissions of files/dir

awk - Text processors commands

  • awk is a utility/language designed for data extraction. Most of the time it is used to extract fields from a file or from an output.

  • check version : awk --version

  • print 1st column of character in a file: awk '{print $1}' file

  • second column/field: awk '{print $2}' file

  • output specific columns: ls -l | awk '{print $1,$3}'

  • last field of the output: ls -l | awk '{print $NF}'

  • search for specific keyword: awk '/dev/ {print}' tx

  • output only first fields of file: awk -F: '{print $1}' /etc/passwd

  • replace words fields words: echo "Hello Tushar" | awk '{$2="Rajpoot"; print $0}'

    • $0 means whatever you find print it now.
  • replace words in output of file: cat tx | awk '{$1="Tushar"; print$0}'

  • get lines that have more than 3 bytes: awk 'length($0) > 3' tx

  • Get the field matching seinfeld in /home/tushar: ls -l | awk '{if($9 == "seinfeld") print $0;}'

  • Number of fields: ls -l | awk '{print NF}'

grep/egrep - Text processors commands

  • The grep command which stands for global regular expression print processes text line by line and prints any lines which match a specific pattern.

  • Check version or help: grep --version, grep --help

  • Search for a keyword from a file: grep keyword file

  • Search for a keyword and count: grep -c keyword file

  • Search for a keyword and ignore case-sensitive: grep -i KEYword file

  • Display the matched lines and their line numbers: grep -n keyword file

  • Display everything but keyword(exclude a keyword): grep -v keyword file

  • Search for a keyword and then only give the 1st field: grep keyword file | awk '{print $1}'

  • Search for a keyword and then only give the 1st field: ls -l | grep Desktop

  • Search for 2 keywords: egrep -i "keyword|keyword2" file

sort/uniq - Text processors commands

  • Sort command sorts in alphabetical order.

  • uniq command filters out the repeated or duplicate lines.

  • check version or help: sort --version or sort --help

  • sort files in alphabetical order ```sort file``

  • sort file content in reverse order: sort -r file

  • sort by field number: sort -k2 file

  • remove duplicates: uniq file

  • sort command output: ls -l | sort

  • remove duplicates in command output: ls -l | uniq

  • Always run sort first using uniq their line numbers: sort file | uniq

  • Sort first then uniq and list count: sort file | uniq -c

  • Only show repeated lines: sort file | uniq -d

wc - Text processors commands

  • wc command either reads stdin or stdout or a list of files and generates: newline count, word count and byte count

  • check file line count, word count and byte count: wc file

  • Get the number of lines in a file: wc -l file

  • Number of words: wc -w file

  • Number of bytes: wc -c file

  • NOT ALLOWED: wc DIRECTORY

  • Number of files: ls -l | wc -l

  • Count directories only: ls -ld */ | wc -l

  • Number of keyword lines: grep keyword | wc -l


Did you find this article valuable?

Support KubeKode Blogs by becoming a sponsor. Any amount is appreciated!

ย