Files

Things and Stuff Wiki - An organically evolving personal wiki knowledge base. An on-the-fly taxonomy containing a patchwork trail of topic outlines, descriptions, notes, stubs and breadcrumbs, with links to sites, systems, software, manuals, organisations, people, articles, guides, slides, papers, books, comments, videos, screencasts, webcasts, scratchpads and more. Content is orientated towards mostly free/libre/open, mostly Linux. Quality and age varies drastically. Sometimes old things are first, sometimes last. Use the Table of Contents menu to navigate long pages. Zoom in if text is too small. Dead link? Wayback Machine. I probably need to fix the theme CSS after an update. See also libreav.org. Chat to msg me (not checking tho atm). e

General

https://en.wikipedia.org/wiki/File_format - a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.Some file formats are designed for very particular types of data: PNG files, for example, store bitmapped images using lossless data compression. Other file formats, however, are designed for storage of several different types of data: the Ogg format can act as a container for different types of multimedia including any combination of audio and video, with or without text (such as subtitles), and metadata. A text file can contain any stream of characters, including possible control characters, and is encoded in one of various character encoding schemes. Some file formats, such as HTML, scalable vector graphics, and the source code of computer software are text files with defined syntaxes that allow them to be used for specific purposes.

https://en.wikipedia.org/wiki/Digital_container_format - or wrapper format is a metafile format whose specification describes how different elements of data and metadata coexist in a computer file. Among the earliest cross-platform container formats were Distinguished Encoding Rules and the 1985 Interchange File Format. Containers are frequently used in multimedia applications.

Best File Formats for Archiving - This guide compares common file formats for the purpose of digital archiving and preservation. It also discusses how to choose a resolution for images, and how to choose a sampling rate and a bit rate for MP3 audio files. Disclaimer: All of the below is provided as my personal opinion only, without guarantee for completeness or correctness. The current version of this text is 2018-01-13.

http://fileformats.archiveteam.org [1]

Turing complete formats - Snow B. Petrel, 2014

Types

https://en.wikipedia.org/wiki/List_of_file_formats

https://en.wikipedia.org/wiki/Filename_extension - or file extension is a suffix to the name of a computer file (for example, .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with a full stop (period), but in some systems it is separated with spaces. Other extension formats include dashes and/or underscores on early versions of Linux and some versions of IBM AIX. Some file systems implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others treat filename extensions as part of the filename without special distinction.

https://en.wikipedia.org/wiki/List_of_filename_extensions

https://en.wikipedia.org/wiki/Media_type - formerly known as MIME type is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication of these classifications. Media types were originally defined in Request for Comments 2045 in November 1996 as a part of MIME (Multipurpose Internet Mail Extensions) specification, for denoting type of email message content and attachments; hence the original name, MIME type. Media types are also used by other internet protocols such as HTTP and document file formats such as HTML, for similar purpose.

https://wiki.archlinux.org/index.php/Default_Applications - Programs implement default application associations in different ways, while command-line programs traditionally use environment variables, graphical applications tend to use XDG MIME Applications through either the GIO API, the Qt API or by executing /usr/bin/xdg-open, which is part of xdg-utils. Because xdg-open and XDG MIME Applications are quite complex various alternative resource openers were developed. These alternatives replace /usr/bin/xdg-open, which obviously only affects applications that actually use the executable. The following table lists some example applications for each method.

https://en.wikipedia.org/wiki/File_association - associates a file with an application capable of opening that file. More commonly, a file association associates a class of files (usually determined by their filename extension, such as .txt) with a corresponding application (such as a text editor).

Association between MIME types and applications - freedesktop.org specs
- shared-mime-info-spec

https://github.com/mime-types/mime-types-data - provides a registry for information about MIME media type definitions. It can be used with the Ruby mime-types library or other software to determine defined filename extensions for MIME types, or to use filename extensions to look up the likely MIME type definitions.

desktop-entry-spec - The desktop entry specification describes desktop entries: files describing information about an application such as the name, icon, and description. These files are used for application launchers and for creating menus of applications that can be launched.

Configuration files:

/usr/share/applications/defaults.list
  # global
~/.local/share/applications/defaults.list
  # per user, overrides global

~/.local/share/applications/mimeapps.list

~/.local/share/applications/mimeinfo.cache

[Default Applications]
mimetype=desktopfile1;desktopfile2;...;desktopfileN

xdg-mime default firefox.desktop x-scheme-handler/https x-scheme-handler/http
  # add two lines to ~/.config/mimeapps.list

https://github.com/adrg/xdg - Go implementation of the XDG Base Directory Specification and XDG user directories

CLI

XdgUtils
- https://wiki.archlinux.org/index.php/Xdg-open

xdg-mime default Thunar.desktop inode/directory
  # make Thunar the default file-browser

xdg-mime default xpdf.desktop application/pdf
  # make xpdf the default PDF viewer


xdg-settings get default-web-browser
  # returns what app opens http/https

https://superuser.com/questions/688063/is-there-a-way-to-redirect-certain-urls-to-specific-web-browsers-in-linux

https://metacpan.org/release/File-MimeInfo

/usr/bin/vendor_perl/mimeopen -d $file.pdf

xdg-settings - get and set various desktop environment settings

Replacing the File Manager in Zsh

mimeo - uses MIME-type file associations to determine which application should be used to open a file. It can launch files or print information such as the command that it would use, the detected MIME-type, etc. It is also possible to use regular expressions to associate arguments with applications. The most common example is to open URLs in browsers or associate file extensions with applications irrespective of their MIME-type.Mimeo tries to adhere to the relevant standards on freedesktop.org and should therefore be compatible with other applications that set or read MIME-type associations, e.g. PCManFM.

https://github.com/AndyCrowd/fbrokendesktop - scan and check desktop files for broken exec lines in .desktop tiles

GUI

http://docs.xfce.org/xfce/xfce4-settings/mime - gui

File Associations

 kcmshell5 filetypes

Metadata

https://en.wikipedia.org/wiki/Extensible_Metadata_Platform

Container

https://en.wikipedia.org/wiki/Digital_container_format - or wrapper format is a metafile format whose specification describes how different elements of data and metadata coexist in a computer file. Among the earliest cross-platform container formats were Distinguished Encoding Rules and the 1985 Interchange File Format. Containers are frequently used in multimedia applications.

IFF / RIFF

A Quick Introduction to IFF - AmigaOS Documentation Wiki

https://wiki.multimedia.cx/index.php/IFF

https://github.com/svanderburg/libiff - Portable, extensible parser for the Interchange File Format (IFF)

Other

Kaitai Struct - declarative binary format parsing language, a declarative language used to describe various binary data structures, laid out in files or in memory: i.e. binary file formats, network stream packet formats, etc.The main idea is that a particular format is described in Kaitai Struct language (.ksy file) and then can be compiled with ksc into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read the described data structure from a file or stream and give access to it in a nice, easy-to-comprehend API.

shell - How do I remove the first 300 million lines from a 700 GB txt file on a system with 1 TB max disk space? - Unix & Linux Stack Exchange -

Files & directories

https://en.wikipedia.org/wiki/Computer_file

https://en.wikipedia.org/wiki/Directory_(computing) - a file system cataloging structure which contains references to other computer files, and possibly other directories. On many computers, directories are known as folders, or drawers to provide some relevancy to a workbench or the traditional office file cabinet. Files are organized by storing related files in the same directory. In a hierarchical filesystem (that is, one in which files and directories are organized in a manner that resembles a tree), a directory contained inside another directory is called a subdirectory. The terms parent and child are often used to describe the relationship between a subdirectory and the directory in which it is cataloged, the latter being the parent. The top-most directory in such a filesystem, which does not have a parent of its own, is called the root directory.

https://en.wikipedia.org/wiki/Unix_file_types - For normal files in the file system, Unix does not impose or provide any internal file structure. This implies that from the point of view of the operating system, there is only one file type. The structure and interpretation thereof is entirely dependent on how the file is interpreted by software. Unix does however have some special files. These special files can be identified by the ls -l command which displays the type of the file in the first alphabetic letter of the file system permissions field. A normal (regular) file is indicated by a hyphen-minus '-'.

https://en.wikipedia.org/wiki/File_descriptor - In the traditional implementation of Unix, file descriptors index into a per-process file descriptor table maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table. This table records the mode with which the file (or other resource) has been opened: for reading, writing, appending, reading and writing, and possibly other modes. It also indexes into a third table called the inode table that describes the actual underlying files. To perform input or output, the process passes the file descriptor to the kernel through a system call, and the kernel will access the file on behalf of the process. The process does not have direct access to the file or inode tables.

On Linux, the set of file descriptors open in a process can be accessed under the path /proc/PID/fd/, where PID is the process identifier. In Unix-like systems, file descriptors can refer to any Unix file type named in a file system. As well as regular files, this includes directories, block and character devices (also called "special files"), Unix domain sockets, and named pipes. File descriptors can also refer to other objects that do not normally exist in the file system, such as anonymous pipes and network sockets.

The FILE data structure in the C standard I/O library usually includes a low level file descriptor for the object in question on Unix-like systems. The overall data structure provides additional abstraction and is instead known as a file handle.

A history of S_IFMT [2]

http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

http://xahlee.info/UnixResource_dir/writ/unix_origin_of_dot_filename.html

YouTube: USENIX ATC '20 - Can Applications Recover from fsync Failures?

Directory structure

See LSB, etc.

https://en.wikipedia.org/wiki/Unix_filesystem

https://en.wikipedia.org/wiki/inode

https://en.wikipedia.org/wiki/stat_(system_call) - a Unix system call that returns file attributes about an inode. The semantics of stat() vary between operating systems. stat appeared in Version 1 Unix. It is among the few original Unix system calls to change, with Version 4's addition of group permissions and larger file size. As an example, Unix command ls uses this system call to retrieve information on files that includes:

atime: time of last access (ls -lu)
mtime: time of last modification (ls -l)
ctime: time of last status change (ls -lc)

https://en.wikipedia.org/wiki/Virtual_file_system

https://en.wikipedia.org/wiki/Directory_(computing)

https://en.wikipedia.org/wiki/Directory_structure

https://en.wikipedia.org/wiki/Root_directory

http://man7.org/linux/man-pages/man7/file-hierarchy.7.html

https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard - defines the directory structure and directory contents in Unix[citation needed] and Unix-like operating systems. It is maintained by the Linux Foundation. The latest version is 3.0, released on 3 June 2015.[1] Currently it is only used by Linux distributions.

http://refspecs.linuxbase.org/FHS_3.0/fhs/index.html

http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html

Linux Directory Structure (File System Structure) Explained with Examples

https://wiki.archlinux.org/index.php/arch_filesystem_hierarchy

http://hivelogic.com/articles/using_usr_local/
Point/Counterpoint - /opt vs. /usr/local - March 2010
Understanding the bin, sbin, usr/bin , usr/sbin split [3] - Dec 9, 2010
-arch-dev-public- -RFC- merge /bin, /sbin, /lib into /usr/bin and /usr/lib - Mar 2nd, 2012

on ., .., .dotfiles - Rob Pike, Aug 3rd, 2012

https://github.com/fuchsia-mirror/docs/blob/master/the-book/../../master/the-book/dotdot.md [4]

Understanding the bin, sbin, usr/bin , usr/sbin split [5]

rmshit - Keep $HOME or other dir clean from unwanted tempfiles, configs and other crap you'll never use that's autocreated upon execution of bad behaving applications

BleachBit quickly frees disk space and tirelessly guards your privacy. Free cache, delete cookies, clear Internet history, shred temporary files, delete logs, and discard junk you didn't know was there. Designed for Linux and Windows systems, it wipes clean a thousand applications including Firefox, Internet Explorer, Adobe Flash, Google Chrome, Opera, Safari,and more. Beyond simply deleting files, BleachBit includes advanced features such as shredding files to prevent recovery, wiping free disk space to hide traces of files deleted by other applications, and vacuuming Firefox to make it faster.
- https://github.com/bleachbit/bleachbit

 bleachbit --clean system.cache system.localizations system.trash system.tmp

http://www.2uo.de/myths-about-urandom/ [6]

File permissions

https://wiki.archlinux.org/index.php/File_permissions_and_attributes

http://serverfault.com/questions/124800/how-to-setup-linux-permissions-for-the-www-folder

Unix Permissions Calculator

stat -c '%A %a %n' *
  list permissions in octal

ulimit

ulimit - provides control over the resources available to the shell and to processes started by it, on systems that allow such control.

https://www.networkworld.com/article/2693414/operating-systems/setting-limits-with-ulimit.html

limits.conf - configuration file for the pam_limits module, /etc/security/limits.conf

chmod

chmod
  # change file mode bits

Chmod, Umask, Stat, Fileperms, and File Permissions - 0000 to 0777 list, etc.

 # Before: drwxr-xr-x 6 archie users  4096 Jul  5 17:37 Documents
chmod g= Documents
chmod o= Documents
  # After: drwx------ 6 archie users  4096 Jul  6 17:32 Documents

Users;

u - the user who owns the file.
g - other users who are in the file's group.
o - all other users.
a - all users; the same as ugo.

Operation;

+ - to add the permissions to whatever permissions the users already have for the file.
- - to remove the permissions from whatever permissions the users already have for the file.
= - to make the permissions the only permissions that the users have for the file.

Permissions; r - the permission the users have to read the file. w - the permission the users have to write to the file. x - the permission the users have to execute the file, or search it if it is a directory.

http://serverfault.com/questions/364677/why-is-chmod-r-777-destructive [7]

chown

https://linux.die.net/man/1/chown - change file owner and group

chown -R user:group .
  change all files and directories to a specific use and group [8]

Access Control Lists

https://savannah.nongnu.org/projects/acl

https://wiki.archlinux.org/index.php/ACL
HOWTO: Access Control Lists (ACLs) in Linux - basics video

Partition must be mounted with acl option. "Operation not supported" from setfacl means this is not so.

mount -o remount,acl /

Set in /etc/fstab for persistence across boots.

Opening a directory requires execute permission.

setfacl -m "u:username:permissions"
setfacl -m "u:uid:permissions"
  add permissions for user

setfacl -m "g:groupname:permissions"
setfacl -m "g:gid:permissions"
  add permissions for group

setfacl -m "u:user:rwx" file
  add read, write, execure perms for user for file
setfacl -Rm "u:user:rw" /dir
  add recursive read, write perms for user for dir
setfacl -Rdm "u:user:rw" /dir
  add recursive read, write perms for user for dir and make them default for future changes

getfacl ./
  shows access control list for directory or file

Copying

cp

cp - copy files and directories

https://www.gnu.org/software/coreutils/manual/html_node/cp-invocation.html#cp-invocation

cp [option]… [-T] source dest

cp target --parents a/b/c existing_dir
  # copies the file `a/b/c' to `existing_dir/a/b/c', creating any missing intermediate directories. [10]

cp ubuntu.img /dev/sdb && sync

My experience with using cp to copy a lot of files (432 millions, 39 TB) [11]

scp

scp -P 2264 foobar.txt your_username@remotehost.edu:/some/remote/directory
scp -rP 2264 folder your_username@remotehost.edu:/some/remote/directory

dcp is a distributed file copy program that automatically distributes and dynamically balances work equally across nodes in a large distributed system without centralized state. http://filecopy.org/

rsync

rsync(1) - a fast, versatile, remote (and local) file-copying tool.

rsync is a software application and network protocol for Unix-like systems with ports to Windows that synchronizes files and directories from one location to another while minimizing data transfer by using delta encoding when appropriate. Quoting the official website: "rsync is a file transfer program for Unix systems. rsync uses the 'rsync algorithm' which provides a very fast method for bringing remote files into sync." An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. rsync can copy or display directory contents and copy files, optionally using compression and recursion.
- https://en.wikipedia.org/wiki/rsync

https://wiki.archlinux.org/index.php/rsync

How Rsync Works: A Practical Overview [12]

Easy Automated Snapshot-Style Backups with Rsync - This document describes a method for generating automatic rotating "snapshot"-style backups on a Unix-based system, with specific examples drawn from the author's GNU/Linux experience. Snapshot backups are a feature of some high-end industrial file servers; they create the illusion of multiple, full backups per day without the space or processing overhead. All of the snapshots are read-only, and are accessible directly by users as special system directories. It is often possible to store several hours, days, and even weeks' worth of snapshots with slightly more than 2x storage. This method, while not as space-efficient as some of the proprietary technologies (which, using special copy-on-write filesystems, can operate on slightly more than 1x storage), makes use of only standard file utilities and the common rsync program, which is installed by default on most Linux distributions. Properly configured, the method can also protect against hard disk failure, root compromises, or even back up a network of heterogeneous desktops automatically.

https://gergap.wordpress.com/2013/08/10/rsync-and-sparse-files

"Unfortunately “--sparse” and “--inplace” cannot be used together. Solution: When copying the file the first time, which means it does not exist on the target server use “rsync --sparse“. This will create a sparse file on the target server and copies only the used data of the sparse file. When the file already exists on the target server and you only want to update it use “rsync --inplace“. This will only transmit the changed blocks and can also append to the existing sparse file."

http://unix.stackexchange.com/questions/43777/what-keeps-one-side-of-an-rsync-so-busy

#rsync FAQ

http://www.jwz.org/doc/backups.html

rsync [OPTION...] SRC... [DEST]
  # copy source file or directory to destination

rsync local-file user@remote-host:remote-file

rsync -e='ssh -p8023' file remotehost:~/
  # non-standard remote shell command, copy to remote users home directory

rsync -r --partial --progress srcdirectory destdirectory
  # recursive
  # --partial - resume partial files,
  # --progress - show progress during transfer, equiv to --info=flist2,name,progress
  # -P                          same as --partial --progress

rsync -rP srcdirectory/ destdirectory
  # recursive, resume partial files with progress bar, don't copy root source folder


 rsync -ahrH --info=progress2 --no-i-r --partial --partial-dir=.rsync-partial

-a, --archive
  #  archive mode; equals -rlptgoD (no -A,-X,-H)
    # recursive, links, preserve permissions/times/groups/owner/device files.

-r, --recursive             recurse into directories
-l, --links                 copy symlinks as symlinks
-p, --perms                 preserve permissions
-t, --times                 preserve modification times
-g, --group                 preserve group
-o, --owner                 preserve owner (super-user only)
-D  --devices --specials.
    --devices               preserve device files (super-user only)
    --specials              preserve special files
 
-h, --human-readable        output numbers in a human-readable format

-v, --verbose               list files transfered
-vv

    --info=progress2       displays where in the list of files to be transfered rsync is
    --no-i-r               disable incremental recursion so rsync create list of all files
    --partial              resume partially transfered files
    --partial-dir=.tmpdir  use a dir to separate partially transfered files

    --inplace               This option is useful for transferring large files with block-based changes or appended data, and also on systems that are disk bound, not network bound. It can also help keep a copy-on-write filesystem snapshot from diverging the entire contents of a file that only has minor changes.
    --no-whole-file         incremental delta-xfer

-A, --acls                  preserve ACLs (implies -p, save permissions)
-X, --xattrs                preserve extended attributes
-H, --hard-links            preserve hard links

    --exclude-from=FILE     read exclude patterns from FILE
    --sparse                handle sparse files efficiently
-W, --whole-file            copy files whole (w/o delta-xfer algorithm)
-d, --dirs                  transfer directories without recursing
    --numeric-ids           transfer numeric group and user IDs rather than mapping user and group name
-x, --one-file-system       don't cross filesystem boundaries (mount points). "/*" as dest would bypass -x
-xx                         as above, but don't create empty directories for mount points

    --delete                delete extraneous files from dest dirs
    --delete-after          receiver deletes after transfer, not during
    --delete-excluded       also delete excluded files from dest dirs

    --ignore-errors         go ahead even when there are IO errors instead of regarding as fatal
    --stats                 give some file-transfer stats
-z                          compress files during transfer

https://wiki.archlinux.org/index.php/Full_System_Backup_with_rsync

rsync -aAXv --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} /* /path/to/backup/directory
  # archive, with permissions/ACL, and attributes, exclude directories

rsync -ahrHP --info=progress2 --no-i-r --partial-dir=.rsync-partial
  # archive, human readable, recursive, don't flatten hard links, progress bar, no incremental recursion, store partial transfers in temp dir

https://serverfault.com/questions/219013/showing-total-progress-in-rsync-is-it-possible

#!/bin/sh
if [ $# -lt 1 ]; then 
    echo "No destination defined. Usage: $0 destination" >&2
    exit 1
elif [ $# -gt 1 ]; then
    echo "Too many arguments. Usage: $0 destination" >&2
    exit 1
fi

if [ $2 = "new" ]; then
   SPARCEINPLACE="--sparse"
elif [ $2 = "rerun" ]; then
   SPARCEINPLACE="--inplace"
fi

RSYNC="ionice -c3 rsync"
RSYNC_ARGS="-aAXv --no-whole-file --stats --ignore-errors --human-readable --delete"
# -a = --recursive, --links, --perms, --times, --group, --owner, --devices, --specials
# -A = --acls, -X = --xattrs, -v = --verbose
# --no-whole-file = delta-xfer algorithm
# --delete = delete extraneous destination files
START=$(date +%s)
# SOURCES="/dir1 /dir2 /file3"
TARGET=$1

echo "Executing dry-run to see how many files must be transferred..."
TODO=$(${RSYNC} --dry-run ${RSYNC_ARGS} ${SOURCES} ${TARGET}|grep "^Number of files transferred"|awk '{print $5}')

${RSYNC} ${RSYNC_ARGS} ${SPARCEINPLACE} --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} \
--exclude={/home/*/.gvfs,/home/*/.thumbnails,/home/*/.cache,/home/*/.cache/mozilla/*,/home/*/.cache/chromium/*} \
--exclude={/home/*/.local/share/Trash/,/home/*/.macromedia,/var/lib/mpd/*,/var/lib/pacman/sync/*, /var/tmp*} \
 / $TARGET | pv -l -e -p -s "$TODO"

FINISH=$(date +%s)
echo "-- Total backup time: $(( ($FINISH-$START) / 60 ))m, $(( ($FINISH-$START) % 60 ))s"
touch $1/backup-from-$(date '+%a_%d_%m_%Y_%H%M%S')
# after refining excluded, --delete-excluded

0 5 * * * rsync-backup.sh /path/to/backup/directory rerun

https://github.com/kristapsdz/openrsync - a clean-room implementation of rsync with a BSD (ISC) license. It's compatible with a modern rsync (3.1.3 is used for testing, but any supporting protocol 27 will do), but accepts only a subset of rsync's command-line arguments. At this time, openrsync runs only on OpenBSD. If you want to port to your system (e.g. Linux, FreeBSD), read the Portability section first. [13]

Daemon

rsync --daemon
  # run as a daemon

https://download.samba.org/pub/rsync/rsyncd.conf.html

http://www.jveweb.net/en/archives/2011/01/running-rsync-as-a-daemon.html

http://unix.stackexchange.com/questions/26182/what-is-the-need-for-rsync-server-in-daemon-mode

lsyncd

https://code.google.com/p/lsyncd/

Services

http://www.rsync.net
- http://www.rsync.net/products/zfsintro.html

dd

dd - Copy a file, converting and formatting according to the options.
- dd is a common Unix program whose primary purpose is the low-level copying and conversion of raw data.

http://www.vidarholen.net/contents/blog/?p=479

http://wiki.linuxquestions.org/wiki/Dd

https://eklitzke.org/the-cult-of-dd [14]

if stands for input file and of for output file.

dd if=/dev/sda of=/mnt/sdb1/backup.img
  create a backup

dd if=/mnt/sdb1/backup.img of=/dev/sda
  restore a backup

dd if=/dev/sdb of=/dev/sdc
  clone a drive

dd if=/dev/sdb | ssh root@target "(cat >backup.img)"
  backup over network

dd if=/dev/cdrom of=cdimage.iso
  backup a cd

dd if=/dev/sr0 of=myCD.iso bs=2048 conv=noerror,sync
  create an ISO disk image from a CD-ROM.

dd if=/dev/sda2 of=/dev/sdb2 bs=4096 conv=noerror
  Clone one partition to another

dd if=/dev/ad0 of=/dev/ad1 bs=1M conv=noerror
  Clone a hard disk "ad0" to "ad1".

dd if=/dev/zero bs=1024 count=1000000 of=file_1GB

dd if=file_1GB of=/dev/null bs=64k
  drive benchmark test and analyze the sequential read and write performance for 1024 byte blocks

http://allgood38.github.io/a-snapshot-of-your-computer-with-dd-pv-and-gzip-part-1.html

dd if=local-file | ssh <host> "dd of=/path/to/remote-file"

hot-clone

Imaging mounted disk volumes under duress
- https://github.com/benjojo/hot-clone

tar pipe

The Tar Pipe - It basically means "copy the src directory to dst, preserving permissions and other special stuff." It does this by firing up two tars – one tarring up src, the other untarring to dst, and wiring them together. [15]

File information

ls

ls
  list files in current directory
ls -l
  long list, each file on a new line with information

ls *
  files in directory and immediate subdiretories

ls -a
  # show hidden files
ls  -A
  # show hidden files, exclude . and ..

ls -m1
  -m fill width with a comma separated list of entries ??
ls --format single-column
  column of names only
ls -l | grep - | awk '{print $9}'
  using awk to show the 9th word (name). strips colour.
ls -l | cut -f9 -s -d" "
  using cut to cut from the 9th word, using space as a delimiter. strips colour.

ls | cat
  # separate words into new lines

Greg's Wiki: ParsingLs - Why you shouldn't parse the output of ls(1)

https://github.com/trapd00r/ls--

https://github.com/hackerb9/lsix - Like "ls", but for images. Shows thumbnails in terminal using sixel graphics.

exa

https://github.com/ogham/exa

even-better-ls

https://github.com/mnurzia/even-better-ls - LS + Icons + Formatting

stat

stat .
  display file or file system status
stat -c "%n %a" * | column -t
  directory files + octal

chattr/lsattr

https://en.wikipedia.org/wiki/Chattr

lsattr .

append only (a), compressed (c), no dump (d), immutable (i), data journaling (j), secure deletion (s), no tail-merging (t), undeletable (u), no atime updates (A), synchronous directory updates (D), synchronous updates (S), and top of directory hierarchy (T).

file

file [filename]

binwalk

https://github.com/ReFirmLabs/binwalk - a fast, easy to use tool for analyzing, reverse engineering, and extracting firmware images.

Directory navigation

pwd

cd

cd change/directory/path

http://www.thegeekstuff.com/2008/10/6-awesome-linux-cd-command-hacks-productivity-tip3-for-geeks/

https://github.com/spencertipping/cd

Autojump

Fast Navigation in Terminal: Coming Full Circle - [16]

z

https://github.com/rupa/z - jump around, Bash. [17]

https://github.com/agkozak/zsh-z - Jump quickly to directories that you have visted "frecently." A native ZSH port of z.sh.

fasd

https://github.com/clvv/fasd - Command-line productivity booster, offers quick access to files and directories, inspired by autojump, z and v. Bash. [18]

v def conf       =>     vim /some/awkward/path/to/type/default.conf
j abc            =>     cd /hell/of/a/awkward/path/to/get/to/abcdef
m movie          =>     mplayer /whatever/whatever/whatever/awesome_movie.mp4
o eng paper      =>     xdg-open /you/dont/remember/where/english_paper.pdf
vim `f rc lo`    =>     vim /etc/rc.local
vim `f rc conf`  =>     vim /etc/rc.conf

alias defaults;

alias a='fasd -a'        # any
alias s='fasd -si'       # show / search / select
alias d='fasd -d'        # directory
alias f='fasd -f'        # file
alias sd='fasd -sid'     # interactive directory selection
alias sf='fasd -sif'     # interactive file selection
alias z='fasd_cd -d'     # cd, same functionality as j in autojump
alias zz='fasd_cd -d -i' # cd with interactive selection

autojump

https://github.com/joelthelion/autojump - very handy for navigation, Python.

pazi

https://github.com/euank/pazi - An autojump "zap to directory" helper, Rust.

zoxide

https://github.com/ajeetdsouza/zoxide - A fast cd command that learns your habits

z.lua

https://github.com/skywind3000/z.lua - zap A new cd command that helps you navigate faster by learning your habits.

zfm

https://github.com/pabloariasal/zfm - a minimal command line bookmark manager for zsh built on top of fzf. It lets you bookmark files and directories in your system and rapidly access them.It's intended to be a less intrusive alternative to z, autojump or fasd that doesn't pollute your prompt command or create bookmarks behind the scenes: you have full control over what gets bookmarked and when, like bookmarks on a web browser.

zz

zz - a smart and efficient directory changer [19]

cx

https://github.com/colfrog/cx - a very simple directory history utility written in C, a shell-agnostic clone of z [20]

to sort

http://jeroenjanssens.com/2013/08/16/quickly-navigate-your-filesystem-from-the-command-line.html [21]

https://aur.archlinux.org/packages/z.go-git - baskerville

https://github.com/skywind3000/z.lua - A new cd command that helps you navigate faster by learning your habits zap

https://github.com/huyng/bashmarks

https://github.com/vigneshwaranr/bd [22]

http://bsago.me/exa [23]

https://github.com/jamesob/desk - Lightweight workspace manager for the shell. Desk makes it easy to flip back and forth between different project contexts in your favorite shell. [24]

http://detox.sourceforge.net

https://github.com/Peltoche/lsd - next gen ls command

https://gitlab.com/chanceboudreaux/rx - Open files and surf through directories with ease.

Creating files

touch

touch filename
  create a file or update timestamp

mkdir

mkdir directory
mkdir directory -p
  no error if existing, make parent directories as needed

ln

symlink

ln -s {target-filename}
ln -s {target-filename} {symbolic-filename}
  create soft link

fallocate

fallocate - used to manipulate the allocated disk space for a file, either to deallocate or preallocate it. For filesystems which support the fallocate system call, preallocation is done quickly by allocating blocks and marking them as uninitialized, requiring no IO to the data blocks. This is much faster than creating a file by filling it with zeroes. The exit code returned by fallocate is 0 on success and 1 on failure.

Test files

yes abcdefghijklmnopqrstuvwxyz0123456789 > largefile
  # about 10 times faster than running dd if=/dev/urandom of=largefile, about as fast as using dd if=/dev/zero of=filename bs=1M. [25]

Editing files

http://www.gnu.org/software/ed/ed.html

See Vim, Emacs

echo "hello" >> greetings.txt

cat temp.txt >> data.txt
  To append the contents of the file temp.txt to file data.txt

date >> dates.txt
  To append the current date/time timestamp to the file dates.txt

Patching files

patch(1) - apply a diff file to an original

patch runs ed, and ed can run anything [26]

https://github.com/loyso/Scarab - A system to patch your content files. Basically, Scarab is a C++ library to build and apply binary diff package for directories. Scarab is built to be embeddable into your application or service.

Viewing files

<filename

cat

http://harmful.cat-v.org/cat-v/

cat filename
  output file to screen
cat -n filename
  output file to screen w/ line numbers
cat filename1 filename2
  output two files (concatinate)
cat filename1 > filename2
  overwrite filename2 with filename1
cat filename1 >> filename2
  append filename1 to filename2
cat filename{1,2} > filename2
  add filename1 and filename2 together into filename3

http://en.digipedia.org/man/doc/view/dog.1/

head

head filename
  top 10 lines of file
head -23 filename
  top 23 lines of file

tail

tail filename
  bottom 10 lines of file
tail -23 filename
  bottom 23 lines of file

Other

sed -n 20,30p filename
  print lines 20..30 of file [27]

bat

https://github.com/sharkdp/bat - A cat(1) clone with syntax highlighting and Git integration.

look

look(1) - displays any lines in file which contain string as a prefix. As look performs a binary search, the lines in file must be sorted. If file is not specified, the file /usr/share/dict/words is used, only alphanumeric characters are compared and the case of alphabetic charac- ters is ignored.

twf

https://github.com/wvanlint/twf - a standalone tree view explorer inspired by fzf.

File pagers

more

more is a filter for paging through text one screenful at a time. This version is especially primitive. Users should realize that less(1) provides more(1) emulation plus extensive enhancements.

less

less is an improvement on more and a funny name.

http://garrett.damore.org/2014/09/modernizing-less.html [28]

lesspipe.sh - a preprocessor for less

Moving files

mv

mv position1 ~/position2
  basic move

https://github.com/jarun/advcpmv - A patch for GNU Core Utilities cp, mv to add progress bars

Rename files in linux / bash using mv command without typing the full name two times - [30]

renameutils

renameutils - a set of programs designed to make renaming of files faster and less cumbersome. The file renaming utilities consists of five programs - qmv, qcp, imv, icp and deurlname.

The qmv ("quick move") program allows file names to be edited in a text editor. The names of all files in a directory are written to a text file, which is then edited by the user. The text file is read and parsed, and the changes are applied to the files.

The qcp ("quick cp") program works like qmv, but copies files instead of moving them.

The imv ("interactive move") program, is trivial but useful when you are too lazy to type (or even complete) the name of the file to rename twice. It allows a file name to be edited in the terminal using the GNU Readline library. icp copies files.

The deurlname program removes URL encoded characters (such as %20 representing space) from file names. Some programs such as w3m tend to keep those characters encoded in saved files.

Métamorphose

Métamorphose - a batch renamer, a program to rename large sets of files and folders quickly and easily.With its extensive feature set, flexibility and powerful interface, Métamorphose is a profesional's tool. A must-have for those that need to rename many files and/or folders on a regular basis.In addition to general usage renaming, it is very useful for photo and music collections, webmasters, programmers, legal and clerical, etc.

nmly

https://github.com/Usbac/nmly - Mass file rename utility with useful functions and written in C

Detox

Detox - a utility designed to clean up filenames. It replaces difficult to work with characters, such as spaces, with standard equivalents. It will also clean up filenames with UTF-8 or Latin-1 (or CP-1252) characters in them.
- https://github.com/dharple/detox

Removing files

rm

rm file

rm -rf directory

find * -maxdepth 0 -name 'keepthis' -prune -o -exec rm -rf '{}' ';'
  # remove all but keepthis [31]

fd somefiletype | xargs rm
  # remove all files returned by fd

scrub

https://github.com/romanhargrave/scrub - a utility that behaves like rm -r, except that rather than simply removing everything it will only remove what you tell it to.

Managing files

See File managers

Finding files

See Regex, Search

ls | sort -f | uniq -i -d
  # list duplicate files taking upper/lower case into account [32]

whereis

GNU findutils

http://www.gnu.org/software/findutils/

https://en.wikipedia.org/wiki/GNU_Find_Utilities

GNU Find Utilities are the basic directory searching utilities of the GNU operating system. These programs are typically used in conjunction with other programs to provide modular and powerful directory search and file locating capabilities to other commands.

The tools supplied with this package are:

find - search for files in a directory hierarchy
locate - list files in databases that match a pattern
updatedb - update a file name database
xargs - build and execute command lines from standard input

find

find /usr/share -name README
find ~/Journalism -name '*.txt'
find ~/Programming -path '*/src/*.c'

find ~/Journalism -name '*.txt' -exec cat {} ;
  exec command on result path (aliases don't work in exec argument)

find ~/Images/Screenshots -size +500k -iname '*.jpg'
find ~/Journalism -name '*.txt' -print0 | xargs -0 cat   (faster than above)

find / -group [group]
find / -user [user]

find . -mtime -[n]
  File's data was last modified n*24 hours ago

find . -mtime +5 -exec rm {} \;
  remove files older than 5 days

find . -type f -links +1
  list hard links

https://www.eriwen.com/productivity/find-is-a-beautiful-tool [33]

-9fans- find(1) revisited.

https://github.com/jaimebuelta/ffind
- http://wrongsideofmemphis.com/2012/11/19/ffind-a-sane-replacement-for-command-line-file-search/

xargs

http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
- https://en.wikipedia.org/wiki/Xargs

xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input. Blank lines on the standard input are ignored.

Because Unix filenames can contain blanks and newlines, this default behaviour is often problematic; filenames containing blanks and/or newlines are in‐correctly processed by xargs. In these situations it is better to use the -0 option, which prevents such problems. When using this option you will need to ensure that the program which produces the input for xargs also uses a null character as a separator. If that program is GNU find for example, the -print0 option does this for you.

If any invocation of the command exits with a status of 255, xargs will stop immediately without reading any further input. An error message is issued on stderr when this happens.

Linux and Unix xargs command tutorial with examples

echo 'one two three' | xargs mkdir
ls
$ one two three

echo 'one two three' | xargs -t rm
$ rm one two three

find . -name '*.py' | xargs wc -l
  # Recursively find all Python files and count the number of lines

find . -name '*~' | xargs -0 rm
  # Recursively find all Emacs backup files and remove them, alloging for filenames with whitespace

find . -name '*.py' | xargs grep 'import'
  # Recursively find all Python files and search them for the word ‘import’

find . -type f -print0 | xargs -0 stat -c "%y %s %n"
  # prints permissions in octal (0775, etc.)

http://www.cyberciti.biz/faq/linux-unix-bsd-xargs-construct-argument-lists-utility/

http://offbytwo.com/2011/06/26/things-you-didnt-know-about-xargs.html

find . -name "*.ext" -print0 | xargs -n 1000 -I '{}' mv '{}' ../..
  # for a number of files to high for the mv command (move to two directories up)

bfs

https://github.com/tavianator/bfs - a variant of the UNIX find command that operates breadth-first rather than depth-first. It is otherwise intended to be compatible with many versions of find, including POSIX find, GNU find, {Free,Open,Net}BSD find, macOS find

fd

https://github.com/sharkdp/fd - a simple, fast and user-friendly alternative to find. While it does not seek to mirror all of find's powerful functionality, it provides sensible (opinionated) defaults for 80% of the use cases.

hf

https://github.com/hugows/hf - a command line utility to quickly find files and execute a command - something like Helm/Anything/CtrlP for the terminal. It tries to find the best match, like other fuzzy finders (Sublime, ido, Helm). [34]

locate

http://en.wikipedia.org/wiki/Locate_(Unix)

locate fileordirectory

locate /

locate / | xargs -i echo 'test -f "{}" && echo "{}"' | sh
  # only files

locate / | xargs -i echo 'test -f "{}" && echo "{}"' | sh
  # only directories [35]

https://wiki.archlinux.org/index.php/locate
- https://www.archlinux.org/news/slocate-replaced-by-mlocate/

"Although in other distros locate and updatedb are in the findutils package, they are no longer present in Arch's package. To use it, install the mlocate package. mlocate is a newer implementation of the tool, but is used in exactly the same way."

mlocate is a locate/updatedb implementation. The 'm' stands for "merging": updatedb reuses the existing database to avoid rereading most of the file system, which makes updatedb faster and does not trash the system caches as much. The locate(1) utility is intended to be completely compatible to slocate. It also attempts to be compatible to GNU locate, when it does not conflict with slocate compatibility.

Before locate can be used, the database will need to be created. To do this, simply run updatedb as root.

sudo updatedb
  # creates/updates a db file of paths that is queried by locate

http://linux.die.net/man/8/updatedb

/etc/updatedb.conf

PRUNE_BIND_MOUNTS = "yes"
PRUNEFS = "9p afs anon_inodefs auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset cramfs debugfs devpts devtmpfs ecryptfs exofs ftpfs fuse fuse.encfs fuse.sshfs fusectl gfs gfs2 hugetlbfs inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs shfs smbfs sockfs sshfs sysfs tmpfs ubifs udf usbfs vboxsf"
PRUNENAMES = ".git .hg .svn"
PRUNEPATHS = "/afs /media /mnt /net /sfs /tmp /udev /var/cache /var/lib/pacman/local /var/lock /var/run /var/spool /var/tmp"

http://linux.die.net/man/5/mlocate.db

/var/lib/mlocate/mlocate.db

http://jvns.ca/blog/2015/03/05/how-the-locate-command-works-and-lets-rewrite-it-in-one-minute

strings /var/lib/mlocate/mlocate.db | grep -E '^/.*config'
  # query db directly, needs sudo or sudoers or acl

fsearch

https://github.com/cboxdoerfer/fsearch - A fast file search utility for Unix-like systems based on GTK [36]

fselect

https://github.com/jhspetersson/fselect - Find files with SQL-like queries

to sort

http://arstechnica.com/information-technology/2011/07/ask-ars-how-to-use-the-find-command-in-a-pipeline/

http://www.pixelbeat.org/fslint/

Diffing files

File and directory comparison.

diff(1)

https://github.com/spcau/godiff - A File/Directory diff-like comparison tool with HTML output.

diffoscope - in-depth comparison of files, archives, and directories. diffoscope will try to get to the bottom of what makes files or directories different. It will recursively unpack archives of many kinds and transform various binary formats into more human readable form to compare them. It can compare two tarballs, ISO images, or PDF just as easily.
- https://salsa.debian.org/reproducible-builds/diffoscope

https://github.com/renardchien/Differnator - a tool for comparing two similar codebases leveraging diff command operations and other comparative operations. Differnator can do large scale comparisons across several directories or numerous pairs of similar codebases.

GUI

Meld

KDiff3

https://en.wikipedia.org/wiki/Kompare

Krusader - Twin panel file management for your desktop - an advanced twin panel (commander style) file manager for KDE Plasma and other desktops in the *nix world, similar to Midnight or Total Commander.

xxdiff - a graphical file and directories comparator and merge tool, provided under the GNU GPL open source license. It has reached stable state, and is known to run on many popular unices, including IRIX, Linux, Solaris, HP/UX, DEC Tru64. It has been deployed inside many large organizations and is being actively maintained by its author (Martin Blais).

Diffuse - graphical tool for merging and comparing text files

Finding duplicate files

md5deep and hashdeep - md5deep is a set of programs to compute MD5, SHA-1, SHA-256, Tiger, or Whirlpool message digests on an arbitrary number of files. md5deep is similar to the md5sum program found in the GNU Coreutils package, but has the following additional features: Recursive operation - md5deep is able to recursive examine an entire directory tree. That is, compute the MD5 for every file in a directory and for every file in every subdirectory. Comparison mode - md5deep can accept a list of known hashes and compare them to a set of input files. The program can display either those input files that match the list of known hashes or those that do not match. Hashes sets can be drawn from Encase, the National Software Reference Library, iLook Investigator, Hashkeeper, md5sum, BSD md5, and other generic hash generating programs. Users are welcome to add functionality to read other formats too! Time estimation - md5deep can produce a time estimate when it's processing very large files. Piecewise hashing - Hash input files in arbitrary sized blocks File type mode - md5deep can process only files of a certain type, such as regular files, block devices, etc. hashdeep is a program to compute, match, and audit hashsets. With traditional matching, programs report if an input file matched one in a set of knows or if the input file did not match. It's hard to get a complete sense of the state of the input files compared to the set of knowns. It's possible to have matched files, missing files, files that have moved in the set, and to find new files not in the set. Hashdeep can report all of these conditions. It can even spot hash collisions, when an input file matches a known file in one hash algorithm but not in others. The results are displayed in an audit report.

FreeFileSync - a folder comparison and synchronization software that creates and manages backup copies of all your important files. Instead of copying every file every time, FreeFileSync determines the differences between a source and a target folder and transfers only the minimum amount of data needed. FreeFileSync is Open Source software, available for Windows, Linux and macOS.

dupeGuru - a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It’s written mostly in Python 3 and has the peculiarity of using multiple GUI toolkits, all using the same core Python code. On OS X, the UI layer is written in Objective-C and uses Cocoa. On Linux 7 Windows, it’s written in Python and uses Qt5. dupeGuru is a tool to find duplicate files on your computer. It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same. dupeGuru runs on Mac OS X and Linux. dupeGuru is efficient. Find your duplicate files in minutes, thanks to its quick fuzzy matching algorithm. dupeGuru not only finds filenames that are the same, but it also finds similar filenames.
- https://github.com/hsoft/dupeguru

https://github.com/adrianlopezroche/fdupes - a program for identifying or deleting duplicate files residing within specified directories.
- http://en.wikipedia.org/wiki/Fdupes

https://github.com/jbruchon/jdupes - A powerful duplicate file finder and an enhanced fork of 'fdupes'. What jdupes is not: a similar (but not identical) file finding tool.

https://github.com/sahib/rmlint - finds space waste and other broken things on your filesystem and offers to remove it.

Duff - a Unix command-line utility for quickly finding duplicates in a given set of files.Duff is written in C, uses gettext where available, is licensed under the zlib/libpng license and should compile on most modern Unices.

https://freshmeat.sourceforge.net/projects/ftwin - a tool useful to find duplicate files or pictures according to their content on your file system.

dupd - A fast and convenient CLI tool for finding duplicate files
- https://github.com/jvirkki/dupd

https://github.com/IgnorantGuru/rmdupe - uses standard linux commands to search within specified folders for duplicate files, regardless of filename or extension. Before duplicate candidates are removed they are compared byte-for-byte. rmdupe can also check duplicates against one or more reference folders, can trash files instead of removing them, allows for a custom removal command, and can limit its search to files of specified size. rmdupe includes a simulation mode which reports what will be done for a given command without actually removing any files.

https://github.com/sebastien/sink - Swiss army knife for directory comparison and synchronization. Python.

https://github.com/tbores/fiche - Directory comparison tools. Fiche uses hashes based comparison. Python.

https://github.com/stephen322/dircmp - directory comparison utility. D.

https://github.com/l00g33k/dirtree - This is a command line directory utility that works on both Windows and Linux. It scans the directory information and store in a file. This allows two computers to be scanned at a different time and even in two different country and allows the comparison to be carried out at a different time and place. This allows terabytes of disk with millions…

https://github.com/bmaia/binwally - Binary and Directory tree comparison tool using Fuzzy Hashing
- w00tsec: Binwally: Directory tree diff tool using Fuzzy Hashing

https://github.com/drrlvn/fscmp - Directory/file comparison utility. Rust.

https://github.com/lordmulder/DoubleFilerScanner - Scan for duplicate files.

https://github.com/qwertz19281/dupion - Duplicate file/folder finder, can also scan in archives, HDD optimized

Archiving

https://en.wikipedia.org/wiki/Archive_file

https://en.wikipedia.org/wiki/File_archiver

ar

https://en.wikipedia.org/wiki/ar_(Unix) - a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

shar

http://www.gnu.org/software/sharutils/

http://en.wikipedia.org/wiki/shar - an abbreviation of shell archive, is an archive format. A shar file is a shell script, and executing it will recreate the files. This is a type of self-extracting archive file. It can be created with the Unix shar utility. To extract the files, only the standard Unix Bourne shell sh is usually required. Note that shar is not specified by the Single Unix Specification, so it is not formally a component of Unix, but a legacy utility.

unshar programs have been written for other operating systems but are not always reliable; shar files are shell scripts and can theoretically do anything that a shell script can do (including using incompatible features of enhanced or workalike shells), limiting their utility outside the Unix world.

The drawback of self-extracting shell scripts (any kind, not just shar) is that they rely on a particular implementation of programs; shell archives created with older versions of makeself

GNU paxutils

http://www.gnu.org/software/paxutils/

cpio

https://en.wikipedia.org/wiki/cpio - The cpio archive format has several basic limitations: It does not store user and group names, only numbers. As a result, it cannot be reliably used to transfer files between systems with dissimilar user and group numbering.

cd /old-dir
find . -name '*.mov' -print | cpio -pvdumB /new-dir [37]

tar

tar

TAR versus Portability - [38]

-z: Compress archive using gzip program
-c: Create archive
-v: Verbose i.e display progress while creating archive
-f: Archive File name

tar -zcvf archive-name.tar.gz directory-name

tar -cjf foo.tar.bz2 bar/
  create bzipped tar archive of the directory bar called foo.tar.bz2

tar -xvf foo.tar
  verbosely extract foo.tar
tar -xzf foo.tar.gz
  extract gzipped foo.tar.gz

tar -xjf foo.tar.bz2 -C bar/
  extract bzipped foo.tar.bz2 after changing directory to bar
tar -xzf foo.tar.gz blah.txt
  extract the file blah.txt from foo.tar.gz

http://hacktux.com/tar/removing/leading/from/member/names

https://github.com/rxi/microtar - A lightweight tar library written in ANSI C

pax

pax - will read, write, and list the members of an archive file, and will copy directory hierarchies. pax operation is independent of the specific archive format, and supports a wide variety of different archive formats. A list of supported archive formats can be found under the description of the -x option. [39]
- https://en.wikipedia.org/wiki/Pax_(Unix)

http://www.linuxjournal.com/content/make-peace-pax

http://h3manth.com/content/pax-powerful-tool-read-and-write-file-archives-and-copy-directory-hierarchies

mkdir newdir
cd olddir
pax -rw . newdir

pak

PAK File - an archive used by video games such as Quake, Hexen, Crysis, Far Cry, Half-Life, and Exient XGS Engine games. It may include graphics, objects, textures, sounds, and other game data "packed" into a single file. PAK files are often just a renamed .ZIP file.

https://wiki.arx-libertatis.org/PAK_file_format - very simple archive format with a file table that contains offsets to the file data, which can be stored at arbitrary positions and in arbitrary order but must be contiguous for each individual file in the archive.

https://github.com/jmaselbas/pak - a simple .pak archive utility to list, extract and create pak archives

libarchive

libarchive - Multi-format archive and compression library
- https://github.com/libarchive/libarchive

Afio

Afio - makes cpio-format archives. It deals somewhat gracefully with input data corruption, supports multi-volume archives during interactive operation, and can make compressed archives that are much safer than compressed tar or cpio archives. Afio is best used as an `archive engine' in a backup script.
- https://github.com/kholtman/afio

makeself

https://github.com/megastep/makeself - A self-extracting archiving tool for Unix systems, in 100% shell script.

Trash can

https://www.freedesktop.org/wiki/Specifications/trash-spec/

https://github.com/andreafrancia/trash-cli

https://github.com/robrwo/bashtrash

Caches

https://github.com/xtaran/unburden-home-dir - Automatically unburden $HOME from caches, etc. Useful for $HOME on SSDs, small disks or slow NFS homes. Can be triggered via an hook in /etc/X11/Xsession.d/.

Storage space

CLI tools

du (disk usage)

du -sh
  size of a folder
du -S
  size of files in a folder

du -aB1m|awk '$1 >= 100'
  everything over 100Mb

cd / | sudo du -khs *
  show root folder size

sudo du -a --max-depth=1 /usr/lib | sort -n -r | head -n 20
  size of program folders /usr/lib

du -sk ./* | sort -nr | awk 'BEGIN{ pref[1]="K"; pref[2]="M"; pref[3]="G";} { total = total + $1;
x = $1; y = 1;  while( x > 1024 ) { x = (x + 1023)/1024; y++; }
printf("%g%s\t%s\n",int(x*10)/10,pref[y],$2); } END { y = 1; while( total > 1024 )
{ total = (total + 1023)/1024; y++; } printf("Total: %g%s\n",int(total*10)/10,pref[y]); }'

diskus

https://github.com/sharkdp/diskus - a very simple program that computes the total size of the current directory. It is a parallelized version of du -sh. On my 8-core laptop, it is about ten times faster than du with a cold disk cache and more than three times faster with a warm disk cache. [40]

dust

https://github.com/bootandy/dust - A more intuitive version of du in rust [41]

df

df - report file system disk space usage
- http://en.wikipedia.org/wiki/Df_(Unix) - a standard Unix command used to display the amount of available disk space for file systems on which the invoking user has appropriate read access. df is typically implemented using the statfs or statvfs system calls.

df -h
  human readable

di

di - a disk information utility, displaying everything (and more) that your 'df' command does. It features the ability to display your disk usage in whatever format you prefer. It also checks the user and group quotas, so that the user sees the space available for their use, not the system wide disk space.

dfc

https://github.com/Rolinh/dfc - a tool to report file system space usage information. When the output is a terminal, it uses color and graphs by default. It has a lot of features such as HTML, JSON and CSV export, multiple filtering options, the ability to show mount options and so on.

dv

https://github.com/ARM-DOE/dv - a command line tool for visualizing disk data usage on systems that don't have access to a graphical environment. After scanning a directory, dv generates an interactive webpage that displays useful information about the disk's contents. This webpage is a sunburst partition layout, where each subdirectory is displayed as a portion of the scanned directory. Hovering over a directory shows its size and percentage of the whole. dv is multiprocessed, and can be used to quickly visualize what is taking up space on a system.

dfrs

https://github.com/anthraxx/dfrs - Display file system space usage using graphs and colors

pdu

https://github.com/KSXGitHub/parallel-disk-usage - Highly parallelized, blazing fast directory tree analyzer

TUI

tdu

tdu - a text-mode disk-usage utility
- https://github.com/dse/tdu

cdu

cdu - Color du, a perl script which calls du and display a pretty histogram with optional colors which allow to imediatly see the directories which take disk space.

ncdu

ncdu - ncurses disk usage

ncdu / --exclude /home --exclude /media --exclude /run/media
  check everything apart from home and external drives

ncdu / --exclude /home --exclude /media --exclude /run/media
  check everything apart from external drives

ncdu / --exclude /home --exclude /media --exclude /run/media --exclude /boot
--exclude /tmp --exclude /dev --exclude /proc
  just the root partition

godu

https://github.com/viktomas/godu - Simple golang utility helping to discover large files/folders.

gdu

https://github.com/dundee/gdu - Disk usage analyzer with console interface written in Go

Broot

https://github.com/Canop/broot - An interactive tree view, a fuzzy search, a balanced BFS descent and customizable commands. [42]

ohmu

https://gitlab.com/paul-nechifor/ohmu - View space usage in your terminal.

duff

https://github.com/muesli/duf - Disk Usage/Free Utility (Linux, BSD & macOS) [43]

GUI

Baobab

Baobab - aka Disk Usage Analyzer, is a graphical, menu-driven viewer that you can use to view and monitor your disk usage and folder structure. It is part of every GNOME desktop.
- https://gitlab.gnome.org/GNOME/baobab

QDirStat

https://github.com/shundhammer/qdirstat - a graphical application to show where your disk space has gone and to help you to clean it up. This is a Qt-only port of the old Qt3/KDE3-based KDirStat, now based on the latest Qt 5. It does not need any KDE libs or infrastructure. It runs on every X11-based desktop on Linux, BSD and other Unix-like systems.

Filelight

Filelight - an application to visualize the disk usage on your computerFeatures: Scan local, remote or removable disks; Configurable color schemes; File system navigation by mouse clicks; Information about files and directories on hovering; Files and directories can be copied or removed directly from the context menu Integration into Konqueror and Krusader

xdiskusage

xdiskusage - a user-friendly program to show you what is using up all your disk space. It is based on the design of xdu written by Phillip C. Dykstra <dykstra at ieee dot org>. Changes have been made so it runs "du" for you, and can display the free space left on the disk, and produce a PostScript version of the display.

Filelight

Filelight - creates an interactive map of concentric, segmented rings that help visualise disk usage on your computer.

KDiskFree

KDiskFree - displays the available file devices (hard drive partitions, floppy and CD drives, etc.) along with information on their capacity, free space, type and mount point. It also allows you to mount and unmount drives and view them in a file manager.
- https://invent.kde.org/utilities/kdf

Multi

Duc

Duc - a collection of tools for indexing, inspecting and visualizing disk usage. Duc maintains a database of accumulated sizes of directories of the file system, and allows you to query this database with some tools, or create fancy graphs showing you where your bytes are.

Web

diskover

diskover - File system crawler, storage search engine and storage analytics software powered by Elasticsearch to help visualize and manage your disk space usage.

Harnessing Your Data with Diskover

Tagging

TMSU - a tool for tagging your files. It provides a simple command-line tool for applying tags and a virtual filesystem so that you can get a tag-based view of your files from within any other program.TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever you put them. TMSU maintains its own database and you simply gain an additional view, which you can mount, based upon the tags you set up. The only commitment required is your time and there's absolutely no lock-in.
- https://github.com/oniony/TMSU

https://github.com/alvatar/dfym

https://bitbucket.org/majo/oyepa/src/default/oyepa - a way to organize (and locate) your documents through ----- the use of tags

Files

General

Types

CLI

GUI

Metadata

Container

IFF / RIFF

Other

Files & directories

Directory structure

File permissions

ulimit

chmod

chown

Access Control Lists

Copying

cp

scp

rsync

Daemon

lsyncd

Services

dd

hot-clone

tar pipe

File information

ls

exa

even-better-ls

stat

chattr/lsattr

file

binwalk

Directory navigation

cd

Autojump

z

fasd

autojump

pazi

zoxide

z.lua

zfm

zz

cx

to sort

Creating files

touch

mkdir

ln

fallocate

Test files

Editing files

Patching files

Viewing files

cat

head

tail

Other

bat

look

twf

File pagers

more

less

most

vimpager

other

Moving files

mv

renameutils

Métamorphose

nmly

Detox

Removing files

rm

scrub

Managing files

Finding files