Storage

Things and Stuff Wiki - An organically evolving personal wiki knowledge base. An on-the-fly taxonomy containing a patchwork trail of topic outlines, descriptions, notes, stubs and breadcrumbs, with links to sites, systems, software, manuals, organisations, people, articles, guides, slides, papers, books, comments, videos, screencasts, webcasts, scratchpads and more. Content is orientated towards mostly free/libre/open, mostly Linux. Quality and age varies drastically. Sometimes old things are first, sometimes last. Use the Table of Contents menu to navigate long pages. Zoom in if text is too small. Dead link? Wayback Machine. I probably need to fix the theme CSS after an update. See also libreav.org. Chat to msg me (not checking tho atm). e

Media

https://en.wikipedia.org/wiki/Create,_read,_update_and_delete - the four basic functions of persistent storage. Alternate words are sometimes used when defining the four basic functions of CRUD, such as retrieve instead of read, modify instead of update, or destroy instead of delete. CRUD is also sometimes used to describe user interface conventions that facilitate viewing, searching, and changing information; often using computer-based forms and reports. The term was likely first popularized by James Martin in his 1983 book Managing the Data-base Environment. The acronym may be extended to CRUDL to cover listing of large data sets which bring additional complexity such as pagination when the data sets are too large to be easily held in memory.

Non-volatile Storage - ACM Queue - Implications of the Datacenter's Shifting Center [1]

https://en.wikipedia.org/wiki/Nearline_storage - a portmanteau of "near" and "online storage") is a term used in computer science to describe an intermediate type of data storage that represents a compromise between online storage (supporting frequent, very rapid access to data) and offline storage/archiving (used for backups or long-term storage, with infrequent access to data). Nearline storage dates back to the IBM 3850 Mass Storage System (MSS) tape library, which was announced in 1974.

Tape

https://en.wikipedia.org/wiki/Linear_Tape-Open

Optical

CD

CD ISO

http://wiki.osdev.org/ISO_9660 - the standard file system for CD-ROMs. It is also widely used on DVD and BD media and may as well be present on USB sticks or hard disks. Its specifications are available for free under the name ECMA-119.

Bootable CD + retro game in a tweet - [2]

http://wiki.osdev.org/Mkisofs - a utility that creates an ISO 9660 image from files on disk. "mkisofs is effectively a pre-mastering program to generate the iso9660 filesystem - it takes a snapshot of a given directory tree, and generates a binary image which will correspond to an iso9660 filesystem when written to a block device." Developers of operating systems will mainly be interested in creating ISO filesystems for bootable CD, DVD, or BD via El-Torito. Nevertheless, ISO filesystems may also be booted from hard disk or USB stick.

Libburnia - a project for reading, mastering and writing optical discs. Currently it is comprised of libraries named libisofs, libburn, libisoburn, a cdrecord emulator named cdrskin, and an integrated multi-session tool named xorriso. The software runs on GNU/Linux, FreeBSD, Solaris, NetBSD, OpenBSD. It is base of the GNU xorriso package.

http://libburnia-project.org/wiki/Xorriso - xorriso is a command line and dialog application, which creates, loads, manipulates and writes ISO 9660 filesystem images with Rock Ridge extensions. It is part of the libisoburn release tarball. It copies file objects from POSIX compliant filesystems into Rock Ridge enhanced ISO 9660 filesystems and performs session-wise manipulation of such filesystems. It can load the management information of existing ISO images and it writes the session results to optical media or to filesystem objects. If linked with zlib then it is able to produce the zisofs compression format. Directory tree, whole session, and single data files may be equipped with MD5 checksums.

Burning

Xfburn - a simple CD/DVD burning tool based on libburnia libraries. It can blank CD/DVD(-RW)s, burn and create iso images, audio CDs, as well as burn personal compositions of data to either CD or DVD. It Is stable, and under ongoing development.

cdrdao - writes audio CD-Rs in disc-at-once mode
- http://www.linuxcommand.org/man_pages/cdrdao1.html

QPxTool - the linux way to get full control over your CD/DVD drives. It is the Open Source Solution which intends to give you access to all available Quality Checks (Q-Checks) on written and blank media, that are available for your drive. This will help you to find the right media and the optimized writing speed for your hardware, which will increase the chance for a long data lifetime.

https://github.com/sonejostudios/CDMasterTool - a tool for audio CD creation, TOC and CUE files manipulation, CD burning with CD-TEXT, drive(s) and CD analysis and Commandline launcher. It is mainly based of Cdrdao and libcdio, as well as a couple of other GNU/Linux tools. The main goal is to burn Audio CDs with CD-TEXT, out of a DAW's Red Book export WAV/TOC/CUE combination.

DVD

dvdisaster - a tool for creating error correction data (“ecc data”) for optical media such as CD, DVD and BD discs. Use cases for creating ecc data; recovering defective media using ecc data, and for general maintenanance of optical media.
- http://dvdisaster.net/downloads/manual.pdf

https://github.com/ldo/dvdauthor - a program that will generate a DVD-Video movie from a valid MPEG-2 stream that should play when you put it in a DVD player. To start you need MPEG-2 files that contain the necessary DVD-Video VOB packets. These can be generated with FFmpeg, or by by passing `-f 8` to `mplex`.

https://superuser.com/questions/39942/using-udf-on-a-usb-flash-drive

YouTube: DVD+R and DVD-R; What was that about?

Blu-Ray

https://www.zdnet.com/article/panasonics-blu-ray-raid-archive/

The Blu-Ray Reauthoring Project - [3]

https://en.wikipedia.org/wiki/Blu-ray_Disc_recordable

OCD

https://en.wikipedia.org/wiki/Optical_Disc_Archive - a storage technology that was introduced by the Sony Corporation. It uses removable cartridges, where each cartridge holds 11 optical discs. Each of the internal optical discs is similar to, but not compatible with, a Blu-ray disc. The latest version of the cartridge, that has a total capacity of about 5.5TB, uses discs that hold about 500GB each. The technology was publicly announced on 16 April 2012 during the NAB Show with the first units shipping in February 2013.

https://www.pro.sony.eu/pro/lang/en/eu/products/archiving-storage-optical-disc-archive

https://pro.sony.com/bbsc/ssr/cat-datastorage/cat-opticaldiscarchive/

https://cvp.com/catalogue/search/odc

M-DISC

https://en.wikipedia.org/wiki/M-DISC - a write-once optical disc technology introduced in 2009 by Millenniata, Inc. and available as DVD and Blu-ray discs. M-DISC's design is intended to provide greater archival media longevity. Millenniata claims that properly stored M-DISC DVD recordings will last 1000 years. The M-DISC DVD looks like a standard disc, except it is slightly thicker and almost transparent.

HDD

https://en.wikipedia.org/wiki/History_of_hard_disk_drives

https://en.wikipedia.org/wiki/Gibi-#Specific_units_of_IEC_60027-2_A.2_and_ISO.2FIEC_80000

http://forre.st/storage - Storage Analysis - GB/$ (New Egg) for different sizes and media

http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/ [4]

Hard disk hacking

https://news.ycombinator.com/item?id=11370840

https://news.ycombinator.com/item?id=12959457

Failure

What is the Best Hard Drive? - The table below shows the annual failure rate through the year 2014. Only models where we have 45 or more drives are shown.
PDF: Failure Trends in a Large Disk Drive Population

New Hard Drive rituals - [5]

Predicting Hard Drive Failures with SMART Stats [6]

Flash

https://en.wikipedia.org/wiki/Flash_memory - an electronic (solid-state) non-volatile computer storage medium that can be electrically erased and reprogrammed.Toshiba developed flash memory from EEPROM (electrically erasable programmable read-only memory) in the early 1980s and introduced it to the market in 1984.[citation needed] The two main types of flash memory are named after the NAND and NOR logic gates. The individual flash memory cells exhibit internal characteristics similar to those of the corresponding gates.While EPROMs had to be completely erased before being rewritten, NAND-type flash memory may be written and read in blocks (or pages) which are generally much smaller than the entire device. NOR-type flash allows a single machine word (byte) to be written – to an erased location – or read independently.

The NAND type is found primarily in memory cards, USB flash drives, solid-state drives (those produced in 2009 or later), and similar products, for general storage and transfer of data. NAND or NOR flash memory is also often used to store configuration data in numerous digital products, a task previously made possible by EEPROM or battery-powered static RAM. One key disadvantage of flash memory is that it can only endure a relatively small number of write cycles in a specific block.

Example applications of both types of flash memory include personal computers, PDAs, digital audio players, digital cameras, mobile phones, synthesizers, video games, scientific instrumentation, industrial robotics, and medical electronics. In addition to being non-volatile, flash memory offers fast read access times, although not as fast as static RAM or ROM. Its mechanical shock resistance helps explain its popularity over hard disks in portable devices, as does its high durability, ability to withstand high pressure, temperature and immersion in water, etc.

Although flash memory is technically a type of EEPROM, the term "EEPROM" is generally used to refer specifically to non-flash EEPROM which is erasable in small blocks, typically bytes. Because erase cycles are slow, the large block sizes used in flash memory erasing give it a significant speed advantage over non-flash EEPROM when writing large amounts of data. As of 2013, flash memory costs much less than byte-programmable EEPROM and had become the dominant memory type wherever a system required a significant amount of non-volatile solid-state storage.

SSD

YouTube: Velocity 2011: Artur Bergman, "Artur on SSD's"

Anatomy of a Solid-state Drive - ACM Queue - While the ubiquitous SSD shares many features with the hard-disk drive, under the surface they are completely different.

Howto HDD and SSD Alignment « Nottingham Linux Users Group -

http://bcache.evilpiepirate.org/

Why SSD Drives Destroy Court Evidence, and What Can Be Done About It | Forensic Focus - Articles

Analysis of SSD Reliability during power-outages -

Velocity 2011: Artur Bergman, "Artur on SSD's"

https://news.ycombinator.com/item?id=9052925

https://news.ycombinator.com/item?id=10764556

Everything I Know About SSDs 2019 - [7]

You can FAT32 format hard drive partitions larger than 32 GB with one of the tools from http://www.uwe-sieber.de/usbtrouble_e.html#format

F3 by Digirati - GPLv3 implementation of the algorithm of H2testw, and other tools that I have been implementing to speed up the identification of fake drives as well as making them usable: f3probe, f3fix, and f3brew. My implementation of H2testw, which I've broken into two applications named f3write and f3read, runs on Linux, Macs, Windows/Cygwin, and FreeBSD. f3probe is the fastest way to identify fake drives and their real sizes. f3fix enables users to use the real capacity of fake drives without losing data. f3brew helps developers to infer how fake drives work. f3probe, f3fix, and f3brew currently runs only on Linux.

https://en.wikipedia.org/wiki/Disk_sector - a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and DVD-ROMs. Newer HDDs use 4096-byte (4 KiB) sectors, which are known as the Advanced Format (AF). The sector is the minimum storage unit of a hard drive. Most disk partitioning schemes are designed to have files occupy an integral number of sectors regardless of the file's actual size. Files that do not fill a whole sector will have the remainder of their last sector filled with zeroes. In practice, operating systems typically operate on blocks of data, which may span multiple sectors.

https://en.wikipedia.org/wiki/Advanced_Format - any disk sector format used to store data on magnetic disks in hard disk drives (HDDs) that exceeds 512, 520, or 528 bytes per sector, such as the 4096, 4112, 4160, and 4224-byte (4 KB) sectors of an Advanced Format Drive (AFD). Larger sectors enable the integration of stronger error correction algorithms to maintain data integrity at higher storage densities. Advanced Format is also considered a milestone technology in the history of HDD storage, where data has been generally processed in 512-byte segments since at least the introduction of consumer-grade HDDs in the early 1980s, and in similar or smaller chunks in the professional field since the invention of HDDs in 1956.

tmpfs

https://en.wikipedia.org/wiki/tmpfs - a temporary file storage paradigm implemented in many Unix-like operating systems. It is intended to appear as a mounted file system, but data is stored in volatile memory instead of a persistent storage device. A similar construction is a RAM disk, which appears as a virtual disk drive and hosts a disk file system.

https://wiki.archlinux.org/index.php/tmpfs - a temporary filesystem that resides in memory and/or swap partition(s). Mounting directories as tmpfs can be an effective way of speeding up accesses to their files, or to ensure that their contents are automatically cleared upon reboot.

Partitions

https://en.wikipedia.org/wiki/Disk_partitioning

https://en.wikipedia.org/wiki/Partition_table - a table maintained on disk by the operating system describing the partitions on that disk. The terms partition table and partition map are most commonly associated with the MBR partition table of a Master Boot Record (MBR) in IBM PC compatibles, but it may be used generically to refer to other "formats" that divide a disk drive into partitions, such as: GUID Partition Table (GPT), Apple partition map (APM), or BSD disklabel.

Partition types

Arch Wiki: Partitioning

GNU Parted manipulates partition tables. This is useful for creating space for new operating systems, reorganizing disk usage, copying data on hard disks and disk imaging. The package contains a library, libparted, as well as well as a command-line frontend, parted, which can also be used in scripts.

http://gparted.sourceforge.net - still buggy when formatting (doesn't manage mounting right)

gnome-disks

MBR

http://en.wikipedia.org/wiki/Master_boot_record - a special type of boot sector at the very beginning of partitioned computer mass storage devices like fixed disks or removable drives intended for use with IBM PC-compatible systems and beyond. The concept of MBRs was publicly introduced in 1983 with PC DOS 2.0.

The MBR holds the information on how the partitions, containing file systems, are organized on that medium. The MBR also contains executable code to function as a loader for the installed operating system—usually by passing control over to the loader's second stage, or in conjunction with each partition's volume boot record (VBR). This MBR code is usually referred to as a boot loader.

https://en.wikipedia.org/wiki/Partition_type - a partition's entry in the partition table inside a master boot record (MBR) is a byte value intended to specify the file system the partition contains and/or to flag special access methods used to access these partitions (f.e. special CHS mappings, LBA access, logical mapped geometries, special driver access, hidden partitions, secured or encrypted file systems, etc.).

GPT

http://en.wikipedia.org/wiki/GUID_Partition_Table

http://www.petri.co.il/gpt-vs-mbr-based-disks.htm

Cache

https://bcache.evilpiepirate.org/

https://wiki.archlinux.org/index.php/Bcache

https://en.wikipedia.org/wiki/Bcache

https://lkml.org/lkml/2015/8/21/22 [9]

Loopback

https://en.wikipedia.org/wiki/Loop_device

https://www.mankier.com/4/loop - The loop device is a block device that maps its data blocks not to a physical device such as a hard disk or optical disk drive, but to the blocks of a regular file in a filesystem or to another block device. This can be useful for example to provide a block device for a filesystem image stored in a file, so that it can be mounted with the mount(8) command.

losetup - set up and control loop devices

The Loopback Root Filesystem HOWTO

http://blog.m8t.in/2009/05/making-use-of-loop-devices-on-linux.html

Create Linux Loopback File System On Disk File

Re: Using loop devices in Debian

Utils

https://github.com/g2p/blocks/

Repair

SMART

https://en.wikipedia.org/wiki/S.M.A.R.T.

smartmontools - contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology System (SMART) built into most modern ATA and SCSI harddisks. In many cases, these utilities will provide advanced warning of disk degradation and failure. It is derived from smartsuite.
- http://sourceforge.net/apps/trac/smartmontools/wiki

GSmartControl - a graphical user interface for smartctl (from smartmontools package), which is a tool for querying and controlling SMART (Self-Monitoring, Analysis, and Reporting Technology) data on modern hard disk and solid-state drives. It allows you to inspect the drive's SMART data to determine its health, as well as run various tests on it.

https://wiki.lime-technology.com/Understanding_SMART_Reports

https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/ [10]

Virtualization

https://en.wikipedia.org/wiki/Storage_virtualization

https://en.wikipedia.org/wiki/Logical_disk - a device that provides an area of usable storage capacity on one or more physical disk drive components in a computer system. Other terms that are used to mean the same thing are partition, logical volume, and in some cases a virtual disk (vdisk).

LVM

https://en.wikipedia.org/wiki/Logical_volume_management - provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual ones that administrators can re-size or move, potentially without interrupting system use. Volume management represents just one of many forms of storage virtualization; its implementation takes place in a layer in the device-driver stack of an OS (as opposed to within storage devices or in a network).

LVM2 - refers to the userspace toolset that provide logical volume management facilities on linux. It is reasonably backwards-compatible with the original LVM toolset.

https://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)

https://wiki.gentoo.org/wiki/LVM#Different_storage_allocation_methods

https://wiki.archlinux.org/index.php/LVM

http://tldp.org/HOWTO/LVM-HOWTO/

RedHat - LVM cheatsheet

4.8. Customized Reporting for LVM Red Hat Enterprise Linux 7 | Red Hat Customer Portal

Volume group - the highest level abstraction used within the Logical Volume Manager (LVM). It gathers together a collection of Logical Volumes (LV) and Physical Volumes (PV) into one administrative unit.

LVM2 defragmenter - defragments or rearranges a LVM2 volume group using pvmove.

Snapshots

LVM Snapshots | Programster's Blog

RAID

https://wiki.archlinux.org/index.php/Software_RAID_and_LVM

GUI

Kvpm - KDE Volume and Partition Manager is a GUI front end for Linux LVM and Gnu parted. LVM2 groups and volumes can be created, removed and manipulated using most of the options supported by the standard LVM2 tools. Some support for creating and operating on partitions is also provided. It also handles creating and mounting file systems.

https://fedoraproject.org/wiki/SystemConfig/lvm - a utility for graphically configuring Logical Volumes.

system-config-lvm

Caching

http://man7.org/linux/man-pages/man7/lvmcache.7.html

LVM SSD Caching - LVM has a cool feature that allows you to cache slow LVs on faster media. It has some limitations like not being able to cache an entire volume group but it does work. I had an extra Intel 520 240GB SSD that I was not using so I decided to add it to one of my servers to speed up an mdadm array.I can probably say that using Gitlab is a bit snappier through the web interface after implementing the SSD cache.

RAID

https://en.wikipedia.org/wiki/RAID

https://en.wikipedia.org/wiki/Standard_RAID_levels

https://en.wikipedia.org/wiki/Non-standard_RAID_levels

https://en.wikipedia.org/wiki/Mdadm

dmsetup - low level logical volume management

https://blogs.oracle.com/ahl/entry/what_is_raid_z - ZFS

SnapRAID - only one of the available not standard RAID solutions for disk arrays.
- https://github.com/amadvance/snapraid

https://github.com/Smurf-IV/Elucidate

Management

System Storage Manager

System Storage Manager - provides easy to use command line interface to manage your storage using various technologies like lvm, btrfs, encrypted volumes and possibly more. In more sophisticated enterprise storage environments, management with Device Mapper (dm), Logical Volume Manager (LVM), or Multiple Devices (md) is becoming increasingly more difficult. With file systems added to the mix, the number of tools needed to configure and manage storage has grown so large that it is simply not user friendly. With so many options for a system administrator to consider, the opportunity for errors and problems is large. The btrfs administration tools have shown us that storage management can be simplified, and we are working to bring that ease of use to Linux filesystems in general.

Snapper

Snapper - a tool for Linux filesystem snapshot management. Apart from the obvious creation and deletion of snapshots, it can compare snapshots and revert differences between snapshots. In simple terms, this allows root and non-root users to view older versions of files and revert changes. The features include: Manually create snapshots, Automatically create snapshots, e.g. with YaST and zypp, Automatically create timeline of snapshots, Show and revert changes between snapshots, Works with btrfs, ext4 and thin-provisioned LVM volumes, Supports Access Control Lists and Extended Attributes, Automatic cleanup of old snapshots, Command line interface, D-Bus interface, PAM module to create snapshots during login and logout.

https://github.com/openSUSE/snapper

https://wiki.archlinux.org/index.php/Snapper

https://github.com/ricardomv/snapper-gui - a graphical user interface for the tool snapper for Linux filesystem snapshot management. It can compare snapshots and revert differences between snapshots. In simple terms, this allows root and non-root users to view older versions of files and revert changes. Currently works with btrfs, ext4 and thin-provisioned LVM volumes

Blivet

Blivet - A python module for configuration of block devices
- https://github.com/storaged-project/blivet

https://github.com/storaged-project/blivet-gui - GUI tool for storage configuration using blivet library

File systems

https://en.wikipedia.org/wiki/File_system

https://en.wikipedia.org/wiki/Comparison_of_file_systems

https://wiki.archlinux.org/index.php/File_Systems

A Visual Expedition Inside the Linux File Systems [11]

http://en.wikipedia.org/wiki/Virtual_file_system - or virtual filesystem switch is an abstraction layer on top of a more concrete file system. The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way. A VFS can, for example, be used to access local and network storage devices transparently without the client application noticing the difference. It can be used to bridge the differences in Windows, Mac OS and Unix filesystems, so that applications can access files on local file systems of those types without having to know what type of file system they are accessing.

A tour of the Linux VFS

http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/ [12]

wipefs - wipe a filesystem signature from a device

https://github.com/pcd1193182/cursedfs - Make a disk image formatted with ZFS, ext2, and FAT all at once. [13]

/etc/fstab

http://en.wikipedia.org/wiki/fstab - or file systems table, a system configuration file commonly found on Unix systems. On Linux, it is part of the util-linux package. The fstab file typically lists all available disks and disk partitions, and indicates how they are to be initialized or otherwise integrated into the overall system's file system. fstab is still used for basic system configuration, notably of a system's main hard drive and startup file system, but for other uses has been superseded in recent years by automatic mounting. The fstab file is most commonly used by the mount command, which reads the fstab file to determine which options should be used when mounting the specified device.

https://wiki.archlinux.org/index.php/fstab

Formatting

sudo fdisk -l

sudo umount /dev/sdc1

# FAT
sudo mkfs.vfat -n 'device name' -I /dev/sdc1

# NTFS
sudo mkfs.ntfs -I /dev/sdc1

# EXT4
sudo mkfs.ext4 -n  -I /dev/sdc1

http://man7.org/linux/man-pages/man8/mke2fs.8.html

Swap

https://wiki.archlinux.org/index.php/Swap

http://linux.die.net/man/8/swapon

swapon -s
  # equivelant to cat /proc/swaps

free -m
  # shows memory used

Linux Performance: Why You Should Almost Always Add Swap Space - [14]

Ext2/3/4

http://en.wikipedia.org/wiki/Ext3 - w/ journaling
http://en.wikipedia.org/wiki/Ext4 - ("ext3.5")

kjournald is responsible for the journal of ext3 [15]

http://extundelete.sourceforge.net/ - undelete in emergencies

http://www.ext2fsd.com - Open source ext3/4 file system driver for Windows (2K/XP/WIN7/WIN8)

https://github.com/gerard/ext4fuse - a read-only implementation of ext4 for FUSE. The main reason this exists is to be able to read linux partitions from OSX. However, it should work on top of any FUSE implementation. Linux and FreeBSD have been tested to some point and I've heard that OpenSolaris should also work.

ZFS

http://en.wikipedia.org/wiki/ZFS - GPL incompatibility, CDDL license, Sun

http://zfsonlinux.org/
- https://wiki.archlinux.org/index.php/ZFS
- https://wiki.archlinux.org/index.php/ZFS_on_FUSE

https://wiki.ubuntu.com/ZFS/ZPool

"FreeBSD ZFS tuning guide wiki indicates you'll need about 5GB of ram per 1TB of saved disk space"

http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm

http://etbe.coker.com.au/2012/04/17/zfs-btrfs-cheap-servers/

http://www.open-zfs.org/wiki/Main_Page [16]

https://github.com/j-keck/zfs-snap-diff [20]

https://news.ycombinator.com/item?id=11695463

Btrfs

General

https://btrfs.wiki.kernel.org/index.php/Main_Page

http://www.funtoo.org/wiki/BTRFS_Fun

https://help.ubuntu.com/community/btrfs

https://wiki.debian.org/Btrfs

Setting up Btrfs on Arch Linux

Arch Linux step-by-step installation on btrfs

https://github.com/egara/arch-btrfs-installation

http://www.rkeene.org/projects/info/wiki.cgi/165 - terminology

Subvolumes appear like directories. inode is different.

"Btrfs support is included in the linux package (as a module). Needs a reboot after installing before btrfs recognised. User space utilities are available in btrfs-progs. For multi-devices support (RAID like feature of btrfs) aka btrfs volume in early boot, you have to enable btrfs mkinitcpio hook (provided by mkinitcpio package) to be able to use, for example, a root btrfs volume. If the btrfs volume is a non-system volume, one only needs to set USEBTRFS="yes" in /etc/rc.conf. However, if you only use bare btrfs partition, such options are not needed."

"The btrfs scrub command reads redundant data and validates all the checksums, correcting any errors it finds along the way, using the checksum to determine which copy is the valid one. But with a single drive, how can it correct anything? The metadata - the file system overhead that is used to manage your data - is always stored in a redundant manner by default, even on a single drive. As a result, any corrupted metadata can be corrected, on the fly."

"EXT4 checksums its journal, which AFAIK will protect against errors caused by sync failures (ie. power failure during disk I/O). But it’s not going to protect against latent sector errors. To do that, you need checksumming on all the file data, along the lines of what ZFS or BTRFS provides."

A cross-subvolume copy patch has made it into 3.6_rc. This patch will allow cp --reflink across subvolumes, as long as the copy does not cross mount points.

I Can't Believe This is Butter! A tour of btrfs. - Avi Miller - Jan 19, 2012

copy-on-write, without the ram requirement of zsf
snapshots every 30 seconds, ability to mount from previous gen

https://news.ycombinator.com/item?id=22159204

Commands

mkfs.btrfs -L [label] /dev/[device]

mount -t btrfs /dev/sdg /mnt/drivename

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-device

btrfs device add /dev/sdc /mnt/btrfs

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-filesystem

btrfs filesystem df /media/drivename

btrfs filesystem show

btrfs filesystem defragment /

btrfs-debug-tree -R /dev/sdg
  show drive/subvolume infos, unmounted

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-subvolume

btrfs subvolume create [<dest>/]

btrfs subvolume snapshot /mnt/btrfs /mnt/btrfs/snapshot_of_root

btrfs subvolume delete [<dest>/]

Cloning a file between subvolumes;

cp --reflink /mnt/MYFILES/myfile1 /mnt/MYFILES/myfile3

https://btrfs.wiki.kernel.org/index.php/Compression

mount -t btrfs -o compress=lzo /dev/sdg /mnt/drivename

https://btrfs.wiki.kernel.org/index.php/Btrfsck

In a nutshell, you should look at:

btrfs scrub to detect issues on live filesystems
look at btrfs detected errors in syslog
mount -o ro,recovery to mount a filesystem with issues
btrfs-zero-log might help in specific cases.
btrfs restore will help you copy data off a broken btrfs filesystem.
btrfs check --repair, aka btrfsck is your last option if the ones above have not worked.

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-scrub

btrfs-scrub [options] <device>
  # scrub btrfs filesystem, verify block checksums

Tools

btrfs-gui is a graphical user interface tool for inspecting and managing btrfs filesystems. It is capable of managing filesystems on the local machine, and filesystems on remote network-accessible machines. It requires root access to the machine to perform most of its tasks (but separates the root-access part from the GUI).

Snapper is a tool for managing btrfs snapshots. Apart from the obvious creation and deletion of snapshots it can compare snapshots and revert differences between snapshots. In simple terms, this allows users to view older versions of files and revert changes. Snapper is available as a command line interface tool and a YaST module. Both make use of the C++ library libsnapper which is also available to other programs.
- https://wiki.archlinux.org/index.php/Snapper
- http://kuther.net/blog/using-opensuses-snapper-archlinux-manage-btrfs-snapshots

Articles

XFS

http://en.wikipedia.org/wiki/XFS

http://lwn.net/Articles/476263/ - XFS
- http://www.youtube.com/watch?v=FegjLbCnoBw

https://lwn.net/Articles/638546/ [23]

bcachefs

https://bcache.evilpiepirate.org/
- https://www.patreon.com/bcachefs [24]

ZUFS

https://github.com/NetApp/zufs-zus - zero-copy user-mode server. User-mode server which delegates zufs commands to specific file-system implementation.

LWN.net: The ZUFS zero-copy filesystem

LWN.net: Taking ZUFS upstream

WAFL

https://en.wikipedia.org/wiki/Write_Anywhere_File_Layout - a file layout[clarification needed] that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.Its author claims that WAFL is not a file system, although it includes one. It tracks changes similarly to journaling file systems as logs (known as NVLOGs) in dedicated memory storage device non-volatile random access memory, referred to as NVRAM or NVMEM. WAFL provides mechanisms that enable a variety of file systems and technologies that want to access disk blocks.

NILFS

https://en.wikipedia.org/wiki/NILFS - a log-structured file system implementation for the Linux kernel. It is being developed by Nippon Telegraph and Telephone Corporation (NTT) CyberSpace Laboratories and a community from all over the world. NILFS was released under the terms of the GNU General Public License (GPL).

DOS / Windows

FAT

https://en.wikipedia.org/wiki/File_Allocation_Table

FAT is a family of filesystems, comprising at least, in chronological order:

FAT12, a filesystem used on floppies since the late 1980s, in particular by MS-DOS;
FAT16, a small modification of FAT12 supporting larger media, introduced to support hard disks;
vFAT, which is backward compatible with FAT, but allows files to have longer names which only vFAT-aware applications running on vFAT-aware operating systems can see;
FAT32, another modification of FAT16 designed to support larger disk sizes. In practice FAT32 is almost always used with vFAT long file name support, but technically 16/32 and long-file-names-yes/no are independent.

Because those filesystems are very similar, they're usually handled by the same drivers and tools. mkfs.vfat and mkfs.fat are the same tool; an empty * FAT16 filesystem and an empty vFAT filesystem look exactly the same, so mkfs doesn't need to distinguish between them. (You can think of FAT16 and vFAT as two different ways of seeing the same filesystem rather than two separate filesystem formats.) [25]

https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system

Petit FAT

Petit FAT File System Module - a sub-set of FatFs module for tiny 8-bit microcontrollers. It is written in compliance with ANSI C and completely separated from the disk I/O layer. It can be incorporated into the tiny microcontrollers with limited memory even if the RAM size is less than sector size. Also full featured FAT file system module is available

NTFS

http://en.wikipedia.org/wiki/NTFS - Windows

http://linux.die.net/man/8/ntfsfix

http://www.tuxera.com/community/ntfs-3g-download/
- https://en.wikipedia.org/wiki/NTFS-3G

ReFS

https://en.wikipedia.org/wiki/ReFS - a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.ReFS was designed to overcome problems that had become significant over the years since NTFS was conceived, which are related to how data storage requirements had changed. The key design advantages of ReFS include automatic integrity checking and data scrubbing, removal of the need for running chkdsk, protection against data degradation, built-in handling of hard disk drive failure and redundancy, integration of RAID functionality, a switch to copy/allocate on write for data and metadata updates, handling of very long paths and filenames, and storage virtualization and pooling, including almost arbitrarily sized logical volumes (unrelated to the physical sizes of the used drives).

Apple

HFS

https://en.wikipedia.org/wiki/Hierarchical_File_System - a proprietary file system developed by Apple Inc. for use in computer systems running Mac OS. Originally designed for use on floppy and hard disks, it can also be found on read-only media such as CD-ROMs. HFS is also referred to as Mac OS Standard (or, erroneously, "HFS Standard"), while its successor, HFS Plus, is also called Mac OS Extended (or, erroneously, "HFS Extended"). With the introduction of Mac OS X 10.6, Apple dropped support for formatting or writing HFS disks and images, which remain supported as read-only volumes.

HFS+

https://en.wikipedia.org/wiki/HFS_Plus - or HFS+ is a file system developed by Apple Inc. It replaced the Hierarchical File System (HFS) as the primary file system of Apple computers with the 1998 release of Mac OS 8.1. HFS+ continued as the primary Mac OS X file system until it was itself replaced with the release of the Apple File System (APFS) with macOS High Sierra in 2017. HFS+ is also one of the formats used by the iPod digital music player. It is also referred to as Mac OS Extended or HFS Extended, where its predecessor, HFS, is also referred to as Mac OS Standard or HFS Standard. During development, Apple referred to this file system with the codename Sequoia.[5]HFS Plus is an improved version of HFS, supporting much larger files (block addresses are 32-bit length instead of 16-bit) and using Unicode (instead of Mac OS Roman or any of several other character sets) for naming items. Like HFS, HFS Plus uses B-trees to store most volume metadata, but unlike most other file systems, HFS Plus supports hard links to directories. HFS Plus permits filenames up to 255 characters in length, and n-forked files similar to NTFS, though until 2005 almost no system software took advantage of forks other than the data fork and resource fork. HFS Plus also uses a full 32-bit allocation mapping table rather than HFS's 16 bits, significantly improving space utilization with large disks.

APFS

https://en.wikipedia.org/wiki/Apple_File_System - a proprietary file system for macOS High Sierra and later, iOS 10.3 and later, tvOS 10.2 and later,[6] and watchOS 3.2 and later, developed and deployed by Apple Inc.[8][9] It aims to fix core problems of HFS+ (also called Mac OS Extended), APFS's predecessor on these operating systems. Apple File System is optimized for flash and solid-state drive storage, with a primary focus on encryption.

http://dtrace.org/blogs/ahl/2016/06/19/apfs-part1/ [26] - Apple, 2016

http://arstechnica.com/apple/2016/06/a-zfs-developers-analysis-of-the-good-and-bad-in-apples-new-apfs-file-system/ [27]

Flash

https://en.wikipedia.org/wiki/Flash_file_system - a file system designed for storing files on flash memory–based storage devices. While the flash file systems are closely related to file systems in general, they are optimized for the nature and characteristics of flash memory (such as to avoid write amplification), and for use in particular operating systems.

https://www.ibm.com/developerworks/linux/library/l-flash-filesystems/

https://www.micron.com/~/media/documents/products/technical-note/nand-flash/tn2961_wear_leveling_in_nand.pdf

flashdba - Oracle databases, storage and the high-performance world of flash memory
- Understanding Flash: The Flash Translation Layer

exFAT

https://en.wikipedia.org/wiki/exFAT - a Microsoft file system introduced in 2006 optimized for flash memory such as USB flash drives and SD cards. It is proprietary and Microsoft owns patents on several elements of its design. exFAT can be used where the NTFS file system is not a feasible solution (due to data structure overhead), yet the file size limit of the standard FAT32 file system (i.e. 4 GiB) remains in those scenarios.m exFAT has been adopted by the SD Card Association as the default file system for SDXC cards larger than 32 GiB.

The exFAT filesystem is coming to Linux—Paragon software’s not happy about it | Ars Technica - [28]

SFFS

SFFS Flash File System - a Safe Flash File System that can support almost any NOR or NAND flash device. It provides a high degree of reliability and complete protection against unexpected power failure or reset events. SFFS provides wear leveling, bad block handling and ECC algorithms to ensure you get optimal use out of a flash device. SFFS is pre-integrated with the MQX RTOS and allows you to quickly create a a robust file system for an embedded device using on-chip or on-board flash devices. The SFFS Flash File System was specifically designed for embedded systems.

F2FS

F2FS Wiki - a file system that, from the start, takes into account the characteristics of NAND flash memory-based storage devices (such as solid-state disks, eMMC, and SD cards), which are widely used in computer systems ranging from mobile devices to servers. F2FS was designed on a basis of a log-structured file system approach, which it adapted to newer forms of storage. Jaegeuk Kim, the principal F2FS author, has stated that it remedies some known issues of the older log-structured file systems, such as the snowball effect of wandering trees and high cleaning overhead. In addition, since a NAND-based storage device shows different characteristics according to its internal geometry or flash memory management scheme (such as the Flash Translation Layer or FTL), it supports various parameters not only for configuring on-disk layout, but also for selecting allocation and cleaning algorithms.
- https://en.wikipedia.org/wiki/F2FS - a flash file system initially developed by Samsung Electronics for the Linux kernel.

LWN: An f2fs teardown

YAFFS

A Robust Flash File System Since 2002 | Yaffs - A Flash File System for embedded use - (Yet Another Flash File System) is an open-source file system specifically designed to be fast, robust and suitable for embedded use with NAND and NOR Flash. It is widely used with Linux, RTOSs, or no OS at all, in consumer devices, avionics, and critical infrastructure. It is available under GNU Public License, GPL, or on commercial terms from Aleph One.

Tess satelliteNASA TESS Mission This Transiting Exoplanet Survey Satellite (TESS) will discover thousands of exoplanets in orbit around the brightest stars in the sky.

spiffs

https://github.com/pellepl/spiffs - Wear-leveled SPI flash file system for embedded devices

log_fs.h

Piconomix FW Library: log_fs.h - Simple record-based file system for Serial Flash.

AXFS

https://github.com/jaredeh/axfs - The Advanced XIP File System is a Linux kernel filesystem driver that enables files to be executed directly from flash or ROM memory rather than being copied into RAM.

Networked

https://en.wikipedia.org/wiki/Category:Network_file_systems

https://wiki.archlinux.org/index.php/Diskless_system

https://en.wikipedia.org/wiki/Category:Distributed_file_systems

Bazil is a distributed file system designed for single-person disconnected operation. It lets you share your files across all your computers, with or without cloud services. FUSE is a programming library for writing file systems in userspace, in Go.

Ori - a distributed P2P file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes. [29]
- https://bitbucket.org/orifs/ori/wiki/Home
- PDF: Replication, History, and Grafting in the Ori File System.

http://www.gluster.org/

NFS

http://datatracker.ietf.org/wg/nfsv4/charter

nfs - fstab format and options for the nfs file systems
mount.nfs

showmount -e server-Ip-address

https://github.com/sahlberg/libnfs

http://querieslinux.blogspot.co.uk/2009/08/what-is-port-map-why-is-it-required.html

http://linux.die.net/man/8/mount.cifs

SMB / CIFS

http://en.wikipedia.org/wiki/Server_Message_Block - SMB, one version of which was also known as Common Internet File System (CIFS), operates as an application-layer network protocol mainly used for providing shared access to files, printers, and serial ports and miscellaneous communications between nodes on a network. It also provides an authenticated inter-process communication mechanism. Most usage of SMB involves computers running Microsoft Windows, where it was known as "Microsoft Windows Network" before the introduction of Active Directory. Corresponding Windows services are LAN Manager Server (for the server component) and LAN Manager Workstation (for the client component).

Samba - opening windows to a wider world - the standard Windows interoperability suite of programs for Linux and Unix.Samba is Free Software licensed under the GNU General Public License, the Samba project is a member of the Software Freedom Conservancy.Since 1992, Samba has provided secure, stable and fast file and print services for all clients using the SMB/CIFS protocol, such as all versions of DOS and Windows, OS/2, Linux and many others.Samba is an important component to seamlessly integrate Linux/Unix Servers and Desktops into Active Directory environments. It can function both as a domain controller or as a regular domain member.
- http://en.wikipedia.org/wiki/Samba_(software)

Using Samba - O'Reilly Media
- -Chapter 4- Disk Shares

YouTube: Samba Share for Beginners

NBD

Network Block Device - What is it: With this compiled into your kernel, Linux can use a remote server as one of its block devices. Every time the client computer wants to read /dev/nbd0, it will send a request to the server via TCP, which will reply with the data requested. This can be used for stations with low disk space (or even diskless - if you use an initrd) to borrow disk space from other computers. Unlike NFS, it is possible to put any file system on it. But (also unlike NFS), if someone has mounted NBD read/write, you must assure that no one else will have it mounted.

https://en.wikipedia.org/wiki/Network_block_device - a device node whose content is provided by a remote machine. Typically, network block devices are used to access a storage device that does not physically reside in the local machine but on a remote one. As an example, a local machine can access a hard disk drive that is attached to another computer.

xNBD - yet another NBD (Network Block Device) server program, which works with the NBD client driver of Linux Kernel.

9P

https://en.wikipedia.org/wiki/9P_(protocol) - a network protocol developed for the Plan 9 from Bell Labs distributed operating system as the means of connecting the components of a Plan 9 system. Files are key objects in Plan 9. They represent windows, network connections, processes, and almost anything else available in the operating system. 9P was revised for the 4th edition of Plan 9 under the name 9P2000, containing various improvements. Some of the improvements made are, the removal of certain filename restrictions, the addition of a 'last modifier' metadata field for directories, and authentication files. The latest version of the Inferno operating system also uses 9P2000. The Inferno file protocol was originally called Styx, but technically it has always been a variant of 9P.

https://github.com/nfs-ganesha/nfs-ganesha - an NFSv3,v4,v4.1 fileserver that runs in user mode on most UNIX/Linux systems. It also supports the 9p.2000L protocol.

Union mount

https://en.wikipedia.org/wiki/Union_mount - a way of combining multiple directories into one that appears to contain their combined contents. Union mounting is supported in Linux, BSD and several of its successors, and Plan 9, with similar but subtly different behavior. As an example application of union mounting, consider the need to update the information contained on a CD-ROM or DVD. While a CD-ROM is not writable, one can overlay the CD's mount point with a writable directory in a union mount. Then, updating files in the union directory will cause them to end up in the writable directory, giving the illusion that the CD-ROM's contents have been updated.

Union mounting was implemented for Linux 0.99 in 1993; this initial implementation was called the Inheriting File System, but was abandoned by its developer because of its complexity. The next major implementation was UnionFS, which grew out of the FiST project at Stony Brook University. An attempt to replace UnionFS, aufs, was released in 2006, followed in 2009 by OverlayFS. Only in 2014 was this last union mount implementation added to the standard Linux kernel source code. Similarly, GlusterFS offers a possibility to mount different filesystems distributed across a network, rather than being located on the same machine.

UnionFS

https://en.wikipedia.org/wiki/UnionFS

aufs

http://aufs.sourceforge.net/
- https://en.wikipedia.org/wiki/Aufs

Overlayfs

https://en.wikipedia.org/wiki/Overlayfs

mhddfs

https://romanrm.net/mhddfs

mergerfs

https://github.com/trapexit/mergerfs - a union filesystem geared towards simplifying storage and management of files across numerous commodity storage devices. It is similar to mhddfs, unionfs, and aufs.

Distributed

https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems - a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources. Distributed file systems differ in their performance, mutability of content, handling of concurrent writes, handling of permanent or temporary loss of nodes or storage, and their policy of storing content.

Ceph

Ceph - a distributed object store and file system designed to provide excellent performance, reliability and scalability. Ceph's main goals are to be POSIX-compatible, and completely distributed without a single point of failure. The data is seamlessly replicated, making it fault tolerant. Clients mount the file system using a Linux kernel client. On March 19, 2010, Linus Torvalds merged the Ceph client for Linux kernel 2.6.34 which was released on May 16, 2010. An older FUSE-based client is also available. The servers run as regular Unix daemons.

YouTube: Owen Synge: ceph: a gentle introduction
YouTube: A Gentle Introduction to Ceph
YouTube: Distributed storage is easier now: usability from Ceph Luminous to Nautilus

seaweedfs

https://github.com/chrislusf/seaweedfs - a simple and highly scalable distributed file system. There are two objectives: to store billions of files! to serve the files fast! SeaweedFS implements an object store with O(1) disk seek and an optional Filer with POSIX interface, supporting S3 API, FUSE mount, Hadoop compatible.

LizardFS

LizardFS - Get your storage up and running in 28 minutes
- https://github.com/lizardfs/lizardfs

Clustered / parallel

https://en.wikipedia.org/wiki/Clustered_file_system - a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

GlusterFS

Gluster - a free and open source software scalable network filesystem.
- https://en.wikipedia.org/wiki/Gluster#GlusterFS - a scale-out network-attached storage file system. It has found applications including cloud computing, streaming media services, and content delivery networks. GlusterFS was developed originally by Gluster, Inc. and then by Red Hat, Inc., as a result of Red Hat acquiring Gluster in 2011.
- https://github.com/gluster/glusterfs

OrangeFS

http://www.orangefs.org/
- https://en.wikipedia.org/wiki/OrangeFS

https://lwn.net/Articles/643165/

Greyhole

Greyhole - An application that uses Samba to create a storage pool of all your available hard drives, and allows you to create redundant copies of the files you store, in order to prevent data loss when part of your hardware fails.
- https://github.com/gboudreau/Greyhole

OpenDedup

OpenDedup - A clustered deduplicated file system.
- https://github.com/opendedup/sdfs

to sort

Opendedup Develops SDFS, a file-system that does inline deduplication.

https://oss.oracle.com/~mason/seekwatcher/

http://murderfs.com/

https://github.com/philipl/pifs

http://www.shysecurity.com/posts/pingfs

http://live.gnome.org/OSTree
- http://blog.verbum.org/2013/08/26/ostree-v2013-6-released/ [https://news.ycombinator.com/item?id=6277518

http://t-o-c-c.com/ [30]

https://github.com/unbit/spockfs [31] - HTTP

http://xtreemfs.org/ [32]

http://tmsu.org/ - tag based
- https://github.com/oniony/TMSU

https://github.com/cosmos72/fstransform - tool for in-place filesystem conversion (for example from jfs/xfs/reiser to ext2/ext3/ext4) without backup

GFS / GFS2

http://www.sourceware.org/cluster/gfs/ - a cluster file system. It allows a cluster of computers to simultaneously use a block device that is shared between them (with FC, iSCSI, NBD, etc...). GFS reads and writes to the block device like a local filesystem, but also uses a lock module to allow the computers coordinate their I/O so filesystem consistency is maintained. One of the nifty features of GFS is perfect consistency -- changes made to the filesystem on one machine show up immediately on all other machines in the cluster.

https://en.wikipedia.org/wiki/GFS2 - a shared-disk file system for Linux computer clusters. GFS2 differs from distributed file systems (such as AFS, Coda, InterMezzo, or GlusterFS) because GFS2 allows all nodes to have direct concurrent access to the same shared block storage. In addition, GFS or GFS2 can also be used as a local filesystem.

Embedded

https://github.com/ARMmbed/littlefs - A little fail-safe filesystem designed for embedded systems. [33]

Database

https://github.com/samzyy/DB-based-replicated-filesystem

YouTube: Database as Filesystem

LibreCat/Catmandu - data processing toolkit, a new institutional repository system developed by LibreCat Group. LibreCat was development in 2013 in Bielefeld and was made available on GitHub from the start. Since 2015 the code is in production at Bielefeld. In 2016 Ghent University joined the development and is using the cataloging backend in production.

Compressed filesystems

https://en.wikipedia.org/wiki/Category:Compression_file_systems - filesystems that support compression of some sort, both read only, changeable data and as an extra feature.

zisofs

https://www.kernel.org/pub/linux/utils/fs/zisofs/

https://dev.lovelyhq.com/libburnia/web/wikis/zisofs

https://ru.wikipedia.org/wiki/Zisofs

Squashfs

Squashfs - a compressed read-only filesystem for Linux. Squashfs is intended for general read-only filesystem use, for archival use (i.e. in cases where a .tar.gz file may be used), and in constrained block device/memory systems (e.g. embedded systems) where low overhead is needed.

https://github.com/vasi/squashfuse - FUSE filesystem to mount squashfs archives

Image files

http://en.wikipedia.org/wiki/Disk_image

http://en.wikipedia.org/wiki/ISO_image

Tools

http://en.wikipedia.org/wiki/AcetoneISO

http://he.fi/bchunk/
ccd2iso - CloneCD image to ISO image file converter
http://users.eastlink.ca/~doiron/bin2iso/

http://www.psychocats.net/ubuntu/partimage

http://clonezilla.org/

http://en.wikipedia.org/wiki/Remastersys

http://snapper.io/

https://wiki.gnome.org/Apps/Brasero

Quotas

https://en.wikipedia.org/wiki/Disk_quota

https://wiki.archlinux.org/index.php/disk_quota

http://cateee.net/lkddb/web-lkddb/QUOTA.html

https://sourceforge.net/projects/linuxquota/

http://www.yolinux.com/TUTORIALS/LinuxTutorialQuotas.html

Encryption

http://en.wikipedia.org/wiki/List_of_cryptographic_file_systems

http://en.wikipedia.org/wiki/Comparison_of_disk_encryption_software

https://www.freedesktop.org/software/systemd/man/crypttab.html - Configuration for encrypted block devices. The /etc/crypttab file describes encrypted block devices that are set up during system boot.

LUKS

http://en.wikipedia.org/wiki/Linux_Unified_Key_Setup - or LUKS is a disk-encryption specification created by Clemens Fruhwirth in 2004 and originally intended for Linux.

https://unix.stackexchange.com/questions/5017/ssh-to-decrypt-encrypted-lvm-during-headless-server-boot

EncFS

https://en.wikipedia.org/wiki/EncFS

http://tom.noflag.org.uk/cryptkeeper.html

securefs

https://github.com/netheril96/securefs - securefs is a filesystem in userspace (FUSE) that transparently encrypts and authenticates data stored. It is particularly designed to secure data stored in the cloud. securefs mounts a regular directory onto a mount point. The mount point appears as a regular filesystem, where one can read/write/create files, directories and symbolic links. The underlying directory will be automatically updated to contain the encrypted and authenticated contents.

eCrypt

http://en.wikipedia.org/wiki/ECryptfs

TrueCrypt / CipherShed

http://www.truecrypt.org/
- http://en.wikipedia.org/wiki/TrueCrypt
- http://www.youtube.com/watch?v=HHoJ9pQ0cn8&t=57m

https://github.com/bwalex/tc-play - Free and simple TrueCrypt Implementation based on dm-crypt

CipherShed - free (as in free-of-charge and free-speech) encryption software for keeping your data secure and private. It started as a fork of the now-discontinued TrueCrypt Project. Learn more about how CipherShed works and the project behind it.
- https://github.com/CipherShed/CipherShed

Rubberhose

http://marutukku.org/
- http://marutukku.org/current/src/doc/maruguide/t1.html
- http://en.wikipedia.org/wiki/Rubberhose_(file_system)

Emulation

https://wiki.archlinux.org/index.php/CDemu

Repair / recovery

https://wiki.archlinux.org/index.php/File_recovery

TestDisk / photorec

http://www.cgsecurity.org/wiki/TestDisk
- https://www.cgsecurity.org/wiki/TestDisk_Livecd

ddrescue

ddrescue - a data recovery tool. It copies data from one file or block device (hard disc, cdrom, etc) to another, trying to rescue the good parts first in case of read errors.

recoverdm

recoverdm - recover files/disks with damaged sectors. You can recover files as well complete devices.In case if finds sectors which simply cannot be recoverd, it writes an empty sector to the outputfile and continues. If you're recovering a CD or a DVD and the program cannot read the sector in "normal mode", then the program will try to read the sector in "RAW mode" (without error-checking etc.). This toolkit also has a utility called 'mergebad': mergebad merges multiple images into one. This can be usefull when you have, for example, multiple CD's with the same data which are all damaged. In such case, you can then first use recoverdm to retrieve the data from the damaged CD's into image-files and then combine them into one image with mergebad.
- https://github.com/flok99/recoverdm

Files & directories

https://en.wikipedia.org/wiki/Computer_file

https://en.wikipedia.org/wiki/Directory_(computing) - a file system cataloging structure which contains references to other computer files, and possibly other directories. On many computers, directories are known as folders, or drawers to provide some relevancy to a workbench or the traditional office file cabinet. Files are organized by storing related files in the same directory. In a hierarchical filesystem (that is, one in which files and directories are organized in a manner that resembles a tree), a directory contained inside another directory is called a subdirectory. The terms parent and child are often used to describe the relationship between a subdirectory and the directory in which it is cataloged, the latter being the parent. The top-most directory in such a filesystem, which does not have a parent of its own, is called the root directory.

https://en.wikipedia.org/wiki/Unix_file_types - For normal files in the file system, Unix does not impose or provide any internal file structure. This implies that from the point of view of the operating system, there is only one file type. The structure and interpretation thereof is entirely dependent on how the file is interpreted by software. Unix does however have some special files. These special files can be identified by the ls -l command which displays the type of the file in the first alphabetic letter of the file system permissions field. A normal (regular) file is indicated by a hyphen-minus '-'.

https://en.wikipedia.org/wiki/File_descriptor - In the traditional implementation of Unix, file descriptors index into a per-process file descriptor table maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table. This table records the mode with which the file (or other resource) has been opened: for reading, writing, appending, reading and writing, and possibly other modes. It also indexes into a third table called the inode table that describes the actual underlying files. To perform input or output, the process passes the file descriptor to the kernel through a system call, and the kernel will access the file on behalf of the process. The process does not have direct access to the file or inode tables.

On Linux, the set of file descriptors open in a process can be accessed under the path /proc/PID/fd/, where PID is the process identifier. In Unix-like systems, file descriptors can refer to any Unix file type named in a file system. As well as regular files, this includes directories, block and character devices (also called "special files"), Unix domain sockets, and named pipes. File descriptors can also refer to other objects that do not normally exist in the file system, such as anonymous pipes and network sockets.

The FILE data structure in the C standard I/O library usually includes a low level file descriptor for the object in question on Unix-like systems. The overall data structure provides additional abstraction and is instead known as a file handle.

A history of S_IFMT [34]

http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

http://xahlee.info/UnixResource_dir/writ/unix_origin_of_dot_filename.html

Storage devices

Block and partition names;

sd[a,b,etc]
  drive

sda[1,2,etc]
  partition of drive

cat /proc/partitions

blkid
  # command-line utility to locate/print block device attributes

File system mounts

lsblk
  # list information about block devices.

findmnt

df -a

mounts

mount -l

mount

mount /dev/sdxY /some/directory

umount /some/directory

mount -o remount /
  remount partition after /etc/fstab change

http://en.wikipedia.org/wiki/Loop_device

mount -o loop example.img /home/you/dir

pmount

https://linux.die.net/man/1/pmount - ("policy mount") is a wrapper around the standard mount program which permits normal users to mount removable devices without a matching /etc/fstab entry.

systemd-mount

https://www.freedesktop.org/software/systemd/man/systemd-mount.html - may be used to create and start a transient .mount or .automount unit of the file system WHAT on the mount point WHERE.

systemd-mount /dev/sdb1
  # mounts to automatic mount point based on label

https://www.freedesktop.org/software/systemd/man/systemd.mount.html

https://wiki.archlinux.org/index.php/Fstab#Automount_with_systemd

autofs

https://wiki.archlinux.org/index.php/autofs

https://www.kernel.org/doc/Documentation/filesystems/autofs4-mount-control.txt

https://blogs.janestreet.com/how-does-automount-work-anyway/

PySDM

PySDM - a Storage Device Manager that allows full customization of hard disk mountpoints without manually access to fstab.It also allows the creation of udev rules for dynamic configuration of storage devices

Automated

http://www.freedesktop.org/wiki/Software/udisks

https://wiki.archlinux.org/index.php/Udisks

https://github.com/coldfix/udiskie - udisks automount script with tray icon

https://github.com/h4tr3d/mount-tray - Application for mounting and unmounting removable storages via system tray using udisks

http://ignorantguru.github.io/udevil
http://igurublog.wordpress.com/downloads/script-devmon - now in udevil

https://github.com/tom5760/usermount - A simple C program to automatically mount removable drives using UDisks2 and D-Bus.

Directory structure

See LSB, etc.

https://en.wikipedia.org/wiki/Unix_filesystem

https://en.wikipedia.org/wiki/inode

https://en.wikipedia.org/wiki/stat_(system_call) - a Unix system call that returns file attributes about an inode. The semantics of stat() vary between operating systems. stat appeared in Version 1 Unix. It is among the few original Unix system calls to change, with Version 4's addition of group permissions and larger file size. As an example, Unix command ls uses this system call to retrieve information on files that includes:

atime: time of last access (ls -lu)
mtime: time of last modification (ls -l)
ctime: time of last status change (ls -lc)

https://en.wikipedia.org/wiki/Virtual_file_system

https://en.wikipedia.org/wiki/Directory_(computing)

https://en.wikipedia.org/wiki/Directory_structure

https://en.wikipedia.org/wiki/Root_directory

http://man7.org/linux/man-pages/man7/file-hierarchy.7.html

https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard - defines the directory structure and directory contents in Unix[citation needed] and Unix-like operating systems. It is maintained by the Linux Foundation. The latest version is 3.0, released on 3 June 2015.[1] Currently it is only used by Linux distributions.

http://refspecs.linuxbase.org/FHS_3.0/fhs/index.html

http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html

Linux Directory Structure (File System Structure) Explained with Examples

https://wiki.archlinux.org/index.php/arch_filesystem_hierarchy

http://hivelogic.com/articles/using_usr_local/
Point/Counterpoint - /opt vs. /usr/local - March 2010
Understanding the bin, sbin, usr/bin , usr/sbin split [35] - Dec 9, 2010
-arch-dev-public- -RFC- merge /bin, /sbin, /lib into /usr/bin and /usr/lib - Mar 2nd, 2012

on ., .., .dotfiles - Rob Pike, Aug 3rd, 2012

https://github.com/fuchsia-mirror/docs/blob/master/the-book/../../master/the-book/dotdot.md [36]

Understanding the bin, sbin, usr/bin , usr/sbin split [37]

rmshit - Keep $HOME or other dir clean from unwanted tempfiles, configs and other crap you'll never use that's autocreated upon execution of bad behaving applications

BleachBit quickly frees disk space and tirelessly guards your privacy. Free cache, delete cookies, clear Internet history, shred temporary files, delete logs, and discard junk you didn't know was there. Designed for Linux and Windows systems, it wipes clean a thousand applications including Firefox, Internet Explorer, Adobe Flash, Google Chrome, Opera, Safari,and more. Beyond simply deleting files, BleachBit includes advanced features such as shredding files to prevent recovery, wiping free disk space to hide traces of files deleted by other applications, and vacuuming Firefox to make it faster.
- https://github.com/bleachbit/bleachbit

 bleachbit --clean system.cache system.localizations system.trash system.tmp

http://www.2uo.de/myths-about-urandom/ [38]

File permissions

https://wiki.archlinux.org/index.php/File_permissions_and_attributes

http://serverfault.com/questions/124800/how-to-setup-linux-permissions-for-the-www-folder

Unix Permissions Calculator

stat -c '%A %a %n' *
  list permissions in octal

ulimit

ulimit - provides control over the resources available to the shell and to processes started by it, on systems that allow such control.

https://www.networkworld.com/article/2693414/operating-systems/setting-limits-with-ulimit.html

limits.conf - configuration file for the pam_limits module, /etc/security/limits.conf

chmod

chmod
  # change file mode bits

Chmod, Umask, Stat, Fileperms, and File Permissions - 0000 to 0777 list, etc.

 # Before: drwxr-xr-x 6 archie users  4096 Jul  5 17:37 Documents
chmod g= Documents
chmod o= Documents
  # After: drwx------ 6 archie users  4096 Jul  6 17:32 Documents

Users;

u - the user who owns the file.
g - other users who are in the file's group.
o - all other users.
a - all users; the same as ugo.

Operation;

+ - to add the permissions to whatever permissions the users already have for the file.
- - to remove the permissions from whatever permissions the users already have for the file.
= - to make the permissions the only permissions that the users have for the file.

Permissions; r - the permission the users have to read the file. w - the permission the users have to write to the file. x - the permission the users have to execute the file, or search it if it is a directory.

http://serverfault.com/questions/364677/why-is-chmod-r-777-destructive [39]

chown

https://linux.die.net/man/1/chown - change file owner and group

chown -R user:group .
  change all files and directories to a specific use and group [40]

Access Control Lists

https://savannah.nongnu.org/projects/acl

https://wiki.archlinux.org/index.php/ACL
HOWTO: Access Control Lists (ACLs) in Linux - basics video

Partition must be mounted with acl option. "Operation not supported" from setfacl means this is not so.

mount -o remount,acl /

Set in /etc/fstab for persistence across boots.

Opening a directory requires execute permission.

setfacl -m "u:username:permissions"
setfacl -m "u:uid:permissions"
  add permissions for user

setfacl -m "g:groupname:permissions"
setfacl -m "g:gid:permissions"
  add permissions for group

setfacl -m "u:user:rwx" file
  add read, write, execure perms for user for file
setfacl -Rm "u:user:rw" /dir
  add recursive read, write perms for user for dir
setfacl -Rdm "u:user:rw" /dir
  add recursive read, write perms for user for dir and make them default for future changes

getfacl ./
  shows access control list for directory or file

Copying

cp

cp - copy files and directories

https://www.gnu.org/software/coreutils/manual/html_node/cp-invocation.html#cp-invocation

cp [option]… [-T] source dest

cp target --parents a/b/c existing_dir
  # copies the file `a/b/c' to `existing_dir/a/b/c', creating any missing intermediate directories. [41]

cp ubuntu.img /dev/sdb && sync

My experience with using cp to copy a lot of files (432 millions, 39 TB) [42]

scp

scp -P 2264 foobar.txt your_username@remotehost.edu:/some/remote/directory
scp -rP 2264 folder your_username@remotehost.edu:/some/remote/directory

dcp is a distributed file copy program that automatically distributes and dynamically balances work equally across nodes in a large distributed system without centralized state. http://filecopy.org/

rsync

rsync(1) - a fast, versatile, remote (and local) file-copying tool.

rsync is a software application and network protocol for Unix-like systems with ports to Windows that synchronizes files and directories from one location to another while minimizing data transfer by using delta encoding when appropriate. Quoting the official website: "rsync is a file transfer program for Unix systems. rsync uses the 'rsync algorithm' which provides a very fast method for bringing remote files into sync." An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction. rsync can copy or display directory contents and copy files, optionally using compression and recursion.
- https://en.wikipedia.org/wiki/rsync

https://wiki.archlinux.org/index.php/rsync

How Rsync Works: A Practical Overview [43]

Easy Automated Snapshot-Style Backups with Rsync - This document describes a method for generating automatic rotating "snapshot"-style backups on a Unix-based system, with specific examples drawn from the author's GNU/Linux experience. Snapshot backups are a feature of some high-end industrial file servers; they create the illusion of multiple, full backups per day without the space or processing overhead. All of the snapshots are read-only, and are accessible directly by users as special system directories. It is often possible to store several hours, days, and even weeks' worth of snapshots with slightly more than 2x storage. This method, while not as space-efficient as some of the proprietary technologies (which, using special copy-on-write filesystems, can operate on slightly more than 1x storage), makes use of only standard file utilities and the common rsync program, which is installed by default on most Linux distributions. Properly configured, the method can also protect against hard disk failure, root compromises, or even back up a network of heterogeneous desktops automatically.

https://gergap.wordpress.com/2013/08/10/rsync-and-sparse-files

"Unfortunately “--sparse” and “--inplace” cannot be used together. Solution: When copying the file the first time, which means it does not exist on the target server use “rsync --sparse“. This will create a sparse file on the target server and copies only the used data of the sparse file. When the file already exists on the target server and you only want to update it use “rsync --inplace“. This will only transmit the changed blocks and can also append to the existing sparse file."

http://unix.stackexchange.com/questions/43777/what-keeps-one-side-of-an-rsync-so-busy

#rsync FAQ

http://www.jwz.org/doc/backups.html

rsync [OPTION...] SRC... [DEST]
  # copy source file or directory to destination

rsync local-file user@remote-host:remote-file

rsync -e='ssh -p8023' file remotehost:~/
  # non-standard remote shell command, copy to remote users home directory

rsync -r --partial --progress srcdirectory destdirectory
  # recursive
  # --partial - resume partial files,
  # --progress - show progress during transfer, equiv to --info=flist2,name,progress
  # -P                          same as --partial --progress

rsync -rP srcdirectory/ destdirectory
  # recursive, resume partial files with progress bar, don't copy root source folder

rsync -avh --inplace --no-whole-file SRC DEST

-a, --archive
  #  archive mode; equals -rlptgoD (no -A,-X,-H)
    # recursive, links, preserve permissions/times/groups/owner/device files.

-r, --recursive             recurse into directories
-l, --links                 copy symlinks as symlinks
-p, --perms                 preserve permissions
-t, --times                 preserve modification times
-g, --group                 preserve group
-o, --owner                 preserve owner (super-user only)
-D  --devices --specials.
    --devices               preserve device files (super-user only)
    --specials              preserve special files

-v, --verbose               list files transfered
-vv
-h, --human-readable        output numbers in a human-readable format

    --inplace               This option is useful for transferring large files with block-based changes or appended data, and also on systems that are disk bound, not network bound. It can also help keep a copy-on-write filesystem snapshot from diverging the entire contents of a file that only has minor changes.
    --no-whole-file         incremental delta-xfer

-A, --acls                  preserve ACLs (implies -p, save permissions)
-X, --xattrs                preserve extended attributes
-H, --hard-links            preserve hard links

--exclude-from=FILE     read exclude patterns from FILE
--sparse                handle sparse files efficiently
-W, --whole-file            copy files whole (w/o delta-xfer algorithm)
-d, --dirs                  transfer directories without recursing
    --numeric-ids           transfer numeric group and user IDs rather than mapping user and group name
-x, --one-file-system       don't cross filesystem boundaries (mount points). "/*" as dest would bypass -x
-xx                         as above, but don't create empty directories for mount points

    --delete                delete extraneous files from dest dirs
    --delete-after          receiver deletes after transfer, not during
    --delete-excluded       also delete excluded files from dest dirs

    --ignore-errors         go ahead even when there are IO errors instead of regarding as fatal
    --stats                 give some file-transfer stats
-z                          compress files during transfer

https://wiki.archlinux.org/index.php/Full_System_Backup_with_rsync

rsync -aAXv --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} /* /path/to/backup/directory
  # archive, with permissions/ACL, and attributes, exclude directories

rsync -ahrHP --info=progress2 --no-i-r --partial-dir=.rsync-partial
  # archive, human readable, recursive, don't flatten hard links, progress bar, no incremental recursion, store partial transfers in temp dir
  # alias ccp

https://serverfault.com/questions/219013/showing-total-progress-in-rsync-is-it-possible

#!/bin/sh
if [ $# -lt 1 ]; then 
    echo "No destination defined. Usage: $0 destination" >&2
    exit 1
elif [ $# -gt 1 ]; then
    echo "Too many arguments. Usage: $0 destination" >&2
    exit 1
fi

if [ $2 = "new" ]; then
   SPARCEINPLACE="--sparse"
elif [ $2 = "rerun" ]; then
   SPARCEINPLACE="--inplace"
fi

RSYNC="ionice -c3 rsync"
RSYNC_ARGS="-aAXv --no-whole-file --stats --ignore-errors --human-readable --delete"
# -a = --recursive, --links, --perms, --times, --group, --owner, --devices, --specials
# -A = --acls, -X = --xattrs, -v = --verbose
# --no-whole-file = delta-xfer algorithm
# --delete = delete extraneous destination files
START=$(date +%s)
# SOURCES="/dir1 /dir2 /file3"
TARGET=$1

echo "Executing dry-run to see how many files must be transferred..."
TODO=$(${RSYNC} --dry-run ${RSYNC_ARGS} ${SOURCES} ${TARGET}|grep "^Number of files transferred"|awk '{print $5}')

${RSYNC} ${RSYNC_ARGS} ${SPARCEINPLACE} --exclude={/dev/*,/proc/*,/sys/*,/tmp/*,/run/*,/mnt/*,/media/*,/lost+found} \
--exclude={/home/*/.gvfs,/home/*/.thumbnails,/home/*/.cache,/home/*/.cache/mozilla/*,/home/*/.cache/chromium/*} \
--exclude={/home/*/.local/share/Trash/,/home/*/.macromedia,/var/lib/mpd/*,/var/lib/pacman/sync/*, /var/tmp*} \
 / $TARGET | pv -l -e -p -s "$TODO"

FINISH=$(date +%s)
echo "-- Total backup time: $(( ($FINISH-$START) / 60 ))m, $(( ($FINISH-$START) % 60 ))s"
touch $1/backup-from-$(date '+%a_%d_%m_%Y_%H%M%S')
# after refining excluded, --delete-excluded

0 5 * * * rsync-backup.sh /path/to/backup/directory rerun

https://github.com/kristapsdz/openrsync - a clean-room implementation of rsync with a BSD (ISC) license. It's compatible with a modern rsync (3.1.3 is used for testing, but any supporting protocol 27 will do), but accepts only a subset of rsync's command-line arguments. At this time, openrsync runs only on OpenBSD. If you want to port to your system (e.g. Linux, FreeBSD), read the Portability section first. [44]

Daemon

rsync --daemon
  # run as a daemon

https://download.samba.org/pub/rsync/rsyncd.conf.html

http://www.jveweb.net/en/archives/2011/01/running-rsync-as-a-daemon.html

http://unix.stackexchange.com/questions/26182/what-is-the-need-for-rsync-server-in-daemon-mode

lsyncd

https://code.google.com/p/lsyncd/

Services

http://www.rsync.net
- http://www.rsync.net/products/zfsintro.html

dd

dd - Copy a file, converting and formatting according to the options.
- dd is a common Unix program whose primary purpose is the low-level copying and conversion of raw data.

http://www.vidarholen.net/contents/blog/?p=479

http://wiki.linuxquestions.org/wiki/Dd

https://eklitzke.org/the-cult-of-dd [45]

if stands for input file and of for output file.

dd if=/dev/sda of=/mnt/sdb1/backup.img
  create a backup

dd if=/mnt/sdb1/backup.img of=/dev/sda
  restore a backup

dd if=/dev/sdb of=/dev/sdc
  clone a drive

dd if=/dev/sdb | ssh root@target "(cat >backup.img)"
  backup over network

dd if=/dev/cdrom of=cdimage.iso
  backup a cd

dd if=/dev/sr0 of=myCD.iso bs=2048 conv=noerror,sync
  create an ISO disk image from a CD-ROM.

dd if=/dev/sda2 of=/dev/sdb2 bs=4096 conv=noerror
  Clone one partition to another

dd if=/dev/ad0 of=/dev/ad1 bs=1M conv=noerror
  Clone a hard disk "ad0" to "ad1".

dd if=/dev/zero bs=1024 count=1000000 of=file_1GB

dd if=file_1GB of=/dev/null bs=64k
  drive benchmark test and analyze the sequential read and write performance for 1024 byte blocks

http://allgood38.github.io/a-snapshot-of-your-computer-with-dd-pv-and-gzip-part-1.html

dd if=local-file | ssh <host> "dd of=/path/to/remote-file"

tar pipe

The Tar Pipe - It basically means "copy the src directory to dst, preserving permissions and other special stuff." It does this by firing up two tars – one tarring up src, the other untarring to dst, and wiring them together. [46]

File information

ls

ls
  list files in current directory
ls -l
  long list, each file on a new line with information

ls *
  files in directory and immediate subdiretories

ls -a
  # show hidden files
ls  -A
  # show hidden files, exclude . and ..

ls -m1
  -m fill width with a comma separated list of entries ??
ls --format single-column
  column of names only
ls -l | grep - | awk '{print $9}'
  using awk to show the 9th word (name). strips colour.
ls -l | cut -f9 -s -d" "
  using cut to cut from the 9th word, using space as a delimiter. strips colour.

ls | cat
  # separate words into new lines

Greg's Wiki: ParsingLs - Why you shouldn't parse the output of ls(1)

https://github.com/trapd00r/ls--

https://github.com/ogham/exa

stat

stat .
  display file or file system status
stat -c "%n %a" * | column -t
  directory files + octal

chattr/lsattr

https://en.wikipedia.org/wiki/Chattr

lsattr .

append only (a), compressed (c), no dump (d), immutable (i), data journaling (j), secure deletion (s), no tail-merging (t), undeletable (u), no atime updates (A), synchronous directory updates (D), synchronous updates (S), and top of directory hierarchy (T).

file

file [filename]

binwalk

https://github.com/ReFirmLabs/binwalk - a fast, easy to use tool for analyzing, reverse engineering, and extracting firmware images.

Directory navigation

pwd

cd

cd change/directory/path

http://www.thegeekstuff.com/2008/10/6-awesome-linux-cd-command-hacks-productivity-tip3-for-geeks/

https://github.com/spencertipping/cd

Autojump

z

https://github.com/rupa/z - jump around, Bash. [47]

https://github.com/agkozak/zsh-z - Jump quickly to directories that you have visted "frecently." A native ZSH port of z.sh.

fasd

https://github.com/clvv/fasd - Command-line productivity booster, offers quick access to files and directories, inspired by autojump, z and v. Bash. [48]

v def conf       =>     vim /some/awkward/path/to/type/default.conf
j abc            =>     cd /hell/of/a/awkward/path/to/get/to/abcdef
m movie          =>     mplayer /whatever/whatever/whatever/awesome_movie.mp4
o eng paper      =>     xdg-open /you/dont/remember/where/english_paper.pdf
vim `f rc lo`    =>     vim /etc/rc.local
vim `f rc conf`  =>     vim /etc/rc.conf

alias defaults;

alias a='fasd -a'        # any
alias s='fasd -si'       # show / search / select
alias d='fasd -d'        # directory
alias f='fasd -f'        # file
alias sd='fasd -sid'     # interactive directory selection
alias sf='fasd -sif'     # interactive file selection
alias z='fasd_cd -d'     # cd, same functionality as j in autojump
alias zz='fasd_cd -d -i' # cd with interactive selection

autojump

https://github.com/joelthelion/autojump - very handy for navigation, Python.

pazi

https://github.com/euank/pazi - An autojump "zap to directory" helper, Rust.

zoxide

https://github.com/ajeetdsouza/zoxide - A fast cd command that learns your habits

zfm

https://github.com/pabloariasal/zfm - a minimal command line bookmark manager for zsh built on top of fzf. It lets you bookmark files and directories in your system and rapidly access them.It's intended to be a less intrusive alternative to z, autojump or fasd that doesn't pollute your prompt command or create bookmarks behind the scenes: you have full control over what gets bookmarked and when, like bookmarks on a web browser.

zz

zz - a smart and efficient directory changer [49]

cx

https://github.com/colfrog/cx - a very simple directory history utility written in C, a shell-agnostic clone of z [50]

to sort

http://jeroenjanssens.com/2013/08/16/quickly-navigate-your-filesystem-from-the-command-line.html [51]

https://aur.archlinux.org/packages/z.go-git - baskerville

https://github.com/skywind3000/z.lua - A new cd command that helps you navigate faster by learning your habits zap

https://github.com/huyng/bashmarks

https://github.com/vigneshwaranr/bd [52]

http://bsago.me/exa [53]

https://github.com/jamesob/desk - Lightweight workspace manager for the shell. Desk makes it easy to flip back and forth between different project contexts in your favorite shell. [54]

http://detox.sourceforge.net

Creating files

touch

touch filename
  create a file or update timestamp

mkdir

mkdir directory
mkdir directory -p
  no error if existing, make parent directories as needed

ln

symlink

ln -s {target-filename}
ln -s {target-filename} {symbolic-filename}
  create soft link

fallocate

fallocate - used to manipulate the allocated disk space for a file, either to deallocate or preallocate it. For filesystems which support the fallocate system call, preallocation is done quickly by allocating blocks and marking them as uninitialized, requiring no IO to the data blocks. This is much faster than creating a file by filling it with zeroes. The exit code returned by fallocate is 0 on success and 1 on failure.

Test files

yes abcdefghijklmnopqrstuvwxyz0123456789 > largefile
  # about 10 times faster than running dd if=/dev/urandom of=largefile, about as fast as using dd if=/dev/zero of=filename bs=1M. [55]

Editing files

http://www.gnu.org/software/ed/ed.html

See Vim, Emacs

echo "hello" >> greetings.txt

cat temp.txt >> data.txt
  To append the contents of the file temp.txt to file data.txt

date >> dates.txt
  To append the current date/time timestamp to the file dates.txt

Patching files

patch(1) - apply a diff file to an original

patch runs ed, and ed can run anything [56]

https://github.com/loyso/Scarab - A system to patch your content files. Basically, Scarab is a C++ library to build and apply binary diff package for directories. Scarab is built to be embeddable into your application or service.

Viewing files

<filename

cat

http://harmful.cat-v.org/cat-v/

cat filename
  output file to screen
cat -n filename
  output file to screen w/ line numbers
cat filename1 filename2
  output two files (concatinate)
cat filename1 > filename2
  overwrite filename2 with filename1
cat filename1 >> filename2
  append filename1 to filename2
cat filename{1,2} > filename2
  add filename1 and filename2 together into filename3

http://en.digipedia.org/man/doc/view/dog.1/

head

head filename
  top 10 lines of file
head -23 filename
  top 23 lines of file

tail

tail filename
  bottom 10 lines of file
tail -23 filename
  bottom 23 lines of file

Other

sed -n 20,30p filename
  print lines 20..30 of file [57]

bat

https://github.com/sharkdp/bat - A cat(1) clone with syntax highlighting and Git integration.

look

look(1) - displays any lines in file which contain string as a prefix. As look performs a binary search, the lines in file must be sorted. If file is not specified, the file /usr/share/dict/words is used, only alphanumeric characters are compared and the case of alphabetic charac- ters is ignored.

File pagers

more

more is a filter for paging through text one screenful at a time. This version is especially primitive. Users should realize that less(1) provides more(1) emulation plus extensive enhancements.

less

less is an improvement on more and a funny name.

http://garrett.damore.org/2014/09/modernizing-less.html [58]

lesspipe.sh - a preprocessor for less

most

http://www.jedsoft.org/most/

vimpager

https://github.com/rkitover/vimpager

other

http://sourceforge.net/projects/w3m/

https://joeyh.name/code/scroll/ [59]

Moving files

mv

mv position1 ~/position2
  basic move

https://github.com/jarun/advcpmv - A patch for GNU Core Utilities cp, mv to add progress bars

Rename files in linux / bash using mv command without typing the full name two times - [60]

renameutils

renameutils - a set of programs designed to make renaming of files faster and less cumbersome. The file renaming utilities consists of five programs - qmv, qcp, imv, icp and deurlname.

The qmv ("quick move") program allows file names to be edited in a text editor. The names of all files in a directory are written to a text file, which is then edited by the user. The text file is read and parsed, and the changes are applied to the files.

The qcp ("quick cp") program works like qmv, but copies files instead of moving them.

The imv ("interactive move") program, is trivial but useful when you are too lazy to type (or even complete) the name of the file to rename twice. It allows a file name to be edited in the terminal using the GNU Readline library. icp copies files.

The deurlname program removes URL encoded characters (such as %20 representing space) from file names. Some programs such as w3m tend to keep those characters encoded in saved files.

Métamorphose

Métamorphose - a batch renamer, a program to rename large sets of files and folders quickly and easily.With its extensive feature set, flexibility and powerful interface, Métamorphose is a profesional's tool. A must-have for those that need to rename many files and/or folders on a regular basis.In addition to general usage renaming, it is very useful for photo and music collections, webmasters, programmers, legal and clerical, etc.

nmly

https://github.com/Usbac/nmly - Mass file rename utility with useful functions and written in C

Detox

Detox - a utility designed to clean up filenames. It replaces difficult to work with characters, such as spaces, with standard equivalents. It will also clean up filenames with UTF-8 or Latin-1 (or CP-1252) characters in them.
- https://github.com/dharple/detox

Removing files

rm

rm file

rm -rf directory

find * -maxdepth 0 -name 'keepthis' -prune -o -exec rm -rf '{}' ';'
  # remove all but keepthis [61]

fd somefiletype | xargs rm
  # remove all files returned by fd

scrub

https://github.com/romanhargrave/scrub - a utility that behaves like rm -r, except that rather than simply removing everything it will only remove what you tell it to.

Managing files

mc

http://www.midnight-commander.org/

ranger

http://ranger.nongnu.org/ - ranger is a console file manager with VI key bindings. It provides a minimalistic and nice curses interface with a view on the directory hierarchy. It ships with "rifle", a file launcher that is good at automatically finding out which program to use for what file type.
- https://wiki.archlinux.org/index.php/Ranger

setup ~/.config/ranger/ with defaults;

ranger --copy-config=all

Vifm

Vifm - an ncurses based file manager with vi like keybindings/modes/options/commands/configuration, which also borrows some useful ideas from mutt.

If you use vi, Vifm gives you complete keyboard control over your files without having to learn a new set of commands.

- https://github.com/vifm/vifm

https://github.com/cirala/vifmimg - Image previews using Überzug for Vifm (vi file manager)

deer

https://github.com/Vifon/deer - a file navigator for zsh heavily inspired by ranger.

lscd

https://github.com/hut/lscd - ranger-clone in POSIX shell

nnn

https://github.com/jarun/nnn

noice

noice - small file browser

hunter

https://github.com/rabite0/hunter - a fast and lag-free file browser/manager for the terminal. It features a heavily asynchronous and multi-threaded design and all disk IO happens off the main thread in a non-blocking fashion, so that hunter will always stay responsive, even under heavy load on a slow spinning rust disk, even with all the previews enabled.

cfm

https://github.com/WillEccles/cfm - a TUI file manager with some features. It's quick and light. Whether or not you should use it depends on whether or not you like the name and/or the dev, like with all software.

Other

Worker file manager - MC clone for X11

https://github.com/D630/fzf-fs - acts like a very simple and configurable file browser/navigator for the command line by taking advantage of the general-purpose fuzzy finder fzf. Although coming without Miller columns, fzf-fs is inspired by tools like lscd and deer, which both follow the example set by ranger. [62]

https://github.com/psprint/zsh-navigation-tools - Curses-based tools for ZSH

https://github.com/b4b4r07/enhancd - next-generation cd command with an interactive filter

https://github.com/buffet/filet - a blazingly fast, lightweight file manager, with a focus on a clear and easy to understand code base.

Finding files

See Regex, Search

ls | sort -f | uniq -i -d
  # list duplicate files taking upper/lower case into account [63]

whereis

GNU findutils

http://www.gnu.org/software/findutils/

https://en.wikipedia.org/wiki/GNU_Find_Utilities

GNU Find Utilities are the basic directory searching utilities of the GNU operating system. These programs are typically used in conjunction with other programs to provide modular and powerful directory search and file locating capabilities to other commands.

The tools supplied with this package are:

find - search for files in a directory hierarchy
locate - list files in databases that match a pattern
updatedb - update a file name database
xargs - build and execute command lines from standard input

find

find /usr/share -name README
find ~/Journalism -name '*.txt'
find ~/Programming -path '*/src/*.c'

find ~/Journalism -name '*.txt' -exec cat {} ;
  exec command on result path (aliases don't work in exec argument)

find ~/Images/Screenshots -size +500k -iname '*.jpg'
find ~/Journalism -name '*.txt' -print0 | xargs -0 cat   (faster than above)

find / -group [group]
find / -user [user]

find . -mtime -[n]
  File's data was last modified n*24 hours ago

find . -mtime +5 -exec rm {} \;
  remove files older than 5 days

find . -type f -links +1
  list hard links

https://www.eriwen.com/productivity/find-is-a-beautiful-tool [64]

-9fans- find(1) revisited.

https://github.com/jaimebuelta/ffind
- http://wrongsideofmemphis.com/2012/11/19/ffind-a-sane-replacement-for-command-line-file-search/

xargs

http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
- https://en.wikipedia.org/wiki/Xargs

xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input. Blank lines on the standard input are ignored.

Because Unix filenames can contain blanks and newlines, this default behaviour is often problematic; filenames containing blanks and/or newlines are in‐correctly processed by xargs. In these situations it is better to use the -0 option, which prevents such problems. When using this option you will need to ensure that the program which produces the input for xargs also uses a null character as a separator. If that program is GNU find for example, the -print0 option does this for you.

If any invocation of the command exits with a status of 255, xargs will stop immediately without reading any further input. An error message is issued on stderr when this happens.

Linux and Unix xargs command tutorial with examples

echo 'one two three' | xargs mkdir
ls
$ one two three

echo 'one two three' | xargs -t rm
$ rm one two three

find . -name '*.py' | xargs wc -l
  # Recursively find all Python files and count the number of lines

find . -name '*~' | xargs -0 rm
  # Recursively find all Emacs backup files and remove them, alloging for filenames with whitespace

find . -name '*.py' | xargs grep 'import'
  # Recursively find all Python files and search them for the word ‘import’

find . -type f -print0 | xargs -0 stat -c "%y %s %n"
  # prints permissions in octal (0775, etc.)

http://www.cyberciti.biz/faq/linux-unix-bsd-xargs-construct-argument-lists-utility/

http://offbytwo.com/2011/06/26/things-you-didnt-know-about-xargs.html

find . -name "*.ext" -print0 | xargs -n 1000 -I '{}' mv '{}' ../..
  # for a number of files to high for the mv command (move to two directories up)

bfs

https://github.com/tavianator/bfs - a variant of the UNIX find command that operates breadth-first rather than depth-first. It is otherwise intended to be compatible with many versions of find, including POSIX find, GNU find, {Free,Open,Net}BSD find, macOS find

fd

https://github.com/sharkdp/fd - a simple, fast and user-friendly alternative to find. While it does not seek to mirror all of find's powerful functionality, it provides sensible (opinionated) defaults for 80% of the use cases.

hf

https://github.com/hugows/hf - a command line utility to quickly find files and execute a command - something like Helm/Anything/CtrlP for the terminal. It tries to find the best match, like other fuzzy finders (Sublime, ido, Helm). [65]

locate

http://en.wikipedia.org/wiki/Locate_(Unix)

locate fileordirectory

locate /

locate / | xargs -i echo 'test -f "{}" && echo "{}"' | sh
  # only files

locate / | xargs -i echo 'test -f "{}" && echo "{}"' | sh
  # only directories [66]

https://wiki.archlinux.org/index.php/locate
- https://www.archlinux.org/news/slocate-replaced-by-mlocate/

"Although in other distros locate and updatedb are in the findutils package, they are no longer present in Arch's package. To use it, install the mlocate package. mlocate is a newer implementation of the tool, but is used in exactly the same way."

mlocate is a locate/updatedb implementation. The 'm' stands for "merging": updatedb reuses the existing database to avoid rereading most of the file system, which makes updatedb faster and does not trash the system caches as much. The locate(1) utility is intended to be completely compatible to slocate. It also attempts to be compatible to GNU locate, when it does not conflict with slocate compatibility.

Before locate can be used, the database will need to be created. To do this, simply run updatedb as root.

sudo updatedb
  # creates/updates a db file of paths that is queried by locate

http://linux.die.net/man/8/updatedb

/etc/updatedb.conf

PRUNE_BIND_MOUNTS = "yes"
PRUNEFS = "9p afs anon_inodefs auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset cramfs debugfs devpts devtmpfs ecryptfs exofs ftpfs fuse fuse.encfs fuse.sshfs fusectl gfs gfs2 hugetlbfs inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs shfs smbfs sockfs sshfs sysfs tmpfs ubifs udf usbfs vboxsf"
PRUNENAMES = ".git .hg .svn"
PRUNEPATHS = "/afs /media /mnt /net /sfs /tmp /udev /var/cache /var/lib/pacman/local /var/lock /var/run /var/spool /var/tmp"

http://linux.die.net/man/5/mlocate.db

/var/lib/mlocate/mlocate.db

http://jvns.ca/blog/2015/03/05/how-the-locate-command-works-and-lets-rewrite-it-in-one-minute

strings /var/lib/mlocate/mlocate.db | grep -E '^/.*config'
  # query db directly, needs sudo or sudoers or acl

fselect

https://github.com/jhspetersson/fselect - Find files with SQL-like queries

to sort

http://arstechnica.com/information-technology/2011/07/ask-ars-how-to-use-the-find-command-in-a-pipeline/

http://www.pixelbeat.org/fslint/

Diffing files

File and directory comparison.

diff(1)

https://github.com/spcau/godiff - A File/Directory diff-like comparison tool with HTML output.

diffoscope - in-depth comparison of files, archives, and directories. diffoscope will try to get to the bottom of what makes files or directories different. It will recursively unpack archives of many kinds and transform various binary formats into more human readable form to compare them. It can compare two tarballs, ISO images, or PDF just as easily.
- https://salsa.debian.org/reproducible-builds/diffoscope

https://github.com/renardchien/Differnator - a tool for comparing two similar codebases leveraging diff command operations and other comparative operations. Differnator can do large scale comparisons across several directories or numerous pairs of similar codebases.

GUI

Meld

KDiff3

https://en.wikipedia.org/wiki/Kompare

Krusader - Twin panel file management for your desktop - an advanced twin panel (commander style) file manager for KDE Plasma and other desktops in the *nix world, similar to Midnight or Total Commander.

xxdiff - a graphical file and directories comparator and merge tool, provided under the GNU GPL open source license. It has reached stable state, and is known to run on many popular unices, including IRIX, Linux, Solaris, HP/UX, DEC Tru64. It has been deployed inside many large organizations and is being actively maintained by its author (Martin Blais).

Diffuse - graphical tool for merging and comparing text files

Finding duplicate files

md5deep and hashdeep - md5deep is a set of programs to compute MD5, SHA-1, SHA-256, Tiger, or Whirlpool message digests on an arbitrary number of files. md5deep is similar to the md5sum program found in the GNU Coreutils package, but has the following additional features: Recursive operation - md5deep is able to recursive examine an entire directory tree. That is, compute the MD5 for every file in a directory and for every file in every subdirectory. Comparison mode - md5deep can accept a list of known hashes and compare them to a set of input files. The program can display either those input files that match the list of known hashes or those that do not match. Hashes sets can be drawn from Encase, the National Software Reference Library, iLook Investigator, Hashkeeper, md5sum, BSD md5, and other generic hash generating programs. Users are welcome to add functionality to read other formats too! Time estimation - md5deep can produce a time estimate when it's processing very large files. Piecewise hashing - Hash input files in arbitrary sized blocks File type mode - md5deep can process only files of a certain type, such as regular files, block devices, etc. hashdeep is a program to compute, match, and audit hashsets. With traditional matching, programs report if an input file matched one in a set of knows or if the input file did not match. It's hard to get a complete sense of the state of the input files compared to the set of knowns. It's possible to have matched files, missing files, files that have moved in the set, and to find new files not in the set. Hashdeep can report all of these conditions. It can even spot hash collisions, when an input file matches a known file in one hash algorithm but not in others. The results are displayed in an audit report.

FreeFileSync - a folder comparison and synchronization software that creates and manages backup copies of all your important files. Instead of copying every file every time, FreeFileSync determines the differences between a source and a target folder and transfers only the minimum amount of data needed. FreeFileSync is Open Source software, available for Windows, Linux and macOS.

dupeGuru - a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It’s written mostly in Python 3 and has the peculiarity of using multiple GUI toolkits, all using the same core Python code. On OS X, the UI layer is written in Objective-C and uses Cocoa. On Linux 7 Windows, it’s written in Python and uses Qt5. dupeGuru is a tool to find duplicate files on your computer. It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same. dupeGuru runs on Mac OS X and Linux. dupeGuru is efficient. Find your duplicate files in minutes, thanks to its quick fuzzy matching algorithm. dupeGuru not only finds filenames that are the same, but it also finds similar filenames.
- https://github.com/hsoft/dupeguru

https://github.com/adrianlopezroche/fdupes - a program for identifying or deleting duplicate files residing within specified directories.
- http://en.wikipedia.org/wiki/Fdupes

https://github.com/jbruchon/jdupes - A powerful duplicate file finder and an enhanced fork of 'fdupes'. What jdupes is not: a similar (but not identical) file finding tool.

https://github.com/sahib/rmlint - finds space waste and other broken things on your filesystem and offers to remove it.

Duff - a Unix command-line utility for quickly finding duplicates in a given set of files.Duff is written in C, uses gettext where available, is licensed under the zlib/libpng license and should compile on most modern Unices.

https://freshmeat.sourceforge.net/projects/ftwin - a tool useful to find duplicate files or pictures according to their content on your file system.

dupd - A fast and convenient CLI tool for finding duplicate files
- https://github.com/jvirkki/dupd

https://github.com/IgnorantGuru/rmdupe - uses standard linux commands to search within specified folders for duplicate files, regardless of filename or extension. Before duplicate candidates are removed they are compared byte-for-byte. rmdupe can also check duplicates against one or more reference folders, can trash files instead of removing them, allows for a custom removal command, and can limit its search to files of specified size. rmdupe includes a simulation mode which reports what will be done for a given command without actually removing any files.

https://github.com/sebastien/sink - Swiss army knife for directory comparison and synchronization. Python.

https://github.com/tbores/fiche - Directory comparison tools. Fiche uses hashes based comparison. Python.

https://github.com/stephen322/dircmp - directory comparison utility. D.

https://github.com/l00g33k/dirtree - This is a command line directory utility that works on both Windows and Linux. It scans the directory information and store in a file. This allows two computers to be scanned at a different time and even in two different country and allows the comparison to be carried out at a different time and place. This allows terabytes of disk with millions…

https://github.com/bmaia/binwally - Binary and Directory tree comparison tool using Fuzzy Hashing
- w00tsec: Binwally: Directory tree diff tool using Fuzzy Hashing

https://github.com/drrlvn/fscmp - Directory/file comparison utility. Rust.

Archiving

https://en.wikipedia.org/wiki/Archive_file

https://en.wikipedia.org/wiki/File_archiver

ar

https://en.wikipedia.org/wiki/ar_(Unix) - a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

shar

http://www.gnu.org/software/sharutils/

http://en.wikipedia.org/wiki/shar - an abbreviation of shell archive, is an archive format. A shar file is a shell script, and executing it will recreate the files. This is a type of self-extracting archive file. It can be created with the Unix shar utility. To extract the files, only the standard Unix Bourne shell sh is usually required. Note that shar is not specified by the Single Unix Specification, so it is not formally a component of Unix, but a legacy utility.

unshar programs have been written for other operating systems but are not always reliable; shar files are shell scripts and can theoretically do anything that a shell script can do (including using incompatible features of enhanced or workalike shells), limiting their utility outside the Unix world.

The drawback of self-extracting shell scripts (any kind, not just shar) is that they rely on a particular implementation of programs; shell archives created with older versions of makeself

GNU paxutils

http://www.gnu.org/software/paxutils/

cpio

https://en.wikipedia.org/wiki/cpio - The cpio archive format has several basic limitations: It does not store user and group names, only numbers. As a result, it cannot be reliably used to transfer files between systems with dissimilar user and group numbering.

cd /old-dir
find . -name '*.mov' -print | cpio -pvdumB /new-dir [67]

tar

tar

-z: Compress archive using gzip program
-c: Create archive
-v: Verbose i.e display progress while creating archive
-f: Archive File name

tar -zcvf archive-name.tar.gz directory-name

tar -cjf foo.tar.bz2 bar/
  create bzipped tar archive of the directory bar called foo.tar.bz2

tar -xvf foo.tar
  verbosely extract foo.tar
tar -xzf foo.tar.gz
  extract gzipped foo.tar.gz

tar -xjf foo.tar.bz2 -C bar/
  extract bzipped foo.tar.bz2 after changing directory to bar
tar -xzf foo.tar.gz blah.txt
  extract the file blah.txt from foo.tar.gz

http://hacktux.com/tar/removing/leading/from/member/names

pax

pax will read, write, and list the members of an archive file, and will copy directory hierarchies. pax operation is independent of the specific archive format, and supports a wide variety of different archive formats. A list of supported archive formats can be found under the description of the -x option. [68]
- https://en.wikipedia.org/wiki/Pax_(Unix)

http://www.linuxjournal.com/content/make-peace-pax

http://h3manth.com/content/pax-powerful-tool-read-and-write-file-archives-and-copy-directory-hierarchies

mkdir newdir
cd olddir
pax -rw . newdir

libarchive

libarchive - Multi-format archive and compression library
- https://github.com/libarchive/libarchive

Afio

Afio - makes cpio-format archives. It deals somewhat gracefully with input data corruption, supports multi-volume archives during interactive operation, and can make compressed archives that are much safer than compressed tar or cpio archives. Afio is best used as an `archive engine' in a backup script.
- https://github.com/kholtman/afio

Trash can

https://www.freedesktop.org/wiki/Specifications/trash-spec/

https://github.com/andreafrancia/trash-cli

https://github.com/robrwo/bashtrash

Caches

https://github.com/xtaran/unburden-home-dir - Automatically unburden $HOME from caches, etc. Useful for $HOME on SSDs, small disks or slow NFS homes. Can be triggered via an hook in /etc/X11/Xsession.d/.

Storage space

CLI tools

du (disk usage)

du -sh
  size of a folder
du -S
  size of files in a folder

du -aB1m|awk '$1 >= 100'
  everything over 100Mb

cd / | sudo du -khs *
  show root folder size

sudo du -a --max-depth=1 /usr/lib | sort -n -r | head -n 20
  size of program folders /usr/lib

du -sk ./* | sort -nr | awk 'BEGIN{ pref[1]="K"; pref[2]="M"; pref[3]="G";} { total = total + $1;
x = $1; y = 1;  while( x > 1024 ) { x = (x + 1023)/1024; y++; }
printf("%g%s\t%s\n",int(x*10)/10,pref[y],$2); } END { y = 1; while( total > 1024 )
{ total = (total + 1023)/1024; y++; } printf("Total: %g%s\n",int(total*10)/10,pref[y]); }'

diskus

https://github.com/sharkdp/diskus - a very simple program that computes the total size of the current directory. It is a parallelized version of du -sh. On my 8-core laptop, it is about ten times faster than du with a cold disk cache and more than three times faster with a warm disk cache. [69]

dust

https://github.com/bootandy/dust - A more intuitive version of du in rust [70]

df

df - report file system disk space usage
- http://en.wikipedia.org/wiki/Df_(Unix) - a standard Unix command used to display the amount of available disk space for file systems on which the invoking user has appropriate read access. df is typically implemented using the statfs or statvfs system calls.

df -h
  human readable

di

di - a disk information utility, displaying everything (and more) that your 'df' command does. It features the ability to display your disk usage in whatever format you prefer. It also checks the user and group quotas, so that the user sees the space available for their use, not the system wide disk space.

dfc

https://github.com/Rolinh/dfc - a tool to report file system space usage information. When the output is a terminal, it uses color and graphs by default. It has a lot of features such as HTML, JSON and CSV export, multiple filtering options, the ability to show mount options and so on.

dv

https://github.com/ARM-DOE/dv - a command line tool for visualizing disk data usage on systems that don't have access to a graphical environment. After scanning a directory, dv generates an interactive webpage that displays useful information about the disk's contents. This webpage is a sunburst partition layout, where each subdirectory is displayed as a portion of the scanned directory. Hovering over a directory shows its size and percentage of the whole. dv is multiprocessed, and can be used to quickly visualize what is taking up space on a system.

TUI

tdu

tdu - a text-mode disk-usage utility
- https://github.com/dse/tdu

cdu

cdu - Color du, a perl script which calls du and display a pretty histogram with optional colors which allow to imediatly see the directories which take disk space.

ncdu

ncdu - ncurses disk usage

ncdu / --exclude /home --exclude /media --exclude /run/media
  check everything apart from home and external drives

ncdu / --exclude /home --exclude /media --exclude /run/media
  check everything apart from external drives

ncdu / --exclude /home --exclude /media --exclude /run/media --exclude /boot
--exclude /tmp --exclude /dev --exclude /proc
  just the root partition

godu

https://github.com/viktomas/godu - Simple golang utility helping to discover large files/folders.

Broot

https://github.com/Canop/broot - An interactive tree view, a fuzzy search, a balanced BFS descent and customizable commands. [71]

GUI

Baobab

Baobab - aka Disk Usage Analyzer, is a graphical, menu-driven viewer that you can use to view and monitor your disk usage and folder structure. It is part of every GNOME desktop.
- https://gitlab.gnome.org/GNOME/baobab

qdirstat

https://github.com/shundhammer/qdirstat - a graphical application to show where your disk space has gone and to help you to clean it up. This is a Qt-only port of the old Qt3/KDE3-based KDirStat, now based on the latest Qt 5. It does not need any KDE libs or infrastructure. It runs on every X11-based desktop on Linux, BSD and other Unix-like systems.

xdiskusage

http://xdiskusage.sourceforge.net/

Filelight

Filelight - creates an interactive map of concentric, segmented rings that help visualise disk usage on your computer.

Duc

Duc: Dude, where are my bytes! - a collection of tools for indexing, inspecting and visualizing disk usage. Duc maintains a database of accumulated sizes of directories of the file system, and allows you to query this database with some tools, or create fancy graphs showing you where your bytes are.

Web

diskover

diskover - File system crawler, storage search engine and storage analytics software powered by Elasticsearch to help visualize and manage your disk space usage.

Harnessing Your Data with Diskover

Tagging

TMSU - a tool for tagging your files. It provides a simple command-line tool for applying tags and a virtual filesystem so that you can get a tag-based view of your files from within any other program.TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever you put them. TMSU maintains its own database and you simply gain an additional view, which you can mount, based upon the tags you set up. The only commitment required is your time and there's absolutely no lock-in.
- https://github.com/oniony/TMSU

https://github.com/alvatar/dfym

https://bitbucket.org/majo/oyepa/src/default/oyepa - a way to organize (and locate) your documents through ----- the use of tags

Storage

Media

Tape

Optical

CD

CD ISO

Burning

DVD

Blu-Ray

OCD

M-DISC

HDD

Failure

Flash

SSD

SD/MicroSD cards

Floppy

To sort

tmpfs

Partitions

MBR

GPT

Cache

Loopback

Utils

Repair

SMART

Virtualization

LVM

Snapshots

RAID

GUI

Caching

RAID

Management

System Storage Manager

Snapper

Blivet

File systems

/etc/fstab

Formatting

Swap

Ext2/3/4

ZFS

Btrfs

General

Commands

Tools

Articles

XFS

bcachefs

ZUFS

WAFL

NILFS

DOS / Windows

FAT

Petit FAT

NTFS

ReFS

Apple

HFS

HFS+

APFS

Flash

exFAT

SFFS

F2FS

YAFFS

spiffs

log_fs.h

AXFS

Networked

NFS

SMB / CIFS

NBD

9P

Union mount

UnionFS

aufs

Overlayfs