Database
General
- https://en.wikipedia.org/wiki/Database - an organized collection of data. The data are typically organized to model relevant aspects of reality in a way that supports processes requiring this information. For example, modeling the availability of rooms in hotels in a way that supports finding a hotel with vacancies.
Database management systems (DBMSs) are specially designed applications that interact with the user, other applications, and the database itself to capture and analyze data. A general-purpose database management system (DBMS) is a software system designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, PostgreSQL, SQLite, Microsoft SQL Server, Microsoft Access, Oracle, SAP, dBASE, FoxPro, IBM DB2, LibreOffice Base and FileMaker Pro. A database is not generally portable across different DBMS, but different DBMSs can inter-operate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one database.
- Disambiguating Databases - Use the database built for your access model. [1]
- https://en.wikipedia.org/wiki/Database_administrator - DBAs) use specialized software to store and organize data. The role may include capacity planning, installation, configuration, database design, migration, performance monitoring, security, troubleshooting, as well as backup and data recovery.
- https://en.wikipedia.org/wiki/Database_administration - the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such, corporations that use DBMS software often hire specialized information technology personnel called database administrators or DBAs.
- https://en.wikipedia.org/wiki/Database_tuning - describes a group of activities used to optimize and homogenize the performance of a database. It usually overlaps with query tuning, but refers to design of the database files, selection of the database management system (DBMS) application, and configuration of the database's environment (operating system, CPU, etc.).
- https://en.wikipedia.org/wiki/Query_language - an be classified according to whether they are database query languages or information retrieval query languages. The difference is that a database query language attempts to give factual answers to factual questions, while an information retrieval query language attempts to find documents containing information that is relevant to an area of inquiry.
- https://en.wikipedia.org/wiki/Database_schema - the structure of a database described in a formal language supported typically by a relational database management system (RDBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divided into database tables in the case of relational databases). The formal definition of a database schema is a set of formulas (sentences) called integrity constraints imposed on a database. These integrity constraints ensure compatibility between parts of the schema. All constraints are expressible in the same language. A database can be considered a structure in realization of the database language. The states of a created conceptual schema are transformed into an explicit mapping, the database schema. This describes how real-world entities are modeled in the database.
- https://en.wikipedia.org/wiki/Conceptual_schema - or conceptual data model is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient detail to build an actual database. This level describes the structure of the whole database for a group of users. The conceptual model is also known as the data model that can be used to describe the conceptual schema when a database system is implemented. It hides the internal details of physical storage and targets on describing entities, datatypes, relationships and constraints.
- https://en.wikipedia.org/wiki/Logical_schema - a data model of a specific problem domain expressed independently of a particular database management product or storage technology (physical data model) but in terms of data structures such as relational tables and columns, object-oriented classes, or XML tags. This is as opposed to a conceptual data model, which describes the semantics of an organization without reference to technology.
- https://en.wikipedia.org/wiki/Physical_schema - or database design is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, though it may be reverse-engineered from a given database implementation. A complete physical data model will include all the database artifacts required to create relationships between tables or to achieve performance goals, such as indexes, constraint definitions, linking tables, partitioned tables or clusters. Analysts can usually use a physical data model to calculate storage estimates; it may include specific storage allocation details for a given database system.
- https://en.wikipedia.org/wiki/Column_(database) - a set of data values of a particular simple type, one for each row of the table. The columns provide the structure according to which the rows are composed. When a column allows data values of a single type, it does not essentially mean it only has simple text values. In relational database terminology, column's equivalent is called attribute.
- https://en.wikipedia.org/wiki/Row_(database) - also called a record or tuple — represents a single, implicitly structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns or fields. Each row in a table represents a set of related data, and every row in the table has the same structure.
- https://en.wikipedia.org/wiki/Field_(computer_science) - a record divided into fields. Relational databases arrange data as sets of database records, also called rows. Each record consists of several fields; the fields of all records form the columns. Examples of fields: name, gender, hair colour.
- https://en.wikipedia.org/wiki/View_(SQL) - the result set of a stored query on the data, which the database users can query just as they would in a persistent database collection object. This pre-established query command is kept in the database dictionary. Unlike ordinary base tables in a relational database, a view does not form part of the physical schema: as a result set, it is a virtual table computed or collated dynamically from data in the database when access to that view is requested. Changes applied to the data in a relevant underlying table are reflected in the data shown in subsequent invocations of the view. In some NoSQL databases, views are the only way to query data.
Database practitioners can define views as read-only or updatable. If the database system can determine the reverse mapping from the view schema to the schema of the underlying base tables, then the view is updatable. INSERT, UPDATE, and DELETE operations can be performed on updatable views. Read-only views do not support such operations because the DBMS cannot map the changes to the underlying base tables. A view update is done by key preservation.
- https://en.wikipedia.org/wiki/Materialized_view - a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.
- https://en.wikipedia.org/wiki/Database_index - a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.
- https://en.wikipedia.org/wiki/CAP_theorem - also named Brewer's theorem after computer scientist Eric Brewer, states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
- Consistency: Every read receives the most recent write or an error
- Availability: Every request receives a (non-error) response – without guarantee that it contains the most recent write
- Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
In particular, the CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability. Note that consistency as defined in the CAP theorem is quite different from the consistency guaranteed in ACID database transactions.
- https://en.wikipedia.org/wiki/PACELC_theorem - an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).
- https://en.wikipedia.org/wiki/ACID_(computer_science) - Atomicity, Consistency, Isolation, Durability, is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc. In the context of databases, a sequence of database operations that satisfies the ACID properties (and these can be perceived as a single logical operation on the data) is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.
- http://opensourceconnections.com/blog/2013/12/11/codds-relational-vision-has-nosql-come-full-circle/
- https://en.wikipedia.org/wiki/Correlation_database - a database management system (DBMS) that is data-model-independent and designed to efficiently handle unplanned, ad hoc queries in an analytical system environment. Because a correlation DBMS stores each unique data value only once, the physical database size is significantly smaller than relational or column-oriented databases, without the use of data compression techniques. Above approximately 30GB, a correlation DBMS may become smaller than the raw data set.
- YouTube: Database as Filesystem
- LibreOffice Base - a full-featured desktop database front end, designed to meet the needs of a broad array of users. Base caters to power users and enterprise requirements, providing native-support drivers for some of the most widely employed multi-user database engines: MySQL/MariaDB, Adabas D, MS Access and PostgreSQL. In addition, the built-in support for JDBC- and ODBC-standard drivers allows you to connect to virtually any other existing database engine as well.
Relational
- https://en.wikipedia.org/wiki/Relational_database_management_system - a database management system (DBMS) that is based on the relational model as invented by E. F. Codd, of IBM's San Jose Research Laboratory. In 2017, many of the databases in widespread use are based on the relational database model.
- https://en.wikipedia.org/wiki/Relational_calculus - consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries. This in contrast to the relational algebra which is also part of the relational model but provides a more procedural way for specifying queries.
- https://en.wikipedia.org/wiki/Relational_algebra - first created by E.F. Codd while at IBM, is a family of algebras with a well-founded semantics used for modelling the data stored in relational databases, and defining queries on it. The main application of relational algebra is providing a theoretical foundation for relational databases, particularly query languages for such databases, chief among which is SQL.
- etc.
- Jailer - a tool for database subsetting, schema and data browsing. It exports consistent, referentially intact row-sets from relational databases. It removes obsolete data without violating integrity. It is DBMS agnostic (by using JDBC), platform independent, and generates DbUnit datasets, hierarchically structured XML, and topologically sorted SQL-DML.
Object
Media
- https://en.wikipedia.org/wiki/Multimedia_database - a collection of related for multimedia data. The multimedia data include one or more primary media data types such as text, images, graphic objects (including drawings, sketches and illustrations) animation sequences, audio and video. A Multimedia Database Management System (MMDBMS) is a framework that manages different types of data potentially represented in a wide diversity of formats on a wide array of media sources. It provides support for multimedia data types, and facilitate for creation, storage, access, query and control of a multimedia database.
Vector
- What is vector search? - Algolia Blog | Algolia - Vector search is a way to find related objects that have similar characteristics using machine learning models that detect semantic relationships between objects in an index. Solutions for vector search and recommendation are becoming more and more common. If you want to add a natural language text search on your site, create image search, or build a powerful recommendation system, you’ll want to look into using vectors.
- https://github.com/m1guelpf/tinyvector - a tiny embedding database in pure Rust
- https://github.com/typesense/typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
Embedded
- https://en.wikipedia.org/wiki/Embedded_database - a database management system which is tightly integrated with an application software; it is embedded in the application.
Chroma
- Chroma - the open-source embedding database. The fastest way to build Python or JavaScript LLM apps with memory!
SpacetimeDB
- SpacetimeDB - You can think of SpacetimeDB as a database that is also a server. It is a relational database system that lets you upload your application logic directly into the database by way of very fancy stored procedures called "modules". Instead of deploying a web or game server that sits in between your clients and your database, your clients connect directly to the database and execute your application logic inside the database itself. You can write all of your permission and authorization logic right inside your module just as you would in a normal server. This means that you can write your entire application in a single language, Rust, and deploy it as a single binary. No more microservices, no more containers, no more Kubernetes, no more Docker, no more VMs, no more DevOps, no more infrastructure, no more ops, no more servers.
Tarantool
- Tarantool - middleware for data
- https://github.com/tarantool/tarantool - an in-memory computing platform consisting of a database and an application server.
- https://github.com/tarantool/awesome-tarantool - A curated list of delightful Tarantool modules, connectors and other resources
SQL
ugh
- YouTube: SQL (Video versions) - Khan Academy
SELECT * from table;
- https://vimeo.com/56639635 - Michael 'Monty' Widenius - Author of the MySQL Server and MariaDB fork
- SQL-DK – an batch/terminal client for relational databases
- https://en.wikipedia.org/wiki/Information_schema - (information_schema) is an ANSI-standard set of read-only views which provide information about all of the tables, views, columns, and procedures in a database. It can be used as a source of the information which some databases make available through non-standard commands
- https://github.com/xo/usql - Universal command-line interface for SQL databases [9]
- https://github.com/forbesmyester/esqlate - Build minimum viable admin panels quickly with just SQL
- https://github.com/lerocha/chinook-database - Sample database for SQL Server, Oracle, MySQL, PostgreSQL, SQLite, DB2
MySQL
- Wikipedia:MySQL
- InnoDB is the default storage engine for MySQL
MariaDB is compatible with MySQL, you probably want to use that, if not Postgres
- YouTube: MySQL - A series covering working with MySQL including managing databases, tables and data.
- YouTube: Mysql Database - playlist
Connecting
mysql -p # connect with anonymous user, prompt for password
mysql -u username -ppassword -h nonlocalhost dbname # specify user, password and nonlocalhost address, USE dbname
Admin
mysqladmin - a client for performing administrative operations. You can use it to check the server's configuration and current status, to create and drop databases, and more.
- YouTube: 15 Mysql Database MySQL Admin
SHOW engines; SHOW processlist; Show variables;
Database management
SHOW databases;
USE [db name];
CREATE DATABASE [dbname];
CREATE DATABASE IF NOT EXISTS [dbname]
mysql -u username -p -e "CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci"; # drupal 7
select database(); # show which database is in use
DROP database dbname; # remove dbname from db;
User management
CREATE USER 'jeffrey'@'localhost' IDENTIFIED BY 'newpassword';
SELECT User FROM mysql.user; # show all database users
USE mysql; SET PASSWORD FOR 'user-name-here'@'hostname-name-here' = PASSWORD('new-password-here');
UPDATE mysql.user SET Password=PASSWORD('new-password-here') WHERE User='user-name-here' AND Host='host-name-here';
Passwords
SET PASSWORD FOR 'user'@'localhost' = PASSWORD('mypass');
mysqladmin -u user_name -h host_name password "newpwd"
- Resetting a forgotten MySQL root password, starting with --skip-grant-tables
Password=PASSWORD('NewPassword') WHERE User='root'; FLUSH PRIVILEGES;
Permissions
SHOW GRANTS; # show permissions for current db user
SHOW GRANTS FOR user@localhost; # show permissions for a user
SELECT CONCAT("SHOW GRANTS FOR '",user,"'@'",host,"';") FROM mysql.user WHERE host!='localhost'; # Create command list for showing user grants [10]
GRANT all on [dbname].* TO '[username]'; [11]
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE TEMPORARY TABLES ON databasename.* TO 'username'@'localhost' IDENTIFIED BY 'password'; # drupal 7
create database wikidb; grant index, create, select, insert, update, delete, alter, lock tables on wikidb.* to 'wikiuser'@'localhost' identified by 'password'; #mediawiki??
REVOKE ALL PRIVILEGES, GRANT OPTION FROM user [, user] ...
FLUSH privileges; # Reloads the privileges from the grant tables in the mysql database.
The server caches information in memory as a result of GRANT, CREATE USER, CREATE SERVER, and INSTALL PLUGIN statements. This memory is not released by the corresponding REVOKE, DROP USER, DROP SERVER, and UNINSTALL PLUGIN statements, so for a server that executes many instances of the statements that cause caching, there will be an increase in memory use. This cached memory can be freed with FLUSH PRIVILEGES.
Table management
SHOW tables; SELECT * from mysql.user; # return ascii table with rows from user table from mysql database SELECT * from mysql.user\G; # return vertical row information from user table from mysql database
pager less -SFX # info will now be returned via less, use arrow keys to navigate large tables, q to quit this mode
nopager # reset output from pager to stdout
SELECT table_schema "Data Base Name", sum( data_length + index_length ) / 1024 / 1024 "Data Base Size in MB" FROM information_schema.TABLES GROUP BY table_schema ;
13.8.1. DESCRIBE provides information about the columns in a table.
Troubleshooting
- 4.6.7. mysqlbinlog — Utility for Processing Binary Log Files
Backup and restore
mysql < database.sql # if database file was saved with CREATE DATABASE
mysql databasename < database.sql # specify database name to import database file
mysqldump
mysqldump [options] db_name [tbl_name ...]
mysqldump -u [username] -p -A -R -E --triggers --single-transaction > full_backup.sql # full database backup [12] # -A For all databases (you can also use --all-databases) # -R For all routines (stored procedures & triggers) # -E For all events # --single-transaction Without locking the tables i.e., without interrupting any connection (R/W).
mysqldump -u root -ppassword dbname | mysql -u root -ppassword --host=remote-server -C dbname # copy direct to new database instance, direct db connection # REMEMBER - zsh, space before the command to hide from history
mysqldump -u username -ppassword dbname | ssh user@remote.box.com mysql -u username -ppassword dbname # copy direct to new database instance, via ssh connection # REMEMBER - zsh, space before the command to hide from history # fails for large DBs
mysqldump -u username -ppassword dbname | gzip -c | ssh USERNAME@YOUR_TO_HOST 'cat > ~/dump.sql.gz' # gzip across the wire to a compressed file on the other end # REMEMBER - zsh, space before the command to hide from history
- https://github.com/sadreck/mysqldbsplit - This script breaks down a mysqldump file into one-file-per-table. [13]
mysqlbackup
Install the mysql-client package to access.
For a non-busy server;
mysqlbackup --port=3306 --protocol=tcp --user=root --password --backup-dir=/home/user/backupdir backup-and-apply-log
mysqlhotcopy
- http://dev.mysql.com/doc/refman/5.0/en/mysqlhotcopy.html - currently deprecated.
Xtrabackup
- Percona XtraBackup - Documentation - an open-source hot backup utility for MySQL - based servers that doesn’t lock your database during the backup. It can back up data from InnoDB, XtraDB, and MyISAM tables on MySQL 5.1, 5.5, 5.6 and 5.7 servers, as well as Percona Server with XtraDB.
Replication
Tools
- https://github.com/dbcli/mycli - A command line client for MySQL that can do auto-completion and syntax highlighting.
- dbdeploy is a Database Change Management tool.
- https://github.com/tanin47/php_db_migrate
- https://bitbucket.org/stepancheg/mysql-diff/wiki/Home
- dBug - "PHP version of ColdFusion’s cfdump. Outputs colored and structured tabular variable information. Variable types supported are: Arrays, Classes/Objects, Database and XML Resources."
- anywhereindb - Sometime we need to find out a small piece of string in big Database. Like where is the configuration is saved, or where is Jon's Date of birth is saved. This code is search all the tables and all the rows and columns in a MYSQL Database. The code is written in PHP. For faster result, we are only searching in the varchar field.
- http://sourceforge.net/projects/ajaxmytop/ - monitoring
Scripts
- Search Replace DB - This script was made to aid the process of migrating PHP and MySQL based websites. It has additional features for WordPress but works for most other similar CMSes.
./searchreplacedb2cli.php --host localhost --user root --database test --pass "pass" --charset utf\-8 --search "findMe" --replace "replaceMe" --dry-run
Performance
Native clients
- SQL Workbench/J - a free, DBMS-independent, cross-platform SQL query tool. It is written in Java and should run on any operating system that provides a Java Runtime Environment. Its main focus is on running SQL scripts (either interactively or as a batch) and export/import features. Graphical query building or more advanced DBA tasks are not the focus and are not planned
SQuirreL SQL Client
- SQuirreL SQL Client - a graphical SQL client written in Java that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc.
HeidiSQL
- HeidiSQL - a useful and reliable tool designed for web developers using the popular MySQL server, Microsoft SQL databases and PostgreSQL. It enables you to browse and edit data, create and edit tables, views, procedures, triggers and scheduled events. Also, you can export structure and data either to SQL file, clipboard or to other servers. [14]
DBeaver
- DBeaver - Free multi-platform database tool for developers, SQL programmers, database administrators and analysts. Supports all popular databases: MySQL, PostgreSQL, MariaDB, SQLite, Oracle, DB2, SQL Server, Sybase, MS Access, Teradata, Firebird, Derby, etc.
Web clients
phpMyAdmin
Chive
SQL Buddy
- SQL Buddy - Web based MySQL administration
Adminer
- Adminer - single php file
wget http://www.adminer.org/latest-mysql-en.php -O adminer.php wget http://www.adminer.org/latest-en.php -O adminer.php
OmniDB
- OmniDB - Open Source Web Tool For Database Management
MariaDB
Fork of MySQL, drop in replacement.
Percona Server
PostgreSQL
- http://www.craigkerstiens.com/2012/04/30/why-postgres/
- http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL
- http://wekeroad.com/2012/07/19/postgresql-rising
- https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified
- https://github.com/dbcli/pgcli - Postgres CLI with autocompletion and syntax highlighting
- LiteCLI - a user-friendly CommandLine client for SQLite database. It is based on the popular pgcli and mycli projects. LiteCLI is written in python using the wonderful prompt-toolkit library. It is cross-platform compatible and it is tested on Linux, MacOS and Windows. [17]
- https://github.com/mgartner/pg_flame - A flamegraph generator for Postgres EXPLAIN ANALYZE output.
- https://github.com/aquametalabs/aquameta - Web development platform built entirely in PostgreSQL
Ingres
SQLite
- sqlpkg - SQLite Package Registry - the (unofficial) SQLite package registry
- https://github.com/nalgeon/sqlpkg-cli#readme - manages SQLite extensions, just like pip does with Python packages or brew does with macOS programs. It works primarily with the SQLite package registry, but is not limited to it. You can install SQLite extensions from GitHub repositories or other websites. All you need is a package spec file (more on that later,.
SQLiteStudio
- SQLiteStudio - A free, open source, multi-platform SQLite database manager written in C++, with use of Qt framework.
DB Browser for SQLite
- DB Browser for SQLite - a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite.It is for users and developers wanting to create databases, search, and edit data. It uses a familiar spreadsheet-like interface, and you don't need to learn complicated SQL commands.
sql_with_qt
- https://github.com/katecpp/sql_with_qt - This repository is a small example of how to set up sqlite database with Qt and perform some basic queries.
sqlite-tui
- https://github.com/mathaou/sqlite-tui - A TUI for viewing sqlite databases
sqlite-manager
- https://github.com/lunu-bounir/sqlite-manager - a browser extension to read, manipulate, plot and write SQLite databases
CockroachDB
- CockroachDB - Architected for the cloud, CockroachDB delivers resilient, consistent, distributed SQL at your scale
IBM DB2
Oracle Database
IBM System R
HyperSQL / HSQL
- HyperSQL Documentationa relational database engine written in Java. Version 2.7 offers many features and adheres closely to the latest SQL and JDBC 4 standards.
NoSQL
dbm
Redis
See also Server#Redis
Used in the BOA stack for caching.
CouchDB
- PouchDB was written to help web developers build applications that work as well offline as well as they do online, applications save data locally so the user can use all the features of an app even while offline and synchronise the data between clients so they have up to date data wherever they go.
Couchbase
MongoDB
- http://snmaynard.com/2012/10/17/things-i-wish-i-knew-about-mongodb-a-year-ago/
- http://blog.serverdensity.com/does-everyone-hate-mongodb/
JavaScript
MDBM
Riak
- Riak - product line of distributed databases is built on a set of core services providing a highly reliable, scalable distributed systems framework. Riak KV is a distributed NoSQL database. Riak TS is built on the same core foundation as Riak KV and is highly optimized for IoT and time series data. Riak also integrates with Riak S2 to optimize large object storage, and integrates with other data services including Apache Spark, Redis Caching, Apache Solr, and Apache Mesos.
Other
- Kinto - a minimalist JSON storage service with synchronisation and sharing abilities.
- https://github.com/kallaballa/Janosh - A json document database with a shell interface and lua scripting support.Janosh is written in C++11. It is used in the ScreenInvader project.
- replikativ - an open, scalable and distributive infrastructure for a data-driven community of applications. It can serve as a storage backend for your applications and make your application state always accessible on all your endpoints. For our applications it radically simplifies frontend development by streaming state changes directly into our reactive UI pipelines.
- Fauna - the only Mission-Critical NoSQL Database that guarantees data correctness without operational complexity from the team that scaled Twitter.
- https://github.com/fastio/pedis - NoSQL data store using the SEASTAR framework, compatible with Redis
- ArangoDB - From the ground up, ArangoDB is designed as a native multi-model database, supporting key/value, document and graph models. This means you can model your data and application in a very flexible way. ArangoDB can operate as a highly scalable database cluster for all data models. An ArangoDB cluster can be configured to serve various types of loads and runs on container orchestration systems like Kubernetes & DC/OS.
to sort
- http://radar.oreilly.com/2012/02/nosql-non-relational-database.html - http://news.ycombinator.com/item?id=3610844
- Trousseau is a gpg encrypted key-value store designed to be a simple, safe and relient place for your data. It stores data in a single multi-recipients encrypted file and can supports both local and remote storage sources (S3 and ssh so far) import/export.
gpu;
- https://wiki.postgresql.org/wiki/PGStrom
- http://istc-bigdata.org/index.php/mapd-a-way-to-map-big-data-faster/
- RocksDB - A persistent key-value store
- https://github.com/amirouche/hoply - a generic n-tuple store that can be used to create a triplestore or a quadstore or whatever.
- https://en.wikipedia.org/wiki/Bitmap_index - a special kind of database index that uses bitmaps.Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data. The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?), which has two values, True and False. Bitmap indexes use bit arrays (commonly called bitmaps) and answer queries by performing bitwise logical operations on these bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications.Some researchers argue that bitmap indexes are also useful for moderate or even high-cardinality data (e.g., unique-valued data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the AND, OR or XOR operators extensively.[1]Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema.
- Judy Arrays - a C library that provides a state-of-the-art core technology that implements a sparse dynamic array. Judy arrays are declared simply with a null pointer. A Judy array consumes memory only when it is populated, yet can grow to take advantage of all available memory if desired.Judy's key benefits are scalability, high performance, and memory efficiency. A Judy array is extensible and can scale up to a very large number of elements, bounded only by machine memory. Since Judy is designed as an unbounded array, the size of a Judy array is not pre-allocated but grows and shrinks dynamically with the array population.Judy combines scalability with ease of use. The Judy API is accessed with simple insert, retrieve, and delete calls that do not require extensive programming. Tuning and configuring are not required (in fact not even possible). In addition, sort, search, count, and sequential access capabilities are built into Judy's design.Judy can be used whenever a developer needs dynamically sized arrays, associative arrays or a simple-to-use interface that requires no rework for expansion or contraction.Judy can replace many common data structures, such as arrays, sparse arrays, hash tables, B-trees, binary trees, linear lists, skiplists, other sort and search algorithms, and counting functions.
Virtuoso Universal Server
- http://en.wikipedia.org/wiki/Virtuoso_Universal_Server - a middleware and database engine hybrid that combines the functionality of a traditional Relational database management system (RDBMS), Object-relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.
Dolt
- https://github.com/liquidata-inc/dolt - a relational database, i.e. it has tables, and you can execute SQL queries against those tables. It also has version control primitives that operate at the level of table cell. Thus Dolt is a database that supports fine grained value-wise version control, where all changes to data and schema are stored in commit log. It is inspired by RDBMS and Git, and attempts to blend concepts about both in a manner that allows users to better manage, distribute, and collaborate on, data.
- DoltHub - public data
Time series
- https://en.wikipedia.org/wiki/Time_series_database - a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. Ideally, repositories of time series are natively implemented using specialized database algorithms. However, it is possible to store time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema.[citation needed] Efficiency is often improved if time is treated as a discrete quantity rather than as a continuous mathematical dimension.
InfluxDB
- InfluxDB - an open source time series database with no external dependencies. It's useful for recording metrics, events, and performing analytics.
Graph
See also Semantic and Semantic#Triplestore
- https://en.wikipedia.org/wiki/Graph_database - a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships within a graph database is fast because they are perpetually stored within the database itself. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.
Neo4j
- Neo4j - The Leader in Graph Databases
Cayley
- https://github.com/google/cayley - an open-source graph inspired by the graph database behind Freebase and Google's Knowledge Graph. Its goal is to be a part of the developer's toolbox where Linked Data and graph-shaped data (semantic webs, social networks, etc) in general are concerned. [32]
Gaffer
- https://github.com/gchq/Gaffer - A large-scale entity and relation database supporting aggregation of properties [33]
Grakn
- Grakn - an intelligent database: a knowledge graph engine to organise complex networks of data and making it queryable, by performing knowledge engineering. Rooted in Knowledge Representation and Automated Reasoning, Grakn provides the knowledge foundation for cognitive and intelligent (e.g. AI) systems, by providing an intelligent language for modelling, transactions and analytics. Being a distributed database, Grakn is designed to scale over a network of computers through partitioning and replication. Under the hood, Grakn has built an expressive knowledge representation system based on hypergraph theory (a subfield in mathematics that generalises an edge to be a set of vertices) with a transactional query interface, Graql. Graql is Grakn’s reasoning (through OLTP) and analytics (through OLAP) declarative query language.
TinkerPop
- Apache TinkerPop - a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
Linkurious
- Linkurious - an on-premises graph visualization and analysis platform. Fraud, intelligence or cyber analysts use it to detect and investigate threats in large and complex datasets. [34]
SurrealDB
- SurrealDB - The ultimate database for tomorrow's applications. With an SQL-style query language, real-time queries with highly-efficient related data retrieval, advanced security permissions for multi-tenant access, and support for performant analytical workloads, SurrealDB is the next generation serverless database.
- https://github.com/surrealdb - not libre
GraphQL
- GraphQL - a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.
- https://github.com/graphql
- https://github.com/graphql/graphql-spec
- https://en.wikipedia.org/wiki/GraphQL - an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data. GraphQL was developed internally by Facebook in 2012 before being publicly released in 2015. On 7 November 2018, the GraphQL project was moved from Facebook to the newly-established GraphQL Foundation, hosted by the non-profit Linux Foundation.
- https://github.com/graphql/graphiql - An in-browser IDE for exploring GraphQL.
- YouTube: GraphQL - playlist by Fun Fun Function
- GRANDstack - A new paradigm for building APIs, GraphQL is a way of describing data and enabling clients to query it.
- GitHub Developer Guide: GitHub GraphQL API v4
- https://github.com/imolorhe/altair - A beautiful feature-rich GraphQL Client for all platforms
- https://github.com/dgraph-io/dgraph - Native GraphQL Database with graph backend
- https://github.com/surrealdb - A scalable, distributed, collaborative, document-graph database, for the realtime web
Replication
- http://www.symmetricds.org/ - open source software for database and file synchronization with Multi-master replication, filtered synchronization, and transformation capabilities. It is designed to scale for a large number of nodes, work across low-bandwidth connections, and withstand periods of network outage. Data synchronization occurs asynchronously from a scheduled job, with data changes being sent over a push or pull operation.
- https://en.wikipedia.org/wiki/SymmetricDS
Distributed
H-Store
- H-Store - an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications. It is a highly distributed, row-store-based relational database that runs on a cluster on shared-nothing, main memory executor nodes. The H-Store project is a collaboration between MIT, Brown University, Carnegie Mellon University, Yale University, and Intel.
HBase
RethinkDB
FoundationDB
- FoundationDB - gives you the power of ACID transactions in a distributed database.[36]
to sort
TiDB
- https://github.com/pingcap/tidb - an open source distributed scalable Hybrid Transactional and Analytical Processing (HTAP) database built by PingCAP. Inspired by the design of Google F1 and Google Spanner, TiDB features infinite horizontal scalability, strong consistency, and high availability. The goal of TiDB is to serve as a one-stop solution for both OLTP (Online Transactional Processing) and OLAP (Online Analytical Processing).
tidis
- https://github.com/yongman/tidis - a Distributed NoSQL database, providing a redis-protocal api(string,list,hash,set,sorted-set), written in Go. Tidis is like TiDB layer, providing protocol transform, powered by tikv backend distributed storage which use raft for data replication and 2PC for distributed transaction. [38]
Scuttlebot
- Scuttlebot - an open source peer-to-peer log store used as a database, identity provider, and messaging system. It features global replication, file-syncronization, and end-to-end encryption.
to sort
https://news.ycombinator.com/item?id=25871605
GUN
- GUN - a small, easy, and fast data sync and storage system that runs everywhere JavaScript does. The aim of GUN is to let you focus on the data that needs to be stored, loaded, and shared in your app without worrying about servers, network calls, databases, or tracking offline changes or concurrency conflicts. This lets you build cool apps fast
- https://github.com/amark/gun - a realtime, distributed, offline-first, graph database engine. Doing 20M+ ops/sec in just ~9KB gzipped.
- https://gun.eco/distributed/matters.html
Titan
- Titan - a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time.
Social
Other
- https://github.com/cube2222/octosql - a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
- https://github.com/SemWebCentral/parliament - Standards-compliant triple store for RDF, OWL, and SPARQL
- Datanymizer - an open-source, GDPR-compliant, privacy-preserving data anonymization tool flexible about how the anonymization takes place. Written in Rust.
ORM
- https://en.wikipedia.org/wiki/Object-relational_mapping - ORM, O/RM, and O/R mapping tool, in computer science is a programming technique for converting data between incompatible type systems using object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language. There are both free and commercial packages available that perform object-relational mapping, although some programmers opt to construct their own ORM tools.
Prisma
- Prisma - provides a database-agnostic abstraction to be used from any programming language.