Data

From Things and Stuff Wiki
Revision as of 09:02, 28 February 2017 by Milk (talk | contribs) (→‎Open data)
Jump to navigation Jump to search


See also Computing#Data structures, Free/open

General

data, noun

  • facts and statistics collected together for reference or analysis: there is very little data available
    • the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
    • Philosophy things known or assumed as facts, making the basis of reasoning or calculation.

Articles

Learning

Data

See also Database, Visualisation, Maths#Software





  • A Taxonomy of Data Science - Both within the academy and within tech startups, we’ve been hearing some similar questions lately: Where can I find a good data scientist? What do I need to learn to become a data scientist? Or more succinctly: What is data science?


  • School of Data works to empower civil society organizations, journalists and citizens with the skills they need to use data effectively in their efforts to create more equitable and effective societies.
  • Kaggle - Service - From Big Data to Big Analytics.




Formats

See also Documents, Video#Codecs, Audio#Audio formats

all moved, to reorder

See also Video#Container format

Encoding

Morse

ASCII / ANSI

  • https://en.wikipedia.org/wiki/ASCII - abbreviated from American Standard Code for Information Interchange, is a character-encoding scheme. Originally based on the English alphabet, it encodes 128 specified characters into 7-bit binary integers as shown by the ASCII chart on the right. The characters encoded are numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes that originated with Teletype machines, and a space. For example, lowercase j would become binary 1101010 and decimal 106.
  • https://en.wikipedia.org/wiki/Extended_ASCII - eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others. The use of the term is sometimes criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue.
  • https://en.wikipedia.org/wiki/Code_page_437







Art

Unicode

mirroring char in brackets: (‮‮test ( 


Serialization

See also HTML/CSS#Markup, JavaScript#JSON


  • https://en.wikipedia.org/wiki/Delimiter-separated_values - store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. Most database and spreadsheet programs are able to read or save data in a delimited format. A delimited text file is a text file used to store data, in which each line represents a single book, company, or other thing, and each line has fields separated by the delimiter. Compared to the kind of flat file that uses spaces to force every field to the same width, a delimited file has the advantage of allowing field values of any length


CSV


ML

A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference. XML uses a subset of SGML DTD.

As of 2009, newer XML namespace-aware schema languages (such as W3C XML Schema and ISO RELAX NG) have largely superseded DTDs. A namespace-aware version of DTDs is being developed as Part 9 of ISO DSDL. DTDs persist in applications that need special publishing characters, such as the XML and HTML Character Entity References, which derive from larger sets defined as part of the ISO SGML standard effort.


XML




  • w3c: XQuery - a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, binary, etc.). The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.
  • http://www.xembly.org/ Xembly is an Assembly-like imperative programming language for data manipulation in XML documents. It is a much simplier alternative to XSLT and XQuery. Read this blog post for a more detailed explanation: Xembly, an Assembly for XML.



Schema


XSL

JSON

See JavaScript#JSON

Other



Markup

HTML / CSS

See HTML/CSS

Markdown

See also Documents#Markdown




Variations
  • https://github.com/karlcow/markdown-testsuite - This project was initiated to provide a test suite for Markdown markup, and eventually create a specification from this test results. A part of of the community has started a new endeavor which seems to get traction as CommonMark.



Tools
  • Markdown Here is a Google Chrome, Firefox, Safari, Opera, and Thunderbird extension that lets you write email in Markdown and render them before sending. It also supports syntax highlighting (just specify the language in a fenced code block).
  • https://github.com/mwhite/resume - a simple Markdown resumé template, LaTeX header, and pre-processing script that can be used with Pandoc to generate professional-looking PDF and HTML output.
  • Markx - Markdown editor for scientific writing. Batteries included.


  • Markdown.css - CSS to make HTML markup look like plain-text markdown.
  • PageDown is the JavaScript Markdown previewer used on Stack Overflow and the rest of the Stack Exchange network. It includes a Markdown-to-HTML converter and an in-page Markdown editor with live preview.


  • Lorem Markdownum - Inspired by the many excellent lorem ipsum generators, this simple webapp generates placeholder text. However, instead of generating plain text, this generator gives you structured text in the form of markdown. In order to do so, it uses Markov Chains and many heuristics.
  • Markdown Extra is an extension to PHP Markdown implementing some features currently not available with the plain Markdown syntax. Markdown Extra is available as a separate parser class in PHP Markdown Lib.


  • mdp - A command-line based markdown presentation tool. [22]



  • Fountain is a simple markup syntax for writing, editing and sharing screenplays in plain, human-readable text. Fountain allows you to work on your screenplay anywhere, on any computer or tablet, using any software that edits text files.


Table of Contents
cat ~/projects/Dockerfile.vim/README.md | ./gh-md-toc -
 * [Dockerfile.vim](#dockerfilevim)
 * [Screenshot](#screenshot)
 * [Installation](#installation)
       * [OR using Pathogen:](#or-using-pathogen)
       * [OR using Vundle:](#or-using-vundle)
 * [License](#license)


WYSIWYM
Configuration

JSON

Systems

WikiCreole

other

  • Pandoc - a universal document converter. If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, or MediaWiki markup to; HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, Slideous, S5, or DZSlides. Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML, Ebooks: EPUB version 2 or 3, FictionBook2, Documentation formats: DocBook, GNU TexInfo, Groff man pages, TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides, PDF via LaTeX, Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, Emacs Org-Mode, Textile

XMPP

Other


Mining


Scraping

See also HTTP#Scraping, Network#Saving

  • Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
  • kimono - Turn websites into structured APIs from your browser in seconds [29]




Tools




Services