Data
See also Computing#Data structures, Free/open
General
data, noun
- facts and statistics collected together for reference or analysis: there is very little data available
- the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
- Philosophy things known or assumed as facts, making the basis of reasoning or calculation.
- https://en.wikipedia.org/wiki/Unstructured_data
- https://en.wikipedia.org/wiki/Semi-structured_data
- https://en.wikipedia.org/wiki/Data_model - structured data
- https://en.wikipedia.org/wiki/Data_structure
- Data, Information, Knowledge, and Wisdom - some abstractions
Articles
Learning
Data science
See also Visualisation
- A Taxonomy of Data Science - Both within the academy and within tech startups, we’ve been hearing some similar questions lately: Where can I find a good data scientist? What do I need to learn to become a data scientist? Or more succinctly: What is data science?
- School of Data works to empower civil society organizations, journalists and citizens with the skills they need to use data effectively in their efforts to create more equitable and effective societies.
- http://jeroenjanssens.com/2013/09/19/seven-command-line-tools-for-data-science.html
- http://cacm.acm.org/blogs/blog-cacm/155468-what-does-big-data-mean/fulltext
- http://www.evanmiller.org/statistical-formulas-for-programmers.html
- Kaggle - Service - From Big Data to Big Analytics.
Formats
all moved, to reorder
https://en.wikipedia.org/wiki/Resource_Interchange_File_Format
Encoding
- http://home.paulschou.net/tools/xlate/
- http://www.subnetonline.com/pages/converters/hex-to-bin-to-dec.php
ASCII / ANSI
- https://en.wikipedia.org/wiki/ASCII - abbreviated from American Standard Code for Information Interchange, is a character-encoding scheme. Originally based on the English alphabet, it encodes 128 specified characters into 7-bit binary integers as shown by the ASCII chart on the right. The characters encoded are numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes that originated with Teletype machines, and a space. For example, lowercase j would become binary 1101010 and decimal 106.
- https://en.wikipedia.org/wiki/Extended_ASCII - eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others. The use of the term is sometimes criticized, because it can be mistakenly interpreted that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, both of which are untrue.
- https://en.wikipedia.org/wiki/Code_page_437
- https://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/ [7]
Art
Unicode
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
- Codepoint, n. the position of a character in an encoding system.
- Charbase - A visual unicode database
- http://en.wikipedia.org/wiki/List_of_Unicode_characters
- http://en.wikipedia.org/wiki/Unicode_control_characters
- http://www.charset.org/
- http://unicode.org/charts/
- http://sheet.shiar.nl/unicode
mirroring char in brackets: (test (
Serialization
See also HTML/CSS#Markup, JavaScript#JSON
- http://en.wikipedia.org/wiki/Serialization
- http://en.wikipedia.org/wiki/Marshalling_(computer_science)
- http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
- http://en.wikipedia.org/wiki/Category:Data_serialization_formats
- http://www.drdobbs.com/web-development/after-xml-json-then-what/240151851
- http://en.wikipedia.org/wiki/Runoff_(program)
- http://en.wikipedia.org/wiki/IBM_Generalized_Markup_Language
- http://en.wikipedia.org/wiki/TeX - 1978
- http://en.wikipedia.org/wiki/Scribe_(markup_language) - 1980
ML
XML
JSON
See JavaScript#JSON
- https://github.com/letsencrypt/acme-spec - over https
Other
- Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats.
- http://kentonv.github.io/ - from proto buf dev [12]
Markup
See also Computing#Text encoding, HTML/CSS
- https://en.wikipedia.org/wiki/Lightweight_markup_language
- Lightweight Markup: Markdown, MediaWiki, Wikidot, LaTeX
Markdown
- http://daringfireball.net/projects/markdown/syntax
- https://five.squarespace.com/display/ShowHelp?section=Markdown
- Markdown Here is a Google Chrome, Firefox, Safari, Opera, and Thunderbird extension that lets you write email in Markdown and render them before sending. It also supports syntax highlighting (just specify the language in a fenced code block).
- PageDown is the JavaScript Markdown previewer used on Stack Overflow and the rest of the Stack Exchange network. It includes a Markdown-to-HTML converter and an in-page Markdown editor with live preview.
- http://blogs.plos.org/mfenner/2012/12/13/a-call-for-scholarly-markdown/
- http://indiewebcamp.com/2013/Citations_and_Scholarly_Markdown
WYSIWYM
Configuration
JSON
- http://beautifuldocs.com/
- https://github.com/scottstanfield/markdown-to-json
- https://github.com/sheremetyev/markdown-json
- Markdown Syntax for Object Notation (MSON) - This document is a proposal of Markdown syntax for JSON & JSON Schema.
Systems
WikiCreole
other
- Pandoc - a universal document converter. If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, or MediaWiki markup to; HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, Slideous, S5, or DZSlides. Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML, Ebooks: EPUB version 2 or 3, FictionBook2, Documentation formats: DocBook, GNU TexInfo, Groff man pages, TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides, PDF via LaTeX, Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, Emacs Org-Mode, Textile
XMPP
Other
Morse
Open Data
See also WebDev#API, Open, Semantic web
Data sources
Hubs / platforms
- CKAN is a fully-featured, mature, open source data portal and data management solution. CKAN provides a streamlined way to make your data discoverable and presentable. Each dataset is given its own page with a rich collection of metadata, making it a valuable and easily searchable resource.
UK
- http://www.data-archive.ac.uk/find/hasset-thesaurus/skos-hasset This new resource is an outcome of the Jisc-funded SKOS-HASSET project, led by staff at the UK Data Archive at the University of Essex, which owns and manages HASSET. Like dictionaries, thesauri describe the changing world around them; this is why the UK Data Archive continues work to ensure HASSET is up to date. Simple Knowledge Organisation System(SKOS) makes the thesaurus machine-readable. It is the version of Resource Description Framework (RDF) specific to classification resources. It encodes these products in a standardised way to make their structures comparable and to facilitate interaction.
Government
- Data.gov.uk is a key part of the Government's work on Transparency which is being lead by the Transparency Board. Data.gov.uk implementation is being led by the Transparency and Open Data team in the Cabinet Office, working across government departments to ensure that data is released in a timely and accessible way. This work is being supported by Sir Tim-Berners Lee & Professor Nigel Shadbolt. There are a number of technical partners involved in the project to date. These include the Comprehensive Knowledge Archive Network (CKAN): CKAN runs the catalogue at data.gov.uk/data as well as a growing number of open data registries around the world. It is a project created by the Open Knowledge Foundation to make it easy to find, share and reuse open content and data. The CKAN software provides a web interface, programmer's API, feeds notifying of changes, and a browsable history of all changes. The API is documented here: http://data.gov.uk/data/api.
- data.gov.uk: Who is doing what? - This page lists the domains which publish and maintain linked data and short term projects developing the government use of linked data. Most sectors have one or more SPARQL endpoints, which enable you to perform searches across the data; you can access these interactively on this site.
- National Institute for Health Research - Clinical Research Network: App Centre
- https://www.odp.nihr.ac.uk/ODP_QlikView%20Reporting%20User%20Guide%20v0.4.pdf
- London Datastore has been created by the Greater London Authority (GLA) as an innovation towards freeing London’s data. We want citizens to be able access the data that the GLA and other public sector organisations hold, and to use that data however they see fit – free of charge. The GLA is committed to influencing and cajoling other public sector organisations into releasing their data here too.
- http://www.datagm.org.uk/ - manchester
- Open Data Communities - Open Access to Local Data. This site is the UK Department for Communities and Local Government's official Linked Open Data site. It provides a selection of statistics on a variety of themes including Local Government finance, housing and homelessness, wellbeing, deprivation, and the department's business plan as well as supporting geographical data. All of the data is available as fully browsable and queryable Linked Data, and the majority is free to re-use under the Open Government Licence.
Education
BBC
- http://www.bbc.co.uk/blogs/internet/posts/News-Linked-Data-Ontology
- http://www.infoq.com/presentations/bbc-data-platform-api
Other
Scotland
National
- A Digital Ambition for Scotland - October 22 2010
- Scotland's Digital Future: A Strategy for Scotland - Strategy setting out how the Scottish Government will ensure Scotland takes full advantage of digital technology.
"Action 2.4 We will develop proposals with partners for releasing more government information and data for use by the public. Initial proposals to be developed and implementation to begin by end of July 2011. We invite suggestions for areas where the greater availability of public data could lead to new services or innovative applications " - March 3 2011
- http://www.scotland.gov.uk/Topics/Government/sustainabilityperformance/Data
- http://cofog01.data.scotland.gov.uk/ - linked data
- Scottish Index of Multiple Deprivation - Data Sources and Suitability
Health
- ALISS stands for Access to Local Information to Support Self Management. It’s a wide-ranging project taking a number of approaches to making it easier to find local self management support.
Local
Ireland
Europe
- http://open-data.europa.eu/
- http://ec.europa.eu/information_society/policy/psi/open_data_portal/index_en.htm
- http://publicdata.eu/
- http://lod2.okfn.org/eu-data-catalogues/
- http://latc-project.eu/datasets
- http://www.oecd.org/statistics/
- http://www.eea.europa.eu/data-and-maps
USA
Gloal
- http://www.mpi-inf.mpg.de/yago-naga/yago/ - YAGO2s is a huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames. Currently, YAGO2s has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.
UN
WordNet
- http://wordnet.princeton.edu/
- http://wordnet.princeton.edu/wordnet/related-projects/
- https://en.wikipedia.org/wiki/WordNet - english language semantic relations
- http://globalwordnet.org/
Crowdsourced
DBpedia
WikiData
Geo
Other
- http://echoprint.me/data_download - music id
internet of things;
Commercial
- http://www.whoownsscotland.org.uk/ - has to cover land registry cost?
Development
JavaScript
APIs
See WebDev#API
- http://www.programmableweb.com/apis
- https://www.mashape.com/
- http://www.apihub.com/
- http://publicapis.com/ [23]
Mining
- http://www.dcc.ufmg.br/livros/miningalgorithms/DokuWiki/doku.php?id=contents
- http://www.slideshare.net/anilmlis/semantic-web-mining
- http://www.mops1.com/oracle/event/pasig/downloads/PASIG_2010-Simon.pdf
- http://www.public.asu.edu/~hdavulcu/CSE591_Semantic_Web_Mining.html
- http://en.wikipedia.org/wiki/Data_wrangling
- http://en.wikipedia.org/wiki/Data_analysis
- http://en.wikipedia.org/wiki/Data_management
- http://en.wikipedia.org/wiki/Data_governance
Scraping
- ScraperWiki - Accurately extract tables from web pages and PDFs
- Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
- Portia is a tool for visually scraping web sites using Scrapy without any programming knowledge. Just annotate web pages with a point and click editor to indicate what data you want to extract, and portia will learn how to scrape similar pages from the site.
Tools
- http://openrefine.org/ - google refine
- http://idcubed.org/open-platform/platform/
- https://wiki.idhypercubed.org/wiki/ProjectMustardSeed - A Framework for developing and deploying secure cloud applications to collect, compute on, and share personal data
- Recline Data Explorer and Library - A simple but powerful library for building data applications in pure Javascript and HTML.