Documents

From Things and Stuff Wiki
Revision as of 16:57, 6 November 2020 by Milk (talk | contribs) (→‎PDF)
Jump to navigation Jump to search


General

See also Editors, Web systems#Document management, Wiki, Organisation#Knowledge

to reorder

Markup



COCOA

  • https://en.wikipedia.org/wiki/COCOA_(digital_humanities) - was an early text file utility and associated file format for digital humanities, then known as humanities computing. It was approximately 4000 punched cards of FORTRAN and created in the late 1960s and early 1970s at University College London and the Atlas Computer Laboratory in Harwell, Oxfordshire. Functionality included word-counting and concordance building.

The COCOA file format bears at least a passing similarity to the later markup languages such as SGML and XML. A noticeable difference with its successors is that COCOA tags are flat and not tree structured. In that format, every information type and value encoded by a tag should be considered true until the same tag changes its value. Members of the Text Encoding Initiative community maintain legacy support for COCOA, although most in-demand texts and corpora have already been migrated to more widely understand formats such as TEI XML

SGML

SGML descended from IBM's Generalized Markup Language (GML), which Charles Goldfarb, Edward Mosher, and Raymond Lorie developed in the 1960s. Goldfarb, editor of the international standard, coined the “GML” term using their surname initials. Goldfarb also wrote the definitive work on SGML syntax in "The SGML Handbook". The syntax of SGML is closer to the COCOA format. As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry. Many such documents must remain readable for several decades—a long time in the information technology field. SGML also was extensively applied by the military, and the aerospace, technical reference, and industrial publishing industries. The advent of the XML profile has made SGML suitable for widespread application for small-scale, general-purpose use.



  • https://en.wikipedia.org/wiki/Document_type_definition - a set of markup declarations that define a document type for an SGML-family markup language (SGML, XML, HTML). A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.

XML uses a subset of SGML DTD.

As of 2009, newer XML namespace-aware schema languages (such as W3C XML Schema and ISO RELAX NG) have largely superseded DTDs. A namespace-aware version of DTDs is being developed as Part 9 of ISO DSDL. DTDs persist in applications that need special publishing characters, such as the XML and HTML Character Entity References, which derive from larger sets defined as part of the ISO SGML standard effort.




  • "dtinfo"("user) manual page - starts the desktop on-line information browser, also known as the CDE Information Manager. On-line information is typically packaged into an information library (infolib), which is a hierarchy of bookcases containing SGML books (see the dtinfogen(1) command). The browser offers an ability to view, search, and print on-line information with a high degree of control. Bookmarks and annotations may be attached at desired points for later recall.

XML

  • https://en.wikipedia.org/wiki/XML - Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML. The design goals of XML emphasize simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid the processing of XML data.

The essence of why extensible markup languages are necessary is explained at Markup language (for example, see Markup language § XML) and at Standard Generalized Markup Language.Hundreds of document formats using XML syntax have been developed, including RSS, Atom, SOAP, SVG, and XHTML. XML-based formats have become the default for many office-productivity tools, including Microsoft Office (Office Open XML), OpenOffice.org and LibreOffice (OpenDocument), and Apple's iWork[citation needed]. XML has also provided the base language for communication protocols such as XMPP. Applications for the Microsoft .NET Framework use XML files for configuration, and property lists are an implementation of configuration storage built on XML.



  • Document Content Description for XML - This document proposes a structural schema facility, Document Content Description (DCD), for specifying rules covering the structure and content of XML documents. The DCD proposal incorporates a subset of the XML-Data Submission [XML-Data] and expresses it in a way which is consistent with the ongoing W3C RDF (Resource Description Framework) [RDF] effort; in particular, DCD is an RDF vocabulary. DCD is intended to define document constraints in an XML syntax; these constraints may be used in the same fashion as traditional XML DTDs. DCD also provides additional properties, such as basic datatypes.

Tree

  • https://en.wikipedia.org/wiki/XML_tree - XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree.XML documents must contain a root element (one that is the parent of all other elements). All elements in an XML document can contain sub elements, text and attributes. The tree represented by an XML document starts at the root element and branches to the lowest level of elements. Although there is no consensus on the terminology used on XML Trees, at least two standard terminologies have been released by the W3C: The terminology used in the XPath Data Model The terminology used in the XML Information Set.


  • https://en.wikipedia.org/wiki/XML_Information_Set - a W3C specification describing an abstract data model of an XML document in terms of a set of information items.[1] The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.



XQuery

  • w3c: XQuery - a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, binary, etc.). The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.


to sort

  • Xembly - an Assembly-like imperative programming language for data manipulation in XML documents. It is a much simplier alternative to XSLT and XQuery. Read this blog post for a more detailed explanation: Xembly, an Assembly for XML.




Schema

  • https://en.wikipedia.org/wiki/XML_schema - a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.There are languages developed specifically to express XML schemas. The document type definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S) and RELAX NG.The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means.


  • https://en.wikipedia.org/wiki/XML_Schema_(W3C) - a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document. They can check if it adheres to the description of the element it is placed in. Like all XML schema languages, XSD can be used to express a set of rules to which an XML document must conform in order to be considered "valid" according to that schema. However, unlike most other schema languages, XSD was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data types. Such a post-validation infoset can be useful in the development of XML document processing software.



  • https://en.wikipedia.org/wiki/RELAX_NG - REgular LAnguage for XML Next Generation, is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.It was defined by a committee specification of the OASIS RELAX NG technical committee in 2001 and 2002, based on Murata Makoto's RELAX and James Clark's TREX, and also by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL). ISO/IEC 19757-2 was developed by ISO/IEC JTC1/SC34 and published in its first version in 2003.

XSL


HTML / CSS

See HTML/CSS

DocBook

  • https://en.wikipedia.org/wiki/DocBook - a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation.[1]As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML, XHTML, EPUB, PDF, man pages, Web help[2] and HTML Help, without requiring users to make any changes to the source. In other words, when a document is written in DocBook format it becomes easily portable into other formats. It solves the problem of reformatting by writing it once using XML tags.


Markdown

See Markdown

AsciiDoc

  • AsciiDoc Home Page - a text document format for writing notes, documentation, articles, books, ebooks, slideshows, web pages, man pages and blogs. AsciiDoc files can be translated to many formats including HTML, PDF, EPUB, man page.AsciiDoc is highly configurable: both the AsciiDoc source file syntax and the backend output markups (which can be almost any type of SGML/XML markup) can be customized and extended by the user.AsciiDoc is free software and is licenced under the terms of the GNU General Public License version 2 (GPLv2).



PDF


  • https://github.com/LibrePDF/OpenPDF - a free Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.






  • flpsed - a WYSIWYG PostScript annotator. You can't remove or modify existing elements of a document. But flpsed lets you add arbitrary text lines to existing PostScript documents (PostScript is a registered trademark of Adobe Systems Incorporated). Added lines can later be reedited with flpsed. Using pdftops, which is part of xpdf one can convert PDF documents to PostScript and also add text to them. flpsed is useful for filling in forms, adding notes etc. GsWidget is now part of flpsed.

flpsed is released under the GPL.


  • https://sourceforge.net/projects/pdfshuffler - a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.


  • https://github.com/2mol/pboy - a small .pdf management utility. It is borne out of the frustration of having a dowload folder full of PDFs with names like 'I08.pdf', '1412.4880.pdf' and so on. Since I want to save some of these files for later reading, it would be helpful to have more descriptive filenames. This tool helps with renaming those files. It will rename/move documents to a specified folder, and it even gives some filename suggestions by looking at the file content and the pdf metadata.



Formats

See Data, Images, Audio#Audio formats


  • https://en.wikipedia.org/wiki/Digital_container_format - or wrapper format is a metafile format whose specification describes how different elements of data and metadata coexist in a computer file. Among the earliest cross-platform container formats were Distinguished Encoding Rules and the 1985 Interchange File Format. Containers are frequently used in multimedia applications.


  • Best File Formats for Archiving - This guide compares common file formats for the purpose of digital archiving and preservation. It also discusses how to choose a resolution for images, and how to choose a sampling rate and a bit rate for MP3 audio files. Disclaimer: All of the below is provided as my personal opinion only, without guarantee for completeness or correctness. The current version of this text is 2018-01-13.



Readers

Okular

  • Okular - a universal document viewer based developed by KDE. Okular works on multiple platforms, including but not limited to Linux, Windows, Mac OS X, *BSD, etc. The last stable release is Okular 1.3, shipped as part of the KDE Applications 17.12 release. Okular combines the excellent functionalities with the versatility of supporting different kind of documents, like PDF, Postscript, DjVu, CHM, XPS, ePub and others. The document format handlers page has a chart describing in more detail the supported formats and the features supported in each of them.

MuPDF

  • MuPDF - a lightweight PDF, XPS, and E-book viewer.MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB, and FictionBook 2. You can annotate PDF documents and fill out forms with the mobile viewers (this feature is coming soon to the desktop viewer as well). The command line tools allow you to annotate, edit, and convert documents to other formats such as HTML, SVG, PDF, and CBZ. You can also write scripts to manipulate documents using Javascript. The library is written modularly in portable C, so features can be added and removed by integrators if they so desire. We also have a Java library using JNI that works on both Oracle's Java and Android.

XpdfReader

  • XpdfReader - a free PDF viewer and toolkit, including a text extractor, image converter, HTML converter, and more. Most of the tools are available as open source.

Librera Reader

  • Librera Reader - All-Format eBook Reader for Android PDF - EPUB - MOBI - DJVU - FB2 - TXT - RTF - AZW - AZW3 - HTML - CBZ - CBR - XPS - MHT

Tools

Pandoc

  • Pandoc - a universal document converter If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to:
  • HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides.
  • Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
  • Ebooks: EPUB version 2 or 3, FictionBook2
  • Documentation formats: DocBook, TEI Simple, GNU TexInfo, Groff man pages, Haddock markup
  • Page layout formats: InDesign ICML
  • Outline formats: OPML
  • TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
  • PDF via LaTeX
  • Lightweight markup formats: Markdown (including CommonMark), reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile
  • Custom formats: custom writers can be written in lua.
  • Pandoc understands a number of useful markdown syntax extensions, including document metadata (title, author, date); footnotes; tables; definition lists; superscript and subscript; strikeout; enhanced ordered lists (start number and numbering style are significant); running example lists; delimited code blocks with syntax highlighting; smart quotes, dashes, and ellipses; markdown inside HTML blocks; and inline LaTeX. If strict markdown compatibility is desired, all of these extensions can be turned off.
pandoc -s -r html http://www.gnu.org/software/make/ -o example12.md
  # Converting a web page to markdown

pandoc MANUAL.txt --latex-engine=xelatex -o example13.pdf
  # From markdown to PDF. Arch Linux package requirements: texlive-core texlive-latexextra
  # bug: does not work with .md that has too many nested headings! [5]




  • Pandoc Scholar - Create beautiful, semantically enriched articles with pandoc. This package provides utilities to make publishing of scientific articles as simple and pleasant as possible. It simplifies setting authors' metadata in YAML blocks, allows to add semantic annotation to citations, and only requires the programs pandoc and make.

Docverter

  • Docverter - Convert plain text documents written in HTML, Markdown, or LaTeX to PDF, Docx, RTF or ePub with a simple HTTP API. It wraps the following open-source software in a JRuby app: Pandoc for plain text to HTML and ePub conversion, Flying Saucer for HTML to PDF, Calibre for ePub to MOBI conversion

Docutils

  • Docutils - an open-source text processing system for processing plaintext documentation into useful formats, such as HTML, LaTeX, man-pages, open-document or XML. It includes reStructuredText, the easy to read, easy to use, what-you-see-is-what-you-get plaintext markup language.

Asciidoctor

  • Asciidoctor - a fast, open source text processor and publishing toolchain for converting AsciiDoc content to HTML5, DocBook, PDF, and other formats. Asciidoctor is written in Ruby and runs on all major operation systems. To simplify installation, Asciidoctor is packaged and distributed as a gem to RubyGems.org and is packaged for popular Linux distributions and macOS. Asciidoctor can also be run in a JVM using AsciidoctorJ or in any JavaScript environment using Asciidoctor.js. The Asciidoctor project is hosted on GitHub.


Evince

  • Evince - a document viewer for multiple document formats. The goal of evince is to replace the multiple document viewers that exist on the GNOME Desktop with a single simple application.Evince is specifically designed to support the file following formats: PDF, Postscript, djvu, tiff, dvi, XPS, SyncTex support with gedit, comics books (cbr,cbz,cb7 and cbt).


Bookdown


Text formatting





LaTeX










  • https://github.com/ekiim/vim-mathpix - uses scrot, curl, and jq, to submit a POST request to the Mathpix API, in order to convert the selected image in to \Latex, or regular text depending on the argument.


  • LyX - a document processor that encourages an approach to writing based on the structure of your documents (WYSIWYM) and not simply their appearance (WYSIWYG).






  • Detexify - LaTeX handwritten symbol recognition. Anyone who works with LaTeX knows how time-consuming it can be to find a symbol in symbols-a4.pdf that you just can't memorize. Detexify is an attempt to simplify this search.

DTP / Office



LibreOffice / OpenOffice

Writer

Calc

OO

  • WollMux is an OpenOffice.org plugin with enhanced template, form, and autotext functionality. It can construct templates on the fly from multiple files (e.g. letterhead, footer, and body text) and will fill in personal and organizational data from various databases such as LDAP. An extra form GUI presents fields in an easily navigable manner and offers plausibility checks and computed values to ease filling in the form. Chainable printing functions allow various transformations during print and custom dialogs.

AbiWord


Xi

Monaco

WPS

  • WPS Office - The Most Compatible Free Office Suite


Other

See also WebDev#Authoring




Manuskript

Screenwriting

Fountain

  • Fountain - a simple markup syntax for writing, editing and sharing screenplays in plain, human-readable text. Fountain allows you to work on your screenplay anywhere, on any computer or tablet, using any software that edits text files. Taking its cues from John Gruber’s Markdown, Fountain files are eminently readable. When special syntax is required, it is straightforward and intuitive. Even when viewed as plain text, your screenplay feels like a screenplay. Fountain supports everything a screenwriter is likely to need in the early, creative phases of writing. Not included are production features such as MOREs, CONTINUEDs, revision marks, locked pages, or colored pages. Because it’s just text, Fountain is also a great format for archiving screenplays without worry of file-format obsolescence or incompatibility.

Celtx

Trelby

  • Trelby - simple, fast and elegantly laid out to make screenwriting simple. It is infinitely configurable. Trelby is free software, that you can contribute to. Features; Screenplay editor: Enforces correct script format and pagination, auto-completion, and spell checking. Multiplatform : Behaves identically on all platforms, generating the exact same output. Choice of view: Multiple views, including draft view, WYSIWYG mode, and fullscreen to suit your writing style. Name database: Character name database containing over 200,000 names from various countries. Reporting: Scene/location/character/dialogue reports. Compare: Ability to compare scripts, so you know what changed between versions. Import: Screenplay formatted text, Final Draft XML (.fdx), Celtx (.celtx), Fountain (.fountain), Adobe Story (.astx) and Fade In Pro (.fadein). Export: PDF, formatted text, HTML, RTF, Final Draft XML (.fdx) and Fountain (.fountain). PDF: Built-in, highly configurable PDF generator. Supports embedding your chosen font. Also supports generating PDFs with custom watermarks, to help track shared files. Free software: Licensed under the GPL, Trelby welcomes developers and screenwriters to contribute in making it more useful.


Storyboarding

Storyboard Fountain

Storyboarder

Technical documentation

Antora


Templates

Business letters

Presentations

Software





  • GitPitch - The Markdown Presentation Service on Git; - Markdown → Git → Slideshow. The Markdown Presentation Service for everyone on GitHub, GitLab, and Bitbucket. Using the tools you already know and love ~ Markdown + Git.

Spice-up

  • https://github.com/Philip-Scott/Spice-up - Create presentations that stand out! Spice-Up has everything you need to create simple and beautiful presentations. Get your ideas across with beautiful designed templates, or start from scratch with a blank canvas. Either way, you will add some spice to your presentations with a wide variety of background patterns and a beautiful color palette.

reveal.js


  • Asciidoctor Reveal.js - a converter for Asciidoctor and Asciidoctor.js that transforms an AsciiDoc document into an HTML5 presentation designed to be executed by the reveal.js presentation framework.


reveal-md slides.md --static _site


remark


WebSlides


Marp

  • Marp - Markdown Presentation Writer [11]

pdfpc

  • pdfpc - a GTK-based presentation viewer which uses Keynote-like multi-monitor output to provide meta information to the speaker during the presentation. It is able to show a normal presentation window on one screen while showing a more sophisticated overview on the other one, providing information like a picture of the next slide, as well as the time left in the presentation. pdfpc processes PDF documents, which can be created using nearly all modern presentation software.

Articles


Powerpoint karaoke

  • https://en.wikipedia.org/wiki/Powerpoint_Karaoke - also known as Battledecks or Battle Decks, is an improvisational activity in which a participant must deliver a presentation based on a set of slides that they have never seen before. Its name is derived from Microsoft PowerPoint, a popular presentation software, and karaoke, an activity in which a performer sings along with a pre-recorded backing track (although there is usually no music or singing involved in PowerPoint Karaoke). The effect is intended to be comical, and PowerPoint Karaoke can be considered a form of improvisational theatre, or a type of Theatresports game. The presentation can either be a real slideshow on an arcane topic, or a set of real slides from different presentations that are nonsensical when assembled together, or slides that are nonsensical on their own (in some cases created by randomly downloading images from the internet and adding unrelated text). In some cases, the presenter is given a theme beforehand that they must attempt to tie all the slides into.



Contacts


Collaborative documentation

See Wiki




Infinote

  • Infinote protocol provides real-time collaborative editing of documents with the main focus being on collaborative plain text editing. In the meanwhile there are quite a few solutions out there, but all of them implement a different protocol and thus cannot be used with other tools. Our goal is to provide a flexible yet powerful open framework and clients for various environments that can interoperate with each other.

Software

  • Gobby is a free collaborative editor supporting multiple documents in one session and a multi-user chat. It runs on Microsoft Windows, Mac OS X, Linux and other Unix-like platforms.

to sort




Kolab

Collabora

  • Collabora - LibreOffice online office suite that supports all major document, spreadsheet and presentation file formats, which you can integrate in your own infrastructure. Key features are collaborative editing and excellent office file format support.


Notepads

Often zen like.




  • https://authorea.com - collaborative platform for research. Write and manage your technical documents in one place. web native, uses Git. write in LaTeX, Markdown, HTML, Javascript, and/or more.








to sort




CodiMD

  • https://github.com/hackmdio/codimd - lets you collaborate in real-time with markdown. Built on HackMD source code, CodiMD lets you host and control your team's content with speed and ease.

CodeSandbox Live

Spreadsheets








Gnumeric


to sort

https://github.com/federico-terzi/espanso - Cross-platform Text Expander written in Rust. A text expander is a program that detects when you type a specific keyword and replaces it with something else.