Documents

From Things and Stuff Wiki
(Redirected from Office)
Jump to navigation Jump to search


General

See also Data, Typography, Media, Editors, Web systems#Document management, Wiki, Organisation#Knowledge, Net/web media, Images, Audio#Audio formats


to reorder


  • https://en.wikipedia.org/wiki/Structured_document - an electronic document where some method of markup is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" (or "code sample" or "quatrain") rather than as "Helvetica bold 24" or "indented Courier". Such portions in general are commonly called "components" or "elements" of a document.


  • https://en.wikipedia.org/wiki/Document_retrieval - defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words. Document retrieval is sometimes referred to as, or as a branch of, text retrieval. Text retrieval is a branch of information retrieval where the information is stored primarily in the form of text. Text databases became decentralized thanks to the personal computer. Text retrieval is a critical area of study today, since it is the fundamental basis of all internet search engines.



  • https://en.wikipedia.org/wiki/Document_file_format - a text or binary file format for storing documents on a storage media, especially for use by computers. There currently exist a multitude of incompatible document file formats. Examples of XML-based open standards are DocBook, XHTML, and, more recently, the ISO/IEC standards OpenDocument (ISO 26300:2006) and Office Open XML (ISO 29500:2008). In 1993, the ITU-T tried to establish a standard for document file formats, known as the Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed. Page description languages such as PostScript and PDF have become the de facto standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of ISO/IEC standards for PDF began to be published, including the specification for PDF itself, ISO-32000. HTML is the most used and open international standard and it is also used as document file format. It has also become ISO/IEC standard (ISO 15445:2000). The default binary file format used by Microsoft Word (.doc) has become widespread de facto standard for office documents, but it is a proprietary format and is not always fully supported by other word processors.


  • https://en.wikipedia.org/wiki/Open_file_format - a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. An open file format is licensed with an open license. For example, an open format can be implemented by both proprietary and free and open-source software, using the typical software licenses used by each. In contrast to open file formats, closed file formats are considered trade secrets. Depending on the definition, the specification of an open format may require a fee to access or, very rarely, contain other restrictions. The range of meanings is similar to that of the term open standard.


ODF

  • https://en.wikipedia.org/wiki/OpenDocument - also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications. The standard is developed and maintained by a technical committee in the Organization for the Advancement of Structured Information Standards (OASIS) consortium. It was based on the Sun Microsystems specification for OpenOffice.org XML, the default format for OpenOffice.org and LibreOffice. It was originally developed for StarOffice "to provide an open standard for office documents."

By design, OpenDocument reuses existing open XML standards whenever they are available, and it creates new tags only where no existing standard can provide the needed functionality. Thus OpenDocument uses a subset of DublinCore for metadata, MathML for displayed formulas, SMIL for multimedia, XLink for hyperlinks etc. Although not fully reusing SVG for vector graphics, OpenDocument does use SVG-compatible vector graphics within an ODF-format-specific namespace, but also includes non-SVG graphics.



Markup

See also Data, Semantic


  • https://en.wikipedia.org/wiki/Markup_language - a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing. A markup language is a set of rules governing what markup information may be included in a document and how it is combined with the content of the document in a way to facilitate use by humans and computer programs. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red pen or blue pencil on authors' manuscrips.

Older markup languages, which typically focus on typography and presentation, include Troff, TeX, and LaTeX. Scribe and most modern markup languages, such as XML, identify document components (for example headings, paragraphs, and tables), with the expectation that technology, such as stylesheets, will be used to apply formatting or other processing. Some markup languages, such as the widely used HTML, have pre-defined presentation semantics, meaning that their specifications prescribe some aspects of how to present the structured data on particular media. HTML, like DocBook, Open eBook, JATS, and many others, is based on the markup meta-languages SGML and XML. That is, SGML and XML allow designers to specify particular schemas, which determine which elements, attributes, and other features are permitted, and where.

One extremely important characteristic of most markup languages is that they allow intermingling markup with document content such as text and pictures. For example, if a few words in a sentence need to be emphasized, or identified as a proper name, defined term, or another special item, the markup may be inserted between the characters of the sentence. This is quite different structurally from traditional databases, where it is by definition impossible to have data that is within a record but not within any field. Furthermore, markup for human-readable texts must maintain order: it would not suffice to make each paragraph of a book into a "paragraph" record, where those records do not maintain order.



  • https://en.wikipedia.org/wiki/Overlapping_markup - In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non-hierarchical manner. A document with overlapping markup cannot be represented as a tree. This is also known as concurrent markup. Overlap happens, for instance, in poetry, where there may be a metrical structure of feet and lines; a linguistic structure of sentences and quotations; and a physical structure of volumes and pages and editorial annotations.



  • https://en.wikipedia.org/wiki/Presentation_semantics - specify how a particular piece of a formal language is represented in a distinguished manner accessible to human senses, usually human vision. For example, saying that ... must render the text between these constructs using some bold typeface is a specification of presentation semantics for that syntax. Many markup languages, including HTML, DSSSL, and XSL-FO, have presentation semantics, but others, such as XML, do not. Character encoding standards, such as Unicode, also have presentation semantics. One of the main goals of style sheet languages is to separate the syntax that defines document content from the syntax endowed with presentation semantics. This is the norm on the World Wide Web, where the Cascading Style Sheets language provides a large collection of presentation semantics for HTML documents.



  • https://en.wikipedia.org/wiki/Lightweight_markup_language - LML, also termed a simple or humane markup language, is a markup language with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output. For instance, a person downloading a software library might prefer to read the documentation in a text editor rather than a web browser. Another application for such languages is to provide for data entry in web-based publishing, such as blogs and wikis, where the input interface is a simple text box. The server software then converts the input into a common document markup language like HTML.





COCOA

  • https://en.wikipedia.org/wiki/COCOA_(digital_humanities) - was an early text file utility and associated file format for digital humanities, then known as humanities computing. It was approximately 4000 punched cards of FORTRAN and created in the late 1960s and early 1970s at University College London and the Atlas Computer Laboratory in Harwell, Oxfordshire. Functionality included word-counting and concordance building.

The COCOA file format bears at least a passing similarity to the later markup languages such as SGML and XML. A noticeable difference with its successors is that COCOA tags are flat and not tree structured. In that format, every information type and value encoded by a tag should be considered true until the same tag changes its value. Members of the Text Encoding Initiative community maintain legacy support for COCOA, although most in-demand texts and corpora have already been migrated to more widely understand formats such as TEI XML


GML

  • https://en.wikipedia.org/wiki/IBM_Generalized_Markup_Language - GML, 1969, is a set of macros that implement intent-based (procedural) markup tags for the IBM text formatter, SCRIPT. SCRIPT/VS is the main component of IBM's Document Composition Facility (DCF). A starter set of tags in GML is provided with the DCF product.

Scribe

CBCL

  • The Common Business Communication Language (7-May-1999) - The Common Business Communication Language, published in 1982, proposes a language for inter-business inter-computer commmunication. The proposal was considerably more advanced than X12 that came along some years later.
  • https://en.wikipedia.org/wiki/Common_Business_Communication_Language - (CBCL) is a communications language proposed by John McCarthy that foreshadowed much of XML. The language consists of a basic framework of hierarchical markup derived from S-expressions, coupled with some general principles about use and extensibility. Although written in 1975, the proposal was not published until 1982, and to this day remains relatively obscure.


SGML

  • https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language - SGML descended from IBM's Generalized Markup Language (GML), which Charles Goldfarb, Edward Mosher, and Raymond Lorie developed in the 1960s. Goldfarb, editor of the international standard, coined the “GML” term using their surname initials. Goldfarb also wrote the definitive work on SGML syntax in "The SGML Handbook". The syntax of SGML is closer to the COCOA format. As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry. Many such documents must remain readable for several decades—a long time in the information technology field. SGML also was extensively applied by the military, and the aerospace, technical reference, and industrial publishing industries. The advent of the XML profile has made SGML suitable for widespread application for small-scale, general-purpose use.


  • https://en.wikipedia.org/wiki/SGML_entity - In the Standard Generalized Markup Language (SGML), an entity is a primitive data type, which associates a string with either a unique alias (such as a user-specified name, or an SGML reserved word (such as #DEFAULT). Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of plain text, SGML tags, and/or references to previously defined entities. Certain entity types may also invoke external documents. Entities are called by reference.


  • https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references - In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference and a character entity reference. This article lists the character entity references that are valid in HTML and XML documents.



  • https://en.wikipedia.org/wiki/Formal_Public_Identifier - FPI, is a short piece of text with a particular structure that may be used to uniquely identify a product, specification or document. FPIs were introduced as part of Standard Generalized Markup Language (SGML), and serve particular purposes in formats historically derived from SGML (HTML and XML). Some of their most common uses are as part of document type declarations (DOCTYPEs) and document type definitions (DTDs) in SGML, XML and historically HTML, but they are also used in the vCard and iCalendar file formats to identify the software product which generated the file. More recently, Uniform Resource Identifiers (URIs) and universally unique identifiers (UUIDs) are usually used to uniquely identify objects. FPIs have become a legacy system.


  • https://en.wikipedia.org/wiki/Document_Style_Semantics_and_Specification_Language - an international standard developed to provide stylesheets for SGML documents. DSSSL consists of two parts: a tree transformation process that can be used to manipulate the tree structure of documents prior to presentation, and a formatting process that associates the elements in the source document with specific nodes in the target representation—the flow object tree. DSSSL specifications are device-independent pieces of information that can be interchanged between different platforms. DSSSL does not standardize the back-end formatters that generate the language's output. Such formatters may render the output for on-screen display, or write it to a computer file in a specific format (such as PostScript or Rich Text Format). Based on a subset of the Scheme programming language..


  • https://en.wikipedia.org/wiki/Document_type_declaration - or DOCTYPE, is an instruction that associates a particular XML or SGML document (for example, a web page, with a document type definition (DTD) (for example, the formal definition of a particular version of HTML 2.0 - 4.0). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

The HTML layout engines in modern web browsers perform DOCTYPE "sniffing" or "switching", wherein the DOCTYPE in a document served as text/html determines a layout mode, such as "quirks mode" or "standards mode". The text/html serialization of HTML5, which is not SGML-based, uses the DOCTYPE only for mode selection. Since web browsers are implemented with special-purpose HTML parsers, rather than general-purpose DTD-based parsers, they do not use DTDs and never access them even if a URL is provided. The DOCTYPE is retained in HTML5 as a "mostly useless, but required" header only to trigger "standards mode" in common browsers.


  • https://en.wikipedia.org/wiki/Document_type_definition - a set of markup declarations that define a document type for an SGML-family markup language (SGML, XML, HTML). A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.

XML uses a subset of SGML DTD.

As of 2009, newer XML namespace-aware schema languages (such as W3C XML Schema and ISO RELAX NG) have largely superseded DTDs. A namespace-aware version of DTDs is being developed as Part 9 of ISO DSDL. DTDs persist in applications that need special publishing characters, such as the XML and HTML Character Entity References, which derive from larger sets defined as part of the ISO SGML standard effort.


Processing instructions are exposed in the Document Object Model as Node.PROCESSING_INSTRUCTION_NODE, and they can be used in XPath and XQuery with the 'processing-instruction()' command.






  • "dtinfo"("user) manual page - starts the desktop on-line information browser, also known as the CDE Information Manager. On-line information is typically packaged into an information library (infolib), which is a hierarchy of bookcases containing SGML books (see the dtinfogen(1) command). The browser offers an ability to view, search, and print on-line information with a high degree of control. Bookmarks and annotations may be attached at desired points for later recall.

XML

See also Semantic

  • https://en.wikipedia.org/wiki/XML - Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML. The design goals of XML emphasize simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid the processing of XML data.

The essence of why extensible markup languages are necessary is explained at Markup language (for example, see Markup language § XML) and at Standard Generalized Markup Language.Hundreds of document formats using XML syntax have been developed, including RSS, Atom, SOAP, SVG, and XHTML. XML-based formats have become the default for many office-productivity tools, including Microsoft Office (Office Open XML), OpenOffice.org and LibreOffice (OpenDocument), and Apple's iWork[citation needed]. XML has also provided the base language for communication protocols such as XMPP. Applications for the Microsoft .NET Framework use XML files for configuration, and property lists are an implementation of configuration storage built on XML.



  • Document Content Description for XML - This document proposes a structural schema facility, Document Content Description (DCD), for specifying rules covering the structure and content of XML documents. The DCD proposal incorporates a subset of the XML-Data Submission [XML-Data] and expresses it in a way which is consistent with the ongoing W3C RDF (Resource Description Framework) [RDF] effort; in particular, DCD is an RDF vocabulary. DCD is intended to define document constraints in an XML syntax; these constraints may be used in the same fashion as traditional XML DTDs. DCD also provides additional properties, such as basic datatypes.


  • https://en.wikipedia.org/wiki/XML_Base - a World Wide Web Consortium recommended facility for defining base URIs, for resolving relative URIs, in parts of XML documents. XML Base recommendation was adopted on 2001-06-27.


  • https://en.wikipedia.org/wiki/XML_catalog - XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs. However absolute URLs only work when the network can reach them. Relying on remote resources makes XML processing susceptible to both planned and unplanned network downtime. Relative URLs are only useful in the context where they were initially created. For example, the URL "../../xml/dtd/docbookx.xml" will usually only be useful in very limited circumstances. One way to avoid these problems is to use an entity resolver (a standard part of SAX, or a URI Resolver (a standard part of JAXP). A resolver can examine the URIs of the resources being requested and determine how best to satisfy those requests. The XML catalog is a document describing a mapping between external entity references and locally cached equivalents.



XML Tree

  • https://en.wikipedia.org/wiki/XML_tree - XML documents have a hierarchical structure and can conceptually be interpreted as a tree structure, called an XML tree. XML documents must contain a root element (one that is the parent of all other elements). All elements in an XML document can contain sub elements, text and attributes. The tree represented by an XML document starts at the root element and branches to the lowest level of elements. Although there is no consensus on the terminology used on XML Trees, at least two standard terminologies have been released by the W3C: The terminology used in the XPath Data Model; The terminology used in the XML Information Set.


  • https://en.wikipedia.org/wiki/XML_Information_Set - a W3C specification describing an abstract data model of an XML document in terms of a set of information items. The definitions in the XML Information Set specification are meant to be used in other specifications that need to refer to the information in a well-formed XML document.An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.


  • https://en.wikipedia.org/wiki/XPath - XML Path Language, is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) in 1999, and can be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. Support for XPath exists in applications that support XML, such as web browsers, and many programming languages.

The XPath language is based on a tree representation of the XML document, and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though not in the official specification), an XPath expression is often referred to simply as "an XPath". Originally motivated by a desire to provide a common syntax and behavior model between XPointer and XSLT, subsets of the XPath query language are used in other W3C specifications such as XML Schema, XForms and the Internationalization Tag Set (ITS). XPath has been adopted by a number of XML processing libraries and tools, many of which also offer CSS Selectors, another W3C standard, as a simpler alternative to XPath.


  • https://en.wikipedia.org/wiki/XPointer - a system for addressing components of XML-based Internet media. It is divided among four specifications: a "framework" that forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces, and a scheme for XPath-based addressing. XPointer Framework is a W3C recommendation since March 2003. The XPointer language is designed to address structural aspects of XML, including text content and other information objects created as a result of parsing the document. Thus, it could be used to point to a section of a document highlighted by a user through a mouse drag action.


XML Schema

  • https://en.wikipedia.org/wiki/XML_schema - a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.There are languages developed specifically to express XML schemas. The document type definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S) and RELAX NG.The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means.


  • https://en.wikipedia.org/wiki/XML_Schema_(W3C) - a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document. They can check if it adheres to the description of the element it is placed in. Like all XML schema languages, XSD can be used to express a set of rules to which an XML document must conform in order to be considered "valid" according to that schema. However, unlike most other schema languages, XSD was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data types. Such a post-validation infoset can be useful in the development of XML document processing software.



XQuery

  • w3c: XQuery - a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats (JSON, binary, etc.). The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.


Xembly

  • Xembly - an Assembly-like imperative programming language for data manipulation in XML documents. It is a much simplier alternative to XSLT and XQuery. Read this blog post for a more detailed explanation: Xembly, an Assembly for XML.


XLink


  • https://en.wikipedia.org/wiki/RELAX_NG - REgular LAnguage for XML Next Generation, is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also offers a popular compact, non-XML syntax. Compared to other XML schema languages RELAX NG is considered relatively simple.It was defined by a committee specification of the OASIS RELAX NG technical committee in 2001 and 2002, based on Murata Makoto's RELAX and James Clark's TREX, and also by part two of the international standard ISO/IEC 19757: Document Schema Definition Languages (DSDL). ISO/IEC 19757-2 was developed by ISO/IEC JTC1/SC34 and published in its first version in 2003.

XSL



XSLT

  • https://en.wikipedia.org/wiki/XSLT - Extensible Stylesheet Language Transformations, is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subsequently be converted to other formats, such as PDF, PostScript and PNG. Support for JSON and plain-text transformation was added in later updates to the XSLT 1.0 specification. As of August 2022, the most recent stable version of the language is XSLT 3.0, which achieved Recommendation status in June 2017. XSLT 3.0 implementations support Java, .NET, C/C++, Python, PHP and NodeJS. An XSLT 3.0 Javascript library can also be hosted within the Web Browser. Modern web browsers also include native support for XSLT 1.0. For an XSLT document transformation, the original document is not changed; rather, a new document is created based on the content of an existing one. Typically, input documents are XML files, but anything from which the processor can build an XQuery and XPath Data Model can be used, such as relational database tables or geographical information systems.


  • https://linux.die.net/man/1/xsltproc - command line tool for applying XSLT stylesheets to XML documents. It is part of libxslt(3), the XSLT C library for GNOME. While it was developed as part of the GNOME project, it can operate independently of the GNOME desktop. xsltproc is invoked from the command line with the name of the stylesheet to be used followed by the name of the file or files to which the stylesheet is to be applied. It will use the standard input if a filename provided is -. If a stylesheet is included in an XML document with a Stylesheet Processing Instruction, no stylesheet need to be named at the command line. xsltproc will automatically detect the included stylesheet and use it. By default, output is to stdout. You can specify a file for output using the -o or --output option.


HTML / CSS

See HTML/CSS

DocBook

  • https://en.wikipedia.org/wiki/DocBook - a semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software but it can be used for any other sort of documentation.[1]As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML, XHTML, EPUB, PDF, man pages, Web help[2] and HTML Help, without requiring users to make any changes to the source. In other words, when a document is written in DocBook format it becomes easily portable into other formats. It solves the problem of reformatting by writing it once using XML tags.


Markdown

See Markdown

AsciiDoc

  • AsciiDoc Home Page - a text document format for writing notes, documentation, articles, books, ebooks, slideshows, web pages, man pages and blogs. AsciiDoc files can be translated to many formats including HTML, PDF, EPUB, man page.AsciiDoc is highly configurable: both the AsciiDoc source file syntax and the backend output markups (which can be almost any type of SGML/XML markup) can be customized and extended by the user.AsciiDoc is free software and is licenced under the terms of the GNU General Public License version 2 (GPLv2).



Setext

  • https://en.wikipedia.org/wiki/Setext - a lightweight markup language used to format plain text documents such as e-newsletters, Usenet postings, and e-mails. In contrast to some other markup languages (such as HTML,, the markup is easily readable without any parsing or special software. Setext was first introduced in 1991 by Ian Feldman for use in the TidBITS electronic newsletter.

Page description languages

  • https://en.wikipedia.org/wiki/Page_description_language - a computer language that describes the appearance of a printed page in a higher level than an actual output bitmap (or generally raster graphics). An overlapping term is printer control language, which includes Hewlett-Packard's Printer Command Language (PCL). PostScript is one of the most noted page description languages. The markup language adaptation of the PDL is the page description markup language. Page description languages are text (human-readable) or binary data streams, usually intermixed with text or graphics to be printed. They are distinct from graphics application programming interfaces (APIs) such as GDI and OpenGL that can be called by software to generate graphical output.


  • https://en.wikipedia.org/wiki/Device_independent_file_format - the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program (called a DVI driver) which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages (e.g. PostScript, PDF) and for printing.


PDF





  • https://github.com/LibrePDF/OpenPDF - a free Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.




  • https://github.com/glutanimate/PDFMtEd - PDF Metadata Editor) is a set of tools designed to simplify working with PDF metadata on Linux. The utilities hosted in this repository are graphical front-ends to the marvelous ExifTool by Phil Harvey.




  • flpsed - a WYSIWYG PostScript annotator. You can't remove or modify existing elements of a document. But flpsed lets you add arbitrary text lines to existing PostScript documents (PostScript is a registered trademark of Adobe Systems Incorporated). Added lines can later be reedited with flpsed. Using pdftops, which is part of xpdf one can convert PDF documents to PostScript and also add text to them. flpsed is useful for filling in forms, adding notes etc. GsWidget is now part of flpsed.

flpsed is released under the GPL.


  • https://sourceforge.net/projects/pdfshuffler - a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.


  • https://github.com/2mol/pboy - a small .pdf management utility. It is borne out of the frustration of having a dowload folder full of PDFs with names like 'I08.pdf', '1412.4880.pdf' and so on. Since I want to save some of these files for later reading, it would be helpful to have more descriptive filenames. This tool helps with renaming those files. It will rename/move documents to a specified folder, and it even gives some filename suggestions by looking at the file content and the pdf metadata.




  • https://github.com/pdfarranger/pdfarranger - Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface


  • print-css.rocks - CSS Paged Media tutorial and information, shows how to generate PDF documents from XML/HTML using the "CSS Paged Media" approach, whereby the complete styling and layout information is kept in cascading stylesheets (CSS). It will also show the results produced by different tools with identical data, providing an impression of functionality and output quality.What is CSS Paged MediaIn brief: CSS Paged Media (a W3C standard) is a way of generating PDF documents using XML/HTML as input and CSS for styling. It can be thought of as an extension of CSS for print purposes. As such, it is obvious that CSS Paged Media must deal with print-related considerations such as pagination, page formats, page regions and other print-specific details.


PDFio

  • PDFio - a simple C library for reading and writing PDF files. The primary goals of PDFio are: Read and write any version of PDF file; Provide access to pages, objects, and streams within a PDF file; Support reading and writing of encrypted PDF files; Extract or embed useful metadata (author, creator, page information, etc.); “Filter” PDF files, for example to extract a range of pages or to embed fonts that are missing from a PDF; Provide access to objects used for each page PDFio is not concerned with rendering or viewing a PDF file, although a PDF RIP or viewer could be written using it..

PostScript

  • https://en.wikipedia.org/wiki/PostScript - a page description language in the electronic publishing and desktop publishing business. It is a dynamically typed, concatenative programming language and was created at Adobe Systems by John Warnock, Charles Geschke, Doug Brotz, Ed Taft and Bill Paxton from 1982 to 1984.


  • PostScript as a Programming Language - PostScript has all the programming functionality you would expect from an HP-caclulator, plus some interesting features hard to find in other languages. There are variables, loops, subroutines (of a sort), and an advanced idea for the scope of variables.




RIP

  • https://en.wikipedia.org/wiki/Remote_Imaging_Protocol - associated Remote Imaging Protocol Scripting Language, RIPscrip, is a scripting language that provides a system for sending vector graphics over low-bandwidth links, notably modems. It was originally created by Jeff Reeder, Jim Bergman, and Mark Hayton of TeleGrafix Communications in Huntington Beach, California to enhance bulletin board systems and other applications.


DVI

  • https://en.wikipedia.org/wiki/Device_independent_file_format - the output file format of the TeX typesetting program, designed by David R. Fuchs and implemented by Donald E. Knuth in 1982. Unlike the TeX markup files used to generate them, DVI files are not intended to be human-readable; they consist of binary data describing the visual layout of a document in a manner not reliant on any specific image format, display hardware or printer. DVI files are typically used as input to a second program (called a DVI driver) which translates DVI files to graphical data. For example, most TeX software packages include a program for previewing DVI files on a user's computer display; this program is a driver. Drivers are also used to convert from DVI to popular page description languages (e.g. PostScript, PDF) and for printing.

Readers

Okular

  • Okular - a universal document viewer based developed by KDE. Okular works on multiple platforms, including but not limited to Linux, Windows, Mac OS X, *BSD, etc. The last stable release is Okular 1.3, shipped as part of the KDE Applications 17.12 release. Okular combines the excellent functionalities with the versatility of supporting different kind of documents, like PDF, Postscript, DjVu, CHM, XPS, ePub and others. The document format handlers page has a chart describing in more detail the supported formats and the features supported in each of them.

MuPDF

  • MuPDF - a lightweight PDF, XPS, and E-book viewer.MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB, and FictionBook 2. You can annotate PDF documents and fill out forms with the mobile viewers (this feature is coming soon to the desktop viewer as well). The command line tools allow you to annotate, edit, and convert documents to other formats such as HTML, SVG, PDF, and CBZ. You can also write scripts to manipulate documents using Javascript. The library is written modularly in portable C, so features can be added and removed by integrators if they so desire. We also have a Java library using JNI that works on both Oracle's Java and Android.

XpdfReader

  • XpdfReader - a free PDF viewer and toolkit, including a text extractor, image converter, HTML converter, and more. Most of the tools are available as open source.

Librera Reader

  • Librera Reader - All-Format eBook Reader for Android PDF - EPUB - MOBI - DJVU - FB2 - TXT - RTF - AZW - AZW3 - HTML - CBZ - CBR - XPS - MHT

Tools

Pandoc

  • Pandoc - a universal document converter If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to:
  • HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides.
  • Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
  • Ebooks: EPUB version 2 or 3, FictionBook2
  • Documentation formats: DocBook, TEI Simple, GNU TexInfo, Groff man pages, Haddock markup
  • Page layout formats: InDesign ICML
  • Outline formats: OPML
  • TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
  • PDF via LaTeX
  • Lightweight markup formats: Markdown (including CommonMark), reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile
  • Custom formats: custom writers can be written in lua.
  • Pandoc understands a number of useful markdown syntax extensions, including document metadata (title, author, date); footnotes; tables; definition lists; superscript and subscript; strikeout; enhanced ordered lists (start number and numbering style are significant); running example lists; delimited code blocks with syntax highlighting; smart quotes, dashes, and ellipses; markdown inside HTML blocks; and inline LaTeX. If strict markdown compatibility is desired, all of these extensions can be turned off.


pandoc -s -r html http://www.gnu.org/software/make/ -o example12.md
  # Converting a web page to markdown

pandoc MANUAL.txt --latex-engine=xelatex -o example13.pdf
  # From markdown to PDF. Arch Linux package requirements: texlive-core texlive-latexextra
  # bug: does not work with .md that has too many nested headings! [3]

panrun

Pandoc Scholar

  • Pandoc Scholar - Create beautiful, semantically enriched articles with pandoc. This package provides utilities to make publishing of scientific articles as simple and pleasant as possible. It simplifies setting authors' metadata in YAML blocks, allows to add semantic annotation to citations, and only requires the programs pandoc and make.

Docverter

  • Docverter - Convert plain text documents written in HTML, Markdown, or LaTeX to PDF, Docx, RTF or ePub with a simple HTTP API. It wraps the following open-source software in a JRuby app: Pandoc for plain text to HTML and ePub conversion, Flying Saucer for HTML to PDF, Calibre for ePub to MOBI conversion

Docutils

  • Docutils - an open-source text processing system for processing plaintext documentation into useful formats, such as HTML, LaTeX, man-pages, open-document or XML. It includes reStructuredText, the easy to read, easy to use, what-you-see-is-what-you-get plaintext markup language.

Asciidoctor

  • Asciidoctor - a fast, open source text processor and publishing toolchain for converting AsciiDoc content to HTML5, DocBook, PDF, and other formats. Asciidoctor is written in Ruby and runs on all major operation systems. To simplify installation, Asciidoctor is packaged and distributed as a gem to RubyGems.org and is packaged for popular Linux distributions and macOS. Asciidoctor can also be run in a JVM using AsciidoctorJ or in any JavaScript environment using Asciidoctor.js. The Asciidoctor project is hosted on GitHub.


Evince

  • Evince - a document viewer for multiple document formats. The goal of evince is to replace the multiple document viewers that exist on the GNOME Desktop with a single simple application.Evince is specifically designed to support the file following formats: PDF, Postscript, djvu, tiff, dvi, XPS, SyncTex support with gedit, comics books (cbr,cbz,cb7 and cbt).


Bookdown

Features include:

  • Generate printer-ready books and ebooks from R Markdown documents
  • A markup language easier to learn than LaTeX, and to write elements such as section headers, lists, quotes, figures, tables, and citations
  • Multiple choices of output formats: PDF, LaTeX, HTML, EPUB, and Word.
  • Possibility of including dynamic graphics and interactive applications (HTML widgets and Shiny apps)
  • Support for languages other than R, including C/C++, Python, and SQL, etc.
  • LaTeX equations, theorems, and proofs work for all output formats
  • Can be published to GitHub, bookdown.org, and any web servers
  • Integrated with the RStudio IDE
  • One-click publishing to https://bookdown.org

Text formatting


TeX


  • TeX Live - TeX Users Group - intended to be a straightforward way to get up and running with the TeX document production system. It provides a comprehensive TeX system with binaries for most flavors of Unix, including GNU/Linux and macOS, and also Windows. It includes all the major TeX-related programs, macro packages, and fonts that are free software, including support for many languages around the world. Many Unix/GNU/Linux operating systems provide TeX Live via their own distributions and package managers.
  • https://en.wikipedia.org/wiki/TeX_Live - a cross-platform, free software distribution for the TeX typesetting system that includes major TeX-related programs, macro packages, and fonts. It is the replacement of its no-longer supported counterpart teTeX. It is now the default TeX distribution for several Linux distributions such as openSUSE, Fedora, Debian, Ubuntu, Termux and Gentoo. Other Unix operating systems like OpenBSD, FreeBSD and NetBSD have also converted from teTeX to TeX Live.


LaTeX











  • LaTeX Editor - an online, interactive LaTeX editor. The visitor's LaTeX, entered or copied into the editing window below, will be quickly rendered by up to three renderers (in different ways). To learn how this works, I suggest choosing an example from the "LaTeX Examples" drop-down list at the lower left.





  • Kile - an Integrated LaTeX Editing Environment, a user-friendly TeX/LaTeX editor by KDE. Kile is available for many architectures and operating systems such as PC, Mac, and BSD, including Linux and Microsoft Windows.


  • https://github.com/ekiim/vim-mathpix - uses scrot, curl, and jq, to submit a POST request to the Mathpix API, in order to convert the selected image in to \Latex, or regular text depending on the argument.


  • LyX - a document processor that encourages an approach to writing based on the structure of your documents (WYSIWYM) and not simply their appearance (WYSIWYG).






  • Simple guide to using KaTeX - shows how to use KaTeXKATE​X using pure JavaScript without installing extra programs, so will work on any machine.


  • Detexify - LaTeX handwritten symbol recognition. Anyone who works with LaTeX knows how time-consuming it can be to find a symbol in symbols-a4.pdf that you just can't memorize. Detexify is an attempt to simplify this search.





AsciiMath

  • AsciiMath - an easy-to-write markup language for mathematics. [6]

DTP / Office


The word processor was a stand-alone office machine developed in the 1960s, combining the keyboard text-entry and printing functions of an electric typewriter with a recording unit, either tape or floppy disk (as used by the Wang machine) with a simple dedicated computer processor for the editing of text. Although features and designs varied among manufacturers and models, and new features were added as technology advanced, the first word processors typically featured a monochrome display and the ability to save documents on memory cards or diskettes. Later models introduced innovations such as spell-checking programs, and improved formatting options.


  • https://en.wikipedia.org/wiki/Word_processor_program - a computer program that provides word processing functions. Originally a separate type of application to desktop publishing, the two program types now overlap, with many word processors now including what were once desktop publishing functions. The first known electronic word processor program was Electric Pencil, released in 1976, as a tool for programmers to write documentation and manuals for their code. Electric pencil featured basic formatting and navigation, and supported external devices such as cassette recorders and printers. Electric Pencil II was released shortly after, targeting the CP/M operating system. Several other word processing programs were released shortly after, including EasyWriter and WordStar. WordStar was created in four months by Seymour Rubinstein after founding MicroPro International in 1978. WordStar is commonly attributed as the first WYSIWYG (what you see is what you get) editor, as the WordStar editor replicated the printed output. Inspired by the success of WordStar, many competitors began to release their offerings, including WordPerfect in 1979, MultiMate in 1982, and Microsoft Word in 1983.




LibreOffice / OpenOffice


  • The Document Foundation - It is an independent self-governing meritocratic entity, created by a large group of Free Software advocates, in the form of a charitable Foundation under German law (gemeinnützige rechtsfähige Stiftung des bürgerlichen Rechts). It continues to build on the foundation of the dedicated work by the OpenOffice.org Community. It was created in the belief that the culture born of an independent Foundation brings out the best in contributors and will deliver the best software for users. It is open to any individual who agrees with our core values and contributes to our activities. It welcomes corporate participation, e.g. by sponsoring individuals to work as equals alongside other contributors in the community. The Document Foundation is proud to be the home of LibreOffice, the next evolution of the world’s leading free office suite, and The Document Liberation Project, a community of developers united to free users from vendor lock-in of content by providing powerful tools for the conversion of proprietary file formats to the corresponding ODF format.





  • https://github.com/yamsu/vibreoffice - primary focus of this fork is to provide vim keybindings for calc.As mentioned, the original extension for Libreoffice/OpenOffice didn't support Calc, but provided support for Writer/Impress/Drawing. IMO a spreadsheet is a perfect example where vim's modal editing is a great way to work. There is already an extension for Excel, ExcelLikeVim, Google Sheets has a good extension for chrome/firefox sheetkey. But there was none that I could find for calc! and hence I added the functionality to vibreoffice.



Writer

Calc

OO

  • WollMux is an OpenOffice.org plugin with enhanced template, form, and autotext functionality. It can construct templates on the fly from multiple files (e.g. letterhead, footer, and body text) and will fill in personal and organizational data from various databases such as LDAP. An extra form GUI presents fields in an easily navigable manner and offers plausibility checks and computed values to ease filling in the form. Chainable printing functions allow various transformations during print and custom dialogs.

AbiWord


Xi

Monaco

WPS

  • WPS Office - The Most Compatible Free Office Suite


Other

See also WebDev#Authoring




  • https://en.wikipedia.org/wiki/Interleaf - the first commercial document processor that integrated text and graphics editing, producing WYSIWYG ("what you see is what you get") output at near-typeset quality

Manuskript

Screenwriting

Fountain

  • Fountain - a simple markup syntax for writing, editing and sharing screenplays in plain, human-readable text. Fountain allows you to work on your screenplay anywhere, on any computer or tablet, using any software that edits text files. Taking its cues from John Gruber’s Markdown, Fountain files are eminently readable. When special syntax is required, it is straightforward and intuitive. Even when viewed as plain text, your screenplay feels like a screenplay. Fountain supports everything a screenwriter is likely to need in the early, creative phases of writing. Not included are production features such as MOREs, CONTINUEDs, revision marks, locked pages, or colored pages. Because it’s just text, Fountain is also a great format for archiving screenplays without worry of file-format obsolescence or incompatibility.

Celtx

Trelby

  • Trelby - simple, fast and elegantly laid out to make screenwriting simple. It is infinitely configurable. Trelby is free software, that you can contribute to. Features; Screenplay editor: Enforces correct script format and pagination, auto-completion, and spell checking. Multiplatform : Behaves identically on all platforms, generating the exact same output. Choice of view: Multiple views, including draft view, WYSIWYG mode, and fullscreen to suit your writing style. Name database: Character name database containing over 200,000 names from various countries. Reporting: Scene/location/character/dialogue reports. Compare: Ability to compare scripts, so you know what changed between versions. Import: Screenplay formatted text, Final Draft XML (.fdx), Celtx (.celtx), Fountain (.fountain), Adobe Story (.astx) and Fade In Pro (.fadein). Export: PDF, formatted text, HTML, RTF, Final Draft XML (.fdx) and Fountain (.fountain). PDF: Built-in, highly configurable PDF generator. Supports embedding your chosen font. Also supports generating PDFs with custom watermarks, to help track shared files. Free software: Licensed under the GPL, Trelby welcomes developers and screenwriters to contribute in making it more useful.


Storyboarding

Storyboard Fountain

Storyboarder

Technical documentation

See also Internet


  • PDF: NISTIR 8366 Guidance for NIST Staff on Using Inclusive Language in Documentary Standards - This document provides guidance to NIST staff regarding the use of inclusive language in documentary standards and documents that support the realization and dissemination of physical standards; Standards Developing Organizations’ (SDOs) policies and procedures; and standards development participation. The document is intended to be used as guidance by NIST standards participants seeking to be impactful in addressing these issues.


Antora

Offline reader/browser

Devhelp

  • Devhelp - a developer tool for browsing and searching API documentation. It provides an easy way to navigate through libraries and to search by function, struct, or macro. The documentation must be installed locally, so an internet connection is not needed to use Devhelp. Devhelp works natively with GTK-Doc, so the GTK and GNOME libraries are well supported. But other development platforms can be supported as well, as long as the API documentation is available in HTML and a *.devhelp2 index file is generated.


Zeal

Templates

Business letters

Presentations

See also JS libs#Presentation

  • https://en.wikipedia.org/wiki/Presentation_program - also called presentation software) is a software package used to display information in the form of a slide show. It has three major functions: an editor that allows text to be inserted and formatted, a method for inserting and manipulating graphic images and media clips, a slide-show system to display the content

Presentation software can be viewed as enabling a functionally-specific category of electronic media, with its own distinct culture and practices as compared to traditional presentation media (such as blackboards, whiteboards and flip charts). Presentations in this mode of delivery have become pervasive in many aspects of business communication, especially in business planning, as well as in academic-conference and professional conference settings, and in the knowledge economy generally, where ideas are a primary work output. Presentations may also feature prominently in political settings, especially in workplace politics, where persuasion is a central determinant of group outcomes. Most modern meeting-rooms and conference halls are configured to include presentation electronics, such as projectors suitable for displaying presentation slides, often driven by the presenter's own laptop, under direct control of the presentation program used to develop the presentation. Often a presenter will present a lecture using the slides as a visual aid both for the presenter (to track the lecture's coverage) and for the audience (especially when an audience member mishears or misunderstands the verbal component).



Software





  • GitPitch - The Markdown Presentation Service on Git; - Markdown → Git → Slideshow. The Markdown Presentation Service for everyone on GitHub, GitLab, and Bitbucket. Using the tools you already know and love ~ Markdown + Git.

Spice-up

  • https://github.com/Philip-Scott/Spice-up - Create presentations that stand out! Spice-Up has everything you need to create simple and beautiful presentations. Get your ideas across with beautiful designed templates, or start from scratch with a blank canvas. Either way, you will add some spice to your presentations with a wide variety of background patterns and a beautiful color palette.

reveal.js


  • Asciidoctor Reveal.js - a converter for Asciidoctor and Asciidoctor.js that transforms an AsciiDoc document into an HTML5 presentation designed to be executed by the reveal.js presentation framework.


reveal-md slides.md --static _site


remark


WebSlides


Marp

  • Marp - Markdown Presentation Writer [11]

pdfpc

  • pdfpc - a GTK-based presentation viewer which uses Keynote-like multi-monitor output to provide meta information to the speaker during the presentation. It is able to show a normal presentation window on one screen while showing a more sophisticated overview on the other one, providing information like a picture of the next slide, as well as the time left in the presentation. pdfpc processes PDF documents, which can be created using nearly all modern presentation software.

Articles


Powerpoint karaoke

  • https://en.wikipedia.org/wiki/Powerpoint_Karaoke - also known as Battledecks or Battle Decks, is an improvisational activity in which a participant must deliver a presentation based on a set of slides that they have never seen before. Its name is derived from Microsoft PowerPoint, a popular presentation software, and karaoke, an activity in which a performer sings along with a pre-recorded backing track (although there is usually no music or singing involved in PowerPoint Karaoke). The effect is intended to be comical, and PowerPoint Karaoke can be considered a form of improvisational theatre, or a type of Theatresports game. The presentation can either be a real slideshow on an arcane topic, or a set of real slides from different presentations that are nonsensical when assembled together, or slides that are nonsensical on their own (in some cases created by randomly downloading images from the internet and adding unrelated text). In some cases, the presenter is given a theme beforehand that they must attempt to tie all the slides into.



Contacts


Spreadsheets










Gnumeric


Smart spreadsheets



  • NocoDB - Turns your SQL database into a Nocode platform. Free & open source Airtable alternative

Signing

  • https://en.wikipedia.org/wiki/Electronic_signature - or e-signature, is data that is logically associated with other data and which is used by the signatory to sign the associated data. This type of signature has the same legal standing as a handwritten signature as long as it adheres to the requirements of the specific regulation under which it was created (e.g., eIDAS in the European Union, NIST-DSS in the USA or ZertES in Switzerland).

Electronic signatures are a legal concept distinct from digital signatures, a cryptographic mechanism often used to implement electronic signatures. While an electronic signature can be as simple as a name entered in an electronic document, digital signatures are increasingly used in e-commerce and in regulatory filings to implement electronic signatures in a cryptographically protected way. Standardization agencies like NIST or ETSI provide standards for their implementation (e.g., NIST-DSS, XAdES or PAdES). The concept itself is not new, with common law jurisdictions having recognized telegraph signatures as far back as the mid-19th century and faxed signatures since the 1980s.


OpenSign

  • OpenSign - an open-source document e-signing solution designed to provide a secure, reliable, and free alternative to commercial platforms like DocuSign, PandaDoc, SignNow, Adobe Sign, Smartwaiver, SignRequest, HelloSign & Zoho sign. Developed under the OpenSignLabs organization, our mission is to democratize the e-signing process, making it accessible and straightforward for everyone. [18]


Documenso

to sort