- 1 General
- 2 Languages
- 3 to sort
- 4 Linguistics
- 5 Literacy
- 6 Technology
- 7 Other
- 8 Fiction
- 9 Naming conventions
- http://www.economist.com/news/science-and-technology/21707183-researchers-uncover-ancient-links-between-majority-worlds 
- Glottolog - Comprehensive reference information for the world's languages, especially the lesser known languages.
- CLLD - Cross-Linguistic Linked Data - Helping collect the world's language diversity heritage.
- https://en.wikipedia.org/wiki/Proto-Indo-European_language - the linguistic reconstruction of the hypothetical common ancestor of the Indo-European languages, the most widely spoken language family in the world.
Far more work has gone into reconstructing PIE than any other proto-language, and it is by far the best understood of all proto-languages of its age. The vast majority of linguistic work during the 19th century was devoted to the reconstruction of PIE or its daughter proto-languages (such as Proto-Germanic), and most of the modern techniques of linguistic reconstruction such as the comparative method were developed as a result. These methods supply all current knowledge concerning PIE since there is no written record of the language.
PIE is estimated to have been spoken as a single language from 4500 BC to 2500 BC during the Late Neolithic to Early Bronze Age, though estimates vary by more than a thousand years. According to the prevailing Kurgan hypothesis, the origin...ht into the culture and religion of its speakers.
As Proto-Indo-Europeans became isolated from each other through the Indo-European migrations, the Proto-Indo-European language became spoken by the various groups in regional dialects which then underwent the Indo-European sound laws divergence, and along with shifts in morphology, these dialects slowly but eventually transformed into the known ancient Indo-European languages. From there, further linguistic divergence led to the evolution of their current descendants, the modern Indo-European languages. Today, the descendant languages, or daughter languages, of PIE with the most speakers are Spanish, English, Hindustani (Hindi and Urdu), Portuguese, Bengali, Russian, Punjabi, German, Persian, French, Italian and Marathi. Hundreds of other living descendants of PIE range from languages as diverse as Albanian (gjuha shqipe), Kurdish (کوردی), Nepali (खस भाषा), Tsakonian (τσακώνικα), Ukrainian (українська мова), and Welsh (Cymraeg).
- https://en.wikipedia.org/wiki/Grimm%27s_law - also known as the First Germanic Sound Shift or Rask's rule) is a set of statements named after Jacob Grimm and Rasmus Rask describing the inherited Proto-Indo-European (PIE) stop consonants as they developed in Proto-Germanic (the common ancestor of the Germanic branch of the Indo-European family) in the 1st millennium BC. It establishes a set of regular correspondences between early Germanic stops and fricatives and the stop consonants of certain other centum Indo-European languages (Grimm used mostly Latin and Greek for illustration).
- https://en.wikipedia.org/wiki/Indo-European_languages - a language family of several hundred related languages and dialects. mThere are about 445 living Indo-European languages, according to the estimate by Ethnologue, with over two thirds (313) of them belonging to the Indo-Iranian branch.
- https://github.com/soulaklabs/bitoduc.fr - A website about french words for computer concepts.
- Japanese Complete = ジャパニーズ・コンプリート - With 777 of the most frequent kanji, one has 90.0% coverage of Kanji in the wild!
- https://github.com/soimort/translate-shell - a command-line translator powered by Google Translate (default), Bing Translator, Yandex.Translate, and Apertium. It gives you easy access to one of these translation engines in your terminal:
- BabelFish.org is a fish that translates speech from one language to another.
- EUdict is a collection of online dictionaries for the languages spoken mostly in the European Community. These dictionaries are the result of the work of many authors who worked very hard and finally offered their product free of charge on the internet thus making it easier to all of us to communicate with each other.
- dict.cc is not only an online dictionary. It's an attempt to create a platform where users from all over the world can share their knowledge in the field of translations. Every visitor can suggest new translations and correct or confirm other users' suggestions.
- Linguee - Dictionary and search engine for 100 million translations.
- Dictionarist - provides translations in English, Spanish, Portuguese, German, French, Italian, Russian, Turkish, Dutch, Greek,Chinese, Japanese, Korean, Arabic, Hindi, Indonesian, Polish, Romanian, Ukrainian and Vietnamese.
- http://rut.org/cgi-bin/j-e/dict - Japanese
- Pootle - Community localization server. Get your community translating your software into their languages.
- FrathWiki - information for the conlanging and linguistics community
- https://en.wikipedia.org/wiki/Ordinal_indicator - nd, rd, th, etc.
to sort from Being
- https://en.wikipedia.org/wiki/Rhetorical_modes - also known as modes of discourse, describe the variety, conventions, and purposes of the major kinds of language-based communication, particularly writing and speaking. Four of the most common rhetorical modes and their purpose are narration, description, exposition, and argumentation.
- https://en.wikipedia.org/wiki/Description - act of description may be related to that of definition. Description is also the fiction-writing mode for transmitting a mental image of the particulars of a story. Definition: The pattern of development that presents a word picture of a thing, a person, a situation, or a series of events.
- https://en.wikipedia.org/wiki/Narrative - or story is any report of connected events, real or imaginary, presented in a sequence of written or spoken words, and/or still or moving images. Narrative can be organized in a number of thematic and/or formal categories: non-fiction (such as definitively including creative non-fiction, biography, journalism, transcript poetry, and historiography); fictionalization of historical events (such as anecdote, myth, legend, and historical fiction); and fiction proper (such as literature in prose and sometimes poetry, such as short stories, novels, and narrative poems and songs, and imaginary narratives as portrayed in other textual forms, games, or live or recorded performances). Narrative is found in all forms of human creativity, art, and entertainment, including speech, literature, theatre, music and song, comics, journalism, film, television and video, radio, gameplay, unstructured recreation, and performance in general, as well as some painting, sculpture, drawing, photography, and other visual arts (though several modern art movements refuse the narrative in favor of the abstract and conceptual), as long as a sequence of events is presented. The word derives from the Latin verb narrare, "to tell", which is derived from the adjective gnarus, "knowing" or "skilled".
Oral storytelling is perhaps the earliest method for sharing narratives. During most people's childhoods, narratives are used to guide them on proper behavior, cultural history, formation of a communal identity, and values, as especially studied in anthropology today among traditional indigenous peoples. Narratives may also be nested within other narratives, such as narratives told by an unreliable narrator (a character) typically found in noir fiction genre. An important part of narration is the narrative mode, the set of methods used to communicate the narrative through a process narration (see also "Narrative Aesthetics" below). Along with exposition, argumentation, and description, narration, broadly defined, is one of four rhetorical modes of discourse. More narrowly defined, it is the fiction-writing mode in which the narrator communicates directly to the reader.
- https://en.wikipedia.org/wiki/Exposition_(narrative) - the insertion of important background information within a story; for example, information about the setting, characters' backstories, prior plot events, historical context, etc. In a specifically literary context, exposition appears in the form of expository writing embedded within the narrative.
- https://en.wikipedia.org/wiki/Linguistics - the scientific study of language, specifically language form, language meaning, and language in context. The earliest activities in the description of language have been attributed to the 4th century BCE Indian grammarian Pāṇini, who was an early student of linguistics and wrote a formal description of the Sanskrit language in his Aṣṭādhyāyī.
Linguistics analyzes human language as a system for relating sounds (or signs in signed languages) and meaning. Phonetics studies acoustic and articulatory properties of the production and perception of speech sounds and non-speech sounds. The study of language meaning, on the other hand, deals with how languages encode relations between entities, properties, and other aspects of the world to convey, process, and assign meaning, as well as to manage and resolve ambiguity. While the study of semantics typically concerns itself with truth conditions, pragmatics deals with how context influences meanings.
Grammar is a system of rules which govern the form of the utterances in a given language. It encompasses both sound and meaning, and includes phonology (how sounds or gestures function together), morphology (the formation and composition of words), and syntax (the formation and composition of phrases and sentences from words).
In the early 20th century, Ferdinand de Saussure distinguished between the notions of langue and parole in his formulation of structural linguistics. According to him, parole is the specific utterance of speech, whereas langue refers to an abstract phenomenon that theoretically defines the principles and system of rules that govern a language. This distinction resembles the one made by Noam Chomsky between competence and performance, where competence is individual's ideal knowledge of a language, while performance is the specific way in which it is used.
The formal study of language has also led to the growth of fields like psycholinguistics, which explores the representation and function of language in the mind; neurolinguistics, which studies language processing in the brain; and language acquisition, which investigates how children and adults acquire a particular language.
Linguistics also includes non-formal approaches to the study of other aspects of human language, such as social, cultural, historical and political factors. The study of cultural discourses and dialects is the domain of sociolinguistics, which looks at the relation between linguistic variation and social structures, as well as that of discourse analysis, which examines the structure of texts and conversations. Research on language through historical and evolutionary linguistics focuses on how languages change, and on the origin and growth of languages, particularly over an extended period of time.
Corpus linguistics takes naturally occurring texts and studies the variation of grammatical and other features based on such corpora. Stylistics involves the study of patterns of style: within written, signed, or spoken discourse. Language documentation combines anthropological inquiry with linguistic inquiry to describe languages and their grammars. Lexicography covers the study and construction of dictionaries. Computational linguistics applies computer technology to address questions in theoretical linguistics, as well as to create applications for use in parsing, data retrieval, machine translation, and other areas. People can apply actual knowledge of a language in translation and interpreting, as well as in language education – the teaching of a second or foreign language. Policy makers work with governments to implement new plans in education and teaching which are based on linguistic research.
- Max Planck Neuroscience on Nautilus: Brainwaves Encode the Grammar of Human Language - The relative timing of brainwaves encodes the structure of a sentence.
- https://en.wikipedia.org/wiki/Applied_linguistics - an interdisciplinary field of linguistics that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, psychology, computer science, communication research, anthropology, and sociology.
- https://en.wikipedia.org/wiki/Linguistic_turn - a major development in Western philosophy during the 20th century, the most important characteristic of which is the focusing of philosophy and the other humanities primarily on the relationship between philosophy and language.
- https://en.wikipedia.org/wiki/Etymology - is the history of words, their origins, and how their form and meaning have changed over time. By an extension, the term "the etymology of [a word]" means the origin of the particular word.
Becker's Criterion: "Any theory (or partial theory) of the English Language that is expounded in the English Language must account for (or at least apply to) the text of its own exposition."
Becker's Razor: his final riposte to theoretical linguists: "Elegance and truth are inversely related', after which he finishes with, 'Put that in your phrasal lexicon and invoke it!"
- YouTube: Chomsky on Zizek and Lacan
- https://en.wikipedia.org/wiki/Linguistic_typology - a subfield of linguistics that studies and classifies languages according to their structural and functional features. Its aim is to describe and explain the common properties and the structural diversity of the world's languages. It includes three subdisciplines: qualitative typology, which deals with the issue of comparing languages and within-language variance; quantitative typology, which deals with the distribution of structural patterns in the world’s languages; and theoretical typology, which explains these distributions.
- Field Linguist's Toolbox - a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.Although Toolbox is very powerful, it is designed to be easy to learn. The user can start with a simple standard setup and gradually add the use of more powerful features as desired. The Toolbox downloads include a training package that is usable for self-paced individual learning as well as for classroom teaching of Toolbox. 
- FieldWorks - consists of software tools that help you manage linguistic and cultural data. FieldWorks supports tasks ranging from the initial entry of collected data through to the preparation of data for publication, including dictionary development, interlinearization of texts, morphological analysis, and other publications. Furthermore, FieldWorks BTE contains a specialized drafting and editing environment for Bible Translators, which provides interaction with the language data stored in Language Explorer.
- AGGREGATION - Implemented grammars can contribute to endangered language documentation in several ways. In the first instance, the grammars themselves provide a very rich addition to prose descriptive grammars, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances (from running text or elicited examples) associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in typology, grounding higher level linguistic abstractions in actual utterances in a computationally tractable fashion. Despite these advantages, grammar engineering for language documentation has gone largely unexplored. In this project, we investigate how to automate the construction of grammar fragments, building on interlinear glossed text (IGT) and the LinGO Grammar Matrix, a typologically motivated cross-linguistic computational resource.
- Home - DELPH-IN - Computational linguists from research sites world-wide have joined forces in a collaborative effort aimed at ‘deep’ linguistic processing of human language. The goal is the combination of linguistic and statistical processing methods for getting at the meaning of texts and utterances. The partners have adopted Head-Driven Phrase Structure Grammar (HPSG) and Minimal Recursion Semantics (MRS), two advanced models of formal linguistic analysis. They have also committed themselves to a shared format for grammatical representation and to a rigid scheme of evaluation, as well as to the general use of open-source licensing and transparency.
- SFST - A toolbox for the implementation of morphological analysers
- Foma - A Finite State Compiler and Library - a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes.The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, boolean operations. Also, more advanced construction methods are available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.
- Helsinki Finite-State Technology - Project Web Hosting - Open Source Software - intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.
- Tatoeba - a collection of sentences and translations.It's collaborative, open, free and even addictive.
See also Maths#Logic
- https://en.wikipedia.org/wiki/Semiotics - also called semiotic studies; not to be confused with the Saussurean tradition called semiology which is a part of semiotics) is the study of meaning-making, the study of sign processes and meaningful communication. This includes the study of signs and sign processes (semiosis), indication, designation, likeness, analogy, allegory, metonymy, metaphor, symbolism, signification, and communication.
Semiotics is closely related to the field of linguistics, which, for its part, studies the structure and meaning of language more specifically. The semiotic tradition explores the study of signs and symbols as a significant part of communications. As different from linguistics, however, semiotics also studies non-linguistic sign systems.
Semiotics is frequently seen as having important anthropological dimensions; for example, the late Italian semiotician and novelist Umberto Eco proposed that every cultural phenomenon may be studied as communication. Some semioticians focus on the logical dimensions of the science, however. They examine areas belonging also to the life sciences—such as how organisms make predictions about, and adapt to, their semiotic niche in the world (see semiosis). In general, semiotic theories take signs or sign systems as their object of study: the communication of information in living organisms is covered in biosemiotics (including zoosemiotics).
- Semiotics for Beginners - Daniel Chandler
- https://en.wikipedia.org/wiki/Syntagma_(linguistics) - an elementary constituent segment within a text. Such a segment can be a phoneme, a word, a grammatical phrase, a sentence, or an event within a larger narrative structure, depending on the level of analysis. Syntagmatic analysis involves the study of relationships (rules of combination) among syntagmas.
At the lexical level, syntagmatic structure in a language is the combination of words according to the rules of syntax for that language. For example, English uses determiner + adjective + noun, e.g. the big house. Another language might use determiner + noun + adjective (Spanish la casa grande) and therefore have a different syntagmatic structure.
At a higher level, narrative structures feature a realistic temporal flow guided by tension and relaxation; thus, for example, events or rhetorical figures may be treated as syntagmas of epic structures.
Syntagmatic structure is often contrasted with paradigmatic structure. In semiotics, "syntagmatic analysis" is analysis of syntax or surface structure (syntagmatic structure), rather than paradigms as in paradigmatic analysis. Analysis is often achieved through commutation tests.
- https://en.wikipedia.org/wiki/Syntagmatic_analysis - is analysis of syntax or surface structure (syntagmatic structure) as opposed to paradigms (paradigmatic analysis). This is often achieved using commutation tests.
- Ontology is Overrated: Categories, Links, and Tags - "The Only Group That Can Categorize Everything Is Everybody"
- https://en.wikipedia.org/wiki/Process_philosophy - ontology of becoming, Whitehead
- https://en.wikipedia.org/wiki/Pragmatics - a subfield of linguistics and semiotics that studies the ways in which context contributes to meaning. Pragmatics encompasses speech act theory, conversational implicature, talk in interaction and other approaches to language behavior in philosophy, sociology, linguistics and anthropology.
Unlike semantics, which examines meaning that is conventional or "coded" in a given language, pragmatics studies how the transmission of meaning depends not only on structural and linguistic knowledge (e.g., grammar, lexicon, etc.) of the speaker and listener, but also on the context of the utterance, any pre-existing knowledge about those involved, the inferred intent of the speaker, and other factors. In this respect, pragmatics explains how language users are able to overcome apparent ambiguity, since meaning relies on the manner, place, time etc. of an utterance.
The ability to understand another speaker's intended meaning is called pragmatic competence.
- https://en.wikipedia.org/wiki/Phonetics - a branch of linguistics that comprises the study of the sounds of human speech, or—in the case of sign languages—the equivalent aspects of sign. It is concerned with the physical properties of speech sounds or signs (phones): their physiological production, acoustic properties, auditory perception, and neurophysiological status. Phonology, on the other hand, is concerned with the abstract, grammatical characterization of systems of sounds or signs.
The field of phonetics is a multilayered subject of linguistics that focuses on speech. In the case of oral languages there are three basic areas of study inter-connected through the common mechanism of sound, such as wavelength (pitch), amplitude, and harmonics:
- https://en.wikipedia.org/wiki/Articulatory_phonetics - the study of the production of speech sounds by the articulatory and vocal tract by the speaker.
- https://en.wikipedia.org/wiki/Acoustic_phonetics - the study of the physical transmission of speech sounds from the speaker to the listener.
- https://en.wikipedia.org/wiki/Auditory_phonetics - the study of the reception and perception of speech sounds by the listener.
- https://en.wikipedia.org/wiki/ARPABET - also spelled ARPAbet, is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character (alternating upper- and lower-case letters) and the other with two or more (case-insensitive), were devised, the latter being far more widely adopted.ARPABET has been used in several speech synthesizers, including Computalker for the S-100 system, SAM for the Commodore 64, SAY for the Amiga, TextAssist for the PC and Speakeasy from Intelligent Artefacts which used the Votrax SC-01 speech synthesiser IC. It is also used in the CMU Pronouncing Dictionary. A revised version of ARPABET is used in the TIMIT corpus.
- https://en.wikipedia.org/wiki/Phonetic_algorithm - an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result. They are necessarily complex algorithms with many rules and exceptions, because English spelling and pronunciation is complicated by historical changes in pronunciation and words borrowed from many languages.
- https://en.wikipedia.org/wiki/Phonology - a branch of linguistics concerned with the systematic organization of sounds in languages. It has traditionally focused largely on the study of the systems of phonemes in particular languages (and therefore used to be also called phonemics, or phonematics), but it may also cover any linguistic analysis either at a level beneath the word (including syllable, onset and rime, articulatory gestures, articulatory features, mora, etc.) or at all levels of language where sound is considered to be structured for conveying linguistic meaning. Phonology also includes the study of equivalent organizational systems in sign languages.
- https://en.wikipedia.org/wiki/Phoneme - one of the units of sound that distinguish one word from another in a particular language. The difference in meaning between the English words kill and kiss is a result of the exchange of the phoneme /l/ for the phoneme /s/. Two words that differ in meaning through a contrast of a single phoneme form a minimal pair.
In linguistics, phonemes (established by the use of minimal pairs, such as kill vs kiss or pat vs bat) are written between slashes like this: /p/, whereas when it is desired to show the more exact pronunciation of any sound, linguists use square brackets, for example [pʰ] (indicating an aspirated p).
Within linguistics there are differing views as to exactly what phonemes are and how a given language should be analyzed in phonemic (or phonematic) terms. However, a phoneme is generally regarded as an abstraction of a set (or equivalence class) of speech sounds (phones) which are perceived as equivalent to each other in a given language. For example, in English, the "k" sounds in the words kit and skill are not identical (as described below), but they are distributional variants of a single phoneme /k/. Different speech sounds that are realizations of the same phoneme are known as allophones. Allophonic variation may be conditioned, in which case a certain phoneme is realized as a certain allophone in particular phonological environments, or it may be free in which case it may vary randomly. In this way, phonemes are often considered to constitute an abstract underlying representation for segments of words, while speech sounds make up the corresponding phonetic realization, or surface form.
- https://en.wikipedia.org/wiki/Morpheme - is the smallest grammatical unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word, by definition, is freestanding. When it stands by itself, it is considered a root because it has a meaning of its own (e.g. the morpheme cat) and when it depends on another morpheme to express an idea, it is an affix because it has a grammatical function (e.g. the –s in cats to specify that it is plural). Every word comprises one or more morphemes. The more combinations a morpheme is found in, the more productive it is said to be.
- https://en.wikipedia.org/wiki/Morphology_(linguistics) - the identification, analysis and description of the structure of a given language's morphemes and other linguistic units, such as root words, affixes, parts of speech, intonations and stresses, or implied context. In contrast, morphological typology is the classification of languages according to their use of morphemes, while lexicology is the study of those words forming a language's wordstock. The discipline that deals specifically with the sound changes occurring within morphemes is morphophonology.
While words, along with clitics, are generally accepted as being the smallest units of syntax, in most languages, if not all, many words can be related to other words by rules that collectively describe the grammar for that language. For example, English speakers recognize that the words dog and dogs are closely related, differentiated only by the plurality morpheme "-s", only found bound to nouns. Speakers of English, a fusional language, recognize these relations from their tacit knowledge of English's rules of word formation. They infer intuitively that dog is to dogs as cat is to cats; and, in similar fashion, dog is to dog catcher as dish is to dishwasher. By contrast, Classical Chinese has very little morphology, using almost exclusively unbound morphemes ("free" morphemes) and depending on word order to convey meaning. (Most words in modern Standard Chinese ("Mandarin"), however, are compounds and most roots are bound.) These are understood as grammars that represent the morphology of the language. The rules understood by a speaker reflect specific patterns or regularities in the way words are formed from smaller units in the language they are using and how those smaller units interact in speech. In this way, morphology is the branch of linguistics that studies patterns of word formation within and across languages and attempts to formulate rules that model the knowledge of the speakers of those languages.
Polysynthetic languages, such as Chukchi, have words composed of many morphemes. The Chukchi word "təmeyŋəlevtpəγtərkən", for example, meaning "I have a fierce headache", is composed of eight morphemes t-ə-meyŋ-ə-levt-pəγt-ə-rkən that may be glossed. The morphology of such languages allows for each consonant and vowel to be understood as morphemes, while the grammar of the language indicates the usage and understanding of each morpheme.
- https://en.wikipedia.org/wiki/Lexeme - a unit of lexical meaning that exists regardless of the number of inflectional endings it may have or the number of words it may contain. It is a basic unit of meaning, and the headwords of a dictionary are all lexemes. Put more technically, a lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as run. A related concept is the lemma (or citation form), which is a particular form of a lexeme that is chosen by convention to represent a canonical form of a lexeme. Lemmas are used in dictionaries as the headwords, and other forms of a lexeme are often listed later in the entry if they are not common conjugations of that word.
A lexeme belongs to a particular syntactic category, has a certain meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. For example, the lexeme run has a present third person singular form runs, a present non-third-person singular form run (which also functions as the past participle and non-finite form), a past form ran, and a present participle running. (It does not include runner, runners, runnable, etc.) The use of the forms of a lexeme is governed by rules of grammar; in the case of English verbs such as run, these include subject-verb agreement and compound tense rules, which determine which form of a verb can be used in a given sentence.
- https://en.wikipedia.org/wiki/Word - smallest element that may be uttered in isolation with semantic or pragmatic content.
- https://en.wikipedia.org/wiki/Open_class_(linguistics) - a word class may be either an open class or a closed class. Open classes accept the addition of new morphemes (words), through such processes as compounding, derivation, inflection, coining, and borrowing; closed classes generally do not.
- https://en.wikipedia.org/wiki/Back-formation - the process of creating a new lexeme, usually by removing actual or supposed affixes. The resulting neologism is called a back-formation, a term coined by James Murray in 1889. (OED online first definition of 'back formation' is from the definition of to burgle, which was first published in 1889.) Back-formation is different from clipping – back-formation may change the part of speech or the word's meaning, whereas clipping creates shortened words from longer words, but does not change the part of speech or the meaning of the word.
- https://en.wikipedia.org/wiki/Lexicology - the part of linguistics which studies words. This may include their nature and function as symbols their meaning, the relationship of their meaning to epistemology in general, and the rules of their composition from smaller elements (morphemes such as the English -ed marker for past or un- for negation; and phonemes as basic sound units). Lexicology also involves relations between words, which may involve semantics (for example, love vs. affection), derivation (for example, fathom vs. unfathomably), usage and sociolinguistic distinctions (for example, flesh vs. meat), and any other issues involved in analyzing the whole lexicon of a language(s).
- https://en.wikipedia.org/wiki/Computational_lexicology - that branch of computational linguistics, which is concerned with the use of computers in the study of lexicon. It has been more narrowly described by some scholars (Amsler, 1980) as the use of computers in the study of machine-readable dictionaries. It is distinguished from computational lexicography, which more properly would be the use of computers in the construction of dictionaries, though some researchers have used computational lexicography as synonymous.
- https://en.wikipedia.org/wiki/Lexicography - is divided into two separate but equally important groups: Practical lexicography is the art or craft of compiling, writing and editing dictionaries; Theoretical lexicography is the scholarly discipline of analyzing and describing the semantic, syntagmatic and paradigmatic relationships within the lexicon (vocabulary) of a language, developing theories of dictionary components and structures linking the data in dictionaries, the needs for information by users in specific types of situation, and how users may best access the data incorporated in printed and electronic dictionaries. This is sometimes referred to as 'metalexicography'.
Part of speech
- https://en.wikipedia.org/wiki/Part_of_speech - also a word class, a lexical class, or a lexical category, a linguistic category of words (lexical items) defined by the items syntactic or morphological behaviour. Common linguistic categories include noun and verb, among others.
Three little words you often see Are ARTICLES: a, an, and the. A NOUN's the name of anything, As: school or garden, toy, or swing. ADJECTIVES tell the kind of noun, As: great, small, pretty, white, or brown. VERBS tell of something being done: To read, write, count, sing, jump, or run. How things are done the ADVERBS tell, As: slowly, quickly, badly, well. CONJUNCTIONS join the words together, As: men and women, wind or weather. The PREPOSITION stands before A noun as: in or through a door. The INTERJECTION shows surprise As: Oh, how pretty! Ah! how wise!
- https://en.wikipedia.org/wiki/Grammar - the set of structural rules governing the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics, and pragmatics.
- https://en.wikipedia.org/wiki/Syntactic_hierarchy - concerned with the way sentences are constructed from smaller parts, such as words and phrases.
- https://en.wikipedia.org/wiki/Clause - the smallest grammar unit that can express a complete proposition. typically consists of a subject and a predicate, where the predicate is typically a verb phrase – a verb together with any objects and other modifiers.
- https://en.wikipedia.org/wiki/Anaphora_(linguistics) - use of an expression the interpretation of which depends upon another expression in context (its antecedent or postcedent). In a narrower sense, anaphora is the use of an expression which depends specifically upon an antecedent expression, and thus is contrasted with cataphora, which is the use of an expression which depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor. For example, in the sentence Sally arrived, but nobody saw her, the pronoun her is an anaphor, referring back to the antecedent Sally. In the sentence Before her arrival, nobody saw Sally, the pronoun her refers forward to the postcedent Sally, so her is now a cataphor (and an anaphor in the broader, but not the narrower, sense). Usually, an anaphoric expression is a proform or some other kind of deictic (contextually-dependent) expression. Both anaphora and cataphora are species of endophora, referring to something mentioned elsewhere in a dialog or text.
- https://en.wikipedia.org/wiki/Formulaic_language - previously known as automatic speech or embolalia, is a linguistic term for verbal expressions that are fixed in form, often non-literal in meaning with attitudinal nuances, and closely related to communicative-pragmatic context. Along with idioms, expletives and proverbs, formulaic language includes pause fillers (e.g., “Like,” “Er” or “Uhm”) and conversational speech formulas (e.g., “You’ve got to be kidding,” “Excuse me?” or “Hang on a minute”).
- https://en.wikipedia.org/wiki/Fixed_expression - a standard form of expression that has taken on a more specific meaning than the expression itself. It is different from a proverb in that it is used as a part of a sentence, and is the standard way of expressing a concept or idea.
- https://en.wikipedia.org/wiki/Idiom - (Latin: idioma, "special property", from Greek: ἰδίωμα – idíōma, "special feature, special phrasing, a peculiarity", f. Greek: ἴδιος – ídios, "one’s own") is a phrase or a fixed expression that has a figurative, or sometimes literal, meaning. An idiom's figurative meaning is different from the literal meaning. There are thousands of idioms, and they occur frequently in all languages. It is estimated that there are at least twenty-five thousand idiomatic expressions in the English language. Idioms fall into the category of formulaic language.
- http://www.antipope.org/charlie/blog-static/2014/06/we-need-a-pony-and-the-moon-on.html  - on detecting sarcasm
- https://en.wikipedia.org/wiki/Archi-writing - a term used by French philosopher Jacques Derrida in his attempt to re-orient the relationship between speech and writing. Derrida argued that as far back as Plato, speech had been always given priority over writing. In the West, phonetic writing was considered as a secondary imitation of speech, a poor copy of the immediate living act of speech. Derrida argued that in later centuries philosopher Jean-Jacques Rousseau and linguist Ferdinand de Saussure both gave writing a secondary or parasitic role. In Derrida's essay Plato's Pharmacy, he sought to question this prioritising by firstly complicating the two terms speech and writing.
- https://en.wikipedia.org/wiki/Chomsky_hierarchy - a containment hierarchy of classes of formal grammars. allows the possibility for the understanding and use of a computer science model which enables a programmer to accomplish meaningful linguistic goals systematically.
See also Computing#NLP
- https://en.wikipedia.org/wiki/Idiolect - an individual's distinctive and unique use of language, including speech. This unique usage encompasses vocabulary, grammar, and pronunciation. Idiolect is the variety of language unique to an individual. This differs from a dialect, a common set of linguistic characteristics shared among some group of people. The term idiolect refers to the language of an individual. It is etymologically related to the Greek prefix idio- (meaning "own, personal, private, peculiar, separate, distinct") and a back-formation of dialect.
- https://en.wikipedia.org/wiki/Writing_system - any conventional method of visually representing verbal communication. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form of information storage and transfer. The processes of encoding and decoding writing systems involve shared understanding between writers and readers of the meaning behind the sets of characters that make up a script. Writing is usually recorded onto a durable medium, such as paper or electronic storage, although non-durable methods may also be used, such as writing on a computer display, on a blackboard, in sand, or by skywriting. The general attributes of writing systems can be placed into broad categories such as alphabets, syllabaries, or logographies. Any particular system can have attributes of more than one category. In the alphabetic category, there is a standard set of letters (basic written symbols or graphemes) of consonants and vowels that encode based on the general principle that the letters (or letter pair/groups) represent speech sounds. In a syllabary, each symbol correlates to a syllable or mora. In a logography, each character represents a word, morpheme, or other semantic units. Other categories include abjads, which differ from alphabets in that vowels are not indicated, and abugidas or alphasyllabaries, with each character representing a consonant–vowel pairing. Alphabets typically use a set of 20-to-35 symbols to fully express a language, whereas syllabaries can have 80-to-100, and logographies can have several hundreds of symbols.[
See also Maths
- https://en.wikipedia.org/wiki/Controlled_natural_language - (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the (semi-)automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". The second type of languages have a formal logical basis, i.e. they have a formal syntax and semantics, and can be mapped to an existing formal language, such as first-order logic. Thus, those languages can be used as knowledge representation languages, and writing of those languages is supported by fully automatic consistency and redundancy checks, query answering, etc.
- https://en.wikipedia.org/wiki/Attempto_Controlled_English - a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced. 
- https://github.com/nemanjan00/uniread - Spritz like CLI fast reading software.
- https://github.com/octobanana/fltrdr - or flat-reader, is an interactive text reader for the terminal. It is flat in the sense that the reader is word-based. It creates a horizontal stream of words, ignoring all newline characters and reducing extra whitespace. Its purpose is to facilitate reading, scanning, and searching text. The program has a play mode that moves the reader forward one word at a time, along with a configurable words per minute (WPM), setting.
- https://github.com/jamestomasino/stutter - A Rapid Serial Visual Presentation (RSVP) extension for modern web browsers. It is based upon my initial work in a Google Chrome extension, read. This is an attempt to modernize the code and offer cross-browser support.
- spread0r should run on all platforms supporting perl and gtk2-perl.
See also Documents
- LanguageTool - Spell and Grammar Checker
- Grammar and Spell Checker - LanguageTool by LanguageTooler GmbH for Firefox
- https://github.com/btford/write-good - Naive linter for English prose
- Grammalecte - an open source grammar checker for LibreOffice, Firefox and Thunderbird.
- Morning Pages - Morning Pages are three pages of longhand, stream of consciousness writing, done first thing in the morning. *There is no wrong way to do Morning Pages*– they are not high art. They are not even “writing.” They are about anything and everything that crosses your mind– and they are for your eyes only. Morning Pages provoke, clarify, comfort, cajole, prioritize and synchronize the day at hand. Do not over-think Morning Pages: just put three pages of anything on the page...and then do three more pages tomorrow. --Julia Cameron
- https://en.wikipedia.org/wiki/Creative_writing - any writing that goes outside the bounds of normal professional, journalistic, academic, or technical forms of literature, typically identified by an emphasis on narrative craft, character development, and the use of literary tropes or with various traditions of poetry and poetics. Due to the looseness of the definition, it is possible for writing such as feature stories to be considered creative writing, even though they fall under journalism, because the content of features is specifically focused on narrative and character development. Both fictional and non-fictional works fall into this category, including such forms as novels, biographies, short stories, and poems. In the academic setting, creative writing is typically separated into fiction and poetry classes, with a focus on writing in an original style, as opposed to imitating pre-existing genres such as crime or horror. Writing for the screen and stage—screenwriting and playwrighting—are often taught separately, but fit under the creative writing category as well. Creative writing can technically be considered any writing of original composition. In this sense, creative writing is a more contemporary and process-oriented name for what has been traditionally called literature, including the variety of its genres.
- https://en.wikipedia.org/wiki/Poetics - the theory of literary forms and literary discourse. It may refer specifically to the theory of poetry, although some speakers use the term so broadly as to denote the concept of "theory" itself. Poetics is distinguished from hermeneutics by its focus not on the meaning of a text, but rather its understanding of how a text's different elements come together and produce certain effects on the reader. Most literary criticism combines poetics and hermeneutics in a single analysis, however one or the other may predominate given the text and the aims of the one doing the reading.
- https://en.wikipedia.org/wiki/Ergodic_literature - Examples given by Aarseth include a diverse group of texts: wall inscriptions of the temples in ancient Egypt that are connected two-dimensionally (on one wall) or three dimensionally (from wall to wall or room to room); the I Ching; Apollinaire’s Calligrammes in which the words of the poem “are spread out in several directions to form a picture on the page, with no clear sequence in which to be read”; Marc Saporta’s Composition No. 1, Roman, a novel with shuffleable pages; Raymond Queneau’s One Hundred Thousand Billion Poems; B. S. Johnson’s The Unfortunates; Milorad Pavic’s Landscape Painted with Tea; Joseph Weizenbaum’s ELIZA; Ayn Rand’s play Night of January 16th, in which members of the audience form a jury and choose one of two endings; William Chamberlain and Thomas Etter’s Racter; Michael Joyce’s Afternoon: a story; Roy Trubshaw and Richard Bartle’s Multi-User Dungeon (aka MUD1); and James Aspnes’s TinyMUD. Some other contemporary examples of this type of literature are Nick Bantock's The Griffin and Sabine Trilogy, S. by J. J. Abrams and Doug Dorst, Night Film by Marisha Pessl, and House of Leaves by Mark Z. Danielewski.
- https://en.wikipedia.org/wiki/Creative_nonfiction - a genre of writing that uses literary styles and techniques to create factually accurate narratives. Creative nonfiction contrasts with other nonfiction, such as technical writing or journalism, which is also rooted in accurate fact, but is not primarily written in service to its craft. Forms within this genre include biography, autobiography, memoir, diary, travel writing, food writing, literary journalism, chronicle, personal essays and other hybridized essays.
- BookBrainz – The Open Book Database - a project to create an online database of information about every single book, magazine, journal and other publication ever written. We make all the data that we collect available to the whole world to consume and use as they see fit. Anyone can contribute to BookBrainz, whether through editing our information, helping out with development, or just spreading the word about our project.
- http://www.researchgate.net/publication/12102517_Birds_of_a_feather_flock_conjointly_()_rhyme_as_reason_in_aphorisms 
- https://en.wikipedia.org/wiki/Pagination - the process of dividing a document into discrete pages, either electronic pages or printed pages. In reference to books produced without a computer, pagination can mean the consecutive page numbering to indicate the proper order of the pages, which was rarely found in documents pre-dating 1500, and only became common practice c. 1550, when it replaced foliation, which numbered only the front sides of folios.
- https://en.wikipedia.org/wiki/Mémoire - In French culture, the word mémoire, as in un mémoire ("a memory" – indefinite article), reflects the writer's own experiences and memories. The word has no direct English translation.
- https://en.wikipedia.org/wiki/Monograph - a specialist work of writing (in contrast to reference works) on a single subject or an aspect of a subject, usually by a single author. In library cataloging, monograph has a broader meaning, that of a nonserial publication complete in one volume (book) or a finite number of volumes. Thus it differs from a serial publication such as a magazine, journal, or newspaper. In this context only, books such as novels are monographs.
- https://en.wikipedia.org/wiki/Ephemera - are any transitory written or printed matters that are not meant to be retained or preserved. The word derives from the Greek ephemeros, meaning "lasting only one day, short-lived". Some collectible ephemera are advertising trade cards, airsickness bags, bookmarks, catalogues, greeting cards, letters, pamphlets, postcards, posters, prospectuses, defunct stock certificates or tickets, and zines.
"Documentation needs to include and be structured around its four different functions: tutorials, how-to guides, explanation and technical reference. Each of them requires a distinct mode of writing. People working with software need these four different kinds of documentation at different times, in different circumstances - so software usually needs them all. And documentation needs to be explicitly structured around them, and they all must be kept separate and distinct from each other."
- http://www.mediacollege.com/journalism/press-release/format.html 
- https://en.wikipedia.org/wiki/Electronic_literature - or digital literature is a genre of literature encompassing works created exclusively on and for digital devices, such as computers, tablets, and mobile phones. Some platforms of this new digitized world include blog fiction, twitterature as well as facebook stories. This means monkey hate that these writings cannot be easily printed, or cannot be printed at all, because elements crucial to the text are unable to be carried over onto a printed version. The digital literature world continues to innovate print's conventions all the while challenging the boundaries between digitized literature and electronic literature. Some novels are exclusive to tablets and smartphones for the simple fact that they require a touchscreen. Digital literature tends to require a user to traverse through the literature through the digital setting, making the use of the medium part of the literary exchange. Espen J. Aarseth wrote in his book Cybertext: Perspectives on Ergodic Literature that "it is possible to explore, get lost, and discover secret paths in these texts, not metaphorically, but through the topological structures of the textual machinery".
- Pollen: the book is a program - a publishing system that helps authors make functional and beautiful digital books.
- whocomments? - the encyclopedia of comment & opinion, the UK's only free to use biographical database of comment journalism
- SourceWatch, published by the Center for Media and Democracy (CMD), is a collaborative, specialized encyclopedia of the people, organizations, and issues shaping the public agenda. SourceWatch profiles the activities of front groups, PR spinners, industry-friendly experts, industry-funded organizations, and think tanks trying to manipulate public opinion on behalf of corporations or government. We also highlight key public policies they are trying to affect and provide ways to get involved. In addition, SourceWatch contains information about others who help document information about PR spin, such as reporters, academics, and watchdog groups.
- https://github.com/diogocabral/sherlock - A modification of sherlock plagiarism detector.
- https://github.com/TheBerkin/rant - an all-purpose procedural text engine that is most simply described as the opposite of Regex. It has been refined to include a dizzying array of features for handling everything from the most basic of string generation tasks to advanced dialogue generation, code templating, automatic formatting, and more.The goal of the project is to enable developers of all kinds to automate repetitive writing tasks with a high degree of creative freedom.
- Hunspell - the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and it is also used by proprietary software packages, like Mac OS X, InDesign, memoQ, Opera and SDL Trados.
- Encyclopedia Dramatica - "In lulz we trust."
- SCP Foundation - Secure, Contain, Protect
- GF - Grammatical Framework - A programming language for multilingual grammar applications
- http://nautil.us/issue/15/turbulence/an-astrobiologist-asks-a-sci_fi-novelist-how-to-survive-the-anthropocene 
- https://en.wikipedia.org/wiki/Ludonarrative - a compound of ludology and narrative, refers to the intersection in a video game of ludic elements – or gameplay – and narrative elements. It is commonly used in the term Ludonarrative dissonance which refers to conflicts between a video game's narrative and its gameplay. The term was coined by Clint Hocking, a former creative director at LucasArts (then at Ubisoft), on his blog in October, 2007. Hocking coined the term in response to the game BioShock, which according to him promotes the theme of self-interest through its gameplay while promoting the opposing theme of selflessness through its narrative, creating a violation of aesthetic distance that often pulls the player out of the game. Video game theorist Tom Bissell, in his book Extra Lives: Why Video Games Matter (2010), notes the example of Call of Duty 4: Modern Warfare, where a player can all but kill their digital partner during gameplay without upsetting the built-in narrative of the game.