Language

Things and Stuff Wiki - An organically evolving personal wiki knowledge base. An on-the-fly taxonomy containing a patchwork trail of topic outlines, descriptions, notes, stubs and breadcrumbs, with links to sites, systems, software, manuals, organisations, people, articles, guides, slides, papers, books, comments, videos, screencasts, webcasts, scratchpads and more. Content is orientated towards mostly free/libre/open, mostly Linux. Quality and age varies drastically. Sometimes old things are first, sometimes last. Use the Table of Contents menu to navigate long pages. Zoom in if text is too small. Dead link? Wayback Machine. I probably need to fix the theme CSS after an update. See also libreav.org. Chat to msg me (not checking tho atm). e

General

tooo soooort

https://en.wikipedia.org/wiki/Origin_of_speech

https://en.wikipedia.org/wiki/Language

https://en.wikipedia.org/wiki/Origin_of_language

https://en.wikipedia.org/wiki/Oral_literature

https://en.wikipedia.org/wiki/Oral_tradition

https://en.wikipedia.org/wiki/Oral_history

https://en.wikipedia.org/wiki/Linguistics

Online Etymology Dictionary - Origin, history and meaning of English words

http://www.fun-with-words.com

http://blog.nabeelqu.com/post/33557680375/surprisingly-undervalued-books

http://www.scientificamerican.com/article/monkey-see-monkey-speak-video/ [1]

http://www.nybooks.com/blogs/nyrblog/2015/jun/26/reading-is-forgetting [2]

http://www.sil.org/resources/software_fonts/xlingpaper

http://andrew.gibiansky.com/blog/linguistics/homophony-groups/ [3]

Blogs

http://languagelog.ldc.upenn.edu/nll/

http://elms.wordpress.com/

Types

https://en.wikipedia.org/wiki/Analytic_language

https://en.wikipedia.org/wiki/Isolating_language

https://en.wikipedia.org/wiki/Synthetic_language

https://en.wikipedia.org/wiki/Polysynthetic_language

Languages

https://news.ycombinator.com/item?id=10090480

http://www.economist.com/news/science-and-technology/21707183-researchers-uncover-ancient-links-between-majority-worlds [4]

https://www.vidarholen.net/contents/interjections

Glottolog - Comprehensive reference information for the world's languages, especially the lesser known languages.
- https://github.com/clld/glottolog3

CLLD - Cross-Linguistic Linked Data - Helping collect the world's language diversity heritage.
- https://github.com/clld/clld
- Documentation - The goal of the Cross-Linguistic Linked Data project (CLLD) is to help record the world’s language diversity heritage. This is to be facilitated by developing, providing and maintaining interoperable data publication structures.

Proto-Indo-European

https://en.wikipedia.org/wiki/Proto-Indo-European_language - the linguistic reconstruction of the hypothetical common ancestor of the Indo-European languages, the most widely spoken language family in the world.

Far more work has gone into reconstructing PIE than any other proto-language, and it is by far the best understood of all proto-languages of its age. The vast majority of linguistic work during the 19th century was devoted to the reconstruction of PIE or its daughter proto-languages (such as Proto-Germanic), and most of the modern techniques of linguistic reconstruction such as the comparative method were developed as a result. These methods supply all current knowledge concerning PIE since there is no written record of the language.

PIE is estimated to have been spoken as a single language from 4500 BC to 2500 BC during the Late Neolithic to Early Bronze Age, though estimates vary by more than a thousand years. According to the prevailing Kurgan hypothesis, the origin...ht into the culture and religion of its speakers.

As Proto-Indo-Europeans became isolated from each other through the Indo-European migrations, the Proto-Indo-European language became spoken by the various groups in regional dialects which then underwent the Indo-European sound laws divergence, and along with shifts in morphology, these dialects slowly but eventually transformed into the known ancient Indo-European languages. From there, further linguistic divergence led to the evolution of their current descendants, the modern Indo-European languages. Today, the descendant languages, or daughter languages, of PIE with the most speakers are Spanish, English, Hindustani (Hindi and Urdu), Portuguese, Bengali, Russian, Punjabi, German, Persian, French, Italian and Marathi. Hundreds of other living descendants of PIE range from languages as diverse as Albanian (gjuha shqipe), Kurdish (کوردی‎), Nepali (खस भाषा), Tsakonian (τσακώνικα), Ukrainian (українська мова), and Welsh (Cymraeg).

https://en.wikipedia.org/wiki/Grimm%27s_law - also known as the First Germanic Sound Shift or Rask's rule) is a set of statements named after Jacob Grimm and Rasmus Rask describing the inherited Proto-Indo-European (PIE) stop consonants as they developed in Proto-Germanic (the common ancestor of the Germanic branch of the Indo-European family) in the 1st millennium BC. It establishes a set of regular correspondences between early Germanic stops and fricatives and the stop consonants of certain other centum Indo-European languages (Grimm used mostly Latin and Greek for illustration).

https://en.wikipedia.org/wiki/Indo-European_languages - a language family of several hundred related languages and dialects. mThere are about 445 living Indo-European languages, according to the estimate by Ethnologue, with over two thirds (313) of them belonging to the Indo-Iranian branch.

https://en.wikipedia.org/wiki/List_of_Indo-European_languages

French

https://github.com/soulaklabs/bitoduc.fr - A website about french words for computer concepts.

Japanese

Japanese Complete = ジャパニーズ・コンプリート - With 777 of the most frequent kanji, one has 90.0% coverage of Kanji in the wild!

Acquisition

https://en.wikipedia.org/wiki/Language_acquisition

YouTube: Part 1- What Everyone Should Know about Second Language Acquisition

Translation

CopyTranslator
- https://github.com/CopyTranslator/CopyTranslator

https://github.com/soimort/translate-shell - a command-line translator powered by Google Translate (default), Bing Translator, Yandex.Translate, and Apertium. It gives you easy access to one of these translation engines in your terminal:

BabelFish.org is a fish that translates speech from one language to another.

EUdict is a collection of online dictionaries for the languages spoken mostly in the European Community. These dictionaries are the result of the work of many authors who worked very hard and finally offered their product free of charge on the internet thus making it easier to all of us to communicate with each other.

dict.cc is not only an online dictionary. It's an attempt to create a platform where users from all over the world can share their knowledge in the field of translations. Every visitor can suggest new translations and correct or confirm other users' suggestions.

Linguee - Dictionary and search engine for 100 million translations.

Dictionarist - provides translations in English, Spanish, Portuguese, German, French, Italian, Russian, Turkish, Dutch, Greek,Chinese, Japanese, Korean, Arabic, Hindi, Indonesian, Polish, Romanian, Ukrainian and Vietnamese.

http://www.dsl.ac.uk/dsl/index.html - Scots

http://rut.org/cgi-bin/j-e/dict - Japanese

http://www.translation-guide.com/free_online_translators.php?from=Latin&to=English

http://www.latinphrasetranslation.com/translators/latin_to_english

http://www.paulmeier.com/ipacharts/

http://virtaal.translatehouse.org/

http://translationproject.org/html/welcome.html

Pootle - Community localization server. Get your community translating your software into their languages.

http://www.forvo.com/

http://youpronounce.it/ [7]

https://www.youtube.com/user/PronunciationManual

https://github.com/OpenNMT/OpenNMT [8]

Other

http://www.meta-net.eu/

http://babadum.com/

http://www.xibalba.demon.co.uk/jbr/ranto/

http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners [9]

https://news.ycombinator.com/item?id=8192474

http://mycommunityrights.org.uk/community-right-to-bid/

https://en.wikipedia.org/wiki/Whistled_language

https://news.ycombinator.com/item?id=10092480

http://www.mrc-cbu.cam.ac.uk/people/matt.davis/sine-wave-speech/

FrathWiki - information for the conlanging and linguistics community

http://www.zompist.com/yingzi/yingzi.htm

to sort

Numbers

https://en.wikipedia.org/wiki/English_numerals
https://en.wikipedia.org/wiki/Ordinal_indicator - nd, rd, th, etc.

to sort from Being

https://en.wikipedia.org/wiki/Rhetorical_modes - also known as modes of discourse, describe the variety, conventions, and purposes of the major kinds of language-based communication, particularly writing and speaking. Four of the most common rhetorical modes and their purpose are narration, description, exposition, and argumentation.

https://en.wikipedia.org/wiki/Description - act of description may be related to that of definition. Description is also the fiction-writing mode for transmitting a mental image of the particulars of a story. Definition: The pattern of development that presents a word picture of a thing, a person, a situation, or a series of events.

https://en.wikipedia.org/wiki/Narrative - or story is any report of connected events, real or imaginary, presented in a sequence of written or spoken words, and/or still or moving images. Narrative can be organized in a number of thematic and/or formal categories: non-fiction (such as definitively including creative non-fiction, biography, journalism, transcript poetry, and historiography); fictionalization of historical events (such as anecdote, myth, legend, and historical fiction); and fiction proper (such as literature in prose and sometimes poetry, such as short stories, novels, and narrative poems and songs, and imaginary narratives as portrayed in other textual forms, games, or live or recorded performances). Narrative is found in all forms of human creativity, art, and entertainment, including speech, literature, theatre, music and song, comics, journalism, film, television and video, radio, gameplay, unstructured recreation, and performance in general, as well as some painting, sculpture, drawing, photography, and other visual arts (though several modern art movements refuse the narrative in favor of the abstract and conceptual), as long as a sequence of events is presented. The word derives from the Latin verb narrare, "to tell", which is derived from the adjective gnarus, "knowing" or "skilled".

Oral storytelling is perhaps the earliest method for sharing narratives. During most people's childhoods, narratives are used to guide them on proper behavior, cultural history, formation of a communal identity, and values, as especially studied in anthropology today among traditional indigenous peoples. Narratives may also be nested within other narratives, such as narratives told by an unreliable narrator (a character) typically found in noir fiction genre. An important part of narration is the narrative mode, the set of methods used to communicate the narrative through a process narration (see also "Narrative Aesthetics" below). Along with exposition, argumentation, and description, narration, broadly defined, is one of four rhetorical modes of discourse. More narrowly defined, it is the fiction-writing mode in which the narrator communicates directly to the reader.

https://en.wikipedia.org/wiki/Exposition_(narrative) - the insertion of important background information within a story; for example, information about the setting, characters' backstories, prior plot events, historical context, etc. In a specifically literary context, exposition appears in the form of expository writing embedded within the narrative.

Booker's Seven Basic Plots -

Linguistics

https://en.wikipedia.org/wiki/Linguistics - the scientific study of language, specifically language form, language meaning, and language in context. The earliest activities in the description of language have been attributed to the 4th century BCE Indian grammarian Pāṇini, who was an early student of linguistics and wrote a formal description of the Sanskrit language in his Aṣṭādhyāyī.

Linguistics analyzes human language as a system for relating sounds (or signs in signed languages) and meaning. Phonetics studies acoustic and articulatory properties of the production and perception of speech sounds and non-speech sounds. The study of language meaning, on the other hand, deals with how languages encode relations between entities, properties, and other aspects of the world to convey, process, and assign meaning, as well as to manage and resolve ambiguity. While the study of semantics typically concerns itself with truth conditions, pragmatics deals with how context influences meanings.

Grammar is a system of rules which govern the form of the utterances in a given language. It encompasses both sound and meaning, and includes phonology (how sounds or gestures function together), morphology (the formation and composition of words), and syntax (the formation and composition of phrases and sentences from words).

In the early 20th century, Ferdinand de Saussure distinguished between the notions of langue and parole in his formulation of structural linguistics. According to him, parole is the specific utterance of speech, whereas langue refers to an abstract phenomenon that theoretically defines the principles and system of rules that govern a language. This distinction resembles the one made by Noam Chomsky between competence and performance, where competence is individual's ideal knowledge of a language, while performance is the specific way in which it is used.

The formal study of language has also led to the growth of fields like psycholinguistics, which explores the representation and function of language in the mind; neurolinguistics, which studies language processing in the brain; and language acquisition, which investigates how children and adults acquire a particular language.

Linguistics also includes non-formal approaches to the study of other aspects of human language, such as social, cultural, historical and political factors. The study of cultural discourses and dialects is the domain of sociolinguistics, which looks at the relation between linguistic variation and social structures, as well as that of discourse analysis, which examines the structure of texts and conversations. Research on language through historical and evolutionary linguistics focuses on how languages change, and on the origin and growth of languages, particularly over an extended period of time.

Corpus linguistics takes naturally occurring texts and studies the variation of grammatical and other features based on such corpora. Stylistics involves the study of patterns of style: within written, signed, or spoken discourse. Language documentation combines anthropological inquiry with linguistic inquiry to describe languages and their grammars. Lexicography covers the study and construction of dictionaries. Computational linguistics applies computer technology to address questions in theoretical linguistics, as well as to create applications for use in parsing, data retrieval, machine translation, and other areas. People can apply actual knowledge of a language in translation and interpreting, as well as in language education – the teaching of a second or foreign language. Policy makers work with governments to implement new plans in education and teaching which are based on linguistic research.

https://en.wikipedia.org/wiki/Linguistic_meaning

https://en.wikipedia.org/wiki/Linguistic_analysis

https://en.wikipedia.org/wiki/Language_change

https://en.wikipedia.org/wiki/Feature_(linguistics)

Max Planck Neuroscience on Nautilus: Brainwaves Encode the Grammar of Human Language - The relative timing of brainwaves encodes the structure of a sentence.

https://en.wikipedia.org/wiki/Contact_calls

http://nymag.com/scienceofus/2015/01/importance-of-chattering-away-to-babies.html [10]

https://en.wikipedia.org/wiki/Language_acquisition

The pleasure of learning new words

https://en.wikipedia.org/wiki/Linguistic_modality

https://en.wikipedia.org/wiki/Grammatical_mood

https://en.wikipedia.org/wiki/Irrealis_mood

https://en.wikipedia.org/wiki/Imperative_mood

https://en.wikipedia.org/wiki/Heteroglossia

https://en.wikipedia.org/wiki/Theoretical_linguistics

https://en.wikipedia.org/wiki/Applied_linguistics - an interdisciplinary field of linguistics that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, psychology, computer science, communication research, anthropology, and sociology.

https://en.wikipedia.org/wiki/Linguistic_turn - a major development in Western philosophy during the 20th century, the most important characteristic of which is the focusing of philosophy and the other humanities primarily on the relationship between philosophy and language.

https://en.wikipedia.org/wiki/Internet_linguistics

https://en.wikipedia.org/wiki/Psycholinguistics

https://en.wikipedia.org/wiki/Interactional_sociolinguistics

https://en.wikipedia.org/wiki/Comparative_linguistics

https://en.wikipedia.org/wiki/Contrastive_linguistics

https://en.wikipedia.org/wiki/Etymology - is the history of words, their origins, and how their form and meaning have changed over time. By an extension, the term "the etymology of [a word]" means the origin of the particular word.

https://en.wikipedia.org/wiki/Cognitive_linguistics

https://en.wikipedia.org/wiki/Neurolinguistics

https://en.wikipedia.org/wiki/Historical_linguistics

http://www.unc.edu/~gerfen/Ling30Sp2002/historical2.html

https://en.wikipedia.org/wiki/Language_ideology

https://en.wikipedia.org/wiki/Linguistic_prescription

https://en.wikipedia.org/wiki/Linguistic_description

Becker's Criterion: "Any theory (or partial theory) of the English Language that is expounded in the English Language must account for (or at least apply to) the text of its own exposition."

Becker's Razor: his final riposte to theoretical linguists: "Elegance and truth are inversely related', after which he finishes with, 'Put that in your phrasal lexicon and invoke it!"

https://en.wikipedia.org/wiki/Structural_linguistics
- https://en.wikipedia.org/wiki/Course_in_General_Linguistics

https://en.wikipedia.org/wiki/Generative_linguistics

https://en.wikipedia.org/wiki/Noam_Chomsky

http://www.chomsky.info/

YouTube: Chomsky on Zizek and Lacan

YouTube: Slavoj Zizek responds to Noam Chomsky (July 2013)

https://en.wikipedia.org/wiki/Syntactic_Structures

https://en.wikipedia.org/wiki/Deep_structure

https://news.ycombinator.com/item?id=10731735

https://en.wikipedia.org/wiki/Minimalist_program

https://en.wikipedia.org/wiki/Transformational_grammar

https://en.wikipedia.org/wiki/Linguistics_Wars

https://en.wikipedia.org/wiki/Generative_semantics

https://en.wikipedia.org/wiki/Linguistic_typology - a subfield of linguistics that studies and classifies languages according to their structural and functional features. Its aim is to describe and explain the common properties and the structural diversity of the world's languages. It includes three subdisciplines: qualitative typology, which deals with the issue of comparing languages and within-language variance; quantitative typology, which deals with the distribution of structural patterns in the world’s languages; and theoretical typology, which explains these distributions.

http://www-personal.umich.edu/~jlawler/haj/worldorder.pdf [11]

https://en.wikipedia.org/wiki/Glittering_generality

https://en.wikipedia.org/wiki/Standard_language

https://en.wikipedia.org/wiki/Intension

https://en.wikipedia.org/wiki/Hypostasis_(linguistics)

http://www.ibiblio.org/hhalpin/homepage/notes/paper.pdf

https://en.wikipedia.org/wiki/Comparative_method

https://en.wikipedia.org/wiki/Linguistic_profiling

http://en.wikipedia.org/wiki/Conventionalism#Linguistics

http://en.wikipedia.org/wiki/Sound_symbolism

Why Ice Cream Sounds Fat and Crackers Sound Skinny

http://www.ethnologue.com/orical_device

https://en.wikipedia.org/wiki/Figure_of_speech

http://www.teachingenglish.org.uk/activities/phonemic-chart

http://rhymebrain.com/en

http://www.howmanysyllables.com/

http://nitpickertool.com/

http://norvig.com/mayzner.html

http://www3.telus.net/linguisticsissues/BritishCanadianAmerican.htm

http://lotrproject.com/statistics/books/

http://www.edge.org/conversation/how-does-our-language-shape-the-way-we-think

https://en.wikipedia.org/wiki/Philosophical_Investigations

https://en.wikipedia.org/wiki/Rhetorical_device

https://en.wikipedia.org/wiki/Figure_of_speech

Field Linguist's Toolbox - a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.Although Toolbox is very powerful, it is designed to be easy to learn. The user can start with a simple standard setup and gradually add the use of more powerful features as desired. The Toolbox downloads include a training package that is usable for self-paced individual learning as well as for classroom teaching of Toolbox. [12]

FieldWorks - consists of software tools that help you manage linguistic and cultural data. FieldWorks supports tasks ranging from the initial entry of collected data through to the preparation of data for publication, including dictionary development, interlinearization of texts, morphological analysis, and other publications. Furthermore, FieldWorks BTE contains a specialized drafting and editing environment for Bible Translators, which provides interaction with the language data stored in Language Explorer.

AGGREGATION - Implemented grammars can contribute to endangered language documentation in several ways. In the first instance, the grammars themselves provide a very rich addition to prose descriptive grammars, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances (from running text or elicited examples) associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in typology, grounding higher level linguistic abstractions in actual utterances in a computationally tractable fashion. Despite these advantages, grammar engineering for language documentation has gone largely unexplored. In this project, we investigate how to automate the construction of grammar fragments, building on interlinear glossed text (IGT) and the LinGO Grammar Matrix, a typologically motivated cross-linguistic computational resource.

Home - DELPH-IN - Computational linguists from research sites world-wide have joined forces in a collaborative effort aimed at ‘deep’ linguistic processing of human language. The goal is the combination of linguistic and statistical processing methods for getting at the meaning of texts and utterances. The partners have adopted Head-Driven Phrase Structure Grammar (HPSG) and Minimal Recursion Semantics (MRS), two advanced models of formal linguistic analysis. They have also committed themselves to a shared format for grammatical representation and to a rigid scheme of evaluation, as well as to the general use of open-source licensing and transparency.

Emily M. Bender : Home Page

SFST - A toolbox for the implementation of morphological analysers

Foma - A Finite State Compiler and Library - a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes.The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, boolean operations. Also, more advanced construction methods are available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.

Helsinki Finite-State Technology - Project Web Hosting - Open Source Software - intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.

Tatoeba - a collection of sentences and translations.It's collaborative, open, free and even addictive.

Semiotics

Pragmatics

https://en.wikipedia.org/wiki/Pragmatics - a subfield of linguistics and semiotics that studies the ways in which context contributes to meaning. Pragmatics encompasses speech act theory, conversational implicature, talk in interaction and other approaches to language behavior in philosophy, sociology, linguistics and anthropology.

Unlike semantics, which examines meaning that is conventional or "coded" in a given language, pragmatics studies how the transmission of meaning depends not only on structural and linguistic knowledge (e.g., grammar, lexicon, etc.) of the speaker and listener, but also on the context of the utterance, any pre-existing knowledge about those involved, the inferred intent of the speaker, and other factors. In this respect, pragmatics explains how language users are able to overcome apparent ambiguity, since meaning relies on the manner, place, time etc. of an utterance.

The ability to understand another speaker's intended meaning is called pragmatic competence.

https://en.wikipedia.org/wiki/Deixis

Phonetics

https://en.wikipedia.org/wiki/Phonetics - a branch of linguistics that comprises the study of the sounds of human speech, or—in the case of sign languages—the equivalent aspects of sign. It is concerned with the physical properties of speech sounds or signs (phones): their physiological production, acoustic properties, auditory perception, and neurophysiological status. Phonology, on the other hand, is concerned with the abstract, grammatical characterization of systems of sounds or signs.

The field of phonetics is a multilayered subject of linguistics that focuses on speech. In the case of oral languages there are three basic areas of study inter-connected through the common mechanism of sound, such as wavelength (pitch), amplitude, and harmonics:

https://en.wikipedia.org/wiki/Articulatory_phonetics - the study of the production of speech sounds by the articulatory and vocal tract by the speaker.
https://en.wikipedia.org/wiki/Acoustic_phonetics - the study of the physical transmission of speech sounds from the speaker to the listener.
https://en.wikipedia.org/wiki/Auditory_phonetics - the study of the reception and perception of speech sounds by the listener.

Phonetics

https://en.wikipedia.org/wiki/Phonetic_transcription

https://en.wikipedia.org/wiki/ARPABET - also spelled ARPAbet, is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character (alternating upper- and lower-case letters) and the other with two or more (case-insensitive), were devised, the latter being far more widely adopted.[1]ARPABET has been used in several speech synthesizers, including Computalker for the S-100 system, SAM for the Commodore 64, SAY for the Amiga, TextAssist for the PC and Speakeasy from Intelligent Artefacts which used the Votrax SC-01 speech synthesiser IC. It is also used in the CMU Pronouncing Dictionary. A revised version of ARPABET is used in the TIMIT corpus.

https://en.wikipedia.org/wiki/International_Phonetic_Alphabet

http://www.sil.org/resources/software_fonts/ipa-keyboards

https://en.wikipedia.org/wiki/Phonetic_algorithm - an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result. They are necessarily complex algorithms with many rules and exceptions, because English spelling and pronunciation is complicated by historical changes in pronunciation and words borrowed from many languages.

Phonology

https://en.wikipedia.org/wiki/Phonology - a branch of linguistics concerned with the systematic organization of sounds in languages. It has traditionally focused largely on the study of the systems of phonemes in particular languages (and therefore used to be also called phonemics, or phonematics), but it may also cover any linguistic analysis either at a level beneath the word (including syllable, onset and rime, articulatory gestures, articulatory features, mora, etc.) or at all levels of language where sound is considered to be structured for conveying linguistic meaning. Phonology also includes the study of equivalent organizational systems in sign languages.

https://en.wikipedia.org/wiki/Phoneme - one of the units of sound that distinguish one word from another in a particular language. The difference in meaning between the English words kill and kiss is a result of the exchange of the phoneme /l/ for the phoneme /s/. Two words that differ in meaning through a contrast of a single phoneme form a minimal pair.

In linguistics, phonemes (established by the use of minimal pairs, such as kill vs kiss or pat vs bat) are written between slashes like this: /p/, whereas when it is desired to show the more exact pronunciation of any sound, linguists use square brackets, for example [pʰ] (indicating an aspirated p).

Within linguistics there are differing views as to exactly what phonemes are and how a given language should be analyzed in phonemic (or phonematic) terms. However, a phoneme is generally regarded as an abstraction of a set (or equivalence class) of speech sounds (phones) which are perceived as equivalent to each other in a given language. For example, in English, the "k" sounds in the words kit and skill are not identical (as described below), but they are distributional variants of a single phoneme /k/. Different speech sounds that are realizations of the same phoneme are known as allophones. Allophonic variation may be conditioned, in which case a certain phoneme is realized as a certain allophone in particular phonological environments, or it may be free in which case it may vary randomly. In this way, phonemes are often considered to constitute an abstract underlying representation for segments of words, while speech sounds make up the corresponding phonetic realization, or surface form.

https://en.wikipedia.org/wiki/Phonological_hierarchy

Morphology

https://en.wikipedia.org/wiki/Morpheme - is the smallest grammatical unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word, by definition, is freestanding. When it stands by itself, it is considered a root because it has a meaning of its own (e.g. the morpheme cat) and when it depends on another morpheme to express an idea, it is an affix because it has a grammatical function (e.g. the –s in cats to specify that it is plural). Every word comprises one or more morphemes. The more combinations a morpheme is found in, the more productive it is said to be.

https://en.wikipedia.org/wiki/Morphology_(linguistics) - the identification, analysis and description of the structure of a given language's morphemes and other linguistic units, such as root words, affixes, parts of speech, intonations and stresses, or implied context. In contrast, morphological typology is the classification of languages according to their use of morphemes, while lexicology is the study of those words forming a language's wordstock. The discipline that deals specifically with the sound changes occurring within morphemes is morphophonology.

While words, along with clitics, are generally accepted as being the smallest units of syntax, in most languages, if not all, many words can be related to other words by rules that collectively describe the grammar for that language. For example, English speakers recognize that the words dog and dogs are closely related, differentiated only by the plurality morpheme "-s", only found bound to nouns. Speakers of English, a fusional language, recognize these relations from their tacit knowledge of English's rules of word formation. They infer intuitively that dog is to dogs as cat is to cats; and, in similar fashion, dog is to dog catcher as dish is to dishwasher. By contrast, Classical Chinese has very little morphology, using almost exclusively unbound morphemes ("free" morphemes) and depending on word order to convey meaning. (Most words in modern Standard Chinese ("Mandarin"), however, are compounds and most roots are bound.) These are understood as grammars that represent the morphology of the language. The rules understood by a speaker reflect specific patterns or regularities in the way words are formed from smaller units in the language they are using and how those smaller units interact in speech. In this way, morphology is the branch of linguistics that studies patterns of word formation within and across languages and attempts to formulate rules that model the knowledge of the speakers of those languages.

Polysynthetic languages, such as Chukchi, have words composed of many morphemes. The Chukchi word "təmeyŋəlevtpəγtərkən", for example, meaning "I have a fierce headache", is composed of eight morphemes t-ə-meyŋ-ə-levt-pəγt-ə-rkən that may be glossed. The morphology of such languages allows for each consonant and vowel to be understood as morphemes, while the grammar of the language indicates the usage and understanding of each morpheme.

Lexicology

https://en.wikipedia.org/wiki/Lexeme - a unit of lexical meaning that exists regardless of the number of inflectional endings it may have or the number of words it may contain. It is a basic unit of meaning, and the headwords of a dictionary are all lexemes. Put more technically, a lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as run. A related concept is the lemma (or citation form), which is a particular form of a lexeme that is chosen by convention to represent a canonical form of a lexeme. Lemmas are used in dictionaries as the headwords, and other forms of a lexeme are often listed later in the entry if they are not common conjugations of that word.

A lexeme belongs to a particular syntactic category, has a certain meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. For example, the lexeme run has a present third person singular form runs, a present non-third-person singular form run (which also functions as the past participle and non-finite form), a past form ran, and a present participle running. (It does not include runner, runners, runnable, etc.) The use of the forms of a lexeme is governed by rules of grammar; in the case of English verbs such as run, these include subject-verb agreement and compound tense rules, which determine which form of a verb can be used in a given sentence.

https://en.wikipedia.org/wiki/Word - smallest element that may be uttered in isolation with semantic or pragmatic content.
https://en.wikipedia.org/wiki/Open_class_(linguistics) - a word class may be either an open class or a closed class. Open classes accept the addition of new morphemes (words), through such processes as compounding, derivation, inflection, coining, and borrowing; closed classes generally do not.

https://en.wikipedia.org/wiki/Back-formation - the process of creating a new lexeme, usually by removing actual or supposed affixes. The resulting neologism is called a back-formation, a term coined by James Murray in 1889. (OED online first definition of 'back formation' is from the definition of to burgle, which was first published in 1889.) Back-formation is different from clipping – back-formation may change the part of speech or the word's meaning, whereas clipping creates shortened words from longer words, but does not change the part of speech or the meaning of the word.

https://en.wikipedia.org/wiki/Lexical_diffusion

https://en.wikipedia.org/wiki/Lexicology - the part of linguistics which studies words. This may include their nature and function as symbols[1] their meaning, the relationship of their meaning to epistemology in general, and the rules of their composition from smaller elements (morphemes such as the English -ed marker for past or un- for negation; and phonemes as basic sound units). Lexicology also involves relations between words, which may involve semantics (for example, love vs. affection), derivation (for example, fathom vs. unfathomably), usage and sociolinguistic distinctions (for example, flesh vs. meat), and any other issues involved in analyzing the whole lexicon of a language(s).

https://en.wikipedia.org/wiki/Computational_lexicology - that branch of computational linguistics, which is concerned with the use of computers in the study of lexicon. It has been more narrowly described by some scholars (Amsler, 1980) as the use of computers in the study of machine-readable dictionaries. It is distinguished from computational lexicography, which more properly would be the use of computers in the construction of dictionaries, though some researchers have used computational lexicography as synonymous.

https://en.wikipedia.org/wiki/Lexicon

https://en.wikipedia.org/wiki/Lexicography - is divided into two separate but equally important groups: Practical lexicography is the art or craft of compiling, writing and editing dictionaries; Theoretical lexicography is the scholarly discipline of analyzing and describing the semantic, syntagmatic and paradigmatic relationships within the lexicon (vocabulary) of a language, developing theories of dictionary components and structures linking the data in dictionaries, the needs for information by users in specific types of situation, and how users may best access the data incorporated in printed and electronic dictionaries. This is sometimes referred to as 'metalexicography'.

Part of speech

https://en.wikipedia.org/wiki/Part_of_speech - also a word class, a lexical class, or a lexical category, a linguistic category of words (lexical items) defined by the items syntactic or morphological behaviour. Common linguistic categories include noun and verb, among others.
- https://en.wikipedia.org/wiki/Closed_class
- https://en.wikipedia.org/wiki/Open_class_(linguistics)

Three little words you often see
Are ARTICLES: a, an, and the.

A NOUN's the name of anything,
As: school or garden, toy, or swing.

ADJECTIVES tell the kind of noun,
As: great, small, pretty, white, or brown.

VERBS tell of something being done: 
To read, write, count, sing, jump, or run.

How things are done the ADVERBS tell, 
As: slowly, quickly, badly, well.

CONJUNCTIONS join the words together,
As: men and women, wind or weather.

The PREPOSITION stands before
A noun as: in or through a door.

The INTERJECTION shows surprise
As: Oh, how pretty! Ah! how wise!

http://www.w3.org/International/questions/qa-personal-names.en.php?changelang=en

http://pinker.wjh.harvard.edu/articles/media/2000_03_landfall.html [15]

Grammar

https://en.wikipedia.org/wiki/Grammar - the set of structural rules governing the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics, and pragmatics.

- http://grammar.ccc.commnet.edu/grammar/

https://en.wikipedia.org/wiki/Grammatical_category

https://en.wikipedia.org/wiki/Ambiguous_grammar

https://en.wikipedia.org/wiki/Semantics

https://en.wikipedia.org/wiki/Syntax
- https://en.wikipedia.org/wiki/Syntactic_category

https://en.wikipedia.org/wiki/Phrase

https://en.wikipedia.org/wiki/Head_(linguistics)

https://en.wikipedia.org/wiki/Noun_phrase

https://en.wikipedia.org/wiki/Determiner_phrase

https://en.wikipedia.org/wiki/Verb_phrase

https://en.wikipedia.org/wiki/Adpositional_phrase

http://grammar.ccc.commnet.edu/grammar/definitions.htm

https://en.wikipedia.org/wiki/Null-subject_language

https://en.wikipedia.org/wiki/Function_word

https://en.wikipedia.org/wiki/Solecism

to sort

https://en.wikipedia.org/wiki/Universal_grammar

https://en.wikipedia.org/wiki/Context-sensitive_language

https://en.wikipedia.org/wiki/Context-sensitive_grammar

https://en.wikipedia.org/wiki/Syntactic_hierarchy - concerned with the way sentences are constructed from smaller parts, such as words and phrases.

https://en.wikipedia.org/wiki/Subordination_(linguistics)

https://en.wikipedia.org/wiki/Sentence_(linguistics)

https://en.wikipedia.org/wiki/Phrase

https://en.wikipedia.org/wiki/Clause - the smallest grammar unit that can express a complete proposition. typically consists of a subject and a predicate, where the predicate is typically a verb phrase – a verb together with any objects and other modifiers.

https://en.wikipedia.org/wiki/Phraseology
- https://en.wikipedia.org/wiki/Phraseme

https://en.wikipedia.org/wiki/Anaphora_(linguistics) - use of an expression the interpretation of which depends upon another expression in context (its antecedent or postcedent). In a narrower sense, anaphora is the use of an expression which depends specifically upon an antecedent expression, and thus is contrasted with cataphora, which is the use of an expression which depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor. For example, in the sentence Sally arrived, but nobody saw her, the pronoun her is an anaphor, referring back to the antecedent Sally. In the sentence Before her arrival, nobody saw Sally, the pronoun her refers forward to the postcedent Sally, so her is now a cataphor (and an anaphor in the broader, but not the narrower, sense). Usually, an anaphoric expression is a proform or some other kind of deictic (contextually-dependent) expression. Both anaphora and cataphora are species of endophora, referring to something mentioned elsewhere in a dialog or text.

https://en.wikipedia.org/wiki/Formulaic_language - previously known as automatic speech or embolalia, is a linguistic term for verbal expressions that are fixed in form, often non-literal in meaning with attitudinal nuances, and closely related to communicative-pragmatic context. Along with idioms, expletives and proverbs, formulaic language includes pause fillers (e.g., “Like,” “Er” or “Uhm”) and conversational speech formulas (e.g., “You’ve got to be kidding,” “Excuse me?” or “Hang on a minute”).

https://en.wikipedia.org/wiki/Fixed_expression - a standard form of expression that has taken on a more specific meaning than the expression itself. It is different from a proverb in that it is used as a part of a sentence, and is the standard way of expressing a concept or idea.

https://en.wikipedia.org/wiki/Idiom - (Latin: idioma, "special property", from Greek: ἰδίωμα – idíōma, "special feature, special phrasing, a peculiarity", f. Greek: ἴδιος – ídios, "one’s own") is a phrase or a fixed expression that has a figurative, or sometimes literal, meaning. An idiom's figurative meaning is different from the literal meaning. There are thousands of idioms, and they occur frequently in all languages. It is estimated that there are at least twenty-five thousand idiomatic expressions in the English language. Idioms fall into the category of formulaic language.

http://en.wikipedia.org/wiki/Statistically_Improbable_Phrases

https://en.wikipedia.org/wiki/Placeholder_name

http://www.antipope.org/charlie/blog-static/2013/03/why-i-dont-self-publish.html

http://www.nytimes.com/2010/08/29/magazine/29language-t.html?pagewanted=all&_r=1&

http://en.wikipedia.org/wiki/Solecism

http://huh.ideophone.org/

https://en.wikipedia.org/wiki/Reading_path

https://en.wikipedia.org/wiki/Orthography

http://www.wagsoft.com/PapersHtml/IWPT93/IWPT93.html

http://www.antipope.org/charlie/blog-static/2014/06/we-need-a-pony-and-the-moon-on.html [16] - on detecting sarcasm

https://en.wikipedia.org/wiki/Archi-writing - a term used by French philosopher Jacques Derrida in his attempt to re-orient the relationship between speech and writing. Derrida argued that as far back as Plato, speech had been always given priority over writing. In the West, phonetic writing was considered as a secondary imitation of speech, a poor copy of the immediate living act of speech. Derrida argued that in later centuries philosopher Jean-Jacques Rousseau and linguist Ferdinand de Saussure both gave writing a secondary or parasitic role. In Derrida's essay Plato's Pharmacy, he sought to question this prioritising by firstly complicating the two terms speech and writing.

Formal language

Natural language

Dialect

https://en.wikipedia.org/wiki/Dialectology

https://en.wikipedia.org/wiki/Perceptual_dialectology

https://en.wikipedia.org/wiki/Dialect

https://en.wikipedia.org/wiki/Dialect_continuum

https://en.wikipedia.org/wiki/Dialect_levelling

https://en.wikipedia.org/wiki/Dialect_levelling_in_Britain

https://en.wikipedia.org/wiki/Morphological_leveling

https://en.wikipedia.org/wiki/Idiolect - an individual's distinctive and unique use of language, including speech. This unique usage encompasses vocabulary, grammar, and pronunciation. Idiolect is the variety of language unique to an individual. This differs from a dialect, a common set of linguistic characteristics shared among some group of people. The term idiolect refers to the language of an individual. It is etymologically related to the Greek prefix idio- (meaning "own, personal, private, peculiar, separate, distinct") and a back-formation of dialect.

Written language

https://en.wikipedia.org/wiki/Grapheme

https://en.wikipedia.org/wiki/Writing_system - any conventional method of visually representing verbal communication. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form of information storage and transfer. The processes of encoding and decoding writing systems involve shared understanding between writers and readers of the meaning behind the sets of characters that make up a script. Writing is usually recorded onto a durable medium, such as paper or electronic storage, although non-durable methods may also be used, such as writing on a computer display, on a blackboard, in sand, or by skywriting. The general attributes of writing systems can be placed into broad categories such as alphabets, syllabaries, or logographies. Any particular system can have attributes of more than one category. In the alphabetic category, there is a standard set of letters (basic written symbols or graphemes) of consonants and vowels that encode based on the general principle that the letters (or letter pair/groups) represent speech sounds. In a syllabary, each symbol correlates to a syllable or mora. In a logography, each character represents a word, morpheme, or other semantic units. Other categories include abjads, which differ from alphabets in that vowels are not indicated, and abugidas or alphasyllabaries, with each character representing a consonant–vowel pairing. Alphabets typically use a set of 20-to-35 symbols to fully express a language[citation needed], whereas syllabaries can have 80-to-100[citation needed], and logographies can have several hundreds of symbols.[

https://en.wikipedia.org/wiki/List_of_writing_systems

https://en.wikipedia.org/wiki/Reading_comprehension

http://www.jsoftware.com/papers/tot.htm [17]

Controlled language

https://en.wikipedia.org/wiki/Controlled_natural_language - (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the (semi-)automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". The second type of languages have a formal logical basis, i.e. they have a formal syntax and semantics, and can be mapped to an existing formal language, such as first-order logic. Thus, those languages can be used as knowledge representation languages, and writing of those languages is supported by fully automatic consistency and redundancy checks, query answering, etc.

https://en.wikipedia.org/wiki/Attempto_Controlled_English - a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced. [18]

Technology

See AI#NLP

https://github.com/diogocabral/sherlock - A modification of sherlock plagiarism detector.

https://github.com/TheBerkin/rant - an all-purpose procedural text engine that is most simply described as the opposite of Regex. It has been refined to include a dizzying array of features for handling everything from the most basic of string generation tasks to advanced dialogue generation, code templating, automatic formatting, and more.The goal of the project is to enable developers of all kinds to automate repetitive writing tasks with a high degree of creative freedom.

Spelling

Hunspell - the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and it is also used by proprietary software packages, like Mac OS X, InDesign, memoQ, Opera and SDL Trados.
- https://github.com/hunspell/hunspell

Other

Smiley

https://openlibrary.org/

http://www.metafilter.com/132059/How-to-Write

http://demo.innovationaccelerator.com/

http://infinitemonkeys.fuzzie.sg/

http://hpmor.com/

http://www.skyhunter.com/marcs/GentleSeduction.html

https://www.kuro5hin.org/story/2002/12/21/17846/757

http://lightspeedmagazine.kinja.com/

http://acharya.iitm.ac.in/sanskrit/why_sans.php

http://www.marlborotech.com/Zalgo.html

http://greatlanguagegame.com/

http://www.wordle.net/

http://www.lrb.co.uk/blog/2013/10/16/yasmine-seale/q-v-k/ [19]

On the Pedagogical Motive for Esoteric Writing [20]

http://valhallamovement.com/

https://news.ycombinator.com/item?id=10716154

https://www.youtube.com/watch?v=gG62zay3kck

http://www.arlt.co.uk/songs.html

http://blog.longreads.com/2014/02/15/david-foster-wallace-and-the-nature-of-fact/ [21]

https://en.wikipedia.org/wiki/List_of_games_on_I%27m_Sorry_I_Haven%27t_a_Clue#Cheddar_Gorge

http://mining4meaning.com/2015/02/13/raplyzer/

https://en.wikipedia.org/wiki/Saying

http://visualgenome.org/about [22]

http://allpriorart.com/publications/ [23]

https://github.com/nlp-compromise/nlp_compromise [24]

Encyclopedia Dramatica - "In lulz we trust."

SCP Foundation - Secure, Contain, Protect

https://news.ycombinator.com/item?id=17715063

GF - Grammatical Framework - A programming language for multilingual grammar applications
https://news.ycombinator.com/item?id=17749217

https://twitter.com/robinhouston/status/1177636866671157248

Fiction

https://en.wikipedia.org/wiki/Fiction-writing_mode

https://en.wikipedia.org/wiki/Aesthetic_distance

https://en.wikipedia.org/wiki/Monomyth

https://www.theguardian.com/books/2005/sep/24/classics.sebastianfaulks - character

http://channel101.wikia.com/wiki/Story_Structure_101:_Super_Basic_Shit

https://news.ycombinator.com/item?id=11986214

https://en.wikipedia.org/wiki/Magic_realism

https://news.ycombinator.com/item?id=17746209

http://www.rudyrucker.com/pdf/transrealistmanifesto.pdf
- https://en.wikipedia.org/wiki/Transrealism_(literature)
- http://www.theguardian.com/books/booksblog/2014/oct/24/transrealism-first-major-literary-movement-21st-century

http://lithub.com/modern-china-is-so-crazy-it-needs-a-new-literary-genre/ [25]

https://news.ycombinator.com/item?id=19101209

https://news.ycombinator.com/item?id=8811581

https://en.wikipedia.org/wiki/Italo_Calvino

http://www.infiniteulysses.com/ [26]

http://www-users.cs.york.ac.uk/susan/sf/index.htm

http://hieroglyph.asu.edu/

http://futurismic.com/

http://archiveofourown.org/

http://ficly.com/stories

http://365tomorrows.com/

http://futuretimeline.net/index.htm

http://www.technovelgy.com/

http://www.aeonmagazine.com/world-views/can-the-multiverse-explain-the-course-of-history/

http://lesswrong.com/lw/ihr/september_2013_media_thread/9rr5

http://openfolklore.org/

http://fictionhub.io/ [27]

http://www.infinityplus.co.uk/stories/under.htm?2

http://www.galactanet.com/oneoff/theegg_mod.html

http://qntm.org/responsibility

http://slatestarcodex.com/2014/04/03/the-study-of-anglophysics/

http://www.theparisreview.org/blog/2014/09/12/the-future-according-to-stanislaw-lem/

http://nautil.us/issue/15/turbulence/an-astrobiologist-asks-a-sci_fi-novelist-how-to-survive-the-anthropocene [28]

https://www.reddit.com/r/Fantasy/comments/24muao/add_your_favorite_speculative_fiction_websites_to/

https://en.wikipedia.org/wiki/Walt_Whitman

https://en.wikipedia.org/wiki/Octavia_E._Butler

https://worldtracker.org/media/library/English%20Literature/V/Vogt,%20A.%20E.%20Van/A.%20E.%20Van%20Vogt%20-%20The%20World%20of%20Null-A.pdf

http://en.wikipedia.org/wiki/The_Mind_Parasites

http://www.salon.com/2003/08/26/truncat/

http://strangehorizons.com/2014/20141103/1banks-a.shtml [29]

http://sappingattention.blogspot.co.uk/2014/12/fundamental-plot-arcs-seen-through.html?m=1

http://rationalfiction.io/wiki/rational-fiction [30]

http://fairytalenewsblog.blogspot.co.uk/?m=1

http://www.infinityplus.co.uk/stories/blit.htm

Russell's Guide to Interdimensional Entities

https://killscreen.com/articles/umberto-eco-and-his-legacy-in-open-world-games/ [31]

Interactive

https://en.wikipedia.org/wiki/Ludonarrative - a compound of ludology and narrative, refers to the intersection in a video game of ludic elements – or gameplay – and narrative elements. It is commonly used in the term Ludonarrative dissonance which refers to conflicts between a video game's narrative and its gameplay. The term was coined by Clint Hocking, a former creative director at LucasArts (then at Ubisoft), on his blog in October, 2007. Hocking coined the term in response to the game BioShock, which according to him promotes the theme of self-interest through its gameplay while promoting the opposing theme of selflessness through its narrative, creating a violation of aesthetic distance that often pulls the player out of the game. Video game theorist Tom Bissell, in his book Extra Lives: Why Video Games Matter (2010), notes the example of Call of Duty 4: Modern Warfare, where a player can all but kill their digital partner during gameplay without upsetting the built-in narrative of the game.

http://www.ifcomp.org/ballot

Language

General

Blogs

Types

Languages

Proto-Indo-European

Pali

English

Slang

Scots

French

Japanese

Acquisition

Translation

Other

to sort

Numbers

to sort from Being

Linguistics

Semiotics

Pragmatics

Phonetics

Phonetics

Phonology

Morphology

Lexicology

Part of speech

Grammar

to sort

Formal language

Natural language

Dialect

Written language

Controlled language

Technology

Spelling

Other

Fiction

Interactive

Naming conventions

Navigation menu

Search