- Online Etymology Dictionary - Origin, history and meaning of English words
- http://www.economist.com/news/science-and-technology/21707183-researchers-uncover-ancient-links-between-majority-worlds 
- Glottolog - Comprehensive reference information for the world's languages, especially the lesser known languages.
- CLLD - Cross-Linguistic Linked Data - Helping collect the world's language diversity heritage.
- https://en.wikipedia.org/wiki/Proto-Indo-European_language - the linguistic reconstruction of the hypothetical common ancestor of the Indo-European languages, the most widely spoken language family in the world.
Far more work has gone into reconstructing PIE than any other proto-language, and it is by far the best understood of all proto-languages of its age. The vast majority of linguistic work during the 19th century was devoted to the reconstruction of PIE or its daughter proto-languages (such as Proto-Germanic), and most of the modern techniques of linguistic reconstruction such as the comparative method were developed as a result. These methods supply all current knowledge concerning PIE since there is no written record of the language.
PIE is estimated to have been spoken as a single language from 4500 BC to 2500 BC during the Late Neolithic to Early Bronze Age, though estimates vary by more than a thousand years. According to the prevailing Kurgan hypothesis, the origin...ht into the culture and religion of its speakers.
As Proto-Indo-Europeans became isolated from each other through the Indo-European migrations, the Proto-Indo-European language became spoken by the various groups in regional dialects which then underwent the Indo-European sound laws divergence, and along with shifts in morphology, these dialects slowly but eventually transformed into the known ancient Indo-European languages. From there, further linguistic divergence led to the evolution of their current descendants, the modern Indo-European languages. Today, the descendant languages, or daughter languages, of PIE with the most speakers are Spanish, English, Hindustani (Hindi and Urdu), Portuguese, Bengali, Russian, Punjabi, German, Persian, French, Italian and Marathi. Hundreds of other living descendants of PIE range from languages as diverse as Albanian (gjuha shqipe), Kurdish (کوردی), Nepali (खस भाषा), Tsakonian (τσακώνικα), Ukrainian (українська мова), and Welsh (Cymraeg).
- https://en.wikipedia.org/wiki/Grimm%27s_law - also known as the First Germanic Sound Shift or Rask's rule) is a set of statements named after Jacob Grimm and Rasmus Rask describing the inherited Proto-Indo-European (PIE) stop consonants as they developed in Proto-Germanic (the common ancestor of the Germanic branch of the Indo-European family) in the 1st millennium BC. It establishes a set of regular correspondences between early Germanic stops and fricatives and the stop consonants of certain other centum Indo-European languages (Grimm used mostly Latin and Greek for illustration).
- https://en.wikipedia.org/wiki/Indo-European_languages - a language family of several hundred related languages and dialects. mThere are about 445 living Indo-European languages, according to the estimate by Ethnologue, with over two thirds (313) of them belonging to the Indo-Iranian branch.
- [1907.03920 Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings]
- https://github.com/soulaklabs/bitoduc.fr - A website about french words for computer concepts.
- Japanese Complete = ジャパニーズ・コンプリート - With 777 of the most frequent kanji, one has 90.0% coverage of Kanji in the wild!
- https://github.com/soimort/translate-shell - a command-line translator powered by Google Translate (default), Bing Translator, Yandex.Translate, and Apertium. It gives you easy access to one of these translation engines in your terminal:
- BabelFish.org is a fish that translates speech from one language to another.
- EUdict is a collection of online dictionaries for the languages spoken mostly in the European Community. These dictionaries are the result of the work of many authors who worked very hard and finally offered their product free of charge on the internet thus making it easier to all of us to communicate with each other.
- dict.cc is not only an online dictionary. It's an attempt to create a platform where users from all over the world can share their knowledge in the field of translations. Every visitor can suggest new translations and correct or confirm other users' suggestions.
- Linguee - Dictionary and search engine for 100 million translations.
- Dictionarist - provides translations in English, Spanish, Portuguese, German, French, Italian, Russian, Turkish, Dutch, Greek,Chinese, Japanese, Korean, Arabic, Hindi, Indonesian, Polish, Romanian, Ukrainian and Vietnamese.
- http://rut.org/cgi-bin/j-e/dict - Japanese
- Pootle - Community localization server. Get your community translating your software into their languages.
- FrathWiki - information for the conlanging and linguistics community
- Unker - Non-Linear Writing System
- https://en.wikipedia.org/wiki/Ordinal_indicator - nd, rd, th, etc.
to sort from Being
- https://en.wikipedia.org/wiki/Rhetorical_modes - also known as modes of discourse, describe the variety, conventions, and purposes of the major kinds of language-based communication, particularly writing and speaking. Four of the most common rhetorical modes and their purpose are narration, description, exposition, and argumentation.
- https://en.wikipedia.org/wiki/Description - act of description may be related to that of definition. Description is also the fiction-writing mode for transmitting a mental image of the particulars of a story. Definition: The pattern of development that presents a word picture of a thing, a person, a situation, or a series of events.
- https://en.wikipedia.org/wiki/Narrative - or story is any report of connected events, real or imaginary, presented in a sequence of written or spoken words, and/or still or moving images. Narrative can be organized in a number of thematic and/or formal categories: non-fiction (such as definitively including creative non-fiction, biography, journalism, transcript poetry, and historiography); fictionalization of historical events (such as anecdote, myth, legend, and historical fiction); and fiction proper (such as literature in prose and sometimes poetry, such as short stories, novels, and narrative poems and songs, and imaginary narratives as portrayed in other textual forms, games, or live or recorded performances). Narrative is found in all forms of human creativity, art, and entertainment, including speech, literature, theatre, music and song, comics, journalism, film, television and video, radio, gameplay, unstructured recreation, and performance in general, as well as some painting, sculpture, drawing, photography, and other visual arts (though several modern art movements refuse the narrative in favor of the abstract and conceptual), as long as a sequence of events is presented. The word derives from the Latin verb narrare, "to tell", which is derived from the adjective gnarus, "knowing" or "skilled".
Oral storytelling is perhaps the earliest method for sharing narratives. During most people's childhoods, narratives are used to guide them on proper behavior, cultural history, formation of a communal identity, and values, as especially studied in anthropology today among traditional indigenous peoples. Narratives may also be nested within other narratives, such as narratives told by an unreliable narrator (a character) typically found in noir fiction genre. An important part of narration is the narrative mode, the set of methods used to communicate the narrative through a process narration (see also "Narrative Aesthetics" below). Along with exposition, argumentation, and description, narration, broadly defined, is one of four rhetorical modes of discourse. More narrowly defined, it is the fiction-writing mode in which the narrator communicates directly to the reader.
- https://en.wikipedia.org/wiki/Exposition_(narrative) - the insertion of important background information within a story; for example, information about the setting, characters' backstories, prior plot events, historical context, etc. In a specifically literary context, exposition appears in the form of expository writing embedded within the narrative.
- https://en.wikipedia.org/wiki/Linguistics - the scientific study of language, specifically language form, language meaning, and language in context. The earliest activities in the description of language have been attributed to the 4th century BCE Indian grammarian Pāṇini, who was an early student of linguistics and wrote a formal description of the Sanskrit language in his Aṣṭādhyāyī.
Linguistics analyzes human language as a system for relating sounds (or signs in signed languages) and meaning. Phonetics studies acoustic and articulatory properties of the production and perception of speech sounds and non-speech sounds. The study of language meaning, on the other hand, deals with how languages encode relations between entities, properties, and other aspects of the world to convey, process, and assign meaning, as well as to manage and resolve ambiguity. While the study of semantics typically concerns itself with truth conditions, pragmatics deals with how context influences meanings.
Grammar is a system of rules which govern the form of the utterances in a given language. It encompasses both sound and meaning, and includes phonology (how sounds or gestures function together), morphology (the formation and composition of words), and syntax (the formation and composition of phrases and sentences from words).
In the early 20th century, Ferdinand de Saussure distinguished between the notions of langue and parole in his formulation of structural linguistics. According to him, parole is the specific utterance of speech, whereas langue refers to an abstract phenomenon that theoretically defines the principles and system of rules that govern a language. This distinction resembles the one made by Noam Chomsky between competence and performance, where competence is individual's ideal knowledge of a language, while performance is the specific way in which it is used.
The formal study of language has also led to the growth of fields like psycholinguistics, which explores the representation and function of language in the mind; neurolinguistics, which studies language processing in the brain; and language acquisition, which investigates how children and adults acquire a particular language.
Linguistics also includes non-formal approaches to the study of other aspects of human language, such as social, cultural, historical and political factors. The study of cultural discourses and dialects is the domain of sociolinguistics, which looks at the relation between linguistic variation and social structures, as well as that of discourse analysis, which examines the structure of texts and conversations. Research on language through historical and evolutionary linguistics focuses on how languages change, and on the origin and growth of languages, particularly over an extended period of time.
Corpus linguistics takes naturally occurring texts and studies the variation of grammatical and other features based on such corpora. Stylistics involves the study of patterns of style: within written, signed, or spoken discourse. Language documentation combines anthropological inquiry with linguistic inquiry to describe languages and their grammars. Lexicography covers the study and construction of dictionaries. Computational linguistics applies computer technology to address questions in theoretical linguistics, as well as to create applications for use in parsing, data retrieval, machine translation, and other areas. People can apply actual knowledge of a language in translation and interpreting, as well as in language education – the teaching of a second or foreign language. Policy makers work with governments to implement new plans in education and teaching which are based on linguistic research.
- Max Planck Neuroscience on Nautilus: Brainwaves Encode the Grammar of Human Language - The relative timing of brainwaves encodes the structure of a sentence.
- https://en.wikipedia.org/wiki/Applied_linguistics - an interdisciplinary field of linguistics that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, psychology, computer science, communication research, anthropology, and sociology.
- https://en.wikipedia.org/wiki/Linguistic_turn - a major development in Western philosophy during the 20th century, the most important characteristic of which is the focusing of philosophy and the other humanities primarily on the relationship between philosophy and language.
- https://en.wikipedia.org/wiki/Etymology - is the history of words, their origins, and how their form and meaning have changed over time. By an extension, the term "the etymology of [a word]" means the origin of the particular word.
Becker's Criterion: "Any theory (or partial theory) of the English Language that is expounded in the English Language must account for (or at least apply to) the text of its own exposition."
Becker's Razor: his final riposte to theoretical linguists: "Elegance and truth are inversely related', after which he finishes with, 'Put that in your phrasal lexicon and invoke it!"
- YouTube: Chomsky on Zizek and Lacan
- https://en.wikipedia.org/wiki/Linguistic_typology - a subfield of linguistics that studies and classifies languages according to their structural and functional features. Its aim is to describe and explain the common properties and the structural diversity of the world's languages. It includes three subdisciplines: qualitative typology, which deals with the issue of comparing languages and within-language variance; quantitative typology, which deals with the distribution of structural patterns in the world’s languages; and theoretical typology, which explains these distributions.
- Field Linguist's Toolbox - a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.Although Toolbox is very powerful, it is designed to be easy to learn. The user can start with a simple standard setup and gradually add the use of more powerful features as desired. The Toolbox downloads include a training package that is usable for self-paced individual learning as well as for classroom teaching of Toolbox. 
- FieldWorks - consists of software tools that help you manage linguistic and cultural data. FieldWorks supports tasks ranging from the initial entry of collected data through to the preparation of data for publication, including dictionary development, interlinearization of texts, morphological analysis, and other publications. Furthermore, FieldWorks BTE contains a specialized drafting and editing environment for Bible Translators, which provides interaction with the language data stored in Language Explorer.
- AGGREGATION - Implemented grammars can contribute to endangered language documentation in several ways. In the first instance, the grammars themselves provide a very rich addition to prose descriptive grammars, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances (from running text or elicited examples) associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in typology, grounding higher level linguistic abstractions in actual utterances in a computationally tractable fashion. Despite these advantages, grammar engineering for language documentation has gone largely unexplored. In this project, we investigate how to automate the construction of grammar fragments, building on interlinear glossed text (IGT) and the LinGO Grammar Matrix, a typologically motivated cross-linguistic computational resource.
- Home - DELPH-IN - Computational linguists from research sites world-wide have joined forces in a collaborative effort aimed at ‘deep’ linguistic processing of human language. The goal is the combination of linguistic and statistical processing methods for getting at the meaning of texts and utterances. The partners have adopted Head-Driven Phrase Structure Grammar (HPSG) and Minimal Recursion Semantics (MRS), two advanced models of formal linguistic analysis. They have also committed themselves to a shared format for grammatical representation and to a rigid scheme of evaluation, as well as to the general use of open-source licensing and transparency.
- SFST - A toolbox for the implementation of morphological analysers
- Foma - A Finite State Compiler and Library - a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes.The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, boolean operations. Also, more advanced construction methods are available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.
- Helsinki Finite-State Technology - Project Web Hosting - Open Source Software - intended for processing natural language morphologies. The toolkit is demonstrated by wide-coverage implementations of a number of languages of varying morphological complexity.
- Tatoeba - a collection of sentences and translations.It's collaborative, open, free and even addictive.
See also Maths#Logic
- https://en.wikipedia.org/wiki/Semiotics - also called semiotic studies; not to be confused with the Saussurean tradition called semiology which is a part of semiotics) is the study of meaning-making, the study of sign processes and meaningful communication. This includes the study of signs and sign processes (semiosis), indication, designation, likeness, analogy, allegory, metonymy, metaphor, symbolism, signification, and communication.
Semiotics is closely related to the field of linguistics, which, for its part, studies the structure and meaning of language more specifically. The semiotic tradition explores the study of signs and symbols as a significant part of communications. As different from linguistics, however, semiotics also studies non-linguistic sign systems.
Semiotics is frequently seen as having important anthropological dimensions; for example, the late Italian semiotician and novelist Umberto Eco proposed that every cultural phenomenon may be studied as communication. Some semioticians focus on the logical dimensions of the science, however. They examine areas belonging also to the life sciences—such as how organisms make predictions about, and adapt to, their semiotic niche in the world (see semiosis). In general, semiotic theories take signs or sign systems as their object of study: the communication of information in living organisms is covered in biosemiotics (including zoosemiotics).
- Semiotics for Beginners - Daniel Chandler
- https://en.wikipedia.org/wiki/Syntagma_(linguistics) - an elementary constituent segment within a text. Such a segment can be a phoneme, a word, a grammatical phrase, a sentence, or an event within a larger narrative structure, depending on the level of analysis. Syntagmatic analysis involves the study of relationships (rules of combination) among syntagmas.
At the lexical level, syntagmatic structure in a language is the combination of words according to the rules of syntax for that language. For example, English uses determiner + adjective + noun, e.g. the big house. Another language might use determiner + noun + adjective (Spanish la casa grande) and therefore have a different syntagmatic structure.
At a higher level, narrative structures feature a realistic temporal flow guided by tension and relaxation; thus, for example, events or rhetorical figures may be treated as syntagmas of epic structures.
Syntagmatic structure is often contrasted with paradigmatic structure. In semiotics, "syntagmatic analysis" is analysis of syntax or surface structure (syntagmatic structure), rather than paradigms as in paradigmatic analysis. Analysis is often achieved through commutation tests.
- https://en.wikipedia.org/wiki/Syntagmatic_analysis - is analysis of syntax or surface structure (syntagmatic structure) as opposed to paradigms (paradigmatic analysis). This is often achieved using commutation tests.
- Ontology is Overrated: Categories, Links, and Tags - "The Only Group That Can Categorize Everything Is Everybody"
- https://en.wikipedia.org/wiki/Process_philosophy - ontology of becoming, Whitehead
- https://en.wikipedia.org/wiki/Pragmatics - a subfield of linguistics and semiotics that studies the ways in which context contributes to meaning. Pragmatics encompasses speech act theory, conversational implicature, talk in interaction and other approaches to language behavior in philosophy, sociology, linguistics and anthropology.
Unlike semantics, which examines meaning that is conventional or "coded" in a given language, pragmatics studies how the transmission of meaning depends not only on structural and linguistic knowledge (e.g., grammar, lexicon, etc.) of the speaker and listener, but also on the context of the utterance, any pre-existing knowledge about those involved, the inferred intent of the speaker, and other factors. In this respect, pragmatics explains how language users are able to overcome apparent ambiguity, since meaning relies on the manner, place, time etc. of an utterance.
The ability to understand another speaker's intended meaning is called pragmatic competence.
- https://en.wikipedia.org/wiki/Phonetics - a branch of linguistics that comprises the study of the sounds of human speech, or—in the case of sign languages—the equivalent aspects of sign. It is concerned with the physical properties of speech sounds or signs (phones): their physiological production, acoustic properties, auditory perception, and neurophysiological status. Phonology, on the other hand, is concerned with the abstract, grammatical characterization of systems of sounds or signs.
The field of phonetics is a multilayered subject of linguistics that focuses on speech. In the case of oral languages there are three basic areas of study inter-connected through the common mechanism of sound, such as wavelength (pitch), amplitude, and harmonics:
- https://en.wikipedia.org/wiki/Articulatory_phonetics - the study of the production of speech sounds by the articulatory and vocal tract by the speaker.
- https://en.wikipedia.org/wiki/Acoustic_phonetics - the study of the physical transmission of speech sounds from the speaker to the listener.
- https://en.wikipedia.org/wiki/Auditory_phonetics - the study of the reception and perception of speech sounds by the listener.
- https://en.wikipedia.org/wiki/ARPABET - also spelled ARPAbet, is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character (alternating upper- and lower-case letters) and the other with two or more (case-insensitive), were devised, the latter being far more widely adopted.ARPABET has been used in several speech synthesizers, including Computalker for the S-100 system, SAM for the Commodore 64, SAY for the Amiga, TextAssist for the PC and Speakeasy from Intelligent Artefacts which used the Votrax SC-01 speech synthesiser IC. It is also used in the CMU Pronouncing Dictionary. A revised version of ARPABET is used in the TIMIT corpus.
- https://en.wikipedia.org/wiki/Phonetic_algorithm - an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result. They are necessarily complex algorithms with many rules and exceptions, because English spelling and pronunciation is complicated by historical changes in pronunciation and words borrowed from many languages.
- https://en.wikipedia.org/wiki/Phonology - a branch of linguistics concerned with the systematic organization of sounds in languages. It has traditionally focused largely on the study of the systems of phonemes in particular languages (and therefore used to be also called phonemics, or phonematics), but it may also cover any linguistic analysis either at a level beneath the word (including syllable, onset and rime, articulatory gestures, articulatory features, mora, etc.) or at all levels of language where sound is considered to be structured for conveying linguistic meaning. Phonology also includes the study of equivalent organizational systems in sign languages.
- https://en.wikipedia.org/wiki/Phoneme - one of the units of sound that distinguish one word from another in a particular language. The difference in meaning between the English words kill and kiss is a result of the exchange of the phoneme /l/ for the phoneme /s/. Two words that differ in meaning through a contrast of a single phoneme form a minimal pair.
In linguistics, phonemes (established by the use of minimal pairs, such as kill vs kiss or pat vs bat) are written between slashes like this: /p/, whereas when it is desired to show the more exact pronunciation of any sound, linguists use square brackets, for example [pʰ] (indicating an aspirated p).
Within linguistics there are differing views as to exactly what phonemes are and how a given language should be analyzed in phonemic (or phonematic) terms. However, a phoneme is generally regarded as an abstraction of a set (or equivalence class) of speech sounds (phones) which are perceived as equivalent to each other in a given language. For example, in English, the "k" sounds in the words kit and skill are not identical (as described below), but they are distributional variants of a single phoneme /k/. Different speech sounds that are realizations of the same phoneme are known as allophones. Allophonic variation may be conditioned, in which case a certain phoneme is realized as a certain allophone in particular phonological environments, or it may be free in which case it may vary randomly. In this way, phonemes are often considered to constitute an abstract underlying representation for segments of words, while speech sounds make up the corresponding phonetic realization, or surface form.
- https://en.wikipedia.org/wiki/Morpheme - is the smallest grammatical unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word, by definition, is freestanding. When it stands by itself, it is considered a root because it has a meaning of its own (e.g. the morpheme cat) and when it depends on another morpheme to express an idea, it is an affix because it has a grammatical function (e.g. the –s in cats to specify that it is plural). Every word comprises one or more morphemes. The more combinations a morpheme is found in, the more productive it is said to be.
- https://en.wikipedia.org/wiki/Morphology_(linguistics) - the identification, analysis and description of the structure of a given language's morphemes and other linguistic units, such as root words, affixes, parts of speech, intonations and stresses, or implied context. In contrast, morphological typology is the classification of languages according to their use of morphemes, while lexicology is the study of those words forming a language's wordstock. The discipline that deals specifically with the sound changes occurring within morphemes is morphophonology.
While words, along with clitics, are generally accepted as being the smallest units of syntax, in most languages, if not all, many words can be related to other words by rules that collectively describe the grammar for that language. For example, English speakers recognize that the words dog and dogs are closely related, differentiated only by the plurality morpheme "-s", only found bound to nouns. Speakers of English, a fusional language, recognize these relations from their tacit knowledge of English's rules of word formation. They infer intuitively that dog is to dogs as cat is to cats; and, in similar fashion, dog is to dog catcher as dish is to dishwasher. By contrast, Classical Chinese has very little morphology, using almost exclusively unbound morphemes ("free" morphemes) and depending on word order to convey meaning. (Most words in modern Standard Chinese ("Mandarin"), however, are compounds and most roots are bound.) These are understood as grammars that represent the morphology of the language. The rules understood by a speaker reflect specific patterns or regularities in the way words are formed from smaller units in the language they are using and how those smaller units interact in speech. In this way, morphology is the branch of linguistics that studies patterns of word formation within and across languages and attempts to formulate rules that model the knowledge of the speakers of those languages.
Polysynthetic languages, such as Chukchi, have words composed of many morphemes. The Chukchi word "təmeyŋəlevtpəγtərkən", for example, meaning "I have a fierce headache", is composed of eight morphemes t-ə-meyŋ-ə-levt-pəγt-ə-rkən that may be glossed. The morphology of such languages allows for each consonant and vowel to be understood as morphemes, while the grammar of the language indicates the usage and understanding of each morpheme.
- https://en.wikipedia.org/wiki/Lexeme - a unit of lexical meaning that exists regardless of the number of inflectional endings it may have or the number of words it may contain. It is a basic unit of meaning, and the headwords of a dictionary are all lexemes. Put more technically, a lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as run. A related concept is the lemma (or citation form), which is a particular form of a lexeme that is chosen by convention to represent a canonical form of a lexeme. Lemmas are used in dictionaries as the headwords, and other forms of a lexeme are often listed later in the entry if they are not common conjugations of that word.
A lexeme belongs to a particular syntactic category, has a certain meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. For example, the lexeme run has a present third person singular form runs, a present non-third-person singular form run (which also functions as the past participle and non-finite form), a past form ran, and a present participle running. (It does not include runner, runners, runnable, etc.) The use of the forms of a lexeme is governed by rules of grammar; in the case of English verbs such as run, these include subject-verb agreement and compound tense rules, which determine which form of a verb can be used in a given sentence.
- https://en.wikipedia.org/wiki/Word - smallest element that may be uttered in isolation with semantic or pragmatic content.
- https://en.wikipedia.org/wiki/Open_class_(linguistics) - a word class may be either an open class or a closed class. Open classes accept the addition of new morphemes (words), through such processes as compounding, derivation, inflection, coining, and borrowing; closed classes generally do not.
- https://en.wikipedia.org/wiki/Back-formation - the process of creating a new lexeme, usually by removing actual or supposed affixes. The resulting neologism is called a back-formation, a term coined by James Murray in 1889. (OED online first definition of 'back formation' is from the definition of to burgle, which was first published in 1889.) Back-formation is different from clipping – back-formation may change the part of speech or the word's meaning, whereas clipping creates shortened words from longer words, but does not change the part of speech or the meaning of the word.
- https://en.wikipedia.org/wiki/Lexicology - the part of linguistics which studies words. This may include their nature and function as symbols their meaning, the relationship of their meaning to epistemology in general, and the rules of their composition from smaller elements (morphemes such as the English -ed marker for past or un- for negation; and phonemes as basic sound units). Lexicology also involves relations between words, which may involve semantics (for example, love vs. affection), derivation (for example, fathom vs. unfathomably), usage and sociolinguistic distinctions (for example, flesh vs. meat), and any other issues involved in analyzing the whole lexicon of a language(s).
- https://en.wikipedia.org/wiki/Computational_lexicology - that branch of computational linguistics, which is concerned with the use of computers in the study of lexicon. It has been more narrowly described by some scholars (Amsler, 1980) as the use of computers in the study of machine-readable dictionaries. It is distinguished from computational lexicography, which more properly would be the use of computers in the construction of dictionaries, though some researchers have used computational lexicography as synonymous.
- https://en.wikipedia.org/wiki/Lexicography - is divided into two separate but equally important groups: Practical lexicography is the art or craft of compiling, writing and editing dictionaries; Theoretical lexicography is the scholarly discipline of analyzing and describing the semantic, syntagmatic and paradigmatic relationships within the lexicon (vocabulary) of a language, developing theories of dictionary components and structures linking the data in dictionaries, the needs for information by users in specific types of situation, and how users may best access the data incorporated in printed and electronic dictionaries. This is sometimes referred to as 'metalexicography'.
Part of speech
- https://en.wikipedia.org/wiki/Part_of_speech - also a word class, a lexical class, or a lexical category, a linguistic category of words (lexical items) defined by the items syntactic or morphological behaviour. Common linguistic categories include noun and verb, among others.
Three little words you often see Are ARTICLES: a, an, and the. A NOUN's the name of anything, As: school or garden, toy, or swing. ADJECTIVES tell the kind of noun, As: great, small, pretty, white, or brown. VERBS tell of something being done: To read, write, count, sing, jump, or run. How things are done the ADVERBS tell, As: slowly, quickly, badly, well. CONJUNCTIONS join the words together, As: men and women, wind or weather. The PREPOSITION stands before A noun as: in or through a door. The INTERJECTION shows surprise As: Oh, how pretty! Ah! how wise!
- https://en.wikipedia.org/wiki/Grammar - the set of structural rules governing the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics, and pragmatics.
- https://en.wikipedia.org/wiki/Syntactic_hierarchy - concerned with the way sentences are constructed from smaller parts, such as words and phrases.
- https://en.wikipedia.org/wiki/Clause - the smallest grammar unit that can express a complete proposition. typically consists of a subject and a predicate, where the predicate is typically a verb phrase – a verb together with any objects and other modifiers.
- https://en.wikipedia.org/wiki/Anaphora_(linguistics) - use of an expression the interpretation of which depends upon another expression in context (its antecedent or postcedent). In a narrower sense, anaphora is the use of an expression which depends specifically upon an antecedent expression, and thus is contrasted with cataphora, which is the use of an expression which depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor. For example, in the sentence Sally arrived, but nobody saw her, the pronoun her is an anaphor, referring back to the antecedent Sally. In the sentence Before her arrival, nobody saw Sally, the pronoun her refers forward to the postcedent Sally, so her is now a cataphor (and an anaphor in the broader, but not the narrower, sense). Usually, an anaphoric expression is a proform or some other kind of deictic (contextually-dependent) expression. Both anaphora and cataphora are species of endophora, referring to something mentioned elsewhere in a dialog or text.
- https://en.wikipedia.org/wiki/Formulaic_language - previously known as automatic speech or embolalia, is a linguistic term for verbal expressions that are fixed in form, often non-literal in meaning with attitudinal nuances, and closely related to communicative-pragmatic context. Along with idioms, expletives and proverbs, formulaic language includes pause fillers (e.g., “Like,” “Er” or “Uhm”) and conversational speech formulas (e.g., “You’ve got to be kidding,” “Excuse me?” or “Hang on a minute”).
- https://en.wikipedia.org/wiki/Fixed_expression - a standard form of expression that has taken on a more specific meaning than the expression itself. It is different from a proverb in that it is used as a part of a sentence, and is the standard way of expressing a concept or idea.
- https://en.wikipedia.org/wiki/Idiom - (Latin: idioma, "special property", from Greek: ἰδίωμα – idíōma, "special feature, special phrasing, a peculiarity", f. Greek: ἴδιος – ídios, "one’s own") is a phrase or a fixed expression that has a figurative, or sometimes literal, meaning. An idiom's figurative meaning is different from the literal meaning. There are thousands of idioms, and they occur frequently in all languages. It is estimated that there are at least twenty-five thousand idiomatic expressions in the English language. Idioms fall into the category of formulaic language.
- http://www.antipope.org/charlie/blog-static/2014/06/we-need-a-pony-and-the-moon-on.html  - on detecting sarcasm
- https://en.wikipedia.org/wiki/Archi-writing - a term used by French philosopher Jacques Derrida in his attempt to re-orient the relationship between speech and writing. Derrida argued that as far back as Plato, speech had been always given priority over writing. In the West, phonetic writing was considered as a secondary imitation of speech, a poor copy of the immediate living act of speech. Derrida argued that in later centuries philosopher Jean-Jacques Rousseau and linguist Ferdinand de Saussure both gave writing a secondary or parasitic role. In Derrida's essay Plato's Pharmacy, he sought to question this prioritising by firstly complicating the two terms speech and writing.
- https://en.wikipedia.org/wiki/Chomsky_hierarchy - a containment hierarchy of classes of formal grammars. allows the possibility for the understanding and use of a computer science model which enables a programmer to accomplish meaningful linguistic goals systematically.
See also Computing#NLP
- https://en.wikipedia.org/wiki/Idiolect - an individual's distinctive and unique use of language, including speech. This unique usage encompasses vocabulary, grammar, and pronunciation. Idiolect is the variety of language unique to an individual. This differs from a dialect, a common set of linguistic characteristics shared among some group of people. The term idiolect refers to the language of an individual. It is etymologically related to the Greek prefix idio- (meaning "own, personal, private, peculiar, separate, distinct") and a back-formation of dialect.
- https://en.wikipedia.org/wiki/Writing_system - any conventional method of visually representing verbal communication. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form of information storage and transfer. The processes of encoding and decoding writing systems involve shared understanding between writers and readers of the meaning behind the sets of characters that make up a script. Writing is usually recorded onto a durable medium, such as paper or electronic storage, although non-durable methods may also be used, such as writing on a computer display, on a blackboard, in sand, or by skywriting. The general attributes of writing systems can be placed into broad categories such as alphabets, syllabaries, or logographies. Any particular system can have attributes of more than one category. In the alphabetic category, there is a standard set of letters (basic written symbols or graphemes) of consonants and vowels that encode based on the general principle that the letters (or letter pair/groups) represent speech sounds. In a syllabary, each symbol correlates to a syllable or mora. In a logography, each character represents a word, morpheme, or other semantic units. Other categories include abjads, which differ from alphabets in that vowels are not indicated, and abugidas or alphasyllabaries, with each character representing a consonant–vowel pairing. Alphabets typically use a set of 20-to-35 symbols to fully express a language, whereas syllabaries can have 80-to-100, and logographies can have several hundreds of symbols.
- https://en.wikipedia.org/wiki/Syllabogram - signs used to write the syllables (or morae) of words. This term is most often used in the context of a writing system otherwise organized on different principles—an alphabet where most symbols represent phonemes, or a logographic script where most symbols represent morphemes—but a system based mostly on syllabograms is a syllabary.
- https://en.wikipedia.org/wiki/Syllabary - a set of written symbols that represent the syllables or (more frequently) moras which make up words.
- https://en.wikipedia.org/wiki/Grapheme - the smallest functional unit of a writing system
See also Maths
- https://en.wikipedia.org/wiki/Controlled_natural_language - (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the (semi-)automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". The second type of languages have a formal logical basis, i.e. they have a formal syntax and semantics, and can be mapped to an existing formal language, such as first-order logic. Thus, those languages can be used as knowledge representation languages, and writing of those languages is supported by fully automatic consistency and redundancy checks, query answering, etc.
- https://en.wikipedia.org/wiki/Attempto_Controlled_English - a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced. 
- https://github.com/diogocabral/sherlock - A modification of sherlock plagiarism detector.
- https://github.com/TheBerkin/rant - an all-purpose procedural text engine that is most simply described as the opposite of Regex. It has been refined to include a dizzying array of features for handling everything from the most basic of string generation tasks to advanced dialogue generation, code templating, automatic formatting, and more.The goal of the project is to enable developers of all kinds to automate repetitive writing tasks with a high degree of creative freedom.
- Hunspell - the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and it is also used by proprietary software packages, like Mac OS X, InDesign, memoQ, Opera and SDL Trados.
- Encyclopedia Dramatica - "In lulz we trust."
- SCP Foundation - Secure, Contain, Protect
- GF - Grammatical Framework - A programming language for multilingual grammar applications
- http://nautil.us/issue/15/turbulence/an-astrobiologist-asks-a-sci_fi-novelist-how-to-survive-the-anthropocene 
- https://en.wikipedia.org/wiki/Ludonarrative - a compound of ludology and narrative, refers to the intersection in a video game of ludic elements – or gameplay – and narrative elements. It is commonly used in the term Ludonarrative dissonance which refers to conflicts between a video game's narrative and its gameplay. The term was coined by Clint Hocking, a former creative director at LucasArts (then at Ubisoft), on his blog in October, 2007. Hocking coined the term in response to the game BioShock, which according to him promotes the theme of self-interest through its gameplay while promoting the opposing theme of selflessness through its narrative, creating a violation of aesthetic distance that often pulls the player out of the game. Video game theorist Tom Bissell, in his book Extra Lives: Why Video Games Matter (2010), notes the example of Call of Duty 4: Modern Warfare, where a player can all but kill their digital partner during gameplay without upsetting the built-in narrative of the game.