Language

From Things and Stuff Wiki
Revision as of 10:04, 14 October 2023 by Milk (talk | contribs) (→‎to sort)
Jump to navigation Jump to search


General

See also Being#Language, Mind#Semiotics, Documents, Typography

tooo soooort









Blogs

Types


Languages




  • CLLD - Cross-Linguistic Linked Data - Helping collect the world's language diversity heritage.
    • https://github.com/clld/clld
    • Documentation - The goal of the Cross-Linguistic Linked Data project (CLLD) is to help record the world’s language diversity heritage. This is to be facilitated by developing, providing and maintaining interoperable data publication structures.


Proto-Indo-European

Far more work has gone into reconstructing PIE than any other proto-language, and it is by far the best understood of all proto-languages of its age. The vast majority of linguistic work during the 19th century was devoted to the reconstruction of PIE or its daughter proto-languages (such as Proto-Germanic), and most of the modern techniques of linguistic reconstruction such as the comparative method were developed as a result. These methods supply all current knowledge concerning PIE since there is no written record of the language.

PIE is estimated to have been spoken as a single language from 4500 BC to 2500 BC during the Late Neolithic to Early Bronze Age, though estimates vary by more than a thousand years. According to the prevailing Kurgan hypothesis, the origin...ht into the culture and religion of its speakers.

As Proto-Indo-Europeans became isolated from each other through the Indo-European migrations, the Proto-Indo-European language became spoken by the various groups in regional dialects which then underwent the Indo-European sound laws divergence, and along with shifts in morphology, these dialects slowly but eventually transformed into the known ancient Indo-European languages. From there, further linguistic divergence led to the evolution of their current descendants, the modern Indo-European languages. Today, the descendant languages, or daughter languages, of PIE with the most speakers are Spanish, English, Hindustani (Hindi and Urdu), Portuguese, Bengali, Russian, Punjabi, German, Persian, French, Italian and Marathi. Hundreds of other living descendants of PIE range from languages as diverse as Albanian (gjuha shqipe), Kurdish (کوردی‎), Nepali (खस भाषा), Tsakonian (τσακώνικα), Ukrainian (українська мова), and Welsh (Cymraeg).


  • https://en.wikipedia.org/wiki/Grimm%27s_law - also known as the First Germanic Sound Shift or Rask's rule) is a set of statements named after Jacob Grimm and Rasmus Rask describing the inherited Proto-Indo-European (PIE) stop consonants as they developed in Proto-Germanic (the common ancestor of the Germanic branch of the Indo-European family) in the 1st millennium BC. It establishes a set of regular correspondences between early Germanic stops and fricatives and the stop consonants of certain other centum Indo-European languages (Grimm used mostly Latin and Greek for illustration).


  • https://en.wikipedia.org/wiki/Indo-European_languages - a language family of several hundred related languages and dialects. mThere are about 445 living Indo-European languages, according to the estimate by Ethnologue, with over two thirds (313) of them belonging to the Indo-Iranian branch.


Pali

English




Slang

  • [1907.03920 Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings]

Scots

French

Japanese

Acquisition

Translation


  • https://github.com/soimort/translate-shell - a command-line translator powered by Google Translate (default), Bing Translator, Yandex.Translate, and Apertium. It gives you easy access to one of these translation engines in your terminal:


  • BabelFish.org is a fish that translates speech from one language to another.


  • EUdict is a collection of online dictionaries for the languages spoken mostly in the European Community. These dictionaries are the result of the work of many authors who worked very hard and finally offered their product free of charge on the internet thus making it easier to all of us to communicate with each other.
  • dict.cc is not only an online dictionary. It's an attempt to create a platform where users from all over the world can share their knowledge in the field of translations. Every visitor can suggest new translations and correct or confirm other users' suggestions.


  • Linguee - Dictionary and search engine for 100 million translations.


  • Dictionarist - provides translations in English, Spanish, Portuguese, German, French, Italian, Russian, Turkish, Dutch, Greek,Chinese, Japanese, Korean, Arabic, Hindi, Indonesian, Polish, Romanian, Ukrainian and Vietnamese.








  • Pootle - Community localization server. Get your community translating your software into their languages.





Other



  • FrathWiki - information for the conlanging and linguistics community



  • Unker - Non-Linear Writing System




to sort

  • Material of language - Language is more than just words and meanings: it’s paper and ink, pixels and screens, fingertips on keyboards, voices speaking out loud. Language is, in a word, material. In this course, students will gain an understanding of how the material of language is represented digitally, and learn computational techniques for manipulating this material in order to create speculative technologies that challenge conventional reading and writing practices. Topics include asemic writing, concrete poetry, markup languages, keyboard layouts, interactive and generative typography, printing technologies and bots (alongside other forms of radical publishing). Students will complete a series of weekly readings and production-oriented assignments leading up to a final project. In addition to critique, sessions will feature lectures, class discussions and technical tutorials. Prerequisites: Introduction to Computational Media or equivalent programming experience. Ethos, practice, programming “Let us have no more of those successive, incessant, back and forth motions of our eyes, tracking from one line to the next and beginning all over again—otherwise we will miss that ecstasy in which we have become immortal for a brief hour, free of all reality, and raise our obsessions to the level of creation.” — Stéphane Mallarmé, “The Book: A Spiritual Instrument,” Selected Poetry and Prose, edited by Mary Ann Caws (New York: New Directions, 1982), p. 82. This class concerns what happens when language becomes manifest in the world, with a particular focus on forms of language or forms of manifestation that foreground computation and/or interactive media. Our methodology for approaching these themes is free-form, drawing from critical making, speculative design, creative writing, and the humanities. In particular, the class asserts that making things is one of the most effective ways of learning how to think critically.


Numbers

Runes

  • https://en.wikipedia.org/wiki/Anglo-Saxon_runes - (Old English: rūna ᚱᚢᚾᚪ, are runes used by the early Anglo-Saxons as an alphabet in their writing system. The characters are known collectively as the futhorc (ᚠᚢᚦᚩᚱᚳ fuþorc) from the Old English sound values of the first six runes. The futhorc was a development from the 24-character Elder Futhark. Since the futhorc runes are thought to have first been used in Frisia before the Anglo-Saxon settlement of Britain, they have also been called Anglo-Frisian runes. They were likely to have been used from the 5th century onward, recording Old English and Old Frisian.

They were gradually supplanted in Anglo-Saxon England by the Old English Latin alphabet introduced by missionaries. Futhorc runes were no longer in common use by the eleventh century, but The Byrhtferth Manuscript (MS Oxford St John's College 17) indicates that fairly accurate understanding of them persisted into at least the twelfth century.


Anglish



  • The Anglish Times - News written in Anglish, a kind of English that does not have borrowed words from other languages go here to learn more.


to sort from Being

  • https://en.wikipedia.org/wiki/Rhetorical_modes - also known as modes of discourse, describe the variety, conventions, and purposes of the major kinds of language-based communication, particularly writing and speaking. Four of the most common rhetorical modes and their purpose are narration, description, exposition, and argumentation.


  • https://en.wikipedia.org/wiki/Description - act of description may be related to that of definition. Description is also the fiction-writing mode for transmitting a mental image of the particulars of a story. Definition: The pattern of development that presents a word picture of a thing, a person, a situation, or a series of events.


  • https://en.wikipedia.org/wiki/Narrative - or story is any report of connected events, real or imaginary, presented in a sequence of written or spoken words, and/or still or moving images. Narrative can be organized in a number of thematic and/or formal categories: non-fiction (such as definitively including creative non-fiction, biography, journalism, transcript poetry, and historiography); fictionalization of historical events (such as anecdote, myth, legend, and historical fiction); and fiction proper (such as literature in prose and sometimes poetry, such as short stories, novels, and narrative poems and songs, and imaginary narratives as portrayed in other textual forms, games, or live or recorded performances). Narrative is found in all forms of human creativity, art, and entertainment, including speech, literature, theatre, music and song, comics, journalism, film, television and video, radio, gameplay, unstructured recreation, and performance in general, as well as some painting, sculpture, drawing, photography, and other visual arts (though several modern art movements refuse the narrative in favor of the abstract and conceptual), as long as a sequence of events is presented. The word derives from the Latin verb narrare, "to tell", which is derived from the adjective gnarus, "knowing" or "skilled".

Oral storytelling is perhaps the earliest method for sharing narratives. During most people's childhoods, narratives are used to guide them on proper behavior, cultural history, formation of a communal identity, and values, as especially studied in anthropology today among traditional indigenous peoples. Narratives may also be nested within other narratives, such as narratives told by an unreliable narrator (a character) typically found in noir fiction genre. An important part of narration is the narrative mode, the set of methods used to communicate the narrative through a process narration (see also "Narrative Aesthetics" below). Along with exposition, argumentation, and description, narration, broadly defined, is one of four rhetorical modes of discourse. More narrowly defined, it is the fiction-writing mode in which the narrator communicates directly to the reader.

  • https://en.wikipedia.org/wiki/Exposition_(narrative) - the insertion of important background information within a story; for example, information about the setting, characters' backstories, prior plot events, historical context, etc. In a specifically literary context, exposition appears in the form of expository writing embedded within the narrative.


Linguistics

  • https://en.wikipedia.org/wiki/Linguistics - the scientific study of language, specifically language form, language meaning, and language in context. The earliest activities in the description of language have been attributed to the 4th century BCE Indian grammarian Pāṇini, who was an early student of linguistics and wrote a formal description of the Sanskrit language in his Aṣṭādhyāyī.

Linguistics analyzes human language as a system for relating sounds (or signs in signed languages) and meaning. Phonetics studies acoustic and articulatory properties of the production and perception of speech sounds and non-speech sounds. The study of language meaning, on the other hand, deals with how languages encode relations between entities, properties, and other aspects of the world to convey, process, and assign meaning, as well as to manage and resolve ambiguity. While the study of semantics typically concerns itself with truth conditions, pragmatics deals with how context influences meanings.

Grammar is a system of rules which govern the form of the utterances in a given language. It encompasses both sound and meaning, and includes phonology (how sounds or gestures function together), morphology (the formation and composition of words), and syntax (the formation and composition of phrases and sentences from words).

In the early 20th century, Ferdinand de Saussure distinguished between the notions of langue and parole in his formulation of structural linguistics. According to him, parole is the specific utterance of speech, whereas langue refers to an abstract phenomenon that theoretically defines the principles and system of rules that govern a language. This distinction resembles the one made by Noam Chomsky between competence and performance, where competence is individual's ideal knowledge of a language, while performance is the specific way in which it is used.

The formal study of language has also led to the growth of fields like psycholinguistics, which explores the representation and function of language in the mind; neurolinguistics, which studies language processing in the brain; and language acquisition, which investigates how children and adults acquire a particular language.

Linguistics also includes non-formal approaches to the study of other aspects of human language, such as social, cultural, historical and political factors. The study of cultural discourses and dialects is the domain of sociolinguistics, which looks at the relation between linguistic variation and social structures, as well as that of discourse analysis, which examines the structure of texts and conversations. Research on language through historical and evolutionary linguistics focuses on how languages change, and on the origin and growth of languages, particularly over an extended period of time.

Corpus linguistics takes naturally occurring texts and studies the variation of grammatical and other features based on such corpora. Stylistics involves the study of patterns of style: within written, signed, or spoken discourse. Language documentation combines anthropological inquiry with linguistic inquiry to describe languages and their grammars. Lexicography covers the study and construction of dictionaries. Computational linguistics applies computer technology to address questions in theoretical linguistics, as well as to create applications for use in parsing, data retrieval, machine translation, and other areas. People can apply actual knowledge of a language in translation and interpreting, as well as in language education – the teaching of a second or foreign language. Policy makers work with governments to implement new plans in education and teaching which are based on linguistic research.







  • https://en.wikipedia.org/wiki/Genetic_relationship_(linguistics) - Two languages have a genetic relationship, and belong to the same language family, if both are descended from a common ancestor through the process of language change, or one is descended from the other. The term and the process of language evolution are independent of, and not reliant on, the terminology, understanding, and theories related to genetics in the biological sense, so, to avoid confusion, some linguists prefer the term genealogical relationship.: 184 
  • https://en.wikipedia.org/wiki/Comparative_method - a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor and then extrapolating backwards to infer the properties of that ancestor.


  • https://en.wikipedia.org/wiki/Linguistic_typology - or language typology) is a field of linguistics that studies and classifies languages according to their structural features to allow their comparison. Its aim is to describe and explain the structural diversity and the common properties of the world's languages. Its subdisciplines include, but are not limited to phonological typology, which deals with sound features; syntactic typology, which deals with word order and form; lexical typology, which deals with language vocabulary; and theoretical typology, which aims to explain the universal tendencies.

Linguistic typology is contrasted with genealogical linguistics on the grounds that typology groups languages or their grammatical features based on formal similarities rather than historic descendence. The issue of genealogical relation is however relevant to typology because modern data sets aim to be representative and unbiased. Samples are collected evenly from different language families, emphasizing the importance of exotic languages in gaining insight into human language.





  • https://en.wikipedia.org/wiki/Applied_linguistics - an interdisciplinary field of linguistics that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, psychology, computer science, communication research, anthropology, and sociology.
  • https://en.wikipedia.org/wiki/Linguistic_turn - a major development in Western philosophy during the 20th century, the most important characteristic of which is the focusing of philosophy and the other humanities primarily on the relationship between philosophy and language.




  • https://en.wikipedia.org/wiki/Etymology - is the history of words, their origins, and how their form and meaning have changed over time. By an extension, the term "the etymology of [a word]" means the origin of the particular word.



  • https://en.wikipedia.org/wiki/Historical_linguistics - also termed diachronic linguistics and formerly glottology, is the scientific study of language change over time. Principal concerns of historical linguistics include: to describe and account for observed changes in particular languages; to reconstruct the pre-history of languages and to determine their relatedness, grouping them into language families (comparative linguistics); to develop general theories about how and why language changes; to describe the history of speech communities; to study the history of words, i.e. etymology




Becker's Criterion: "Any theory (or partial theory) of the English Language that is expounded in the English Language must account for (or at least apply to) the text of its own exposition."

Becker's Razor: his final riposte to theoretical linguists: "Elegance and truth are inversely related', after which he finishes with, 'Put that in your phrasal lexicon and invoke it!"














  • https://en.wikipedia.org/wiki/Linguistic_typology - a subfield of linguistics that studies and classifies languages according to their structural and functional features. Its aim is to describe and explain the common properties and the structural diversity of the world's languages. It includes three subdisciplines: qualitative typology, which deals with the issue of comparing languages and within-language variance; quantitative typology, which deals with the distribution of structural patterns in the world’s languages; and theoretical typology, which explains these distributions.











  • https://en.wikipedia.org/wiki/Deixis - the use of general words and phrases to refer to a specific time, place, or person in context, e.g., the words tomorrow, there, and they. Words are deictic if their semantic meaning is fixed but their denoted meaning varies depending on time and/or place. Words or phrases that require contextual information to be fully understood—for example, English pronouns—are deictic. Deixis is closely related to anaphora. Although this article deals primarily with deixis in spoken language, the concept is sometimes applied to written language, gestures, and communication media as well. In linguistic anthropology, deixis is treated as a particular subclass of the more general semiotic phenomenon of indexicality, a sign "pointing to" some aspect of its context of occurrence.







  • Field Linguist's Toolbox - a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.Although Toolbox is very powerful, it is designed to be easy to learn. The user can start with a simple standard setup and gradually add the use of more powerful features as desired. The Toolbox downloads include a training package that is usable for self-paced individual learning as well as for classroom teaching of Toolbox. [12]


  • FieldWorks - consists of software tools that help you manage linguistic and cultural data. FieldWorks supports tasks ranging from the initial entry of collected data through to the preparation of data for publication, including dictionary development, interlinearization of texts, morphological analysis, and other publications. Furthermore, FieldWorks BTE contains a specialized drafting and editing environment for Bible Translators, which provides interaction with the language data stored in Language Explorer.


  • AGGREGATION - Implemented grammars can contribute to endangered language documentation in several ways. In the first instance, the grammars themselves provide a very rich addition to prose descriptive grammars, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances (from running text or elicited examples) associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in typology, grounding higher level linguistic abstractions in actual utterances in a computationally tractable fashion. Despite these advantages, grammar engineering for language documentation has gone largely unexplored. In this project, we investigate how to automate the construction of grammar fragments, building on interlinear glossed text (IGT) and the LinGO Grammar Matrix, a typologically motivated cross-linguistic computational resource.


  • Home - DELPH-IN - Computational linguists from research sites world-wide have joined forces in a collaborative effort aimed at ‘deep’ linguistic processing of human language. The goal is the combination of linguistic and statistical processing methods for getting at the meaning of texts and utterances. The partners have adopted Head-Driven Phrase Structure Grammar (HPSG) and Minimal Recursion Semantics (MRS), two advanced models of formal linguistic analysis. They have also committed themselves to a shared format for grammatical representation and to a rigid scheme of evaluation, as well as to the general use of open-source licensing and transparency.



  • SFST - A toolbox for the implementation of morphological analysers
  • Foma - A Finite State Compiler and Library - a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes.The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, boolean operations. Also, more advanced construction methods are available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.



  • Tatoeba - a collection of sentences and translations.It's collaborative, open, free and even addictive.


Semiotics

See also Maths#Logic

  • https://en.wikipedia.org/wiki/Semiotics - also called semiotic studies; not to be confused with the Saussurean tradition called semiology which is a part of semiotics) is the study of meaning-making, the study of sign processes and meaningful communication. This includes the study of signs and sign processes (semiosis), indication, designation, likeness, analogy, allegory, metonymy, metaphor, symbolism, signification, and communication.

Semiotics is closely related to the field of linguistics, which, for its part, studies the structure and meaning of language more specifically. The semiotic tradition explores the study of signs and symbols as a significant part of communications. As different from linguistics, however, semiotics also studies non-linguistic sign systems.

Semiotics is frequently seen as having important anthropological dimensions; for example, the late Italian semiotician and novelist Umberto Eco proposed that every cultural phenomenon may be studied as communication. Some semioticians focus on the logical dimensions of the science, however. They examine areas belonging also to the life sciences—such as how organisms make predictions about, and adapt to, their semiotic niche in the world (see semiosis). In general, semiotic theories take signs or sign systems as their object of study: the communication of information in living organisms is covered in biosemiotics (including zoosemiotics).




  • https://en.wikipedia.org/wiki/Syntagma_(linguistics) - an elementary constituent segment within a text. Such a segment can be a phoneme, a word, a grammatical phrase, a sentence, or an event within a larger narrative structure, depending on the level of analysis. Syntagmatic analysis involves the study of relationships (rules of combination) among syntagmas.

At the lexical level, syntagmatic structure in a language is the combination of words according to the rules of syntax for that language. For example, English uses determiner + adjective + noun, e.g. the big house. Another language might use determiner + noun + adjective (Spanish la casa grande) and therefore have a different syntagmatic structure.

At a higher level, narrative structures feature a realistic temporal flow guided by tension and relaxation; thus, for example, events or rhetorical figures may be treated as syntagmas of epic structures.

Syntagmatic structure is often contrasted with paradigmatic structure. In semiotics, "syntagmatic analysis" is analysis of syntax or surface structure (syntagmatic structure), rather than paradigms as in paradigmatic analysis. Analysis is often achieved through commutation tests.






Pragmatics

  • https://en.wikipedia.org/wiki/Pragmatics - a subfield of linguistics and semiotics that studies the ways in which context contributes to meaning. Pragmatics encompasses speech act theory, conversational implicature, talk in interaction and other approaches to language behavior in philosophy, sociology, linguistics and anthropology.

Unlike semantics, which examines meaning that is conventional or "coded" in a given language, pragmatics studies how the transmission of meaning depends not only on structural and linguistic knowledge (e.g., grammar, lexicon, etc.) of the speaker and listener, but also on the context of the utterance, any pre-existing knowledge about those involved, the inferred intent of the speaker, and other factors. In this respect, pragmatics explains how language users are able to overcome apparent ambiguity, since meaning relies on the manner, place, time etc. of an utterance.

The ability to understand another speaker's intended meaning is called pragmatic competence.

Phonetics

  • https://en.wikipedia.org/wiki/Phonetics - a branch of linguistics that comprises the study of the sounds of human speech, or—in the case of sign languages—the equivalent aspects of sign. It is concerned with the physical properties of speech sounds or signs (phones): their physiological production, acoustic properties, auditory perception, and neurophysiological status. Phonology, on the other hand, is concerned with the abstract, grammatical characterization of systems of sounds or signs.

The field of phonetics is a multilayered subject of linguistics that focuses on speech. In the case of oral languages there are three basic areas of study inter-connected through the common mechanism of sound, such as wavelength (pitch), amplitude, and harmonics:



Phonemics

  • https://en.wikipedia.org/wiki/Phonics - a method for teaching people how to read and write an alphabetic language (such as English or Russian,. It is done by demonstrating the relationship between the sounds of the spoken language (phonemes), and the letters or groups of letters (graphemes) or syllables of the written language. In English, this is also known as the alphabetic principle or the alphabetic code.


  • https://en.wikipedia.org/wiki/Phoneme - one of the units of sound that distinguish one word from another in a particular language. The difference in meaning between the English words kill and kiss is a result of the exchange of the phoneme /l/ for the phoneme /s/. Two words that differ in meaning through a contrast of a single phoneme form a minimal pair. In linguistics, phonemes (established by the use of minimal pairs, such as kill vs kiss or pat vs bat) are written between slashes like this: /p/, whereas when it is desired to show the more exact pronunciation of any sound, linguists use square brackets, for example [pʰ] (indicating an aspirated p).

Within linguistics there are differing views as to exactly what phonemes are and how a given language should be analyzed in phonemic (or phonematic) terms. However, a phoneme is generally regarded as an abstraction of a set (or equivalence class) of speech sounds (phones) which are perceived as equivalent to each other in a given language. For example, in English, the "k" sounds in the words kit and skill are not identical (as described below), but they are distributional variants of a single phoneme /k/. Different speech sounds that are realizations of the same phoneme are known as allophones. Allophonic variation may be conditioned, in which case a certain phoneme is realized as a certain allophone in particular phonological environments, or it may be free in which case it may vary randomly. In this way, phonemes are often considered to constitute an abstract underlying representation for segments of words, while speech sounds make up the corresponding phonetic realization, or surface form.


  • https://en.wikipedia.org/wiki/Phonemic_awareness - a part of phonological awareness in which listeners are able to hear, identify and manipulate phonemes, the smallest mental units of sound that help to differentiate units of meaning (morphemes,. Separating the spoken word "cat" into three distinct phonemes, /k/, /æ/, and /t/, requires phonemic awareness. The National Reading Panel has found that phonemic awareness improves children's word reading and reading comprehension and helps children learn to spell. Phonemic awareness is the basis for learning phonics. Phonemic awareness and phonological awareness are often confused since they are interdependent. Phonemic awareness is the ability to hear and manipulate individual phonemes. Phonological awareness includes this ability, but it also includes the ability to hear and manipulate larger units of sound, such as onsets and rimes and syllables.


Phonetics


  • https://en.wikipedia.org/wiki/ARPABET - also spelled ARPAbet, is a set of phonetic transcription codes developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Research project in the 1970s. It represents phonemes and allophones of General American English with distinct sequences of ASCII characters. Two systems, one representing each segment with one character (alternating upper- and lower-case letters) and the other with two or more (case-insensitive), were devised, the latter being far more widely adopted.[1]ARPABET has been used in several speech synthesizers, including Computalker for the S-100 system, SAM for the Commodore 64, SAY for the Amiga, TextAssist for the PC and Speakeasy from Intelligent Artefacts which used the Votrax SC-01 speech synthesiser IC. It is also used in the CMU Pronouncing Dictionary. A revised version of ARPABET is used in the TIMIT corpus.



  • https://en.wikipedia.org/wiki/Phonetic_algorithm - an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result. They are necessarily complex algorithms with many rules and exceptions, because English spelling and pronunciation is complicated by historical changes in pronunciation and words borrowed from many languages.


Phonology

  • https://en.wikipedia.org/wiki/Phonology - a branch of linguistics concerned with the systematic organization of sounds in languages. It has traditionally focused largely on the study of the systems of phonemes in particular languages (and therefore used to be also called phonemics, or phonematics), but it may also cover any linguistic analysis either at a level beneath the word (including syllable, onset and rime, articulatory gestures, articulatory features, mora, etc.) or at all levels of language where sound is considered to be structured for conveying linguistic meaning. Phonology also includes the study of equivalent organizational systems in sign languages.


  • https://en.wikipedia.org/wiki/Phonological_awareness - an individual's awareness of the phonological structure, or sound structure, of words. Phonological awareness is an important and reliable predictor of later reading ability and has, therefore, been the focus of much research



Morphology

  • https://en.wikipedia.org/wiki/Morpheme - is the smallest grammatical unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word, by definition, is freestanding. When it stands by itself, it is considered a root because it has a meaning of its own (e.g. the morpheme cat) and when it depends on another morpheme to express an idea, it is an affix because it has a grammatical function (e.g. the –s in cats to specify that it is plural). Every word comprises one or more morphemes. The more combinations a morpheme is found in, the more productive it is said to be.


  • https://en.wikipedia.org/wiki/Morphology_(linguistics) - the identification, analysis and description of the structure of a given language's morphemes and other linguistic units, such as root words, affixes, parts of speech, intonations and stresses, or implied context. In contrast, morphological typology is the classification of languages according to their use of morphemes, while lexicology is the study of those words forming a language's wordstock. The discipline that deals specifically with the sound changes occurring within morphemes is morphophonology.

While words, along with clitics, are generally accepted as being the smallest units of syntax, in most languages, if not all, many words can be related to other words by rules that collectively describe the grammar for that language. For example, English speakers recognize that the words dog and dogs are closely related, differentiated only by the plurality morpheme "-s", only found bound to nouns. Speakers of English, a fusional language, recognize these relations from their tacit knowledge of English's rules of word formation. They infer intuitively that dog is to dogs as cat is to cats; and, in similar fashion, dog is to dog catcher as dish is to dishwasher. By contrast, Classical Chinese has very little morphology, using almost exclusively unbound morphemes ("free" morphemes) and depending on word order to convey meaning. (Most words in modern Standard Chinese ("Mandarin"), however, are compounds and most roots are bound.) These are understood as grammars that represent the morphology of the language. The rules understood by a speaker reflect specific patterns or regularities in the way words are formed from smaller units in the language they are using and how those smaller units interact in speech. In this way, morphology is the branch of linguistics that studies patterns of word formation within and across languages and attempts to formulate rules that model the knowledge of the speakers of those languages.

Polysynthetic languages, such as Chukchi, have words composed of many morphemes. The Chukchi word "təmeyŋəlevtpəγtərkən", for example, meaning "I have a fierce headache", is composed of eight morphemes t-ə-meyŋ-ə-levt-pəγt-ə-rkən that may be glossed. The morphology of such languages allows for each consonant and vowel to be understood as morphemes, while the grammar of the language indicates the usage and understanding of each morpheme.


Lexicology

  • https://en.wikipedia.org/wiki/Lexeme - a unit of lexical meaning that exists regardless of the number of inflectional endings it may have or the number of words it may contain. It is a basic unit of meaning, and the headwords of a dictionary are all lexemes. Put more technically, a lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as run. A related concept is the lemma (or citation form), which is a particular form of a lexeme that is chosen by convention to represent a canonical form of a lexeme. Lemmas are used in dictionaries as the headwords, and other forms of a lexeme are often listed later in the entry if they are not common conjugations of that word.

A lexeme belongs to a particular syntactic category, has a certain meaning (semantic value), and in inflecting languages, has a corresponding inflectional paradigm; that is, a lexeme in many languages will have many different forms. For example, the lexeme run has a present third person singular form runs, a present non-third-person singular form run (which also functions as the past participle and non-finite form), a past form ran, and a present participle running. (It does not include runner, runners, runnable, etc.) The use of the forms of a lexeme is governed by rules of grammar; in the case of English verbs such as run, these include subject-verb agreement and compound tense rules, which determine which form of a verb can be used in a given sentence.



  • https://en.wikipedia.org/wiki/Back-formation - the process of creating a new lexeme, usually by removing actual or supposed affixes. The resulting neologism is called a back-formation, a term coined by James Murray in 1889. (OED online first definition of 'back formation' is from the definition of to burgle, which was first published in 1889.) Back-formation is different from clipping – back-formation may change the part of speech or the word's meaning, whereas clipping creates shortened words from longer words, but does not change the part of speech or the meaning of the word.
  • https://en.wikipedia.org/wiki/Lexicology - the part of linguistics which studies words. This may include their nature and function as symbols[1] their meaning, the relationship of their meaning to epistemology in general, and the rules of their composition from smaller elements (morphemes such as the English -ed marker for past or un- for negation; and phonemes as basic sound units). Lexicology also involves relations between words, which may involve semantics (for example, love vs. affection), derivation (for example, fathom vs. unfathomably), usage and sociolinguistic distinctions (for example, flesh vs. meat), and any other issues involved in analyzing the whole lexicon of a language(s).
  • https://en.wikipedia.org/wiki/Computational_lexicology - that branch of computational linguistics, which is concerned with the use of computers in the study of lexicon. It has been more narrowly described by some scholars (Amsler, 1980) as the use of computers in the study of machine-readable dictionaries. It is distinguished from computational lexicography, which more properly would be the use of computers in the construction of dictionaries, though some researchers have used computational lexicography as synonymous.
  • https://en.wikipedia.org/wiki/Lexicography - is divided into two separate but equally important groups: Practical lexicography is the art or craft of compiling, writing and editing dictionaries; Theoretical lexicography is the scholarly discipline of analyzing and describing the semantic, syntagmatic and paradigmatic relationships within the lexicon (vocabulary) of a language, developing theories of dictionary components and structures linking the data in dictionaries, the needs for information by users in specific types of situation, and how users may best access the data incorporated in printed and electronic dictionaries. This is sometimes referred to as 'metalexicography'.


Part of speech

Three little words you often see
Are ARTICLES: a, an, and the.

A NOUN's the name of anything,
As: school or garden, toy, or swing.

ADJECTIVES tell the kind of noun,
As: great, small, pretty, white, or brown.

VERBS tell of something being done: 
To read, write, count, sing, jump, or run.

How things are done the ADVERBS tell, 
As: slowly, quickly, badly, well.

CONJUNCTIONS join the words together,
As: men and women, wind or weather.

The PREPOSITION stands before
A noun as: in or through a door.

The INTERJECTION shows surprise
As: Oh, how pretty! Ah! how wise!


Grammar

  • https://en.wikipedia.org/wiki/Grammar - the set of structural rules governing the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics, and pragmatics.







  • https://en.wikipedia.org/wiki/Cartographic_syntax - a branch of Generative syntax. The basic assumption of Cartographic syntax is that syntactic structures are built according to the same patterns in all languages of the world. It is assumed that all languages exhibit a richly articulated structure of hierarchical projections with specific meanings. Cartography belongs to the tradition of generative grammar and is regarded as a theory belonging to the Principles and Parameters theory. The founders of Cartography are the Italian linguists Luigi Rizzi and Guglielmo Cinque.






to sort






  • https://en.wikipedia.org/wiki/Clause - the smallest grammar unit that can express a complete proposition. typically consists of a subject and a predicate, where the predicate is typically a verb phrase – a verb together with any objects and other modifiers.


  • https://en.wikipedia.org/wiki/Anaphora_(linguistics) - use of an expression the interpretation of which depends upon another expression in context (its antecedent or postcedent). In a narrower sense, anaphora is the use of an expression which depends specifically upon an antecedent expression, and thus is contrasted with cataphora, which is the use of an expression which depends upon a postcedent expression. The anaphoric (referring) term is called an anaphor. For example, in the sentence Sally arrived, but nobody saw her, the pronoun her is an anaphor, referring back to the antecedent Sally. In the sentence Before her arrival, nobody saw Sally, the pronoun her refers forward to the postcedent Sally, so her is now a cataphor (and an anaphor in the broader, but not the narrower, sense). Usually, an anaphoric expression is a proform or some other kind of deictic (contextually-dependent) expression. Both anaphora and cataphora are species of endophora, referring to something mentioned elsewhere in a dialog or text.


  • https://en.wikipedia.org/wiki/Formulaic_language - previously known as automatic speech or embolalia, is a linguistic term for verbal expressions that are fixed in form, often non-literal in meaning with attitudinal nuances, and closely related to communicative-pragmatic context. Along with idioms, expletives and proverbs, formulaic language includes pause fillers (e.g., “Like,” “Er” or “Uhm”) and conversational speech formulas (e.g., “You’ve got to be kidding,” “Excuse me?” or “Hang on a minute”).
  • https://en.wikipedia.org/wiki/Fixed_expression - a standard form of expression that has taken on a more specific meaning than the expression itself. It is different from a proverb in that it is used as a part of a sentence, and is the standard way of expressing a concept or idea.
  • https://en.wikipedia.org/wiki/Idiom - (Latin: idioma, "special property", from Greek: ἰδίωμα – idíōma, "special feature, special phrasing, a peculiarity", f. Greek: ἴδιος – ídios, "one’s own") is a phrase or a fixed expression that has a figurative, or sometimes literal, meaning. An idiom's figurative meaning is different from the literal meaning. There are thousands of idioms, and they occur frequently in all languages. It is estimated that there are at least twenty-five thousand idiomatic expressions in the English language. Idioms fall into the category of formulaic language.







  • https://en.wikipedia.org/wiki/Archi-writing - a term used by French philosopher Jacques Derrida in his attempt to re-orient the relationship between speech and writing. Derrida argued that as far back as Plato, speech had been always given priority over writing. In the West, phonetic writing was considered as a secondary imitation of speech, a poor copy of the immediate living act of speech. Derrida argued that in later centuries philosopher Jean-Jacques Rousseau and linguist Ferdinand de Saussure both gave writing a secondary or parasitic role. In Derrida's essay Plato's Pharmacy, he sought to question this prioritising by firstly complicating the two terms speech and writing.



Formal language

See also Maths, Computing

  • https://en.wikipedia.org/wiki/Chomsky_hierarchy - a containment hierarchy of classes of formal grammars. allows the possibility for the understanding and use of a computer science model which enables a programmer to accomplish meaningful linguistic goals systematically.


Natural language

See also Computing#NLP


Dialect

  • https://en.wikipedia.org/wiki/Dialectology - the scientific study of linguistic dialect, a sub-field of sociolinguistics. It studies variations in language based primarily on geographic distribution and their associated features. Dialectology deals with such topics as divergence of two local dialects from a common ancestor and synchronic variation.






  • https://en.wikipedia.org/wiki/Idiolect - an individual's distinctive and unique use of language, including speech. This unique usage encompasses vocabulary, grammar, and pronunciation. Idiolect is the variety of language unique to an individual. This differs from a dialect, a common set of linguistic characteristics shared among some group of people. The term idiolect refers to the language of an individual. It is etymologically related to the Greek prefix idio- (meaning "own, personal, private, peculiar, separate, distinct") and a back-formation of dialect.


  • https://en.wikipedia.org/wiki/Linguistic_map - a thematic map showing the geographic distribution of the speakers of a language, or isoglosses of a dialect continuum of the same language, or language family. A collection of such maps is a linguistic atlas.


  • Language Mapping Worldwide: Methods and Traditions | SpringerLink - The chapter provides an overview about methods and traditions of linguistic cartography in the past and present. Mapping language and mapping language-related data are of increasing interest not only in disciplines such as dialectology and language typology, which are the classical domains of linguistic cartography, but also in sociolinguistics and theoretical linguistics. The chapter is structured in three main parts. First, the purposes of language mapping are introduced, ranging from visualization of the position of linguistic features in geographic space (the basic purpose of language mapping) through issues in language classification to correlations between linguistic and nonlinguistic features. Second, a formal typology of language maps based on their symbolization is given, distinguishing point-related maps, line-related maps, area-related maps, and surface maps. Third, major language mapping traditions worldwide are sketched in as much detail as possible in a short overview. The descriptions consider examples from all areas of the world (including reprints of maps and map details). A section on the effects of computerization on language mapping concludes the chapter.


  • Mapmaking for Language Documentation and Description - CORE Reader - This paper introduces readers to mapmaking as part of language documentation.We discuss some of the benefits and ethical challenges in producing good maps,drawing on linguistic geography and GIS literature. We then describe currenttools and practices that are useful when creating maps of linguistic data, par-ticularly using locations of field sites to identify language areas/boundaries. Wedemonstrate a basic workflow that uses CartoDB, before demonstrating a morecomplex workflow involving Google Maps and TileMill. We also discuss presen-tation and archiving of mapping products. The majority of the tools identifiedand used are open source or free to use.


Written language

  • https://en.wikipedia.org/wiki/Writing_system - any conventional method of visually representing verbal communication. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form of information storage and transfer. The processes of encoding and decoding writing systems involve shared understanding between writers and readers of the meaning behind the sets of characters that make up a script. Writing is usually recorded onto a durable medium, such as paper or electronic storage, although non-durable methods may also be used, such as writing on a computer display, on a blackboard, in sand, or by skywriting. The general attributes of writing systems can be placed into broad categories such as alphabets, syllabaries, or logographies. Any particular system can have attributes of more than one category. In the alphabetic category, there is a standard set of letters (basic written symbols or graphemes) of consonants and vowels that encode based on the general principle that the letters (or letter pair/groups) represent speech sounds. In a syllabary, each symbol correlates to a syllable or mora. In a logography, each character represents a word, morpheme, or other semantic units. Other categories include abjads, which differ from alphabets in that vowels are not indicated, and abugidas or alphasyllabaries, with each character representing a consonant–vowel pairing. Alphabets typically use a set of 20-to-35 symbols to fully express a language[citation needed], whereas syllabaries can have 80-to-100[citation needed], and logographies can have several hundreds of symbols.


  • https://en.wikipedia.org/wiki/Syllabogram - signs used to write the syllables (or morae) of words. This term is most often used in the context of a writing system otherwise organized on different principles—an alphabet where most symbols represent phonemes, or a logographic script where most symbols represent morphemes—but a system based mostly on syllabograms is a syllabary.







See also Maths

Controlled language

  • https://en.wikipedia.org/wiki/Controlled_natural_language - (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types: those that improve readability for human readers (e.g. non-native speakers), and those that enable reliable automatic semantic analysis of the language. The first type of languages (often called "simplified" or "technical" languages), for example ASD Simplified Technical English, Caterpillar Technical English, IBM's Easy English, are used in the industry to increase the quality of technical documentation, and possibly simplify the (semi-)automatic translation of the documentation. These languages restrict the writer by general rules such as "Keep sentences short", "Avoid the use of pronouns", "Only use dictionary-approved words", and "Use only the active voice". The second type of languages have a formal logical basis, i.e. they have a formal syntax and semantics, and can be mapped to an existing formal language, such as first-order logic. Thus, those languages can be used as knowledge representation languages, and writing of those languages is supported by fully automatic consistency and redundancy checks, query answering, etc.


  • https://en.wikipedia.org/wiki/Attempto_Controlled_English - a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced. [18]

Technology

See AI#NLP


  • https://github.com/TheBerkin/rant - an all-purpose procedural text engine that is most simply described as the opposite of Regex. It has been refined to include a dizzying array of features for handling everything from the most basic of string generation tasks to advanced dialogue generation, code templating, automatic formatting, and more.The goal of the project is to enable developers of all kinds to automate repetitive writing tasks with a high degree of creative freedom.


WebAnno

  • WebAnno - a general purpose web-based annotation tool for a wide range of linguistic annotations including various layers of morphological, syntactical, and semantic annotations.Additionaly, custom annotation layers can be defined, allowing WebAnno to be used also for non-linguistic annotation tasks. WebAnno is a multi-user tool supporting different roles such as annotator, curator, and project manager. The progress and quality of annotation projects can be monitored and measuered in terms of inter-annotator agreement. Multiple annotation projects can be conducted in parallel.
    • https://github.com/webanno/webanno - The official WebAnno repository has reached the end of the line. -- To migrate, export your annotation projects from WebAnno, then import them into INCEpTION and just work on.

INCEpTION

  • INCEpTION - A semantic annotation platform offering intelligent assistance and knowledge management The annotation of specific semantic phenomena often require compiling task-specific corpora and creating or extending task-specific knowledge bases. Presently, researchers require a broad range of skills and tools to address such semantic annotation tasks. In the recently funded INCEpTION project, UKP Lab at TU Darmstadt aims towards building an annotation platform that incorporates all the related tasks into a joint web-based platform.


Spelling

  • Hunspell - the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and it is also used by proprietary software packages, like Mac OS X, InDesign, memoQ, Opera and SDL Trados.

Other

Smiley











Fiction












Interactive

  • https://en.wikipedia.org/wiki/Ludonarrative - a compound of ludology and narrative, refers to the intersection in a video game of ludic elements – or gameplay – and narrative elements. It is commonly used in the term Ludonarrative dissonance which refers to conflicts between a video game's narrative and its gameplay. The term was coined by Clint Hocking, a former creative director at LucasArts (then at Ubisoft), on his blog in October, 2007. Hocking coined the term in response to the game BioShock, which according to him promotes the theme of self-interest through its gameplay while promoting the opposing theme of selflessness through its narrative, creating a violation of aesthetic distance that often pulls the player out of the game. Video game theorist Tom Bissell, in his book Extra Lives: Why Video Games Matter (2010), notes the example of Call of Duty 4: Modern Warfare, where a player can all but kill their digital partner during gameplay without upsetting the built-in narrative of the game.


Naming conventions