Difference between revisions of "Speech"

From Things and Stuff Wiki
Jump to navigation Jump to search
(a move)
 
Line 1: Line 1:
 
{{menu}}
 
{{menu}}
  
 +
= Recognition ===
 +
* http://en.wikipedia.org/wiki/Speech_recognition_in_Linux
 +
 +
 +
* https://github.com/Picovoice/stt-benchmark [https://news.ycombinator.com/item?id=17703546]
 +
 +
 +
* http://www.honeytechblog.com/12-useful-speech-recognition-applications-available-for-linux/
 +
* http://linuxpoison.blogspot.co.uk/2009/04/voice-control-your-ubuntu-desktop.html
 +
 +
 +
* http://cmusphinx.sourceforge.net/
 +
** http://cmusphinx.sourceforge.net/wiki/
 +
** https://github.com/syl22-00/pocketsphinx.js
 +
 +
 +
* http://chrislord.net/index.php/2016/06/01/open-source-speech-recognition/ [https://news.ycombinator.com/item?id=11820490]
 +
 +
 +
* [http://antiboredom.github.io/audiogrep/ Audiogrep] - transcribes audio files and then creates "audio supercuts" based on search phrases. It uses CMU Pocketsphinx for speech-to-text and pydub to stitch things together. [https://news.ycombinator.com/item?id=9159115]
 +
 +
 +
* [http://simon.kde.org/ Simon] - an open source speech recognition program that can replace your mouse and keyboard. The system is designed to be as flexible as possible and will work with any language or dialect.
 +
 +
 +
* [http://www.jezra.net/projects/blather Blather] is a speech recognizer that will run commands when a user speaks preset sentences.
 +
** [https://www.youtube.com/watch?v=gr1FZ2F7KYA Intro to Blather: Speech Recognition for Linux]
 +
 +
 +
* [http://sp-tk.sourceforge.net/ Speech Signal Processing Toolkit (SPTK)] - a suite of speech signal processing tools for UNIX environments, e.g., LPC analysis, PARCOR analysis, LSP analysis, PARCOR synthesis filter, LSP synthesis filter, vector quantization techniques, and other extended versions of them. This software is released under the Modified BSD license.SPTK was developed and has been used in the research group of Prof. Satoshi Imai (he has retired) and Prof. Takao Kobayashi (currently he is with Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology) at P&I laboratory, Tokyo Institute of Technology. A sub-set of tools was chosen and arranged for distribution by Prof. Keiichi Tokuda (currently he is with Department of Computer Science and Engineering, Nagoya Institute of Technology) as a coordinator in cooperation and other collaborates (see "Acknowledgments" and "Who we are" in README).The original source codes have been written by many people who took part in activities of the research group. The most original source codes of this distribution were written by Takao Kobayashi (graph, data processing, FFT, sampling rate conversion, etc.), Keiichi Tokuda (speech analysis, speech synthesis, etc.), and Kazuhito Koishida (LSP, vector quantization, etc.).
 +
 +
 +
* [https://sites.google.com/site/speechrate/ speechrate] - software for the analysis of speech. Below you will find a script that automatically detects syllable nuclei in order to measure speech rate without the need of a transcription. Peaks in intensity (dB) that are preceded and followed by dips in intensity are considered as potential syllable nuclei. The script subsequently discards peaks that are not voiced. On this page you find an example of how the script works.
 +
** [http://www.fon.hum.uva.nl/praat/ Praat: doing Phonetics by Computer]
 +
 +
 +
* [http://kaldi-asr.org/ Kaldi Speech Recognition Toolkit] [https://news.ycombinator.com/item?id=12773218]
 +
** https://github.com/kaldi-asr/kaldi
 +
** https://github.com/alumae/kaldi-gstreamer-server
 +
 +
 +
* https://github.com/rob-mccann/Pi-Voice - The beginnings of a Star Trek-like computer. Run the program, speak into your microphone and hear the response from your speakers.
 +
 +
 +
* https://github.com/benoitfragit/pocketVox
 +
** http://cmusphinx.sourceforge.net/2014/11/pocketvox-is-listening-you/
 +
 +
 +
* [https://wit.ai/ Wit.ai] - makes it easy for developers to build applications and devices that you can talk or text to. Our vision is to empower developers with an open and extensible natural language platform. Wit.ai learns human language from every interaction, and leverages the community: what’s learned is shared across developers.
 +
 +
 +
* https://github.com/julius-speech/julius - Open-Source Large Vocabulary Continuous Speech Recognition Engine
 +
 +
 +
* https://cloud.google.com/speech/ [https://news.ycombinator.com/item?id=11347872]
 +
 +
 +
* DeepMind: [https://deepmind.com/blog/wavenet-generative-model-raw-audio/ WaveNet: A Generative Model for Raw Audio]
 +
 +
* https://github.com/buriburisuri/speech-to-text-wavenet [https://news.ycombinator.com/item?id=13037063]
 +
 +
 +
* https://github.com/alexa-pi/AlexaPi
 +
 +
* https://distill.pub/2017/ctc/
 +
 +
* https://nicholas.carlini.com/code/audio_adversarial_examples [https://news.ycombinator.com/item?id=16255680]
 +
 +
 +
* [http://kaldi-asr.org/doc/ Kaldi] - a toolkit for speech recognition, intended for use by speech recognition researchers and professionals.
 +
** https://github.com/kaldi-asr/kaldi
 +
 +
 +
* https://github.com/espnet/espnet - an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.
 +
 +
 +
=== Other ===
 +
 +
* [http://jasperproject.github.io/ Jasper] is an open source platform for developing always-on, voice-controlled applications [https://news.ycombinator.com/item?id=7546858]
 +
** https://github.com/jasperproject/jasper-client
 +
** http://hackaday.com/2014/04/09/create-your-own-j-a-r-v-i-s-using-jasper/
 +
 +
* http://www.acapela-group.com/
 +
 +
 +
* https://github.com/fredvs/sak - Speecher Assistive Kit. sak uses the PortAudio and eSpeak Open Source Libraries.
 +
 +
=== Analysis ===
 +
==== ESPS ====
 +
 +
* http://www.speech.cs.cmu.edu/comp.speech/Section1/Labs/esps.html
 +
 +
* [http://www.phon.ox.ac.uk/releases ESPS] - '''Entropic Signal Processing System''', is a package of UNIX-like commands and programming libraries for speech signal processing. As a commercial product of Entropic Research Laboratory, Inc, it became extremely widely used in phonetics and speech technology research laboratories in the 1990's, in view of the wide range of functions it offered, such as get_f0 (for fundamental frequency estimation), formant (for formant frequency measurement), the xwaves graphical user interface, and many other commands and utilities. Following the acquisition of Entropic by Microsoft in 1999, Microsoft and AT&T licensed ESPS to the Centre for Speech Technology at KTH, Sweden, so that a final legacy version of the ESPS source code could continue to be made available to speech researchers. At KTH, code from the ESPS library (such as get_f0) was incorporated by Kåre Sjölander and Jonas Beskow into the Wavesurfer speech analysis tool. This is a very good alternative way to use many ESPS functions if you want a graphical user interface rather than scripting.
 +
 +
 +
* https://github.com/jeremysalwen/ESPS - This archive contains source files from the ESPS toolkit.
 +
 +
==== NICO toolkit ====
 +
* [http://nico.nikkostrom.com NICO toolkit] - mainly intended for, and originally developed for speech recognition applications, a general purpose toolkit for constructing artificial neural networks and training with the back-propagation learning algorithm. The network topology is very flexible. Units are organized in groups and the group is a hierarchical structure, so groups can have sub-groups or other objects as members. This makes it easy to specify multi-layer networks with arbitrary connection structure and to build modular networks.
 +
 +
==== Speech Research Tools ====
 +
* https://sourceforge.net/projects/speechresearch - Software for speech research. It includes programs and libraries for signal processing, along with general purpose scientific libraries. Most of the code is in Python, with C/C++ supporting code. Also, contains code releases corresponding to publishe
 +
 +
 +
==== HAT ====
 +
* [http://www.speech.kth.se/hat/ Higgins Annotation Tool] - can be used to transcribe and annotate speech with one or more audio tracks (such as dialogue). Windows.
 +
 +
 +
== Synthesis ==
 
* http://en.wikipedia.org/wiki/Speech_synthesis
 
* http://en.wikipedia.org/wiki/Speech_synthesis
  
Line 12: Line 121:
  
  
==== SAM ====
+
=== SAM ===
 
* https://en.wikipedia.org/wiki/Software_Automatic_Mouth - or SAM, is a speech synthesis program developed and sold by Don’t Ask Software. The program was released for the Apple II, Lisa, Atari 8-bit family, and Commodore 64.
 
* https://en.wikipedia.org/wiki/Software_Automatic_Mouth - or SAM, is a speech synthesis program developed and sold by Don’t Ask Software. The program was released for the Apple II, Lisa, Atari 8-bit family, and Commodore 64.
  
Line 20: Line 129:
  
  
==== rsynth ====
+
=== rsynth ===
 
* [http://rsynth.sourceforge.net/ rsynth] - Text-to-Speech.
 
* [http://rsynth.sourceforge.net/ rsynth] - Text-to-Speech.
 
** https://github.com/rhdunn/rsynth - fork
 
** https://github.com/rhdunn/rsynth - fork
Line 26: Line 135:
  
  
==== Festival ====
+
=== Festival ===
 
* [http://www.cstr.ed.ac.uk/projects/festival/ Festival] - or '''The Festival Speech Synthesis System''', offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system.
 
* [http://www.cstr.ed.ac.uk/projects/festival/ Festival] - or '''The Festival Speech Synthesis System''', offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system.
  
Line 34: Line 143:
  
  
==== Rocaloid ====
+
=== Rocaloid ===
 
* [http://rocaloid.github.io/index_en.html Rocaloid] - a free, open-source singing voice synthesis system. Its ultimate goal is to fast synthesize natural, flexible and multi-lingual vocal parts.  Like other vocal synthesizing software, after installing the vocal database, inputting lyrics and pitch, you can synthesize attractive vocal parts. What’s more, Rocaloid highlights on providing you more controllable parameters which enabling to take control of exquisite dimensions of the synthesized voice and export with better quality. By using a fully constructed Rocaloid Database, you can synthesize singing voice in any phonetic-based languages.
 
* [http://rocaloid.github.io/index_en.html Rocaloid] - a free, open-source singing voice synthesis system. Its ultimate goal is to fast synthesize natural, flexible and multi-lingual vocal parts.  Like other vocal synthesizing software, after installing the vocal database, inputting lyrics and pitch, you can synthesize attractive vocal parts. What’s more, Rocaloid highlights on providing you more controllable parameters which enabling to take control of exquisite dimensions of the synthesized voice and export with better quality. By using a fully constructed Rocaloid Database, you can synthesize singing voice in any phonetic-based languages.
 
** https://github.com/Rocaloid - dead?
 
** https://github.com/Rocaloid - dead?
  
==== Festvox ====
+
=== Festvox ===
 
* [http://festvox.org/ Festvox] - aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Specifically we offer: Documentation, including scripts explaining the background and specifics for building new voices for speech synthesis in new and supported languages. Example speech databases to help building new voices. Links, demos and a repository for new voices. This work is firmly grounded within Edinburgh University's Festival Speech Synthesis System and Carnegie Mellon University's small footprint Flite synthesis engine.
 
* [http://festvox.org/ Festvox] - aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Specifically we offer: Documentation, including scripts explaining the background and specifics for building new voices for speech synthesis in new and supported languages. Example speech databases to help building new voices. Links, demos and a repository for new voices. This work is firmly grounded within Edinburgh University's Festival Speech Synthesis System and Carnegie Mellon University's small footprint Flite synthesis engine.
  
  
==== MaryTTS ====
+
=== MaryTTS ===
 
* [http://mary.dfki.de/ MaryTTS] is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.
 
* [http://mary.dfki.de/ MaryTTS] is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.
  
  
==== eSpeak ====
+
=== eSpeak ===
 
* [http://espeak.sourceforge.net/ eSpeak] - a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.
 
* [http://espeak.sourceforge.net/ eSpeak] - a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.
  
Line 52: Line 161:
 
* https://github.com/divVerent/ecantorix - a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.
 
* https://github.com/divVerent/ecantorix - a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.
  
==== OpenSource SpeechSynth ====
+
=== OpenSource SpeechSynth ===
 
* http://web.media.mit.edu/~stefanm/osss/
 
* http://web.media.mit.edu/~stefanm/osss/
  
  
==== MBROLA ====
+
=== MBROLA ===
 
* http://tcts.fpms.ac.be/synthesis/ - The MBROLA Project
 
* http://tcts.fpms.ac.be/synthesis/ - The MBROLA Project
 
** http://en.wikipedia.org/wiki/MBROLA
 
** http://en.wikipedia.org/wiki/MBROLA
  
  
==== Assistive Context-Aware Toolkit ====
+
=== Assistive Context-Aware Toolkit ===
 
* [https://01.org/acat Assistive Context-Aware Toolkit (ACAT)] - an open source platform developed at Intel Labs to enable people with motor neuron diseases and other disabilities to have full access to the capabilities and applications of their computers through very constrained interfaces suitable for their condition.  More specifically, ACAT enables users to easily communicate with others through keyboard simulation, word prediction and speech synthesis. Users can perform a range of tasks such as editing, managing documents, navigating the Web and accessing emails. ACAT was originally developed by researchers at Intel Labs for Professor Stephen Hawking, through a very iterative design process over the course of three years.
 
* [https://01.org/acat Assistive Context-Aware Toolkit (ACAT)] - an open source platform developed at Intel Labs to enable people with motor neuron diseases and other disabilities to have full access to the capabilities and applications of their computers through very constrained interfaces suitable for their condition.  More specifically, ACAT enables users to easily communicate with others through keyboard simulation, word prediction and speech synthesis. Users can perform a range of tasks such as editing, managing documents, navigating the Web and accessing emails. ACAT was originally developed by researchers at Intel Labs for Professor Stephen Hawking, through a very iterative design process over the course of three years.
 
** http://blogs.msdn.com/b/cdndevs/archive/2015/08/14/intel-just-open-sourced-stephen-hawking-s-speech-system-and-it-s-a-net-4-5-winforms-app.aspx [https://news.ycombinator.com/item?id=10072188]
 
** http://blogs.msdn.com/b/cdndevs/archive/2015/08/14/intel-just-open-sourced-stephen-hawking-s-speech-system-and-it-s-a-net-4-5-winforms-app.aspx [https://news.ycombinator.com/item?id=10072188]
  
  
==== Praat ====
+
=== Praat ===
 
* [http://www.fon.hum.uva.nl/praat/ Praat] - doing phonetics by computer
 
* [http://www.fon.hum.uva.nl/praat/ Praat] - doing phonetics by computer
  
  
==== Gnuspeech ====
+
=== Gnuspeech ===
 
* [https://www.gnu.org/software/gnuspeech gnuspeech] - makes it easy to produce high quality computer speech output, design new language databases, and create controlled speech stimuli for psychophysical experiments. gnuspeechsa is a cross-platform module of gnuspeech that allows command line, or application-based speech output. The software has been released as two tarballs that are available in the project Downloads area of http://savannah.gnu.org/projects/gnuspeech. [https://news.ycombinator.com/item?id=10421776]
 
* [https://www.gnu.org/software/gnuspeech gnuspeech] - makes it easy to produce high quality computer speech output, design new language databases, and create controlled speech stimuli for psychophysical experiments. gnuspeechsa is a cross-platform module of gnuspeech that allows command line, or application-based speech output. The software has been released as two tarballs that are available in the project Downloads area of http://savannah.gnu.org/projects/gnuspeech. [https://news.ycombinator.com/item?id=10421776]
  
  
==== Project Merlin ====
+
=== Project Merlin ===
 
* [http://projectmeilin.github.io/en/ Project Merlin] - A truly free virtual singer, no matter how you want to send all kinds of ideas. [https://linuxmusicians.com/viewtopic.php?f=48&t=15540]
 
* [http://projectmeilin.github.io/en/ Project Merlin] - A truly free virtual singer, no matter how you want to send all kinds of ideas. [https://linuxmusicians.com/viewtopic.php?f=48&t=15540]
 
** https://github.com/ProjectMeilin - not fully open yet?
 
** https://github.com/ProjectMeilin - not fully open yet?
Line 83: Line 192:
  
  
==== UTAU ====
+
=== UTAU ===
  
 
* https://en.wikipedia.org/wiki/Utau - a Japanese singing synthesizer application created by Ameya/Ayame. This program is similar to the Vocaloid software, with the difference that it is shareware instead of being released under third party licensing
 
* https://en.wikipedia.org/wiki/Utau - a Japanese singing synthesizer application created by Ameya/Ayame. This program is similar to the Vocaloid software, with the difference that it is shareware instead of being released under third party licensing
Line 95: Line 204:
  
  
==== Sinsy ====
+
=== Sinsy ===
 
* [http://www.sinsy.jp/ Sinsy] - HMM-based Singing Voice Synthesis System
 
* [http://www.sinsy.jp/ Sinsy] - HMM-based Singing Voice Synthesis System
 
** http://sinsy.sourceforge.net
 
** http://sinsy.sourceforge.net
Line 106: Line 215:
 
* https://github.com/hyperzlib/Sinsy-Remix - The HMM-Based Singing Voice Syntheis System Remix "Sinsy-r"
 
* https://github.com/hyperzlib/Sinsy-Remix - The HMM-Based Singing Voice Syntheis System Remix "Sinsy-r"
  
==== Mozilla TTS ====
+
=== Mozilla TTS ===
 
* https://github.com/mozilla/TTS - Deep learning for Text to Speech
 
* https://github.com/mozilla/TTS - Deep learning for Text to Speech
  
==== CMU Flite ====
+
=== CMU Flite ===
 
* [http://www.festvox.org/flite/ CMU Flite] - a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
 
* [http://www.festvox.org/flite/ CMU Flite] - a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
 
** https://github.com/festvox/flite
 
** https://github.com/festvox/flite
  
==== mesing ====
+
=== mesing ===
 
* https://github.com/usdivad/mesing
 
* https://github.com/usdivad/mesing
  
  
==== Adobe VoCo ====
+
=== Adobe VoCo ===
 
* https://en.wikipedia.org/wiki/Adobe_Voco
 
* https://en.wikipedia.org/wiki/Adobe_Voco
 
* https://arstechnica.com/information-technology/2016/11/adobe-voco-photoshop-for-audio-speech-editing/ [https://news.ycombinator.com/item?id=12892063]
 
* https://arstechnica.com/information-technology/2016/11/adobe-voco-photoshop-for-audio-speech-editing/ [https://news.ycombinator.com/item?id=12892063]
Line 123: Line 232:
  
  
==== VST Speek ====
+
=== VST Speek ===
 
* http://blog.wavosaur.com/text-to-speech-vst-vst-speek/
 
* http://blog.wavosaur.com/text-to-speech-vst-vst-speek/
 
** [http://blog.wavosaur.com/vst-speek-beta-for-linux/ VST Speek beta for Linux]
 
** [http://blog.wavosaur.com/vst-speek-beta-for-linux/ VST Speek beta for Linux]
  
  
==== char2wav ====
+
=== char2wav ===
 
* https://mila.umontreal.ca/en/publication/char2wav-end-to-end-speech-synthesis/ [https://news.ycombinator.com/item?id=13702243]
 
* https://mila.umontreal.ca/en/publication/char2wav-end-to-end-speech-synthesis/ [https://news.ycombinator.com/item?id=13702243]
  
  
==== loop ====
+
=== loop ===
 
* https://github.com/facebookresearch/loop
 
* https://github.com/facebookresearch/loop
  
  
==== IPOX ====
+
=== IPOX ===
 
* [http://www.phon.ox.ac.uk/ipox IPOX] - an experimental, all-prosodic speech synthesizer, developed many years ago by Arthur Dirksen and John Coleman. It is still available for downloading, and was designed to run on a 486 PC running Windows 3.1 or higher, with a 16-bit Windows-compatible sound card, such as the Soundblaster 16. It still seems to run on e.g. XP, but I haven't tried it on Vista.
 
* [http://www.phon.ox.ac.uk/ipox IPOX] - an experimental, all-prosodic speech synthesizer, developed many years ago by Arthur Dirksen and John Coleman. It is still available for downloading, and was designed to run on a 486 PC running Windows 3.1 or higher, with a 16-bit Windows-compatible sound card, such as the Soundblaster 16. It still seems to run on e.g. XP, but I haven't tried it on Vista.
  
  
==== NPSS ====
+
=== NPSS ===
 
* [http://www.dtic.upf.edu/~mblaauw/NPSS/ Neural Parametric Singing Synthesizer]
 
* [http://www.dtic.upf.edu/~mblaauw/NPSS/ Neural Parametric Singing Synthesizer]
  
Line 146: Line 255:
 
* https://github.com/seaniezhao/torch_npss - pytorch implementation of Neural Parametric Singing Synthesizer 歌声合成
 
* https://github.com/seaniezhao/torch_npss - pytorch implementation of Neural Parametric Singing Synthesizer 歌声合成
  
==== Pink Trombone ====
+
=== Pink Trombone ===
 
* [https://dood.al/pinktrombone/ Pink Trombone] - Bare-handed procedural speech synthesis, version 1.1, March 2017, by Neil Thapen
 
* [https://dood.al/pinktrombone/ Pink Trombone] - Bare-handed procedural speech synthesis, version 1.1, March 2017, by Neil Thapen
 
** https://github.com/giuliomoro/pink-trombone
 
** https://github.com/giuliomoro/pink-trombone
  
==== Klatter ====
+
=== Klatter ===
 
* https://github.com/fundamental/klatter - a bare bones formant synthesizer based upon the description given in the 1979 paper "Software For a Cascade/Parallel Formant Synthesizer" by Dennis Klatt. This program was not designed for interactive use, though there is code for some minimal midi control. In it's current state, it is enough of a curiosity that it will be preserved, though it may not see much if any use.
 
* https://github.com/fundamental/klatter - a bare bones formant synthesizer based upon the description given in the 1979 paper "Software For a Cascade/Parallel Formant Synthesizer" by Dennis Klatt. This program was not designed for interactive use, though there is code for some minimal midi control. In it's current state, it is enough of a curiosity that it will be preserved, though it may not see much if any use.
  
  
==== Tacotron 2 ====
+
=== Tacotron 2 ===
 
* https://github.com/Rayhane-mamah/Tacotron-2 - DeepMind's Tacotron-2 Tensorflow implementation
 
* https://github.com/Rayhane-mamah/Tacotron-2 - DeepMind's Tacotron-2 Tensorflow implementation
  
Line 160: Line 269:
  
  
==== Real-Time-Voice-Cloning ====
+
=== Real-Time-Voice-Cloning ===
 
* https://github.com/CorentinJ/Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
 
* https://github.com/CorentinJ/Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
  
==== leesampler ====
+
=== leesampler ===
 
* https://github.com/GloomyGhost-MosquitoSeal/lessampler - a Singing Voice Synthesizer
 
* https://github.com/GloomyGhost-MosquitoSeal/lessampler - a Singing Voice Synthesizer
  
==== Neural Parametric Singing Synthesizer ====
+
=== Neural Parametric Singing Synthesizer ===
 
* [https://mtg.github.io/singing-synthesis-demos/ A Neural Parametric Singing Synthesizer]
 
* [https://mtg.github.io/singing-synthesis-demos/ A Neural Parametric Singing Synthesizer]
  
==== VoiceOfFaust ====
+
=== VoiceOfFaust ===
 
* https://github.com/magnetophon/VoiceOfFaust - Turn your voice into a synthesizer!
 
* https://github.com/magnetophon/VoiceOfFaust - Turn your voice into a synthesizer!

Revision as of 00:35, 2 January 2020

Recognition ==






  • Audiogrep - transcribes audio files and then creates "audio supercuts" based on search phrases. It uses CMU Pocketsphinx for speech-to-text and pydub to stitch things together. [3]


  • Simon - an open source speech recognition program that can replace your mouse and keyboard. The system is designed to be as flexible as possible and will work with any language or dialect.



  • Speech Signal Processing Toolkit (SPTK) - a suite of speech signal processing tools for UNIX environments, e.g., LPC analysis, PARCOR analysis, LSP analysis, PARCOR synthesis filter, LSP synthesis filter, vector quantization techniques, and other extended versions of them. This software is released under the Modified BSD license.SPTK was developed and has been used in the research group of Prof. Satoshi Imai (he has retired) and Prof. Takao Kobayashi (currently he is with Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology) at P&I laboratory, Tokyo Institute of Technology. A sub-set of tools was chosen and arranged for distribution by Prof. Keiichi Tokuda (currently he is with Department of Computer Science and Engineering, Nagoya Institute of Technology) as a coordinator in cooperation and other collaborates (see "Acknowledgments" and "Who we are" in README).The original source codes have been written by many people who took part in activities of the research group. The most original source codes of this distribution were written by Takao Kobayashi (graph, data processing, FFT, sampling rate conversion, etc.), Keiichi Tokuda (speech analysis, speech synthesis, etc.), and Kazuhito Koishida (LSP, vector quantization, etc.).


  • speechrate - software for the analysis of speech. Below you will find a script that automatically detects syllable nuclei in order to measure speech rate without the need of a transcription. Peaks in intensity (dB) that are preceded and followed by dips in intensity are considered as potential syllable nuclei. The script subsequently discards peaks that are not voiced. On this page you find an example of how the script works.





  • Wit.ai - makes it easy for developers to build applications and devices that you can talk or text to. Our vision is to empower developers with an open and extensible natural language platform. Wit.ai learns human language from every interaction, and leverages the community: what’s learned is shared across developers.







  • https://github.com/espnet/espnet - an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech. ESPnet uses chainer and pytorch as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.


Other


Analysis

ESPS

  • ESPS - Entropic Signal Processing System, is a package of UNIX-like commands and programming libraries for speech signal processing. As a commercial product of Entropic Research Laboratory, Inc, it became extremely widely used in phonetics and speech technology research laboratories in the 1990's, in view of the wide range of functions it offered, such as get_f0 (for fundamental frequency estimation), formant (for formant frequency measurement), the xwaves graphical user interface, and many other commands and utilities. Following the acquisition of Entropic by Microsoft in 1999, Microsoft and AT&T licensed ESPS to the Centre for Speech Technology at KTH, Sweden, so that a final legacy version of the ESPS source code could continue to be made available to speech researchers. At KTH, code from the ESPS library (such as get_f0) was incorporated by Kåre Sjölander and Jonas Beskow into the Wavesurfer speech analysis tool. This is a very good alternative way to use many ESPS functions if you want a graphical user interface rather than scripting.


NICO toolkit

  • NICO toolkit - mainly intended for, and originally developed for speech recognition applications, a general purpose toolkit for constructing artificial neural networks and training with the back-propagation learning algorithm. The network topology is very flexible. Units are organized in groups and the group is a hierarchical structure, so groups can have sub-groups or other objects as members. This makes it easy to specify multi-layer networks with arbitrary connection structure and to build modular networks.

Speech Research Tools

  • https://sourceforge.net/projects/speechresearch - Software for speech research. It includes programs and libraries for signal processing, along with general purpose scientific libraries. Most of the code is in Python, with C/C++ supporting code. Also, contains code releases corresponding to publishe


HAT

  • Higgins Annotation Tool - can be used to transcribe and annotate speech with one or more audio tracks (such as dialogue). Windows.


Synthesis

to sort/categorise



SAM

  • WP: Software_Automatic_Mouth - or SAM, is a speech synthesis program developed and sold by Don’t Ask Software. The program was released for the Apple II, Lisa, Atari 8-bit family, and Commodore 64.


rsynth


Festival

  • Festival - or The Festival Speech Synthesis System, offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system.
  • Festvox - aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice.


Rocaloid

  • Rocaloid - a free, open-source singing voice synthesis system. Its ultimate goal is to fast synthesize natural, flexible and multi-lingual vocal parts.  Like other vocal synthesizing software, after installing the vocal database, inputting lyrics and pitch, you can synthesize attractive vocal parts. What’s more, Rocaloid highlights on providing you more controllable parameters which enabling to take control of exquisite dimensions of the synthesized voice and export with better quality. By using a fully constructed Rocaloid Database, you can synthesize singing voice in any phonetic-based languages.

Festvox

  • Festvox - aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Specifically we offer: Documentation, including scripts explaining the background and specifics for building new voices for speech synthesis in new and supported languages. Example speech databases to help building new voices. Links, demos and a repository for new voices. This work is firmly grounded within Edinburgh University's Festival Speech Synthesis System and Carnegie Mellon University's small footprint Flite synthesis engine.


MaryTTS

  • MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.


eSpeak

  • eSpeak - a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.


  • https://github.com/divVerent/ecantorix - a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.

OpenSource SpeechSynth


MBROLA


Assistive Context-Aware Toolkit


Praat

  • Praat - doing phonetics by computer


Gnuspeech

  • gnuspeech - makes it easy to produce high quality computer speech output, design new language databases, and create controlled speech stimuli for psychophysical experiments. gnuspeechsa is a cross-platform module of gnuspeech that allows command line, or application-based speech output. The software has been released as two tarballs that are available in the project Downloads area of http://savannah.gnu.org/projects/gnuspeech. [10]


Project Merlin


UTAU

  • WP: Utau - a Japanese singing synthesizer application created by Ameya/Ayame. This program is similar to the Vocaloid software, with the difference that it is shareware instead of being released under third party licensing




Sinsy



Mozilla TTS

CMU Flite

  • CMU Flite - a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to Festival for voices built using the FestVox suite of voice building tools.

mesing


Adobe VoCo


VST Speek


char2wav


loop


IPOX

  • IPOX - an experimental, all-prosodic speech synthesizer, developed many years ago by Arthur Dirksen and John Coleman. It is still available for downloading, and was designed to run on a 486 PC running Windows 3.1 or higher, with a 16-bit Windows-compatible sound card, such as the Soundblaster 16. It still seems to run on e.g. XP, but I haven't tried it on Vista.


NPSS


Pink Trombone

Klatter

  • https://github.com/fundamental/klatter - a bare bones formant synthesizer based upon the description given in the 1979 paper "Software For a Cascade/Parallel Formant Synthesizer" by Dennis Klatt. This program was not designed for interactive use, though there is code for some minimal midi control. In it's current state, it is enough of a curiosity that it will be preserved, though it may not see much if any use.


Tacotron 2


Real-Time-Voice-Cloning

leesampler

Neural Parametric Singing Synthesizer

VoiceOfFaust