Synth vox
Synthesis
to sort/categorise
- https://archive.org/details/flexibleformants00lalw - Flexible formant synthesizer : a tool for improving speech production quality
- https://archive.org/details/flexiblehighqual00hsie - A flexible and high quality articulatory speech synthesizer
SAM
- https://en.wikipedia.org/wiki/Software_Automatic_Mouth - or SAM, is a speech synthesis program developed and sold by Don’t Ask Software. The program was released for the Apple II, Lisa, Atari 8-bit family, and Commodore 64.
rsynth
- rsynth - Text-to-Speech.
Festival
- Festival - or The Festival Speech Synthesis System, offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system.
- Festvox - aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice.
Rocaloid
- Rocaloid - a free, open-source singing voice synthesis system. Its ultimate goal is to fast synthesize natural, flexible and multi-lingual vocal parts. Like other vocal synthesizing software, after installing the vocal database, inputting lyrics and pitch, you can synthesize attractive vocal parts. What’s more, Rocaloid highlights on providing you more controllable parameters which enabling to take control of exquisite dimensions of the synthesized voice and export with better quality. By using a fully constructed Rocaloid Database, you can synthesize singing voice in any phonetic-based languages.
- https://github.com/Rocaloid - dead?
Festvox
- Festvox - aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Specifically we offer: Documentation, including scripts explaining the background and specifics for building new voices for speech synthesis in new and supported languages. Example speech databases to help building new voices. Links, demos and a repository for new voices. This work is firmly grounded within Edinburgh University's Festival Speech Synthesis System and Carnegie Mellon University's small footprint Flite synthesis engine.
MaryTTS
- MaryTTS - an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.
- https://github.com/marytts/marytts-txt2wav - An example project demonstrating use of MaryTTS in a deliberately standalone application
- https://github.com/synesthesiam/marytts-txt2wav - Command-line utility for text to speech with MaryTTS
eSpeak
- eSpeak - a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.
- https://github.com/divVerent/ecantorix - a singing synthesis frontend for espeak. It works by using espeak to generate raw speech samples, then adjusting their pitch and length and finally creating a LMMS project file referencing the samples in sync to the input file.
OpenSource SpeechSynth
MBROLA
- http://tcts.fpms.ac.be/synthesis/ - The MBROLA Project
Assistive Context-Aware Toolkit
- Assistive Context-Aware Toolkit (ACAT) - an open source platform developed at Intel Labs to enable people with motor neuron diseases and other disabilities to have full access to the capabilities and applications of their computers through very constrained interfaces suitable for their condition. More specifically, ACAT enables users to easily communicate with others through keyboard simulation, word prediction and speech synthesis. Users can perform a range of tasks such as editing, managing documents, navigating the Web and accessing emails. ACAT was originally developed by researchers at Intel Labs for Professor Stephen Hawking, through a very iterative design process over the course of three years.
Praat
- Praat - doing phonetics by computer
Gnuspeech
- gnuspeech - makes it easy to produce high quality computer speech output, design new language databases, and create controlled speech stimuli for psychophysical experiments. gnuspeechsa is a cross-platform module of gnuspeech that allows command line, or application-based speech output. The software has been released as two tarballs that are available in the project Downloads area of http://savannah.gnu.org/projects/gnuspeech. [2]
Project Merlin
- Project Merlin - A truly free virtual singer, no matter how you want to send all kinds of ideas. [3]
- https://github.com/ProjectMeilin - not fully open yet?
- http://www.cstr.ed.ac.uk/projects/merlin/
- http://ml.cs.yamanashi.ac.jp/world/english
- YouTube: UTAU】Honeyworks ママ ver.acoustic 【徵音梅林cover】
- YouTube: 【徴音梅林】Umbrella カバー
UTAU
- https://en.wikipedia.org/wiki/Utau - a Japanese singing synthesizer application created by Ameya/Ayame. This program is similar to the Vocaloid software, with the difference that it is shareware instead of being released under third party licensing
- https://github.com/stakira/OpenUtau - Open source UTAU editing environment.
Sinsy
- Sinsy - HMM-based Singing Voice Synthesis System
- Sinsy - Singing Voice Synthesizer - how to
- https://github.com/hyperzlib/Sinsy-Remix - The HMM-Based Singing Voice Syntheis System Remix "Sinsy-r"
- https://github.com/YuzukiTsuru/SingSyn - The HMM-Based Singing Voice Synthesis System "SingSyn" Base On Sinsy
- https://github.com/mathigatti/midi2voice - Python script that relies on the sinsy.jp website from the Nagoya Institute of Technology which implements a HMM-based Singing Voice Synthesis System.
UTSU
qtau
- https://notabug.org/isengaara/qtau - Qt based UTAU clone also supports FESTIVAL and MBROLA voices
cadencii
- https://github.com/cadencii/cadencii - simple musical score editor for singing synthesis: VOCALOID, VOCALOID2, UTAU, STRAIGHT with UTAU, and AquesTone are available as synthesizer.
Mozilla TTS
- https://github.com/mozilla/TTS - Deep learning for Text to Speech
CMU Flite
- CMU Flite - a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
mesing
Adobe VoCo
- https://en.wikipedia.org/wiki/Adobe_Voco
- https://arstechnica.com/information-technology/2016/11/adobe-voco-photoshop-for-audio-speech-editing/ [4]
VST Speek
char2wav
loop
IPOX
- IPOX - an experimental, all-prosodic speech synthesizer, developed many years ago by Arthur Dirksen and John Coleman. It is still available for downloading, and was designed to run on a 486 PC running Windows 3.1 or higher, with a 16-bit Windows-compatible sound card, such as the Soundblaster 16. It still seems to run on e.g. XP, but I haven't tried it on Vista.
NPSS
- https://github.com/seaniezhao/torch_npss - pytorch implementation of Neural Parametric Singing Synthesizer 歌声合成
Pink Trombone
- Pink Trombone - Bare-handed procedural speech synthesis, version 1.1, March 2017, by Neil Thapen
Klatter
- https://github.com/fundamental/klatter - a bare bones formant synthesizer based upon the description given in the 1979 paper "Software For a Cascade/Parallel Formant Synthesizer" by Dennis Klatt. This program was not designed for interactive use, though there is code for some minimal midi control. In it's current state, it is enough of a curiosity that it will be preserved, though it may not see much if any use.
Tacotron 2
- https://github.com/Rayhane-mamah/Tacotron-2 - DeepMind's Tacotron-2 Tensorflow implementation
- https://github.com/NVIDIA/tacotron2 - Tacotron 2 - PyTorch implementation with faster-than-realtime inference
SqueezeWave
- https://github.com/tianrengao/SqueezeWave - Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs.
WaveGlow
- https://github.com/NVIDIA/waveglow - In our recent paper, we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable.
STT
- https://github.com/coqui-ai/STT -a deep learning toolkit for Speech-to-Text, battle-tested in research and production
Real-Time-Voice-Cloning
- https://github.com/CorentinJ/Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
rapping-neural-network
- https://github.com/robbiebarrat/rapping-neural-network - Rap song writing recurrent neural network trained on Kanye West's entire discography
yukarin
- https://github.com/Hiroshiba/yukarin - This repository is refactoring the training code for the first stage model of Bcome Yukarin: Convert your voice to favorite voice.
leesampler
- https://github.com/GloomyGhost-MosquitoSeal/lessampler - a Singing Voice Synthesizer
VoiceOfFaust
- https://github.com/magnetophon/VoiceOfFaust - Turn your voice into a synthesizer!
tomomibot
- https://github.com/adzialocha/tomomibot - Artificial intelligence bot for live voice improvisation.
Nanceloid
- https://github.com/MegaLoler/Nanceloid - a vocal synth using digital waveguides n stuff lol
Parakeet
- https://github.com/PaddlePaddle/Parakeet - PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, ClariNet, WaveNet, Deep Voice 3, Transformer TTS and FastSpeech)
PaddleSpeech
- https://github.com/PaddlePaddle/PaddleSpeech - A Speech Toolkit based on PaddlePaddle.
Flowtron
- https://github.com/NVIDIA/flowtron - an Autoregressive Flow-based Network for Text-to-Mel-spectrogram Synthesis
TransformerTTS
- https://github.com/as-ideas/TransformerTTS - Implementation of a non-autoregressive Transformer based neural network for text to speech.
TensorflowTTS
- https://github.com/dathudeptrai/TensorflowTTS - TensorflowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2
AutoSpeech
HiFi-GAN
- https://github.com/george-roussos/hifi-gan - a GAN-based model capable of generating high fidelity speech efficiently.
- https://github.com/rhasspy/hifi-gan -Version of Hi-Fi GAN designed to work with; tacotron2-train, glow-tts-train
- https://github.com/rhasspy/tacotron2-train - An implementation of Tacotron2 designed to work with Gruut
Wave-U-net-TF2
- https://github.com/satvik-venkatesh/Wave-U-net-TF2 -implements the Wave-U-net architecture in TensorFlow 2
larynx
- https://github.com/rhasspy/larynx - End to end text to speech system using gruut and onnx
FastSpeech2
- https://github.com/ming024/FastSpeech2 - An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
TensorVox
- https://github.com/ZDisket/TensorVox - an application designed to enable user-friendly and lightweight neural speech synthesis in the desktop, aimed at increasing accessibility to such technology.Powered by TensorflowTTS, it is written in pure C++/Qt, using the Tensorflow C API for interacting with the models. This way, we can perform inference without having to install gigabytes worth of pip libraries, just a 100MB DLL.
Coqui TTS
- Coqui TTS - an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.Coqui STT has APIs for numerous languages (Python, C/C++, Java, JavaScript, .NET...), is supported on many platforms (Linux, macOS, Windows, ARM...), and is available on GitHub.
- https://github.com/coqui-ai/TTS - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
vits
- https://github.com/jaywalnut310/vits - VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
phonemizer
espeak-phonemizer
- https://github.com/rhasspy/espeak-phonemizer - Uses ctypes and libespeak-ng to transform test into IPA phonemes
nnsvs
- https://github.com/r9y9/nnsvs - Neural network-based singing voice synthesis library for research.
unagan
- https://github.com/ciaua/unagan - contains the code and samples for our paper "Unconditional Audio Generation with GAN and Cycle Regularization", accepted by INTERSPEECH 2020. The goal is to unconditionally generate singing voices, speech, and instrument sounds with GAN.The model is implemented with PyTorch.
vocshape
- https://github.com/PaulBatchelor/vocshape - a very simple proof-of-concept musical instrument for Android that aims to demonstrate the sculptability of a simple articulatory synthesis physical model for vocal synthesis.
- YouTube: vocshape demo3
lexconvert
- https://github.com/ssb22/lexconvert - Convert phoneme codes and lexicon formats for English speech synths
HiFiSinger
- https://github.com/CODEJIN/HiFiSinger - This code is an unofficial implementation of HiFiSinger. The algorithm is based on the following papers:Chen, J., Tan, X., Luan, J., Qin, T., & Liu, T. Y. (2020). HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776.Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., & Liu, T. Y. (2019). Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems, 32, 3171-3180.Yamamoto, R., Song, E., & Kim, J. M. (2020, May). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6199-6203). IEEE.
PortaSpeech
- https://github.com/keonlee9420/PortaSpeech - PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
KaraSinger
- https://github.com/jerrygood0703/KaraSinger - SCORE-FREE SINGING VOICE SYNTHESIS WITH VQ-VAE USING MEL-SPECTROGRAMS Demo
hifigan
- https://github.com/bshall/hifigan - An 16kHz implementation of HiFi-GAN for soft-vc.
VOICEVOX
- VOICEVOX - 無料で使える中品質なテキスト読み上げソフトウェア] - 無料で使える中品質なテキスト読み上げソフトウェア
Comprehensive-Transformer-TTS
- https://github.com/keonlee9420/Comprehensive-Transformer-TTS - A Non-Autoregressive Transformer based TTS, supporting a family of SOTA transformers with supervised and unsupervised duration modelings.
DiffGAN-TTS
- https://github.com/keonlee9420/DiffGAN-TTS - PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
DiffSinger
- https://github.com/MoonInTheRiver/DiffSinger - DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Conversion
See also Effects#Pitch shifting
crank
- https://github.com/k2kobayashi/crank - Non-parallel voice conversion based on vector-quantized variational autoencoder
MelGAN-VC
- https://github.com/marcoppasini/MelGAN-VC - : Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms
World
- https://github.com/mmorise/World - free software for high-quality speech analysis, manipulation and synthesis. It can estimate Fundamental frequency (F0), aperiodicity and spectral envelope and also generate the speech like input speech with only estimated parameters.
Scyclone
- https://github.com/Miralan/Scyclone - (Voice Conversion)
Shallow WaveNet Vocoder
- https://github.com/patrickltobing/shallow-wavenet - Shallow WaveNet Vocoder with Laplacian Distribution using Multiple Samples Output based on Linear Prediction / with Softmax Output
speech-resynthesis
- https://github.com/facebookresearch/speech-resynthesis - Implementation of the method described in the Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.