ML / AI
- YouTube: Neural Network Architectures
- A Course in Machine Learning - a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.
- https://github.com/iamtrask/Grokking-Deep-Learning - this repository accompanies the book "Grokking Deep Learning"
- https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research - These datasets are applied for machine learning (ML, research and have been cited in peer-reviewed academic journals.
- https://github.com/evanmiller/LLM-Reading-List - with an emphasis on inference and model compression.
- https://github.com/rasbt/python-machine-learning-book/blob/master/faq/difference-deep-and-normal-learning.md 
"In applications of "usual" machine learning, there is typically a strong focus on the feature engineering part; the model learned by an algorithm can only be so good as its input data. Of course, there must be sufficient discriminatory information in our dataset, however, the performance of machine learning algorithms can suffer substantially when the information is buried in meaningless features. The goal behind deep learning is to automatically learn the features from (somewhat) noisy data; it's about algorithms that do the feature engineering for us to provide deep neural network structures with meaningful information so that it can learn more effectively. We can think of deep learning as algorithms for automatic "feature engineering," or we could simply call them "feature detectors," which help us to overcome the vanishing gradient challenge and facilitate the learning in neural networks with many layers."
- NNdef - Java and XML based Neural Networks and Knowledge Modeling toolkit and library
- https://github.com/google/deepdream 
- https://github.com/graphific/DeepDreamVideo 
- Caffe - a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.
- Torch - a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.
- https://github.com/karpathy/char-rnn - Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
- YouTube: Connections between physics and deep learning [https://news.ycombinator.com/item?id=13062139
- Data Science Machine - an end-to-end software system that is able to automatically develop predictive models from relational data. The Machine was created by Max Kanter and Kalyan Verramachaneni at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. The system automates two of the most human-intensive components of a data science endeavor: feature engineering, and selection and tuning of the machine learning methods that build predictive models from those features. First, an algorithm called Deep Feature Synthesis automatically engineers features. Next, through an approach called Deep Mining, the Machine composes a generalized machine learning pipeline that includes dimensionality reduction methods, feature selection methods, clustering, and classifier design. Finally, it tunes the parameters through a Gaussian Copula Process.
- TensorFlow - an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.
- https://github.com/neo-ai/neo-ai-dlr - a compiler and runtime for machine learning models. The compiler optimizes machine learning models for various target hardware. The runtime executes the model on the target hardware. A stand-alone, light-weight and portable runtime for CNN and decicion-tree models. Built on top of TVM and Treelite runtime, DLR provides simple and unified Python/C++ APIs for loading and running TVM/Treelite compiled models on a wide range of devices, including X86, TRT-enabled GPU and Arm devices.
- https://github.com/nihalpasham/fingerprinting_radios_w_ML - The key idea behind radio ﬁngerprinting is to extract unique patterns (or features) and use them as signatures to identify devices (or more precisely ID a radio embedded within a device).
- NuPIC - the Numenta Platform for Intelligent Computing, comprises a set of learning algorithms that were first described in a white paper published by Numenta in 2009. The learning algorithms faithfully capture how layers of neurons in the neocortex learn.
- https://github.com/jadore801120/attention-is-all-you-need-pytorch - PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).A novel sequence to sequence framework utilizes the self-attention mechanism, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)
- https://github.com/weihaox/awesome-neural-rendering - A collection of resources on neural rendering.
- https://github.com/apache/incubator-mxnet - a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.
- https://github.com/keras-team/keras - a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
- - Gradio - the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!
- https://github.com/Lightning-AI/stable-diffusion-deploy - Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.
- https://github.com/mlc-ai/web-stable-diffusion - Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
- https://github.com/AUTOMATIC1111/stable-diffusion-webui - A browser interface based on Gradio library for Stable Diffusion.
- https://github.com/vitoplantamura/OnnxStream - The challenge is to run Stable Diffusion, which includes a large transformer model with almost 1 billion parameters, on a Raspberry Pi Zero 2, which is a microcomputer with 512MB of RAM, without adding more swap space and without offloading intermediate results on disk. The recommended minimum RAM/VRAM for Stable Diffusion is typically 8GB. 
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality | LMSYS Org - We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases. The cost of training Vicuna-13B is around $300. The code and weights, along with an online demo, are publicly available for non-commercial use.
- https://github.com/lm-sys/FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. 
- Langchain - a framework for developing applications powered by language models. It enables applications that are: Data-aware: connect a language model to other sources of data, Agentic: allow a language model to interact with its environment. The main value props of LangChain are: Components: abstractions for working with language models, along with a collection of implementations for each abstraction. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not, Off-the-shelf chains: a structured assembly of components for accomplishing specific higher-level tasks
- https://github.com/microsoft/guidance - enables you to control modern language models more effectively and efficiently than traditional prompting or chaining. Guidance programs allow you to interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text. Simple output structures like Chain of Thought and its many variants (e.g., ART, Auto-CoT, etc., have been shown to improve LLM performance. The advent of more powerful LLMs like GPT-4 allows for even richer structure, and guidance makes that structure easier and cheaper.
- https://github.com/normal-computing/outlines - allows you to control and diagnose interactions with LLMs more effectively. Modern language models are powerful and versatile, but the way they interface with existing systems can be very brittle, their outputs can be unreliable, and complex workflows (agents, can introduce a lot of error-prone code duplication. Outlines provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc. Outlines helps developers control text generation and produce predictable outputs that make the interaction with user code more robust. Its sampling-first approach allows one to diagnose issues with model-generated output more easily, and implement more robust generation methods such as self-consistency or DiVeRSe. Outlines is designed as a library that integrates well with the broader Python environment. Generation can be interleaved with control flow or custom function calls, prompts can be imported from other modules or libraries.
- Distill — Latest articles about machine learning
- https://github.com/plaidml/plaidml - PlaidML is a framework for making deep learning work everywhere.
- Embeddings: What they are and why they matter - Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon. If you can make it through that jargon, they unlock powerful and exciting techniques that can be applied to all sorts of interesting problems. 
- OpenAI - a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as is possible safely. The outcome of this venture is uncertain and the work is difficult, but we believe the goal and the structure are right. We hope this is what matters most to the best in the field. 
- https://github.com/daveshap/OpenAI_Agent_Swarm - Hierarchical Autonomous Agent Swarm (HAAS). We have our first GPT Concierge. You can chat with this custom ChatGPT to figure out what's going on!
- https://github.com/deepseek-ai/DeepSeek-Coder - composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and an extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, DeepSeek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
- https://github.com/formulahendry/awesome-gpt - A curated list of awesome projects and resources related to GPT, ChatGPT, OpenAI, LLM, and more.
- https://github.com/lucidrains/PaLM-rlhf-pytorch -Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Maybe I'll add retrieval functionality too, à la RETRO
- https://github.com/peterw/Chat-with-Github-Repo - contains two Python scripts that demonstrate how to create a chatbot using Streamlit, OpenAI GPT-3.5-turbo, and Activeloop's Deep Lake.
- https://github.com/kyegomez/Sophia - a second order clipped stochastic optimization algorithm that uses an inexpensive stochastic estimate of the diagonal of the Hessian as an pre-conditioner and a clipping mechanism to control the worst case update size. It achieves better performance than adam in terms of validation pre-traing loss, total compute, and wall-clock time. By cutting model training cost in half, Sophia can help save millions if not billions of dollars in computational resources.
- https://github.com/jncraton/languagemodels - Python building blocks to explore large language models on any computer with 512MB of RAM
- https://github.com/huggingface/diffusers - hugs Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, hugs Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions.
- Minigpt-4 - Enhancing Vision-language Understanding with Advanced Large Language Models
- https://github.com/Maknee/minigpt4.cpp - Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML,
- https://github.com/evejourney/eve-reasoning - Welcome to the EVE Reasoning Engine repository. This engine enables EVE, the AI specialized in erotic chat, to have a working reasoning system for both general purpose reasoning and erotic discussions.
- https://github.com/philpax/ggml - Tensor library for machine learning. Note that this project is under active development. Some of the development is currently happening in the llama.cpp and whisper.cpp repos
- https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md - a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML. It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models.
- https://github.com/Yifan-Song793/RestGPT - An LLM-based autonomous agent controlling real-world applications via RESTful APIs 
- Petals – Run LLMs at home, BitTorrent-style - Run large language models at home, BitTorrent‑style 
- AI Horde - This is a crowdsourced distributed cluster of Image generation workers and text generation workers. If you like this service, consider joining the horde yourself!
- ArtBot - Create images with Stable Diffusion, utilizing the AI Horde - ArtBot is your gateway to experiment with the wonderful world of generative AI art using the power of the AI Horde, a distributed open source network of GPUs running Stable Diffusion.
- Why You (Probably) Don't Need to Fine-tune an LLM - Tidepool by Aquarium - Just to reiterate, fine-tuning (except in some rare cases) negates most of the resource-saving benefits from recent LLMs — the reasons that people are flocking to this technology in the first place. The biggest reason why NLP was hard to do before late 2022 was because you needed to collect data, label data, train models, host infra — and all that requires hiring an ML ops and eng team! Now with using LLMs out of the box, the startup cost is incredibly low. There are a whole bunch of orgs that never would have done NLP if not for LLMs making the bar so low. Is it worth investing your eng time into fine-tuning when state-of-the-art is advancing so quickly? Sure, you’ll have a slight competitive advantage if your model has better accuracy/quality — but will you still think so a few months later when other companies get the same boosted functionality with GPT-5, no effort required? This is why we recommend that you focus your attention on lighter-touch approaches like few-shot prompting and retrieval augmented generation (RAG).
- https://github.com/mindsdb/mindsdb - AI Virtual Database that empowers developers to connect any AI/ML model to any datasource. This includes relational and non-relational databases, data warehouses and SaaS applications. MindsDB offers two primary benefits to its users. Hook AI models to run automatically as new data is observed and plug the output into any of our integrations. Automate training and finetuning AI models from data contained in any of the 130+ datasources we support.
GPT / ChatGPT
- https://github.com/AgentOps-AI/BestGPTs - Top ranked OpenAI GPTs, ranked and sorted by AgentOps
- https://github.com/lencx/ChatGPT - crystal_ball ChatGPT Desktop Application (Mac, Windows and Linux)
- https://github.com/asrul10/linux-command-gpt - Get Linux commands in natural language with the power of ChatGPT.
- https://github.com/TheAppleTucker/backend-GPT - We've built a entire Backend+Database powered by an LLM. It infers business logic based on the name of the API call and can persist a kilobyte of state!
- https://github.com/TheR1D/shell_gpt - A command-line interface (CLI) productivity tool powered by OpenAI's text-davinci-003 model, will help you accomplish your tasks faster and more efficiently.
- https://github.com/Nutlope/aicommits - A CLI that writes your git commit messages for you with AI
- https://github.com/fedenunez/tulp - Tulp is a command-line tool that can help you create and process piped content using the power of ChatGPT directly from the terminal.
- https://github.com/Nutlope/roomGPT - Upload a photo of your room to generate your dream room with AI.
- https://github.com/xtekky/gpt4free - decentralising the Ai Industry, free gpt-4/3.5 scripts through several reverse engineered api's ( poe.com, phind.com, chat.openai.com, writesonic.com, sqlchat.ai, t3nsor.com, you.com etc...)
- GPT-4 System Card [pdf | Hacker News]
- https://github.com/PromtEngineer/localGPT - Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
- LocalAI - the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU. No GPU required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j, cerebras, falcon, dolly, starcoder, and many others
- https://en.wikipedia.org/wiki/PaLM - a 540 billion parameter transformer-based large language model developed by Google AI. Researchers also trained smaller versions of PaLM, 8 and 62 billion parameter models, to test the effects of model scale.
PaLM is capable of a wide range of tasks, including commonsense reasoning, arithmetic reasoning, joke explanation, code generation, and translation. When combined with chain-of-thought prompting, PaLM achieved significantly better performance on datasets requiring reasoning of multiple steps, such as word problems and logic-based questions.
- https://github.com/ggerganov/llama.cpp - Port of Facebook's LLaMA model in C/C++
- LlamaIndex - Data Framework for LLM Applications. LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. 
- https://github.com/jerryjliu/llama_index - LlamaIndex (GPT Index) is a data framework for your LLM applications
- Llama Hub - Connect custom data sources to your LLM with one or more of these plugins (via LlamaIndex or LangChain)
- https://github.com/jmorganca/ollama - Get up and running with Llama 2 and other large language models locally
- https://github.com/Mozilla-Ocho/llamafile - lets you distribute and run LLMs with a single file. Our goal is to make the "build once anywhere, run anywhere" dream come true for AI developers. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that lets you build apps for LLMs as a single-file artifact that runs locally on most PCs and servers.
- https://github.com/bentoml/OpenLLM - An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease. 
- https://github.com/h2oai/h2ogpt - a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.
- https://github.com/QwenLM/Qwen-7B/tree/main - The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud.
See also Generative#Neural net
- https://github.com/GuitarML/mldsp-papers - Collection of papers related to neural nets/machine learning for audio DSP.
- https://github.com/Yuan-ManX/audio-ai-agent - Audio AI Agent, including speech, music, sound effects, etc.
- https://github.com/fatchord/WaveRNN - Pytorch implementation of Deepmind's WaveRNN model from Efficient Neural Audio Synthesis
- https://github.com/haoheliu/audioldm_eval - This toolbox aims to unify audio generation model evaluation for easier future comparison.
- https://github.com/d3n7/riffusionDJ - Multichannel Looper/Feedback System for Riffusion
- https://github.com/jmoso13/jukebox-diffusion - relies heavily on work produced by OpenAI (Jukebox, and HarmonAI (Dance Diffusion), also big thanks to Flavio Schneider for his work creating the audio-diffusion repo I used for diffusion models. At its core Jukebox Diffusion is a hierarchical latent diffusion model. JBDiff uses the encoder & decoder layers of a Jukebox model to travel between audio space and multiple differently compressed latent spaces. At each of the three latent levels a Denoising U-Net Model is trained to iteratively denoise a normally distributed variable to sample vectors representing compressed audio. The final layer of JBDiff is a Dance Diffusion Denoising U-Net model, providing a bump in audio quality and transforming the mono output of Jukebox into final stereo audio.
- https://github.com/d3n7/GPT-4-To-MIDI - Text prompt to MIDI File using OpenAI's GPT-4. Now with polyphony and MIDI input!
- https://github.com/suno-ai/bark - a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.
- https://github.com/AIGC-Audio/AudioGPT - Understanding and Generating Speech, Music, Sound, and Talking Head
- AudioGPT - a Hugging Face Space by AIGC-Audio
- https://github.com/LAION-AI/CLAP - extract a latent representation of any given audio and text for your own model, or for different downstream tasks. All codes are comming officially with the following paper, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023: Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
- https://github.com/declare-lab/tango - Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
- AudioLDM: Text-to-Audio Generation with Latent Diffusion Models - Speech Research - Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train latent diffusion models (LDMs) with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion.
- https://github.com/haoheliu/AudioLDM - AudioLDM: Generate speech, sound effects, music and beyond, with text.
- https://github.com/tuneflow/AudioLDM - Fork of https://github.com/haoheliu/AudioLDM as a TuneFlow Plugin
- Groove2Groove – One-shot music style transferGrv2Grv) is an AI system for music accompaniment style transfer. Given two MIDI files – a content input and a style input – it generates a new accompaniment for the first file in the style of the second one
- https://github.com/zhvng/open-musiclm - Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.
- https://github.com/IgnacioIrigaray/AnalogAudioTapeDenoising - Dataset and model for analog audio tape denoising
- DAFx-23 - Neural tape - The sound of magnetic recording media, such as open reel and cassette tape recorders, is still sought after by today's sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hysteretic nonlinearity and filtering jointly produced by the magnetic recording process as well as the record and playback amplifiers, the fluctuating delay originating from the tape transport, and the combined additive noise component from various electromagnetic origins. In our approach, the hysteretic nonlinear block is modeled using a recurrent neural network, while the delay trajectories and the noise component are generated using separate diffusion models, which employ U-net deep convolutional neural networks. According to the conducted objective evaluation, the proposed architecture faithfully captures the character of the magnetic tape recorder. The results of this study can be used to construct virtual replicas of vintage sound recording devices.
- https://github.com/01tot10/neural-tape-modeling - Neural Modeling of Magnetic Tape Recorders
- https://github.com/thhuang/SheetMusicAI - From sheet music to midi!
- https://github.com/bshall/urhythmic - Official repository for Rhythm Modeling for Voice Conversion. Abstract: Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic - an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments.Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody. Note: Urhythmic builds on soft speech units from our paper A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
- -2307.04686- VampNet: Music Generation via Masked Acoustic Token Modeling - We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
- -2307.04686- VampNet: Music Generation via Masked Acoustic Token Modeling - the supplemental material for the ISMIR 2023 submission “VampNet: Music Generation via Masked Acoustic Token Modeling”. Here, we showcase different use cases of VampNet as a music creation tool, and create musical loops and variations from short musical excerpts.
- Making Loops and Variations
- Mubert - Thousands of Staff-Picked Royalty-Free Music Tracks for Streaming, Videos, Podcasts, Commercial Use and Online Content, H*uman x AI Generative Music For your video content, podcasts and apps
- https://github.com/MubertAI/Mubert-Text-to-Music - A simple notebook demonstrating prompt-based music generation via Mubert API
- Audiocraft - a single-stop code base for all your generative audio needs: music, sound effects, and compression after training on raw audio signals.
- MusicGen - We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output.
- https://github.com/YuanGongND/ssast - This repository contains the official implementation (in PyTorch) of the Self-Supervised Audio Spectrogram Transformer (SSAST) proposed in the AAAI 2022 paper SSAST: Self-Supervised Audio Spectrogram Transformer (Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass; MIT CSAIL). [Slides]
SSAST is the first patch-based joint discriminative and generative self-supervised learning framework, and also the first self-supervised learning framework for AST. SSAST significantly boosts AST performance on all downstream tasks we evaluated with an average improvement of 60.9%, leading to similar or even better results than a supervised pretrained AST. SSAST can be used as a drop-in replacement of previous ImageNet (supervised) pretrained AST, and has the advantage of 1) no labeled data is used; 2) flexible patch size and shape, ImagenNet pretraining only supports square patches; and 3) better performance on many tasks, in particular speech tasks.
- https://github.com/Stability-AI/stable-audio-tools - Training and inference code for audio generation models
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis - Abstract: Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations, resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation is an appealing alternative, aligning more accurately with human auditory perception, and benefitting from well-established fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that addresses the key challenges of modeling spectral coefficients. Vocos demonstrates improved computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural vocoding approaches. As shown by objective evaluation, Vocos not only matches state-of-the-art audio quality, but thanks to frequency-aware generator, also effectively mitigates the periodicity issues frequently associated with time-domain GANs. The source code and model weights have been open-sourced at https://github.com/charactr-platform/vocos.
- https://github.com/charactr-platform/vocos - Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
- https://github.com/facebookresearch/encodec - State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
- https://github.com/saurabh-kataria/awesome-bandwidth-expansion - an attempt to list interesting audio bandwidth expansion/super-resolution research works.
- https://github.com/samim23/polymath - uses machine learning to convert any music library (e.g from Hard-Drive or YouTube) into a music production sample-library. The tool automatically separates songs into stems (beats, bass, etc.), quantizes them to the same tempo and beat-grid (e.g. 120bpm), analyzes musical structure (e.g. verse, chorus, etc.), key (e.g C4, E3, etc.) and other infos (timbre, loudness, etc.), and converts audio to midi. The result is a searchable sample library that streamlines the workflow for music producers, DJs, and ML audio developers.
- AIcrowd | Sound Demixing Challenge 2023 | Challenges
- https://github.com/aim-qmul/sdx23-aimless - Artificial Intelligence and Music League for Effective Source Separation) is a special interest group in audio source separation at CDM, consisting of PhD students from the AIM CDT program. This repository is adapted from Danna-Sep and contains our training code for the SDX Sound Demixing Challenge.
- https://github.com/Audio-AGI/AudioSep - This repository contains the official implementation of "Separate Anything You Describe". We introduce AudioSep, a foundation model for open-domain sound separation with natural language queries. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability on numerous tasks such as audio event separation, musical instrument separation, and speech enhancement.
- https://github.com/xxlong0/Wonder3D - A cross-domain diffusion model for 3D reconstruction from a single image
- https://replicate.com/fofr/sdxl-emoji/examples#v3vwup3bommw264amkfoxhpvie - An SDXL fine-tune based on Apple Emojis
- https://github.com/yuanzhi-zhu/DiffPIR - contains the code and data associated with the paper "Denoising Diffusion Models for Plug-and-Play Image Restoration", which was presented at the CVPR workshop NTIRE 2023. This code is based on the OpenAI Guided Diffusion and DPIR.
- https://github.com/lllyasviel/Fooocus - an image generating software (based on Gradio).
Fooocus is a rethinking of Stable Diffusion and Midjourney’s designs: Learned from Stable Diffusion, the software is offline, open source, and free. Learned from Midjourney, the manual tweaking is not needed, and users only need to focus on the prompts and images.
- https://github.com/threestudio-project/threestudio a unified framework for 3D content creation from text prompts, single images, and few-shot images, by lifting 2D text-to-image generation models. 
- https://github.com/NVIDIA/vid2vid - Pytorch implementation for high-resolution (e.g., 2048x1024, photorealistic video-to-video translation. It can be used for turning semantic label maps into photo-realistic videos, synthesizing people talking from edge maps, or generating human motions from poses. The core of video-to-video translation is image-to-image translation. Some of our work in that space can be found in pix2pixHD and SPADE.
- https://github.com/damo-vilab/videocomposer - Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
- https://github.com/AILab-CVC/VideoCrafter - VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
- -2007.15153- Fast, Structured Clinical Documentation via Contextual Autocomplete - We present a system that uses a learned autocompletion mechanism to facilitate rapid creation of semi-structured clinical documentation. We dynamically suggest relevant clinical concepts as a doctor drafts a note by leveraging features from both unstructured and structured medical data. By constraining our architecture to shallow neural networks, we are able to make these suggestions in real time. Furthermore, as our algorithm is used to write a note, we can automatically annotate the documentation with clean labels of clinical concepts drawn from medical vocabularies, making notes more structured and readable for physicians, patients, and future algorithms. To our knowledge, this system is the only machine learning-based documentation utility for clinical notes deployed in a live hospital setting, and it reduces keystroke burden of clinical concepts by 67% in real environments.
- Gizmodo used AI to write a Star Wars story. It was filled with errors. - The Washington Post - The article quickly prompted an outcry among staffers who complained in the company’s internal Slack messaging system that the error-riddled story was “actively hurting our reputations and credibility,” showed “zero respect” for journalists and should be deleted immediately, according to messages obtained by The Washington Post. The story was written using a combination of Google Bard and ChatGPT, according to a G/O Media staff member familiar with the matter. (G/O Media owns several digital media sites including Gizmodo, Deadspin, The Root, Jezebel and The Onion.)
- https://github.com/HumanSignal/label-studio - a multi-type data labeling and annotation tool with standardized output format
- https://github.com/gclef-cmu/gpt_paper_assistant - a very simple daily scanner for Arxiv that uses GPT4 and author matches to find papers you might find interesting. It will run daily via github actions and can post this information to slack via a bot or just render it in a static github-pages website.
- HyperWrite - framework to enable multimodal models to operate a computer. Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.
See also Automation
- Khoj - an open-source, AI personal assistant that links up all your knowledge bases. It is a thinking tool that is transparent, fun and easy to engage with. Khoj can help you build faster and better by using an AI assistant to search and reason across your disparate data sources. Khoj learns from your notes, documents, emails to function as an extension of your brain. So that you can stay focused on doing what matters. Khoj currently exposes the ability to search and chat with your personal knowledge base in your file system, while keeping you in control of your data. Khoj started with the founding principle that a personal assistant be understandable, accessible and hackable. This means you can always customize and self-host your Khoj on your own machines.  
- Tabby - Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot. Warning Tabby is still in the alpha phase. Self-contained, with no need for a DBMS or cloud service. OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). Consumer level GPU supports (FP-16 weight loading with various optimization).
- https://github.com/joshuasundance-swca/ai_changelog - Using large language models to maintain AI_CHANGELOG.md
- Function - Run AI prediction functions anywhere, with only a predict function.
- Show HN: Generative Fill with AI and 3D | Hacker News