ML / AI

From Things and Stuff Wiki
Revision as of 05:02, 30 October 2023 by Milk (talk | contribs) (→‎Image)
Jump to navigation Jump to search


Machine learning

messss



  • A Course in Machine Learning - a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.







Why Nvidia’s AI Supremacy is Only Temporary « Pete Warden's blog [2]





"In applications of "usual" machine learning, there is typically a strong focus on the feature engineering part; the model learned by an algorithm can only be so good as its input data. Of course, there must be sufficient discriminatory information in our dataset, however, the performance of machine learning algorithms can suffer substantially when the information is buried in meaningless features. The goal behind deep learning is to automatically learn the features from (somewhat) noisy data; it's about algorithms that do the feature engineering for us to provide deep neural network structures with meaningful information so that it can learn more effectively. We can think of deep learning as algorithms for automatic "feature engineering," or we could simply call them "feature detectors," which help us to overcome the vanishing gradient challenge and facilitate the learning in neural networks with many layers."



  • NNdef - Java and XML based Neural Networks and Knowledge Modeling toolkit and library











  • Caffe - a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.


  • Torch - a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.



  • Data Science Machine - an end-to-end software system that is able to automatically develop predictive models from relational data. The Machine was created by Max Kanter and Kalyan Verramachaneni at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. The system automates two of the most human-intensive components of a data science endeavor: feature engineering, and selection and tuning of the machine learning methods that build predictive models from those features. First, an algorithm called Deep Feature Synthesis automatically engineers features. Next, through an approach called Deep Mining, the Machine composes a generalized machine learning pipeline that includes dimensionality reduction methods, feature selection methods, clustering, and classifier design. Finally, it tunes the parameters through a Gaussian Copula Process.




  • TensorFlow - an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.











  • https://github.com/neo-ai/neo-ai-dlr - a compiler and runtime for machine learning models. The compiler optimizes machine learning models for various target hardware. The runtime executes the model on the target hardware. A stand-alone, light-weight and portable runtime for CNN and decicion-tree models. Built on top of TVM and Treelite runtime, DLR provides simple and unified Python/C++ APIs for loading and running TVM/Treelite compiled models on a wide range of devices, including X86, TRT-enabled GPU and Arm devices.



  • NuPIC - the Numenta Platform for Intelligent Computing, comprises a set of learning algorithms that were first described in a white paper published by Numenta in 2009. The learning algorithms faithfully capture how layers of neurons in the neocortex learn.


  • https://github.com/jadore801120/attention-is-all-you-need-pytorch - PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).A novel sequence to sequence framework utilizes the self-attention mechanism, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)



High-level framework

  • https://github.com/apache/incubator-mxnet - a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.


  • https://github.com/keras-team/keras - a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.



Stable Diffusion

  • https://github.com/Lightning-AI/stable-diffusion-deploy - Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.




  • https://github.com/vitoplantamura/OnnxStream - The challenge is to run Stable Diffusion, which includes a large transformer model with almost 1 billion parameters, on a Raspberry Pi Zero 2, which is a microcomputer with 512MB of RAM, without adding more swap space and without offloading intermediate results on disk. The recommended minimum RAM/VRAM for Stable Diffusion is typically 8GB. [31]



  • Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality | LMSYS Org - We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases. The cost of training Vicuna-13B is around $300. The code and weights, along with an online demo, are publicly available for non-commercial use.


LangChain

  • Langchain - a framework for developing applications powered by language models. It enables applications that are: Data-aware: connect a language model to other sources of data, Agentic: allow a language model to interact with its environment. The main value props of LangChain are: Components: abstractions for working with language models, along with a collection of implementations for each abstraction. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not, Off-the-shelf chains: a structured assembly of components for accomplishing specific higher-level tasks


Guidance

  • https://github.com/microsoft/guidance - enables you to control modern language models more effectively and efficiently than traditional prompting or chaining. Guidance programs allow you to interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text. Simple output structures like Chain of Thought and its many variants (e.g., ART, Auto-CoT, etc., have been shown to improve LLM performance. The advent of more powerful LLMs like GPT-4 allows for even richer structure, and guidance makes that structure easier and cheaper.

Outlines

  • https://github.com/normal-computing/outlines - allows you to control and diagnose interactions with LLMs more effectively. Modern language models are powerful and versatile, but the way they interface with existing systems can be very brittle, their outputs can be unreliable, and complex workflows (agents, can introduce a lot of error-prone code duplication. Outlines provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc. Outlines helps developers control text generation and produce predictable outputs that make the interaction with user code more robust. Its sampling-first approach allows one to diagnose issues with model-generated output more easily, and implement more robust generation methods such as self-consistency or DiVeRSe. Outlines is designed as a library that integrates well with the broader Python environment. Generation can be interleaved with control flow or custom function calls, prompts can be imported from other modules or libraries.

News

  • Distill — Latest articles about machine learning



to sort


  • Embeddings: What they are and why they matter - Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon. If you can make it through that jargon, they unlock powerful and exciting techniques that can be applied to all sorts of interesting problems. [34]







  • OpenAI - a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as is possible safely. The outcome of this venture is uncertain and the work is difficult, but we believe the goal and the structure are right. We hope this is what matters most to the best in the field. [36]











  • https://github.com/kyegomez/Sophia - a second order clipped stochastic optimization algorithm that uses an inexpensive stochastic estimate of the diagonal of the Hessian as an pre-conditioner and a clipping mechanism to control the worst case update size. It achieves better performance than adam in terms of validation pre-traing loss, total compute, and wall-clock time. By cutting model training cost in half, Sophia can help save millions if not billions of dollars in computational resources.




  • https://github.com/huggingface/diffusers - hugs Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, hugs Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions.


  • Minigpt-4 - Enhancing Vision-language Understanding with Advanced Large Language Models


  • https://github.com/evejourney/eve-reasoning - Welcome to the EVE Reasoning Engine repository. This engine enables EVE, the AI specialized in erotic chat, to have a working reasoning system for both general purpose reasoning and erotic discussions.



  • https://github.com/philpax/ggml - Tensor library for machine learning. Note that this project is under active development. Some of the development is currently happening in the llama.cpp and whisper.cpp repos
    • https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md - a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML. It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models.




  • AI Horde - This is a crowdsourced distributed cluster of Image generation workers and text generation workers. If you like this service, consider joining the horde yourself!


MindsDB

  • https://github.com/mindsdb/mindsdb - AI Virtual Database that empowers developers to connect any AI/ML model to any datasource. This includes relational and non-relational databases, data warehouses and SaaS applications. MindsDB offers two primary benefits to its users. Hook AI models to run automatically as new data is observed and plug the output into any of our integrations. Automate training and finetuning AI models from data contained in any of the 130+ datasources we support.

ChatGPT








  • https://github.com/TheR1D/shell_gpt - A command-line interface (CLI) productivity tool powered by OpenAI's text-davinci-003 model, will help you accomplish your tasks faster and more efficiently.
  • https://github.com/fedenunez/tulp - Tulp is a command-line tool that can help you create and process piped content using the power of ChatGPT directly from the terminal.



  • https://github.com/xtekky/gpt4free - decentralising the Ai Industry, free gpt-4/3.5 scripts through several reverse engineered api's ( poe.com, phind.com, chat.openai.com, writesonic.com, sqlchat.ai, t3nsor.com, you.com etc...)



  • [2302.11382 A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT] - [43]



  • LocalAI - the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU. No GPU required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j, cerebras, falcon, dolly, starcoder, and many others

PaLM

  • https://en.wikipedia.org/wiki/PaLM - a 540 billion parameter transformer-based large language model developed by Google AI. Researchers also trained smaller versions of PaLM, 8 and 62 billion parameter models, to test the effects of model scale.

PaLM is capable of a wide range of tasks, including commonsense reasoning, arithmetic reasoning, joke explanation, code generation, and translation. When combined with chain-of-thought prompting, PaLM achieved significantly better performance on datasets requiring reasoning of multiple steps, such as word problems and logic-based questions.

LLaMA











  • Llama Hub - Connect custom data sources to your LLM with one or more of these plugins (via LlamaIndex or LangChain)



OpenLLM

h2oGPT

  • https://github.com/h2oai/h2ogpt - a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.

Qwen-7B

Audio

See also Generative#Neural net









  • https://github.com/jmoso13/jukebox-diffusion - relies heavily on work produced by OpenAI (Jukebox, and HarmonAI (Dance Diffusion), also big thanks to Flavio Schneider for his work creating the audio-diffusion repo I used for diffusion models. At its core Jukebox Diffusion is a hierarchical latent diffusion model. JBDiff uses the encoder & decoder layers of a Jukebox model to travel between audio space and multiple differently compressed latent spaces. At each of the three latent levels a Denoising U-Net Model is trained to iteratively denoise a normally distributed variable to sample vectors representing compressed audio. The final layer of JBDiff is a Dance Diffusion Denoising U-Net model, providing a bump in audio quality and transforming the mono output of Jukebox into final stereo audio.




  • https://github.com/suno-ai/bark - a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.



  • https://github.com/LAION-AI/CLAP - extract a latent representation of any given audio and text for your own model, or for different downstream tasks. All codes are comming officially with the following paper, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023: Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation



  • AudioLDM: Text-to-Audio Generation with Latent Diffusion Models - Speech Research - Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train latent diffusion models (LDMs) with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion.






  • DAFx-23 - Neural tape - The sound of magnetic recording media, such as open reel and cassette tape recorders, is still sought after by today's sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hysteretic nonlinearity and filtering jointly produced by the magnetic recording process as well as the record and playback amplifiers, the fluctuating delay originating from the tape transport, and the combined additive noise component from various electromagnetic origins. In our approach, the hysteretic nonlinear block is modeled using a recurrent neural network, while the delay trajectories and the noise component are generated using separate diffusion models, which employ U-net deep convolutional neural networks. According to the conducted objective evaluation, the proposed architecture faithfully captures the character of the magnetic tape recorder. The results of this study can be used to construct virtual replicas of vintage sound recording devices.




  • https://github.com/bshall/urhythmic - Official repository for Rhythm Modeling for Voice Conversion. Abstract: Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic - an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments.Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody. Note: Urhythmic builds on soft speech units from our paper A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.


  • -2307.04686- VampNet: Music Generation via Masked Acoustic Token Modeling - We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.


  • Mubert - Thousands of Staff-Picked Royalty-Free Music Tracks for Streaming, Videos, Podcasts, Commercial Use and Online Content, H*uman x AI Generative Music For your video content, podcasts and apps
  • https://github.com/MubertAI/Mubert-Text-to-Music - A simple notebook demonstrating prompt-based music generation via Mubert API




  • MusicGen - We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output.


  • https://github.com/YuanGongND/ssast - This repository contains the official implementation (in PyTorch) of the Self-Supervised Audio Spectrogram Transformer (SSAST) proposed in the AAAI 2022 paper SSAST: Self-Supervised Audio Spectrogram Transformer (Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass; MIT CSAIL). [Slides]

SSAST is the first patch-based joint discriminative and generative self-supervised learning framework, and also the first self-supervised learning framework for AST. SSAST significantly boosts AST performance on all downstream tasks we evaluated with an average improvement of 60.9%, leading to similar or even better results than a supervised pretrained AST. SSAST can be used as a drop-in replacement of previous ImageNet (supervised) pretrained AST, and has the advantage of 1) no labeled data is used; 2) flexible patch size and shape, ImagenNet pretraining only supports square patches; and 3) better performance on many tasks, in particular speech tasks.



Compression


Quality

Source separation

  • https://github.com/samim23/polymath - uses machine learning to convert any music library (e.g from Hard-Drive or YouTube) into a music production sample-library. The tool automatically separates songs into stems (beats, bass, etc.), quantizes them to the same tempo and beat-grid (e.g. 120bpm), analyzes musical structure (e.g. verse, chorus, etc.), key (e.g C4, E3, etc.) and other infos (timbre, loudness, etc.), and converts audio to midi. The result is a searchable sample library that streamlines the workflow for music producers, DJs, and ML audio developers.



  • https://github.com/Audio-AGI/AudioSep - This repository contains the official implementation of "Separate Anything You Describe". We introduce AudioSep, a foundation model for open-domain sound separation with natural language queries. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability on numerous tasks such as audio event separation, musical instrument separation, and speech enhancement.

Image





Video

  • https://github.com/NVIDIA/vid2vid - Pytorch implementation for high-resolution (e.g., 2048x1024, photorealistic video-to-video translation. It can be used for turning semantic label maps into photo-realistic videos, synthesizing people talking from edge maps, or generating human motions from poses. The core of video-to-video translation is image-to-image translation. Some of our work in that space can be found in pix2pixHD and SPADE.



Medicine

Journalism

  • Gizmodo used AI to write a Star Wars story. It was filled with errors. - The Washington Post - The article quickly prompted an outcry among staffers who complained in the company’s internal Slack messaging system that the error-riddled story was “actively hurting our reputations and credibility,” showed “zero respect” for journalists and should be deleted immediately, according to messages obtained by The Washington Post. The story was written using a combination of Google Bard and ChatGPT, according to a G/O Media staff member familiar with the matter. (G/O Media owns several digital media sites including Gizmodo, Deadspin, The Root, Jezebel and The Onion.)


Utils


Tools

See also Automation

Khoj

  • Khoj - an open-source, AI personal assistant that links up all your knowledge bases. It is a thinking tool that is transparent, fun and easy to engage with. Khoj can help you build faster and better by using an AI assistant to search and reason across your disparate data sources. Khoj learns from your notes, documents, emails to function as an extension of your brain. So that you can stay focused on doing what matters. Khoj currently exposes the ability to search and chat with your personal knowledge base in your file system, while keeping you in control of your data. Khoj started with the founding principle that a personal assistant be understandable, accessible and hackable. This means you can always customize and self-host your Khoj on your own machines. [52] [53]

Magic Loops

Tabby

  • Tabby - Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot. Warning Tabby is still in the alpha phase. Self-contained, with no need for a DBMS or cloud service. OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). Consumer level GPU supports (FP-16 weight loading with various optimization).


AI_CHANGELOG


AGI

Function