ML / AI

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Things and Stuff Wiki - An organically evolving personal wiki knowledge base. An on-the-fly taxonomy containing a patchwork trail of topic outlines, descriptions, notes, stubs and breadcrumbs, with links to sites, systems, software, manuals, organisations, people, articles, guides, slides, papers, books, comments, videos, screencasts, webcasts, scratchpads and more. Content is orientated towards mostly free/libre/open, mostly Linux. Quality and age varies drastically. Sometimes old things are first, sometimes last. Use the Table of Contents menu to navigate long pages. Zoom in if text is too small. Dead link? Wayback Machine. I probably need to fix the theme CSS after an update. See also libreav.org. Chat to msg me (not checking tho atm). e

Resources

Weather

Machine learning

messss

https://en.wikipedia.org/wiki/Machine_learning

https://news.ycombinator.com/item?id=12713056

YouTube: Neural Network Architectures

A Course in Machine Learning - a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.

https://news.ycombinator.com/item?id=12936891

Tom Mitchell: Never Ending Language Learning

Neural Networks, Manifolds, and Topology -- colah's blog - [1]

OCDevel

YouTube: From Deep Learning of Disentangled Representations to Higher-level Cognition

https://github.com/iamtrask/Grokking-Deep-Learning - this repository accompanies the book "Grokking Deep Learning"

https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research - These datasets are applied for machine learning (ML, research and have been cited in peer-reviewed academic journals.

https://github.com/evanmiller/LLM-Reading-List - with an emphasis on inference and model compression.

LLM Visualization [2]

Why Nvidia’s AI Supremacy is Only Temporary « Pete Warden's blog [3]

https://en.wikipedia.org/wiki/Artificial_neuron

https://en.wikipedia.org/wiki/Perceptron

https://news.ycombinator.com/item?id=12751585

https://github.com/rasbt/python-machine-learning-book/blob/master/faq/difference-deep-and-normal-learning.md [4]

"In applications of "usual" machine learning, there is typically a strong focus on the feature engineering part; the model learned by an algorithm can only be so good as its input data. Of course, there must be sufficient discriminatory information in our dataset, however, the performance of machine learning algorithms can suffer substantially when the information is buried in meaningless features. The goal behind deep learning is to automatically learn the features from (somewhat) noisy data; it's about algorithms that do the feature engineering for us to provide deep neural network structures with meaningful information so that it can learn more effectively. We can think of deep learning as algorithms for automatic "feature engineering," or we could simply call them "feature detectors," which help us to overcome the vanishing gradient challenge and facilitate the learning in neural networks with many layers."

Backpropagation 101 · Thinc · A refreshing functional take on deep learning - [5]

http://deeplearning4j.org/

http://www.damninteresting.com/on-the-origin-of-circuits/ [6]

NNdef - Java and XML based Neural Networks and Knowledge Modeling toolkit and library

https://probmods.org/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding
- https://news.ycombinator.com/item?id=12713388

https://news.ycombinator.com/item?id=12942169

http://www.cs.cmu.edu/~tom7/mario/
- https://www.youtube.com/watch?v=xOCurBYI_gY

http://neuralnetworksanddeeplearning.com/ [7]

https://news.ycombinator.com/item?id=8258652

http://www.technologyreview.com/view/530561/the-revolutionary-technique-that-quietly-changed-machine-vision-forever/

https://news.ycombinator.com/item?id=8553307

http://openssi.net/
- http://records.sigmm.ndlab.net/2014/10/ssi-an-open-source-platform-for-social-signal-interpretation/ [8]

YouTube: Artificial Intelligence: Neural Networks in Machine Learning

https://en.wikipedia.org/wiki/Convolutional_neural_network

https://news.ycombinator.com/item?id=9109157

https://news.ycombinator.com/item?id=11530304

https://news.ycombinator.com/item?id=19144280

http://quantombone.blogspot.co.uk/2015/04/deep-learning-vs-probabilistic.html [9]

https://www.gnu.org/software/gneuralnetwork/ [12]

https://open_nsfw.gitlab.io/ [13]

Caffe - a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.
- Caffe Model Zoo

Torch - a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.
- https://github.com/torch/torch7

https://github.com/karpathy/char-rnn - Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

YouTube: Connections between physics and deep learning [https://news.ycombinator.com/item?id=13062139

DeepRhyme (D-Prime) – generating dope rhymes with deep learning [14]

Composing Music With Recurrent Neural Networks [15]

Data Science Machine - an end-to-end software system that is able to automatically develop predictive models from relational data. The Machine was created by Max Kanter and Kalyan Verramachaneni at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. The system automates two of the most human-intensive components of a data science endeavor: feature engineering, and selection and tuning of the machine learning methods that build predictive models from those features. First, an algorithm called Deep Feature Synthesis automatically engineers features. Next, through an approach called Deep Mining, the Machine composes a generalized machine learning pipeline that includes dimensionality reduction methods, feature selection methods, clustering, and classifier design. Finally, it tunes the parameters through a Gaussian Copula Process.
- System that replaces human intuition with algorithms outperforms human teams

What's Next in Deep Learning [16]

Deep Residual Learning for Image Recognition https://news.ycombinator.com/item?id=10715628

http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/ [17]

TensorFlow - an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.

http://tflearn.org
- https://github.com/tflearn/tflearn [18]

http://playground.tensorflow.org/ [19]

https://www.oreilly.com/learning/hello-tensorflow [20]

https://github.com/sherjilozair/char-rnn-tensorflow

https://nucl.ai/blog/enhance-pixel-art/

http://www-personal.umich.edu/~reedscot/iclr_project.html [21]

https://deepmind.com/blog/wavenet-generative-model-raw-audio/ [22]

http://www.wildml.com/2016/10/learning-reinforcement-learning [23]

https://blog.acolyer.org/2016/10/12/towards-deep-symbolic-reinforcement-learning/ [24]

http://www.wildml.com/2016/10/learning-reinforcement-learning/ [25]

https://research.googleblog.com/2016/12/open-sourcing-embedding-projector-tool.html [26]

https://tryolabs.com/blog/2016/12/06/major-advancements-deep-learning-2016/ [27]

https://nips.cc/Conferences/2016/SpotlightVideos

https://arxiv.org/abs/1612.03770 [28]

https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/ [29]

Operational calculus on programming spaces - https://news.ycombinator.com/item?id=13280818

https://www.darpa.mil/news-events/2016-06-17 [30]

http://distill.pub/about/ [31]

https://vectordash.com

https://github.com/neo-ai/neo-ai-dlr - a compiler and runtime for machine learning models. The compiler optimizes machine learning models for various target hardware. The runtime executes the model on the target hardware. A stand-alone, light-weight and portable runtime for CNN and decicion-tree models. Built on top of TVM and Treelite runtime, DLR provides simple and unified Python/C++ APIs for loading and running TVM/Treelite compiled models on a wide range of devices, including X86, TRT-enabled GPU and Arm devices.

https://github.com/nihalpasham/fingerprinting_radios_w_ML - The key idea behind radio ﬁngerprinting is to extract unique patterns (or features) and use them as signatures to identify devices (or more precisely ID a radio embedded within a device).

NuPIC - the Numenta Platform for Intelligent Computing, comprises a set of learning algorithms that were first described in a white paper published by Numenta in 2009. The learning algorithms faithfully capture how layers of neurons in the neocortex learn.

https://github.com/jadore801120/attention-is-all-you-need-pytorch - PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).A novel sequence to sequence framework utilizes the self-attention mechanism, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)

https://github.com/weihaox/awesome-neural-rendering - A collection of resources on neural rendering.

YouTube: The Surreal Dreams of AI-Generated Art

ONNX - Accelerated Java Machine Learning Production-grade AI engine to speed up training and inferencing in your existing technology stack.
- https://github.com/microsoft/onnxruntime
https://en.wikipedia.org/wiki/Open_Neural_Network_Exchange

https://github.com/microsoft/aici - you build Controllers that constrain and direct output of a Large Language Model (LLM, in real time. Controllers are flexible programs capable of implementing constrained decoding, dynamic editing of prompts and generated text, and coordinating execution across multiple, parallel generations. Controllers incorporate custom logic during the token-by-token decoding and maintain state during an LLM request. This allows diverse Controller strategies, from programmatic or query-based decoding to multi-agent conversations to execute efficiently in tight integration with the LLM itself.

High-level framework

https://github.com/apache/incubator-mxnet - a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.

https://github.com/keras-team/keras - a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

- Gradio - the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!
- https://github.com/gradio-app/gradio

Stable Diffusion

https://github.com/Lightning-AI/stable-diffusion-deploy - Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.

https://github.com/showlab/Awesome-Video-Diffusion - A curated list of recent diffusion models for video generation, editing, restoration, understanding, nerf, etc.

https://github.com/mlc-ai/web-stable-diffusion - Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.

https://github.com/AUTOMATIC1111/stable-diffusion-webui - A browser interface based on Gradio library for Stable Diffusion.

https://github.com/vitoplantamura/OnnxStream - The challenge is to run Stable Diffusion, which includes a large transformer model with almost 1 billion parameters, on a Raspberry Pi Zero 2, which is a microcomputer with 512MB of RAM, without adding more swap space and without offloading intermediate results on disk. The recommended minimum RAM/VRAM for Stable Diffusion is typically 8GB. [32]

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality | LMSYS Org - We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90%* quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca in more than 90%* of cases. The cost of training Vicuna-13B is around $300. The code and weights, along with an online demo, are publicly available for non-commercial use.

https://github.com/lm-sys/FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. [33]

LangChain

Langchain - a framework for developing applications powered by language models. It enables applications that are: Data-aware: connect a language model to other sources of data, Agentic: allow a language model to interact with its environment. The main value props of LangChain are: Components: abstractions for working with language models, along with a collection of implementations for each abstraction. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not, Off-the-shelf chains: a structured assembly of components for accomplishing specific higher-level tasks
- https://github.com/hwchase17/langchain

Guidance

https://github.com/microsoft/guidance - enables you to control modern language models more effectively and efficiently than traditional prompting or chaining. Guidance programs allow you to interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text. Simple output structures like Chain of Thought and its many variants (e.g., ART, Auto-CoT, etc., have been shown to improve LLM performance. The advent of more powerful LLMs like GPT-4 allows for even richer structure, and guidance makes that structure easier and cheaper.

Outlines

https://github.com/normal-computing/outlines - allows you to control and diagnose interactions with LLMs more effectively. Modern language models are powerful and versatile, but the way they interface with existing systems can be very brittle, their outputs can be unreliable, and complex workflows (agents, can introduce a lot of error-prone code duplication. Outlines provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc. Outlines helps developers control text generation and produce predictable outputs that make the interaction with user code more robust. Its sampling-first approach allows one to diagnose issues with model-generated output more easily, and implement more robust generation methods such as self-consistency or DiVeRSe. Outlines is designed as a library that integrates well with the broader Python environment. Generation can be interleaved with control flow or custom function calls, prompts can be imported from other modules or libraries.

News

Distill — Latest articles about machine learning

GPT-3 vs Water Cooler Trivia participants: A Human vs Robot Showdown -

How I Used DALL·E 2 to Generate The Logo for OctoSQL | Jacob Martin - [34]

DALL·E 2 prompt book -pdf- | Hacker News

https://github.com/plaidml/plaidml - PlaidML is a framework for making deep learning work everywhere.

to sort

There's An AI For That - The Biggest AI Aggregator

Embeddings: What they are and why they matter - Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon. If you can make it through that jargon, they unlock powerful and exciting techniques that can be applied to all sorts of interesting problems. [35]

Chat with Open Large Language Models

Jailbreak Chat

How Elon Musk and Y Combinator Plan to Stop Computers From Taking Over

There is a blind spot in AI research : Nature News & Comment - [36]

OpenAI - a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as is possible safely. The outcome of this venture is uncertain and the work is difficult, but we believe the goal and the structure are right. We hope this is what matters most to the best in the field. [37]

https://github.com/daveshap/OpenAI_Agent_Swarm - Hierarchical Autonomous Agent Swarm (HAAS). We have our first GPT Concierge. You can chat with this custom ChatGPT to figure out what's going on!

https://github.com/deepseek-ai/DeepSeek-Coder - composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and an extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, DeepSeek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

http://www.humanbrainproject.eu/

http://www.nih.gov/science/brain/

http://wiki.opencog.org/w/The_Open_Cognition_Project

https://github.com/hades217/awesome-ai

https://github.com/formulahendry/awesome-gpt - A curated list of awesome projects and resources related to GPT, ChatGPT, OpenAI, LLM, and more.

https://github.com/lucidrains/PaLM-rlhf-pytorch -Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Maybe I'll add retrieval functionality too, à la RETRO

The Expanding Dark Forest and Generative AI - [38]

https://github.com/peterw/Chat-with-Github-Repo - contains two Python scripts that demonstrate how to create a chatbot using Streamlit, OpenAI GPT-3.5-turbo, and Activeloop's Deep Lake.

https://github.com/kyegomez/Sophia - a second order clipped stochastic optimization algorithm that uses an inexpensive stochastic estimate of the diagonal of the Hessian as an pre-conditioner and a clipping mechanism to control the worst case update size. It achieves better performance than adam in terms of validation pre-traing loss, total compute, and wall-clock time. By cutting model training cost in half, Sophia can help save millions if not billions of dollars in computational resources.

Scaling Transformer to 1M tokens and beyond with RMT | Hacker News

https://github.com/jncraton/languagemodels - Python building blocks to explore large language models on any computer with 512MB of RAM

https://github.com/huggingface/diffusers - hugs Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, hugs Diffusers is a modular toolbox that supports both. Our library is designed with a focus on usability over performance, simple over easy, and customizability over abstractions.

Minigpt-4 - Enhancing Vision-language Understanding with Advanced Large Language Models

https://github.com/Maknee/minigpt4.cpp - Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML,

https://github.com/evejourney/eve-reasoning - Welcome to the EVE Reasoning Engine repository. This engine enables EVE, the AI specialized in erotic chat, to have a working reasoning system for both general purpose reasoning and erotic discussions.

Patterns for Building LLM-based Systems & Products [39]

https://github.com/philpax/ggml - Tensor library for machine learning. Note that this project is under active development. Some of the development is currently happening in the llama.cpp and whisper.cpp repos
- https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md - a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML. It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models.

https://github.com/Yifan-Song793/RestGPT - An LLM-based autonomous agent controlling real-world applications via RESTful APIs [40]

Petals – Run LLMs at home, BitTorrent-style - Run large language models at home, BitTorrent‑style [41]

AI Horde - This is a crowdsourced distributed cluster of Image generation workers and text generation workers. If you like this service, consider joining the horde yourself!

ArtBot - Create images with Stable Diffusion, utilizing the AI Horde - ArtBot is your gateway to experiment with the wonderful world of generative AI art using the power of the AI Horde, a distributed open source network of GPUs running Stable Diffusion.

KoboldAI Lite

Why You (Probably) Don't Need to Fine-tune an LLM - Tidepool by Aquarium - Just to reiterate, fine-tuning (except in some rare cases) negates most of the resource-saving benefits from recent LLMs — the reasons that people are flocking to this technology in the first place. The biggest reason why NLP was hard to do before late 2022 was because you needed to collect data, label data, train models, host infra — and all that requires hiring an ML ops and eng team! Now with using LLMs out of the box, the startup cost is incredibly low. There are a whole bunch of orgs that never would have done NLP if not for LLMs making the bar so low. Is it worth investing your eng time into fine-tuning when state-of-the-art is advancing so quickly? Sure, you’ll have a slight competitive advantage if your model has better accuracy/quality — but will you still think so a few months later when other companies get the same boosted functionality with GPT-5, no effort required? This is why we recommend that you focus your attention on lighter-touch approaches like few-shot prompting and retrieval augmented generation (RAG).

https://github.com/desik1998/NovelWithLLMs/tree/main - This project showcases the ability to write a long story/novel well grounded in details, feels human like and has coherence of events using LLM. Although the story is heavily grounded in details, this project doesn't aim to create a SOTA story but is just a demonstration of how to generate long stories well grounded in details using LLMs [42]

MindsDB

https://github.com/mindsdb/mindsdb - AI Virtual Database that empowers developers to connect any AI/ML model to any datasource. This includes relational and non-relational databases, data warehouses and SaaS applications. MindsDB offers two primary benefits to its users. Hook AI models to run automatically as new data is observed and plug the output into any of our integrations. Automate training and finetuning AI models from data contained in any of the 130+ datasources we support.

GPT / ChatGPT

https://github.com/AgentOps-AI/BestGPTs - Top ranked OpenAI GPTs, ranked and sorted by AgentOps

https://github.com/lencx/ChatGPT - crystal_ball ChatGPT Desktop Application (Mac, Windows and Linux)

https://github.com/asrul10/linux-command-gpt - Get Linux commands in natural language with the power of ChatGPT.

ChatGPT passes the 2022 APCSA free response section - [43]

GPT-4 can't reason | Hacker News

https://github.com/TheAppleTucker/backend-GPT - We've built a entire Backend+Database powered by an LLM. It infers business logic based on the name of the API call and can persist a kilobyte of state!

Tell HN: ChatGPT can reply like a specific Reddit or HN user, including you | Hacker News -

The Mechanical Professor - by Ethan Mollick

Where Does ChatGPT Fall on the Political Compass?

Alarmed by A.I. Chatbots, Universities Start Revamping How They Teach - The New York Times

GPT-3 Is the Best Journal I've Ever Used - Superorganizers - Every

Capturing the Flag with GPT-4 [44]

https://news.ycombinator.com/item?id=37054241

https://github.com/TheR1D/shell_gpt - A command-line interface (CLI) productivity tool powered by OpenAI's text-davinci-003 model, will help you accomplish your tasks faster and more efficiently.

https://github.com/Nutlope/aicommits - A CLI that writes your git commit messages for you with AI

https://github.com/fedenunez/tulp - Tulp is a command-line tool that can help you create and process piped content using the power of ChatGPT directly from the terminal.

https://github.com/Nutlope/roomGPT - Upload a photo of your room to generate your dream room with AI.

https://github.com/xtekky/gpt4free - decentralising the Ai Industry, free gpt-4/3.5 scripts through several reverse engineered api's ( poe.com, phind.com, chat.openai.com, writesonic.com, sqlchat.ai, t3nsor.com, you.com etc...)

GPT-4 System Card [pdf | Hacker News]

[2302.11382 A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT] - [45]

https://github.com/PromtEngineer/localGPT - Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

LocalAI - the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families that are compatible with the ggml format. Does not require GPU. No GPU required. Runs ggml, GPTQ, onnx, TF compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j, cerebras, falcon, dolly, starcoder, and many others
- https://github.com/b08x/LocalAI

https://github.com/zylon-ai/private-gpt - a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point.

The project provides an API offering all the primitives required to build private, context-aware AI applications. It follows and extends the OpenAI API standard, and supports both normal and streaming responses.

PaLM

https://en.wikipedia.org/wiki/PaLM - a 540 billion parameter transformer-based large language model developed by Google AI. Researchers also trained smaller versions of PaLM, 8 and 62 billion parameter models, to test the effects of model scale.

PaLM is capable of a wide range of tasks, including commonsense reasoning, arithmetic reasoning, joke explanation, code generation, and translation. When combined with chain-of-thought prompting, PaLM achieved significantly better performance on datasets requiring reasoning of multiple steps, such as word problems and logic-based questions.

LLaMA

A brief history of LLaMA models - AGI Sphere [46]

https://old.reddit.com/r/LocalLLaMA/top/?t=month

https://github.com/ggerganov/llama.cpp - Port of Facebook's LLaMA model in C/C++

SAIL-7b - Search Augmented Instruction Learning [47]
- https://github.com/luohongyin/SAIL

LlamaIndex - Data Framework for LLM Applications. LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. [48]
- https://github.com/jerryjliu/llama_index7

https://news.ycombinator.com/item?id=36815255

Llama 2 - Meta AI

Llama 2 is here - get it on Hugging Face

Fine-Tune Your Own Llama 2 Model in a Colab Notebook | Towards Data Science

Run Llama 2 Uncensored Locally [49]

https://github.com/jerryjliu/llama_index - LlamaIndex (GPT Index) is a data framework for your LLM applications

Llama Hub - Connect custom data sources to your LLM with one or more of these plugins (via LlamaIndex or LangChain)

https://github.com/jmorganca/ollama - Get up and running with Llama 2 and other large language models locally

https://github.com/ollama-ui/ollama-ui

https://github.com/Mozilla-Ocho/llamafile - lets you distribute and run LLMs with a single file. Our goal is to make the "build once anywhere, run anywhere" dream come true for AI developers. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that lets you build apps for LLMs as a single-file artifact that runs locally on most PCs and servers.

https://github.com/GoogleCloudPlatform/localllm - Run LLMs locally on Cloud Workstations. Uses: Quantized models from 🤗. llama-cpp-python's webserver

https://github.com/tangledgroup/llama-cpp-wasm - WebAssembly (Wasm, Build and Bindings for llama.cpp

OpenLLM

https://github.com/bentoml/OpenLLM - An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease. [50]

h2oGPT

https://github.com/h2oai/h2ogpt - a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.

Qwen-7B

https://github.com/QwenLM/Qwen-7B/tree/main - The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud.

Alibaba launches open-sourced A.I. model in challenge to Meta

Grok-1

https://github.com/xai-org/grok-1 - contains JAX example code for loading and running the Grok-1 open-weights model.

Audio

See also Generative#Neural net

Machine Learning for Drummers

https://github.com/GuitarML/mldsp-papers - Collection of papers related to neural nets/machine learning for audio DSP.

https://github.com/Yuan-ManX/audio-ai-agent - Audio AI Agent, including speech, music, sound effects, etc.

https://github.com/fatchord/WaveRNN - Pytorch implementation of Deepmind's WaveRNN model from Efficient Neural Audio Synthesis
- WaveRNN + TTS Outputs | model_outputs

https://github.com/haoheliu/audioldm_eval - This toolbox aims to unify audio generation model evaluation for easier future comparison.

https://github.com/d3n7/riffusionDJ - Multichannel Looper/Feedback System for Riffusion

https://github.com/jmoso13/jukebox-diffusion - relies heavily on work produced by OpenAI (Jukebox, and HarmonAI (Dance Diffusion), also big thanks to Flavio Schneider for his work creating the audio-diffusion repo I used for diffusion models. At its core Jukebox Diffusion is a hierarchical latent diffusion model. JBDiff uses the encoder & decoder layers of a Jukebox model to travel between audio space and multiple differently compressed latent spaces. At each of the three latent levels a Denoising U-Net Model is trained to iteratively denoise a normally distributed variable to sample vectors representing compressed audio. The final layer of JBDiff is a Dance Diffusion Denoising U-Net model, providing a bump in audio quality and transforming the mono output of Jukebox into final stereo audio.

https://github.com/d3n7/GPT-4-To-MIDI - Text prompt to MIDI File using OpenAI's GPT-4. Now with polyphony and MIDI input!

https://github.com/suno-ai/bark - a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.

https://github.com/AIGC-Audio/AudioGPT - Understanding and Generating Speech, Music, Sound, and Talking Head
AudioGPT - a Hugging Face Space by AIGC-Audio

https://github.com/LAION-AI/CLAP - extract a latent representation of any given audio and text for your own model, or for different downstream tasks. All codes are comming officially with the following paper, accepted by IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023: Large-Scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

https://github.com/declare-lab/tango - Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models - Speech Research - Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train latent diffusion models (LDMs) with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion.
- https://github.com/haoheliu/AudioLDM - AudioLDM: Generate speech, sound effects, music and beyond, with text.

https://github.com/tuneflow/AudioLDM - Fork of https://github.com/haoheliu/AudioLDM as a TuneFlow Plugin

Groove2Groove – One-shot music style transferGrv2Grv) is an AI system for music accompaniment style transfer. Given two MIDI files – a content input and a style input – it generates a new accompaniment for the first file in the style of the second one
- https://github.com/cifkao/groove2groove

Music AI

https://github.com/zhvng/open-musiclm - Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.

https://github.com/IgnacioIrigaray/AnalogAudioTapeDenoising - Dataset and model for analog audio tape denoising

DAFx-23 - Neural tape - The sound of magnetic recording media, such as open reel and cassette tape recorders, is still sought after by today's sound practitioners due to the imperfections embedded in the physics of the magnetic recording process. This paper proposes a method for digitally emulating this character using neural networks. The signal chain of the proposed system consists of three main components: the hysteretic nonlinearity and filtering jointly produced by the magnetic recording process as well as the record and playback amplifiers, the fluctuating delay originating from the tape transport, and the combined additive noise component from various electromagnetic origins. In our approach, the hysteretic nonlinear block is modeled using a recurrent neural network, while the delay trajectories and the noise component are generated using separate diffusion models, which employ U-net deep convolutional neural networks. According to the conducted objective evaluation, the proposed architecture faithfully captures the character of the magnetic tape recorder. The results of this study can be used to construct virtual replicas of vintage sound recording devices.
- https://github.com/01tot10/neural-tape-modeling - Neural Modeling of Magnetic Tape Recorders

https://github.com/thhuang/SheetMusicAI - From sheet music to midi!

https://github.com/bshall/urhythmic - Official repository for Rhythm Modeling for Voice Conversion. Abstract: Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic - an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments.Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody. Note: Urhythmic builds on soft speech units from our paper A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.

-2307.04686- VampNet: Music Generation via Masked Acoustic Token Modeling - We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
- -2307.04686- VampNet: Music Generation via Masked Acoustic Token Modeling - the supplemental material for the ISMIR 2023 submission “VampNet: Music Generation via Masked Acoustic Token Modeling”. Here, we showcase different use cases of VampNet as a music creation tool, and create musical loops and variations from short musical excerpts.
- Making Loops and Variations

Mubert - Thousands of Staff-Picked Royalty-Free Music Tracks for Streaming, Videos, Podcasts, Commercial Use and Online Content, H*uman x AI Generative Music For your video content, podcasts and apps
https://github.com/MubertAI/Mubert-Text-to-Music - A simple notebook demonstrating prompt-based music generation via Mubert API

Audiocraft - a single-stop code base for all your generative audio needs: music, sound effects, and compression after training on raw audio signals.
- AudioCraft: A simple one-stop shop for audio modeling [51]
- https://github.com/facebookresearch/audiocraft

MusicGen - We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output.

https://github.com/YuanGongND/ssast - This repository contains the official implementation (in PyTorch) of the Self-Supervised Audio Spectrogram Transformer (SSAST) proposed in the AAAI 2022 paper SSAST: Self-Supervised Audio Spectrogram Transformer (Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass; MIT CSAIL). [Slides]

SSAST is the first patch-based joint discriminative and generative self-supervised learning framework, and also the first self-supervised learning framework for AST. SSAST significantly boosts AST performance on all downstream tasks we evaluated with an average improvement of 60.9%, leading to similar or even better results than a supervised pretrained AST. SSAST can be used as a drop-in replacement of previous ImageNet (supervised) pretrained AST, and has the advantage of 1) no labeled data is used; 2) flexible patch size and shape, ImagenNet pretraining only supports square patches; and 3) better performance on many tasks, in particular speech tasks.

https://github.com/Stability-AI/stable-audio-tools - Training and inference code for audio generation models

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis - Abstract: Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations, resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation is an appealing alternative, aligning more accurately with human auditory perception, and benefitting from well-established fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that addresses the key challenges of modeling spectral coefficients. Vocos demonstrates improved computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural vocoding approaches. As shown by objective evaluation, Vocos not only matches state-of-the-art audio quality, but thanks to frequency-aware generator, also effectively mitigates the periodicity issues frequently associated with time-domain GANs. The source code and model weights have been open-sourced at https://github.com/charactr-platform/vocos.
- https://github.com/charactr-platform/vocos - Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

https://github.com/tu-studio/anira - a high-performance library designed to enable the real-time safe integration of neural network inference within audio applications. Compatible with multiple inference backends, LibTorch, ONNXRuntime, and Tensorflow Lite, anira bridges the gap between advanced neural network architectures and real-time audio processing.

Generative Multimodal Models are In-Context Learners - The human ability to easily solve multimodal tasks in context (i.e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate. In this work, we demonstrate that the task-agnostic in-context learning capabilities of large multimodal models can be significantly enhanced by effective scaling-up. We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning, such as visual prompting and object-grounded generation. The model sets a new record on multiple multimodal understanding tasks in few-shot settings. When instruction-tuned to follow specific instructions, Emu2 further achieves new state-of-the-art on challenging tasks such as question answering benchmarks for large multimodal models and open-ended subject-driven generation. These achievements demonstrate that Emu2 can serve as a base model and general-purpose interface for a wide range of multimodal tasks. Code and models are publicly available to facilitate future research.
- https://github.com/baaivision/Emu

Compression

https://github.com/facebookresearch/encodec - State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Quality

https://github.com/saurabh-kataria/awesome-bandwidth-expansion - an attempt to list interesting audio bandwidth expansion/super-resolution research works.

Source separation

https://github.com/samim23/polymath - uses machine learning to convert any music library (e.g from Hard-Drive or YouTube) into a music production sample-library. The tool automatically separates songs into stems (beats, bass, etc.), quantizes them to the same tempo and beat-grid (e.g. 120bpm), analyzes musical structure (e.g. verse, chorus, etc.), key (e.g C4, E3, etc.) and other infos (timbre, loudness, etc.), and converts audio to midi. The result is a searchable sample library that streamlines the workflow for music producers, DJs, and ML audio developers.

AIcrowd | Sound Demixing Challenge 2023 | Challenges
- https://github.com/aim-qmul/sdx23-aimless - Artificial Intelligence and Music League for Effective Source Separation) is a special interest group in audio source separation at CDM, consisting of PhD students from the AIM CDT program. This repository is adapted from Danna-Sep and contains our training code for the SDX Sound Demixing Challenge.

https://github.com/Audio-AGI/AudioSep - This repository contains the official implementation of "Separate Anything You Describe". We introduce AudioSep, a foundation model for open-domain sound separation with natural language queries. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability on numerous tasks such as audio event separation, musical instrument separation, and speech enhancement.

Image

https://github.com/xxlong0/Wonder3D - A cross-domain diffusion model for 3D reconstruction from a single image

https://replicate.com/fofr/sdxl-emoji/examples#v3vwup3bommw264amkfoxhpvie - An SDXL fine-tune based on Apple Emojis

LogoScale - A Method for Vectorizing Small, Crappy Logos [52]

https://github.com/yuanzhi-zhu/DiffPIR - contains the code and data associated with the paper "Denoising Diffusion Models for Plug-and-Play Image Restoration", which was presented at the CVPR workshop NTIRE 2023. This code is based on the OpenAI Guided Diffusion and DPIR.

https://github.com/lllyasviel/Fooocus - an image generating software (based on Gradio).

Fooocus is a rethinking of Stable Diffusion and Midjourney’s designs: Learned from Stable Diffusion, the software is offline, open source, and free. Learned from Midjourney, the manual tweaking is not needed, and users only need to focus on the prompts and images.

https://github.com/threestudio-project/threestudio a unified framework for 3D content creation from text prompts, single images, and few-shot images, by lifting 2D text-to-image generation models. [53]

Video

https://github.com/NVIDIA/vid2vid - Pytorch implementation for high-resolution (e.g., 2048x1024, photorealistic video-to-video translation. It can be used for turning semantic label maps into photo-realistic videos, synthesizing people talking from edge maps, or generating human motions from poses. The core of video-to-video translation is image-to-image translation. Some of our work in that space can be found in pix2pixHD and SPADE.

https://github.com/damo-vilab/videocomposer - Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

https://github.com/AILab-CVC/VideoCrafter - VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

ConsistI2V - Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence. A grand challenge in I2V generation is to maintain visual consistency throughout the video: existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame, as well as ensure a fluid and logical progression within the video narrative. To mitigate these issues, we propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation. Specifically, we introduce (1) spatiotemporal attention over the first frame to maintain spatial and motion consistency, (2) noise initialization from the low-frequency band of the first frame to enhance layout consistency. These two approaches enable ConsistI2V to generate highly consistent videos. We also extend the proposed approaches to show their potential to improve consistency in auto-regressive long video generation and camera motion control. To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation. Our automatic and human evaluation results demonstrate the superiority of ConsistI2V over existing methods.
- https://github.com/TIGER-AI-Lab/ConsistI2V

Medicine

Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity | Nature [54]

-2007.15153- Fast, Structured Clinical Documentation via Contextual Autocomplete - We present a system that uses a learned autocompletion mechanism to facilitate rapid creation of semi-structured clinical documentation. We dynamically suggest relevant clinical concepts as a doctor drafts a note by leveraging features from both unstructured and structured medical data. By constraining our architecture to shallow neural networks, we are able to make these suggestions in real time. Furthermore, as our algorithm is used to write a note, we can automatically annotate the documentation with clean labels of clinical concepts drawn from medical vocabularies, making notes more structured and readable for physicians, patients, and future algorithms. To our knowledge, this system is the only machine learning-based documentation utility for clinical notes deployed in a live hospital setting, and it reduces keystroke burden of clinical concepts by 67% in real environments.

Journalism

Gizmodo used AI to write a Star Wars story. It was filled with errors. - The Washington Post - The article quickly prompted an outcry among staffers who complained in the company’s internal Slack messaging system that the error-riddled story was “actively hurting our reputations and credibility,” showed “zero respect” for journalists and should be deleted immediately, according to messages obtained by The Washington Post. The story was written using a combination of Google Bard and ChatGPT, according to a G/O Media staff member familiar with the matter. (G/O Media owns several digital media sites including Gizmodo, Deadspin, The Root, Jezebel and The Onion.)

Utils

https://github.com/HumanSignal/label-studio - a multi-type data labeling and annotation tool with standardized output format

https://github.com/gclef-cmu/gpt_paper_assistant - a very simple daily scanner for Arxiv that uses GPT4 and author matches to find papers you might find interesting. It will run daily via github actions and can post this information to slack via a bot or just render it in a static github-pages website.

HyperWrite - framework to enable multimodal models to operate a computer. Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective.
- https://github.com/OthersideAI/self-operating-computer

Tools

Khoj

Khoj - an open-source, AI personal assistant that links up all your knowledge bases. It is a thinking tool that is transparent, fun and easy to engage with. Khoj can help you build faster and better by using an AI assistant to search and reason across your disparate data sources. Khoj learns from your notes, documents, emails to function as an extension of your brain. So that you can stay focused on doing what matters. Khoj currently exposes the ability to search and chat with your personal knowledge base in your file system, while keeping you in control of your data. Khoj started with the founding principle that a personal assistant be understandable, accessible and hackable. This means you can always customize and self-host your Khoj on your own machines. [55] [56]
- https://github.com/khoj-ai/khoj

Magic Loops

Magic Loops [57]

Tabby

Tabby - Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot. Warning Tabby is still in the alpha phase. Self-contained, with no need for a DBMS or cloud service. OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). Consumer level GPU supports (FP-16 weight loading with various optimization).
- https://github.com/TabbyML/tabby

AI_CHANGELOG

https://github.com/joshuasundance-swca/ai_changelog - Using large language models to maintain AI_CHANGELOG.md

AGI

AGI Introduction [58]

Function

Function - Run AI prediction functions anywhere, with only a predict function.
Show HN: Generative Fill with AI and 3D | Hacker News