ML / AI

From Things and Stuff Wiki
Jump to navigation Jump to search


Machine learning


  • A Course in Machine Learning - a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.





"In applications of "usual" machine learning, there is typically a strong focus on the feature engineering part; the model learned by an algorithm can only be so good as its input data. Of course, there must be sufficient discriminatory information in our dataset, however, the performance of machine learning algorithms can suffer substantially when the information is buried in meaningless features. The goal behind deep learning is to automatically learn the features from (somewhat) noisy data; it's about algorithms that do the feature engineering for us to provide deep neural network structures with meaningful information so that it can learn more effectively. We can think of deep learning as algorithms for automatic "feature engineering," or we could simply call them "feature detectors," which help us to overcome the vanishing gradient challenge and facilitate the learning in neural networks with many layers."



  • NNdef - Java and XML based Neural Networks and Knowledge Modeling toolkit and library











  • Caffe - a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.


  • Torch - a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.



  • Data Science Machine - an end-to-end software system that is able to automatically develop predictive models from relational data. The Machine was created by Max Kanter and Kalyan Verramachaneni at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. The system automates two of the most human-intensive components of a data science endeavor: feature engineering, and selection and tuning of the machine learning methods that build predictive models from those features. First, an algorithm called Deep Feature Synthesis automatically engineers features. Next, through an approach called Deep Mining, the Machine composes a generalized machine learning pipeline that includes dimensionality reduction methods, feature selection methods, clustering, and classifier design. Finally, it tunes the parameters through a Gaussian Copula Process.




  • TensorFlow - an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.











  • https://github.com/neo-ai/neo-ai-dlr - a compiler and runtime for machine learning models. The compiler optimizes machine learning models for various target hardware. The runtime executes the model on the target hardware. A stand-alone, light-weight and portable runtime for CNN and decicion-tree models. Built on top of TVM and Treelite runtime, DLR provides simple and unified Python/C++ APIs for loading and running TVM/Treelite compiled models on a wide range of devices, including X86, TRT-enabled GPU and Arm devices.



  • NuPIC - the Numenta Platform for Intelligent Computing, comprises a set of learning algorithms that were first described in a white paper published by Numenta in 2009. The learning algorithms faithfully capture how layers of neurons in the neocortex learn.


  • https://github.com/jadore801120/attention-is-all-you-need-pytorch - PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).A novel sequence to sequence framework utilizes the self-attention mechanism, instead of Convolution operation or Recurrent structure, and achieve the state-of-the-art performance on WMT 2014 English-to-German translation task. (2017/06/12)



High-level framework

  • https://github.com/apache/incubator-mxnet - a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines.


  • https://github.com/keras-team/keras - a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.



News

  • Distill — Latest articles about machine learning


AI


  • OpenAI - a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact. We believe AI should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as is possible safely. The outcome of this venture is uncertain and the work is difficult, but we believe the goal and the structure are right. We hope this is what matters most to the best in the field. [31]