Machine Learning - Data Mining (Software, Library and Framework)

About

This sections contains software library or framework that contains the implementation of machine learning algorithm. See Data Mining

Data science can't be point and click

List of tools, software for data miner, machine learner.

See also: Natural Language - (Processing|Processor) (NLP)

Analytics

Data Mining Tool 2

Data Mining Tool

Data Mining Tool 3

Data Tools Oreilly Survey 2013

Framework

  • H2o - Open Source Fast Scalable Machine Learning Platform For Smarter Applications (Deep Learning, Gradient Boosting, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), …) - written in Java binding for R , Python

Languages

  • Matlab was built for matrix calculations (linear algebra).
  • The R language is meant for statistics.
  • Python are good general purpose languages

But they don’t run as quickly as languages like C and Java

Python

Python is an incredible open source ecosystem. Package:

Java

Javascript

R

R Nice interactive data analysis tool through things like RStudio.

R - Machine Learning Package and Method (Views)

Go

https://github.com/chewxy/gorgonia/

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily. If this sounds like Theano or TensorFlow, it's because the idea is quite similar. Specifically, the library is pretty low-level, like Theano, but has higher goals like Tensorflow.

Oracle

Microsoft

Machine Learning Center

Others

  • MATLAB/Octave
  • Julia: New language
  • KNIME: KNIME [naim] is an opensource workbench for the entire analysis process
  • Rapid Miner

Tools

  • jq is a lightweight and flexible command-line JSON processor
  • scrape (Python) (HTML extraction using XPath or CSS selectors),
  • xml2json Command that converts an XML input to a JSON output, using xml-mapping npm module

Framework

  • Uber Michelangelo consists of a mix of open source systems and components built in-house. The primary open sourced components used are HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.

Documentation / Reference

Task Runner