Data Mining / Machine Learning - (Software|Tool|Programming Language)

1 - About

List of tools, software for data miner, machine learner.

See also: Natural Language - Processing (NLP)

Data science can't be point and click

3 - Analytics

4 - Languages

  • Matlab was built for matrix calculations (linear algebra).
  • The R language is meant for statistics.
  • Python are good general purpose languages

But they don’t run as quickly as languages like C and Java

4.1 - Python

Python is an incredible open source ecosystem. Package:

  • Numpy (for maths and arrays),
  • SciPy (Scientific Tools) SciPy provides a lot of scientific routines that work on top of NumPy
  • Pandas (Python Data Analysis Library) - handy to manipulate financial data
  • matplotlib enable to plot graphics
  • csv

4.2 - Java

4.3 - R

R Nice interactive data analysis tool through things like RStudio.

R - Machine Learning Package and Method (Views)

4.4 - Oracle

4.5 - Microsoft

4.6 - Others

  • MATLAB/Octave
  • Julia: New language
  • KNIME: KNIME [naim] is an opensource workbench for the entire analysis process
  • Rapid Miner

4.7 - Tools

  • jq is a lightweight and flexible command-line JSON processor
  • json2csv (json to xml) Converts a stream of newline separated json data to csv format,
  • csvkit. A suite of utilities for converting to and working with CSV, the king of tabular file formats.
  • scrape (Python) (HTML extraction using XPath or CSS selectors),
  • xml2json Command that converts an XML input to a JSON output, using xml-mapping npm module

5 - Framework

  • Uber Michelangelo consists of a mix of open source systems and components built in-house. The primary open sourced components used are HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.

6 - Documentation / Reference

