Data Science - Big Data
Table of Contents
1 - About
Big Data is usually defined in terms of the 3Vs:
- and variety.
Doug Laney of Gartner originally defined the 3Vs 12 years ago in this paper.
Internet-scale data set.
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it… Dan Ariely, 2013
2 - Articles Related
3 - Word Cloud
Apache Cassandra, Machine Learning, Hadoop, NoSQL, Apache Hive, Map/Reduce and HDFS, Data Visualization, ZooKeeper, NoSQL, Distributed Search and Real Time Analytics, Avro, Visualizing Your Graph, Analytics Maturity Model, R
4 - Sources
4.1 - Monitoring
Much data source of Big data occurs with online recording:
- every click on a website,
- every ad viewed,
- every billing event,
- every fast-forward or pause while you're watching a video,
- every request that's made from a client to a server,
- every transaction,
- every network message,
- and every fault.
Anything that occurs potentially could be recorded.
A lot of it is recorded, but very little of it gets analyzed, and that's why we get to know the picture of an iceberg because a phenomenal amount of data is collected but only a tiny amount of that data is analyzed.
4.2 - User-generated content
- post on Facebook
- picture on Instagram
- review on Yelp or TripAdvisor
- tweet on Twitter
- video on YouTube.
4.3 - Health and scientific computing
- the Large Hadron Collider. It generates more data in a year than all the other data sources combined.
- genome sequencing data. The cost of performing sequencing, is dropping exponentially, much faster than Moore's Law, so as result we're collecting more sequencing data than ever before.
4.4 - Graphs
Graphs include things like:
- social networks,
- telecommunication networks,
- computer networks,
- road networks,
- and collaborations or relationships.
Some of these graphs can be absolutely enormous (Facebook's user graph)
4.5 - Log files
4.6 - Internet of things
- RFID tag (California FasTrak Electronic Toll Collection transponder to pay our tolls on the highways but also used to collect data that's used for traffic reporting)