1. Introduction

You can read more about Big Data in Wikipedia which is also a company that generates and processes huge amounts of data itself.

MapReduce and Apache Hadoop are the technologies we will be talking about more in this course.

2. Data Sources

According to IBM: “Every day, 2.5 billion gigabytes of high-velocity data are created in a variety of forms, such as social media posts, information gathered in sensors and medical devices, videos and transaction records”

3. Quiz: Big Data

What is BIG DATA?

4. Definition of Big Data

A resonable definition of big data might be, It’s data that’s too big to be processed on a single machine.

Big Data is a loosely defined term used to describe data sets so large and complex that they become awkward to work with using standard statistical software. (International Journal of Internet Science, 2012, 7 (1), 1–5)

5. Quiz: Challenges

Challenges with big data

6. The 3 Vs - Volume

The 3 V’s were first defined in a research report by Douglas Laney in 2001 titled “3D Data Management: Controlling Data Volume, Velocity and Variety”.

In 2012 he updated the definition as follows “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization”.

7. Quiz: Worthwhile Data

Data Worth Storing?

8. Variety

The problem is that to store data in systems like that(traditional database), the data needs to be able to fit in pre-defined tables. And a lot of data that we deal with these days, tends to be what we call unstructured or semi-sturctured data.

9. Data Formats

Nice thins about Hadoop is that it doesn’t care what format your data comes in. Unlike a traditional database, you can store the data in its raw format and manipulate it and reformat it later.

10. Quiz: Using Variety


11. Velocity


12. Quiz: Your Interests

What data intrests you? > Survey question. no right.

13. Doug Intro

14. Doug Cutting: The Origins of Hadoop

Doug Cutting, Creator of Hadoop

Here are the papers Google published about their distributed file system (GFS) and their processing framework, MapReduce.

15. Hadoop Logo Intro

16. Doug Cutting: The Name of Hadoop

Came from his son’s toy.

17. Core Hadoop


Cloudera provides free download of Chapter 2 of Tom White’s essential text, Hadoop: The Definitive Guide.

18. Hadoop Ecosystem

See more inforation about Pig, Hive, HBase, Impala, Mahout, Sqoop, Flume, Hue, Oozie.

  • CDH

19. Congratulations

See more in the free Chapter 2 of Tom White’s essential text, Hadoop: The Definitive Guide