Apache Hadoop

Course: HDOOP

Duration: 5 Days

Level: II

Course Summary

Apache Hadoop is an OpenSource framework for creating reliable and distributable compute clusters. Credited with the IBM Watson Jeopardy win in 2011, Hadoop can be used (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence.

This course will go well beyond the "Hello World" word-count example into practical, applied uses of Hadoop in large-scale real-world scenarios, including fraud detection, algorithmic trading, and data mining. Students will develop in an environment architected for a dynamically changing business-rule driven infrastructure with multiple disparate data sources and large-scale datasets on a real Hadoop/Drools cluster.

« Hide The Details

Topics Covered In This Course

Overview

Map/Reduce
Hadoop
NoSQL
Mahout
Alternate Frameworks

Hadoop Architecture

Hadoop Map/Reduce
HDFS
Cassandra
HBase
Hive
Pig

Retrieving and Localizing Data

Using JPA in Map/Reduce: Pros and Cons
HDFS
NoSQL
HBase
Cassandra
Neo4J
Sqoop
Flume
Caching with JBoss Infinispan
Caching with OpenTerracotta
Using Spring Data

Feeding Hadoop in the Enterprise

Apache UIMA
Spring Integration
Apache Camel
Spring Batch

Machine Learning with Mahout

Artificial Intelligence Overview
Fuzzy Logic
K-Means
Pattern Mining
Bayesian Classifiers
Analytics
Random Forests
Decision Support with Mahout and Hadoop

Applying Business Rules with Drools

Drools Overview
Integrating Rules-based approach with Hadoop
Decision Making with Drools and Hadoop
Integrating Drools, Mahout, and Hadoop

Pig and Pig Pipelines

Pig Latin
Pig Pipelines
Pig UDFs (User Defined Functions)

Working with the Hive

Hive and HDFS
Meta-data and indexing
Hive UDFs (User Defined Functions)
Hive and Apache S3
HQL

Testing, Performance and Troubleshooting

TDD with MRUnit
TDD with other Unit Testing Frameworks
Bottleneck discovery
Monitoring
Join Framework Optimization
Troubleshooting
Hadoop and Virtualization
Hadoop in the Cloud
Hadoop and Amazon EC2

Recommended Prerequisites

Experience using Java with Eclipse, with the JPA API for data persistence and access, and experience using UNIX shell is expected.

Training Style

40% lecture and 60% hands-on labs.

« Hide The Details

Related Courses

Code	Course Title	Duration	Level
HDPDEV	Introduction to Hadoop Development	5 Days	II	Details
HDPADM	Hadoop Administration	3 Days	II	Details
HDPENT	Real World Hadoop in the Enterprise	5 Days	II	Details
ACCUM	Developing Data-driven Applications with Apache Accumulo	3 Days	III	Details

Every student attending a Verhoef Training class will receive a certificate good for $100 toward their next public class taken within a year.

You can also buy "Verhoef Vouchers" to get a discounted rate for a single student in any of our public or web-based classes. Contact your account manager or our sales office for details.

Schedule For This Course
There are currently no public sessions scheduled for this course. We can schedule a private class for your organization just a couple of weeks from now. Or we can let you know the next time we do schedule a public session.

Notify me the next time this course is confirmed!

Can't find the course you want?
Call us at 800.533.3893, or
email us at [email protected]