Hadoop Administration

Course:   HDPADM
Duration:   3 Days
Level:   II
On our website at:   http://www.verhoef-training.com/courses/HDPADM.html
 
Course Summary

Apache Hadoop is an OpenSource framework for creating reliable and distributable compute clusters. Credited with the IBM Watson Jeopardy win in 2011, Hadoop can be used (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence.

This class is geared for the administrator who is charged with maintaining a Hadoop cluster and its related components. Hadoop is a system designed for massive scalability and is extremely fault-tolerant compared to other cluster architectures. As administrators, we need to install, configure, and maintain Hadoop on Linux in various compute environments.

There is a small amount of overlap with the developer course in the beginning overview sections. There is a single exercise that has the administrators create and test-run a MapReduce program as a developer or user would. All of the remaining exercises are geared towards installation and configuration issues.

This course is focused on the Hadoop 2.0 (pre-)release.

Topics Covered In This Course

Overview

  • Map/Reduce
  • Hadoop
  • Mahout
  • Alternate Frameworks

Hadoop Architecture

  • Hadoop Map/Reduce
  • HDFS
  • Cassandra
  • HBase
  • Hive
  • Pig

Installing Hadoop

  • Linux considerations
  • SSH configuration
  • Hadoop installation
  • OS Security
  • NamedNodes
  • Job Trackers

Test-running Hadoop Programs

  • Simple MapReduce test
  • Pig Test

Cloud Installations

  • Amazon EC2
  • Amazon Elastic MapReduce
  • Rackspace

Optimization and Tuning

  • Performance Metrics
  • Node-sizing
  • Kernel tuning

Installing HBase [optional]

  • HBase installation
  • ZooKeeper

Hive installation [optional]

Cassandra installation [optional]

JBoss Infinispan installation [optional]

Integration with Drools [optional]

Integration with Spring Integration [optional]

Who Should Take This Course

There is a small amount of overlap with the Hadoop developer course in the beginning overview sections. There is a single exercise that has the administrators create and test-run a MapReduce program as a developer or user would. All of the remaining exercises are geared towards installation and configuration issues.

Recommended Prerequisites

Attendees are expected to be experienced UNIX/Linux system administrators. Some Java programming experience will also be helpful.

Training Style

This course is approximately 40% lecture and 60% hands-on labs.

Related Courses
Code Course Title Duration Level
HDPDEV
Introduction to Hadoop Development
5 Days
II
Details
HDOOP
Apache Hadoop
5 Days
II
Details
ACCUM
Developing Data-driven Applications with Apache Accumulo
3 Days
III
Details

Every student attending a Verhoef Training class will receive a certificate good for $100 toward their next public class taken within a year.

You can also buy "Verhoef Vouchers" to get a discounted rate for a single student in any of our public or web-based classes. Contact your account manager or our sales office for details.