Duration: 5 Days
Apache Hadoop is an OpenSource framework for creating reliable and distributable compute clusters. Credited with the IBM Watson Jeopardy win in 2011, Hadoop can be used (with other related frameworks) to process large unstructured or semi-structured data sets from multiple sources to dissect, classify, learn from and make suggestions for business analytics, decision support, and other advanced forms of machine intelligence.
This course will go well beyond the "Hello World" word-count example into practical, applied uses of Hadoop in large-scale real-world scenarios, including fraud detection, algorithmic trading, and data mining. Students will develop in an environment architected for a dynamically changing business-rule driven infrastructure with multiple disparate data sources and large-scale datasets on a real Hadoop/Drools cluster.
Topics Covered In This Course
Retrieving and Localizing Data
Feeding Hadoop in the Enterprise
Machine Learning with Mahout
Applying Business Rules with Drools
Pig and Pig Pipelines
Working with the Hive
Testing, Performance and Troubleshooting
Experience using Java with Eclipse, with the JPA API for data persistence and access, and experience using UNIX shell is expected.
40% lecture and 60% hands-on labs.
Every student attending a Verhoef Training class will receive a certificate good for $100 toward their next public class taken within a year.
You can also buy "Verhoef Vouchers" to get a discounted rate for a single student in any of our public or web-based classes. Contact your account manager or our sales office for details.
Can't find the course you want?
Call us at 800.533.3893, or
email us at email@example.com