Course Catalog
Machine Learning Foundation: Working with Spark and TensorFlow
Code: TTML5508
Duration: 5 Day
$2695 USD

OVERVIEW

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R - the favorite languages of Data Scientists - along with SQL-based front ends.

Machine Learning & Deep Learning Essentials with Spark and TensorFlow is hands-on course designed for data scientists and software engineers new to Machine Learning.  Working in a hands-on learning environment, you’ll learn how to perform Machine Learning at scale using the popular Apache Spark framework, working from the ground up, exploring Apache Spark essentials, core machine learning concepts, regressions, classifications, clustering and more.

The abundance of data and affordable cloud scale has led to an explosion of interest in Deep Learning. Google has released an excellent library called TensorFlow to open-source, allowing state-of-the-art machine learning done at scale, complete with GPU-based acceleration.  In the second half of the class, you’ll dive into deep learning concepts and learn how to implement them using TensorFlow.

DELIVERY FORMAT

This course is available in the following formats:

Virtual Classroom

Duration: 5 Day
Classroom

Duration: 5 Day

CLASS SCHEDULE
Call 800-798-3901 to enroll in this class!

GOALS

Throughout the program, working in a hands-on learning environment guided by our expert instructor, students will

  • Learn popular machine learning algorithms, their applicability, and limitations
  • Practice the application of these methods in the Spark machine learning environment
  • Learn practical use cases and limitations of algorithms
  • Will explore not just the related APIs, but will also learn the theory behind them
  • Work with real world datasets from Uber, Netflix, Walmart, Prosper, etc. Use cases subject to change.
OUTLINE

Part 1: Introduction to Machine Learning

  1. Machine Learning (ML) Overview
  • Machine Learning landscape
  • Machine Learning applications
  • Understanding ML algorithms & models
  1. ML in Python and Spark
  • Spark ML Overview
  • Introduction to Jupyter notebooks
  1. Machine Learning Concepts
  • Statistics Primer
  • Covariance, Correlation, Covariance Matrix
  • Errors, Residuals
  • Overfitting / Underfitting
  • Cross-validation, bootstrapping
  • Confusion Matrix
  • ROC curve, Area Under Curve (AUC)
  1. Feature Engineering (FE)
  • Preparing data for ML
  • Extracting features, enhancing data
  • Data cleanup
  • Visualizing Data
  1. Linear Regression
  • Simple Linear Regression
  • Multiple Linear Regression
  • Running LR
  • Evaluating LR model performance
  • Lab
  1. Logistic Regression
  • Understanding Logistic Regression
  • Calculating Logistic Regression
  • Evaluating model performance
  1. Classification: SVM (Supervised Vector Machines)
  • SVM concepts and theory
  • SVM with kernel
  1. Classification: Decision Trees & Random Forests
  • Theory behind trees
  • Classification and Regression Trees (CART)
  • Random Forest concepts
  1. Classification: Naive Bayes
  • Theory
  • Lab
  1. Clustering (K-Means)
  • Theory behind K-Means
  • Running K-Means algorithm
  • Estimating the performance
  1. Principal Component Analysis (PCA)
  • Understanding PCA concepts
  • PCA applications
  • Running a PCA algorithm
  • Evaluating results
  1. Recommendations (Collaborative filtering)
  • Recommender systems overview
  • Collaborative Filtering concepts
  1. Performance 
  • Best practices for scaling and optimizing Apache Spark
  • Memory caching
  • Testing and validation

Part Two: Introduction to Deep Learning with TensorFlow

  1. Machine Learning Quick Review
  • Understanding Machine Learning
  • Supervised versus Unsupervised Learning
  • Regression
  • Classification
  • Clustering
  1. Introducing TensorFlow
  • TensorFlow intro
  • TensorFlow Features
  • TensorFlow Versions
  • GPU and TPU scalability
  1. The Tensor: The Basic Unit of TensorFlow
  • Introducing Tensors
  • TensorFlow Execution Model
  1. Single Layer Linear Perceptron Classifier with TensorFlow
  • Introducing Perceptrons
  • Linear Separability and Xor Problem
  • Activation Functions
  • Softmax output
  • Backpropagation, loss functions, and Gradient Descent
  1. Hidden Layers: Intro to Deep Learning
  • Hidden Layers as a solution to XOR problem
  • Distributed Training with TensorFlow
  • Vanishing Gradient Problem and ReLU
  • Loss Functions
  1. High level TensorFlow: tf.learn
  • Using high level TensorFlow
  • Developing a model with tf.learn
  1. Convolutional Neural Networks in TensorFlow
  • Introducing CNNs
  • CNNs in TensorFlow
  1. Introducing Keras
  • What is Keras?
  • Using Keras with a TensorFlow Backend
  1. Recurrent Neural Networks in TensorFlow
  • Introducing RNNs
  • RNNs in TensorFlow
  1. Long Short-Term Memory (LSTM) in TensorFlow
  • Introducing RNNs
  • RNNs in TensorFlow

Part 1: Introduction to Machine Learning

  1. Machine Learning (ML) Overview
  • Machine Learning landscape
  • Machine Learning applications
  • Understanding ML algorithms & models
  1. ML in Python and Spark
  • Spark ML Overview
  • Introduction to Jupyter notebooks
  1. Machine Learning Concepts
  • Statistics Primer
  • Covariance, Correlation, Covariance Matrix
  • Errors, Residuals
  • Overfitting / Underfitting
  • Cross-validation, bootstrapping
  • Confusion Matrix
  • ROC curve, Area Under Curve (AUC)
  1. Feature Engineering (FE)
  • Preparing data for ML
  • Extracting features, enhancing data
  • Data cleanup
  • Visualizing Data
  1. Linear Regression
  • Simple Linear Regression
  • Multiple Linear Regression
  • Running LR
  • Evaluating LR model performance
  • Lab
  1. Logistic Regression
  • Understanding Logistic Regression
  • Calculating Logistic Regression
  • Evaluating model performance
  1. Classification: SVM (Supervised Vector Machines)
  • SVM concepts and theory
  • SVM with kernel
  1. Classification: Decision Trees & Random Forests
  • Theory behind trees
  • Classification and Regression Trees (CART)
  • Random Forest concepts
  1. Classification: Naive Bayes
  • Theory
  • Lab
  1. Clustering (K-Means)
  • Theory behind K-Means
  • Running K-Means algorithm
  • Estimating the performance
  1. Principal Component Analysis (PCA)
  • Understanding PCA concepts
  • PCA applications
  • Running a PCA algorithm
  • Evaluating results
  1. Recommendations (Collaborative filtering)
  • Recommender systems overview
  • Collaborative Filtering concepts
  1. Performance 
  • Best practices for scaling and optimizing Apache Spark
  • Memory caching
  • Testing and validation

Part Two: Introduction to Deep Learning with TensorFlow

  1. Machine Learning Quick Review
  • Understanding Machine Learning
  • Supervised versus Unsupervised Learning
  • Regression
  • Classification
  • Clustering
  1. Introducing TensorFlow
  • TensorFlow intro
  • TensorFlow Features
  • TensorFlow Versions
  • GPU and TPU scalability
  1. The Tensor: The Basic Unit of TensorFlow
  • Introducing Tensors
  • TensorFlow Execution Model
  1. Single Layer Linear Perceptron Classifier with TensorFlow
  • Introducing Perceptrons
  • Linear Separability and Xor Problem
  • Activation Functions
  • Softmax output
  • Backpropagation, loss functions, and Gradient Descent
  1. Hidden Layers: Intro to Deep Learning
  • Hidden Layers as a solution to XOR problem
  • Distributed Training with TensorFlow
  • Vanishing Gradient Problem and ReLU
  • Loss Functions
  1. High level TensorFlow: tf.learn
  • Using high level TensorFlow
  • Developing a model with tf.learn
  1. Convolutional Neural Networks in TensorFlow
  • Introducing CNNs
  • CNNs in TensorFlow
  1. Introducing Keras
  • What is Keras?
  • Using Keras with a TensorFlow Backend
  1. Recurrent Neural Networks in TensorFlow
  • Introducing RNNs
  • RNNs in TensorFlow
  1. Long Short-Term Memory (LSTM) in TensorFlow
  • Introducing RNNs
  • RNNs in TensorFlow
LABS

Will Be Updated Soon!
Will Be Updated Soon!
WHO SHOULD ATTEND

Developers, analysts or others (who have basic Python experience) who are intending to start using learning about and working with machine learning algorithms, fundamentals and core concepts leveraging Python and Spark.

PREREQUISITES

This is an intermediate level course, geared for Data Scientists, Data Analysts and Developers new to Machine Learning, Spark and TensorFlow. Students should have strong basic Python Skills, Good foundational mathematics in Linear Algebra and Probability and Basic Linux skills, including familiarity with command-line options such as ls, cd, cp, and su. Attendees without Python background may view labs as follow along exercises or team with others to complete them.