Alliance Micro Solutions | Public Course Catalog

Machine Learning Foundation: Working with Spark and TensorFlow

Code: TTML5508

Duration: 5 Day

$2695 USD

Overview
Delivery Format
Class Schedule
Goals
Outline
Labs
Who Should Attend
Prerequisites

OVERVIEW

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R - the favorite languages of Data Scientists - along with SQL-based front ends.

Machine Learning & Deep Learning Essentials with Spark and TensorFlow is hands-on course designed for data scientists and software engineers new to Machine Learning. Working in a hands-on learning environment, youll learn how to perform Machine Learning at scale using the popular Apache Spark framework, working from the ground up, exploring Apache Spark essentials, core machine learning concepts, regressions, classifications, clustering and more.

The abundance of data and affordable cloud scale has led to an explosion of interest in Deep Learning. Google has released an excellent library called TensorFlow to open-source, allowing state-of-the-art machine learning done at scale, complete with GPU-based acceleration. In the second half of the class, youll dive into deep learning concepts and learn how to implement them using TensorFlow.

DELIVERY FORMAT

This course is available in the following formats:

Virtual Classroom

Duration: 5 Day

Classroom

Duration: 5 Day

CLASS SCHEDULE

Call 800-798-3901 to enroll in this class!

GOALS

Throughout the program, working in a hands-on learning environment guided by our expert instructor, students will

Learn popular machine learning algorithms, their applicability, and limitations
Practice the application of these methods in the Spark machine learning environment
Learn practical use cases and limitations of algorithms
Will explore not just the related APIs, but will also learn the theory behind them
Work with real world datasets from Uber, Netflix, Walmart, Prosper, etc. Use cases subject to change.

OUTLINE

Part 1: Introduction to Machine Learning

Machine Learning (ML) Overview

Machine Learning landscape
Machine Learning applications
Understanding ML algorithms & models

ML in Python and Spark

Spark ML Overview
Introduction to Jupyter notebooks

Machine Learning Concepts

Statistics Primer
Covariance, Correlation, Covariance Matrix
Errors, Residuals
Overfitting / Underfitting
Cross-validation, bootstrapping
Confusion Matrix
ROC curve, Area Under Curve (AUC)

Feature Engineering (FE)

Preparing data for ML
Extracting features, enhancing data
Data cleanup
Visualizing Data

Linear Regression

Simple Linear Regression
Multiple Linear Regression
Running LR
Evaluating LR model performance
Lab

Logistic Regression

Understanding Logistic Regression
Calculating Logistic Regression
Evaluating model performance

Classification: SVM (Supervised Vector Machines)

SVM concepts and theory
SVM with kernel

Classification: Decision Trees & Random Forests

Theory behind trees
Classification and Regression Trees (CART)
Random Forest concepts

Classification: Naive Bayes

Theory
Lab

Clustering (K-Means)

Theory behind K-Means
Running K-Means algorithm
Estimating the performance

Principal Component Analysis (PCA)

Understanding PCA concepts
PCA applications
Running a PCA algorithm
Evaluating results

Recommendations (Collaborative filtering)

Recommender systems overview
Collaborative Filtering concepts

Performance

Best practices for scaling and optimizing Apache Spark
Memory caching
Testing and validation

Part Two: Introduction to Deep Learning with TensorFlow

Machine Learning Quick Review

Understanding Machine Learning
Supervised versus Unsupervised Learning
Regression
Classification
Clustering

Introducing TensorFlow

TensorFlow intro
TensorFlow Features
TensorFlow Versions
GPU and TPU scalability

The Tensor: The Basic Unit of TensorFlow

Introducing Tensors
TensorFlow Execution Model

Single Layer Linear Perceptron Classifier with TensorFlow

Introducing Perceptrons
Linear Separability and Xor Problem
Activation Functions
Softmax output
Backpropagation, loss functions, and Gradient Descent

Hidden Layers: Intro to Deep Learning

Hidden Layers as a solution to XOR problem
Distributed Training with TensorFlow
Vanishing Gradient Problem and ReLU
Loss Functions

High level TensorFlow: tf.learn

Using high level TensorFlow
Developing a model with tf.learn

Convolutional Neural Networks in TensorFlow

Introducing CNNs
CNNs in TensorFlow

Introducing Keras

What is Keras?
Using Keras with a TensorFlow Backend

Recurrent Neural Networks in TensorFlow

Introducing RNNs
RNNs in TensorFlow

Long Short-Term Memory (LSTM) in TensorFlow

Introducing RNNs
RNNs in TensorFlow

Part 1: Introduction to Machine Learning

Machine Learning (ML) Overview

Machine Learning landscape
Machine Learning applications
Understanding ML algorithms & models

ML in Python and Spark

Spark ML Overview
Introduction to Jupyter notebooks

Machine Learning Concepts

Statistics Primer
Covariance, Correlation, Covariance Matrix
Errors, Residuals
Overfitting / Underfitting
Cross-validation, bootstrapping
Confusion Matrix
ROC curve, Area Under Curve (AUC)

Feature Engineering (FE)

Preparing data for ML
Extracting features, enhancing data
Data cleanup
Visualizing Data

Linear Regression

Simple Linear Regression
Multiple Linear Regression
Running LR
Evaluating LR model performance
Lab

Logistic Regression

Understanding Logistic Regression
Calculating Logistic Regression
Evaluating model performance

Classification: SVM (Supervised Vector Machines)

SVM concepts and theory
SVM with kernel

Classification: Decision Trees & Random Forests

Theory behind trees
Classification and Regression Trees (CART)
Random Forest concepts

Classification: Naive Bayes

Theory
Lab

Clustering (K-Means)

Theory behind K-Means
Running K-Means algorithm
Estimating the performance

Principal Component Analysis (PCA)

Understanding PCA concepts
PCA applications
Running a PCA algorithm
Evaluating results

Recommendations (Collaborative filtering)

Recommender systems overview
Collaborative Filtering concepts

Performance

Best practices for scaling and optimizing Apache Spark
Memory caching
Testing and validation

Part Two: Introduction to Deep Learning with TensorFlow

Machine Learning Quick Review

Understanding Machine Learning
Supervised versus Unsupervised Learning
Regression
Classification
Clustering

Introducing TensorFlow

TensorFlow intro
TensorFlow Features
TensorFlow Versions
GPU and TPU scalability

The Tensor: The Basic Unit of TensorFlow

Introducing Tensors
TensorFlow Execution Model

Single Layer Linear Perceptron Classifier with TensorFlow

Introducing Perceptrons
Linear Separability and Xor Problem
Activation Functions
Softmax output
Backpropagation, loss functions, and Gradient Descent

Hidden Layers: Intro to Deep Learning

Hidden Layers as a solution to XOR problem
Distributed Training with TensorFlow
Vanishing Gradient Problem and ReLU
Loss Functions

High level TensorFlow: tf.learn

Using high level TensorFlow
Developing a model with tf.learn

Convolutional Neural Networks in TensorFlow

Introducing CNNs
CNNs in TensorFlow

Introducing Keras

What is Keras?
Using Keras with a TensorFlow Backend

Recurrent Neural Networks in TensorFlow

Introducing RNNs
RNNs in TensorFlow

Long Short-Term Memory (LSTM) in TensorFlow

Introducing RNNs
RNNs in TensorFlow

LABS

Will Be Updated Soon!

WHO SHOULD ATTEND

Developers, analysts or others (who have basic Python experience) who are intending to start using learning about and working with machine learning algorithms, fundamentals and core concepts leveraging Python and Spark.

PREREQUISITES

This is an intermediate level course, geared for Data Scientists, Data Analysts and Developers new to Machine Learning, Spark and TensorFlow. Students should have strong basic Python Skills, Good foundational mathematics in Linear Algebra and Probability and Basic Linux skills, including familiarity with command-line options such as ls, cd, cp, and su. Attendees without Python background may view labs as follow along exercises or team with others to complete them.

Alliance Micro Solutions provides certified and advanced degree computer instructors and consultants