Course Catalog
Logging, Monitoring and Observability in Google Cloud
Code: Log Monitor Obser
Duration: 2 Day
$1800 USD

OVERVIEW

This three-day instructor-led course teaches participants techniques for monitoring, troubleshooting, and improving infrastructure and application performance in Google Cloud. Guided by the principles of Site Reliability Engineering (SRE), and using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage.

DELIVERY FORMAT

This course is available in the following formats:

Virtual Classroom

Duration: 2 Day
Classroom

Duration: 2 Day

CLASS SCHEDULE

Delivery Format: Virtual Classroom
Date: Aug 08 2024 - Aug 09 2024 | 09:00 - 17:00 EST
Location: Online
Course Length: 2 Day

$ 1800

GOALS

This course teaches participants the following skills:

  • Plan and implement a well-architected logging and monitoring infrastructure
  • Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
  • Create effective monitoring dashboards and alerts
  • Monitor, troubleshoot, and improve Google Cloud infrastructure
  • Analyze and export Google Cloud audit logs
  • Find production code defects, identify bottlenecks, and improve performance
  • Optimize monitoring costs
OUTLINE

Module 1: Introduction to Google Cloud Monitoring Tools

  • Understand the purpose and capabilities of Google Cloud
  • components: Logging, Monitoring, Error
  • Reporting, and Service Monitoring
  • Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler

 

Module 2: Avoiding Customer Pain

  • Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation
  • Measure customer pain with SLIs
  • Define critical performance measures
  • Create and use SLOs and SLAs
  • Achieve developer and operation harmony with error budgets

 

Module 3: Alerting Policies

  • Develop alerting strategies
  • Define alerting policies
  • Add notification channels
  • Identify types of alerts and common uses for each
  • Construct and alert on resource groups
  • Manage alerting policies programmatically

 

Module 4: Monitoring Critical Systems

  • Choose best practice monitoring project architectures
  • Differentiate Cloud IAM roles for monitoring
  • Use the default dashboards appropriately
  • Build custom dashboards to show resource consumption and application load
  • Define uptime checks to track aliveness and latency

 

Module 5: Configuring Google Cloud Services for Observability

  • Integrate logging and monitoring agents into Compute Engine VMs and images
  • Enable and utilize Kubernetes Monitoring
  • Extend and clarify Kubernetes monitoring with Prometheus
  • Expose custom metrics through code, and with the help of OpenCensus

 

Module 6: Advanced Logging and Anaylsis

  • Identify and choose among resource tagging approaches
  • Define log sinks (inclusion filters) and exclusion filters
  • Create metrics based on logs
  • Define custom metrics
  • Link application errors to Logging using Error Reporting
  • Export logs to BigQuery

 

Module 7: Monitoring Network Security and Audit Logs

  • Collect and analyze VPC Flow logs and Firewall Rules logs
  • Enable and monitor Packet Mirroring
  • Explain the capabilities of Network Intelligence Center
  • Use Admin Activity audit logs to track changes to the configuration or metadata of resources
  • Use Data Access audit logs to track accesses or changes to
    user-provided resource data
  • Use System Event audit logs to track GCP administrative actions

 

Module 8: Managing Incidents

  • Define incident management roles and communication channels
  • Mitigate incident impact
  • Troubleshoot root causes
  • Resolve incidents
  • Document incidents in a post-mortem process

 

Module 9: Investigating Application Performance Issues

  • Debug production code to correct code defects
  • Trace latency through layers of service interaction to eliminate performance bottlenecks
  • Profile and identify resource-intensive functions in an application

 

Module 10:

  • Optimizing the Costs of Monitoring
  • Analyze resource utilization cost for monitoring related components within Google Cloud
  • Implement best practices for controlling the cost of monitoring within Google Cloud

Module 1: Introduction to Google Cloud Monitoring Tools

  • Understand the purpose and capabilities of Google Cloud
  • components: Logging, Monitoring, Error
  • Reporting, and Service Monitoring
  • Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler

 

Module 2: Avoiding Customer Pain

  • Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation
  • Measure customer pain with SLIs
  • Define critical performance measures
  • Create and use SLOs and SLAs
  • Achieve developer and operation harmony with error budgets

 

Module 3: Alerting Policies

  • Develop alerting strategies
  • Define alerting policies
  • Add notification channels
  • Identify types of alerts and common uses for each
  • Construct and alert on resource groups
  • Manage alerting policies programmatically

 

Module 4: Monitoring Critical Systems

  • Choose best practice monitoring project architectures
  • Differentiate Cloud IAM roles for monitoring
  • Use the default dashboards appropriately
  • Build custom dashboards to show resource consumption and application load
  • Define uptime checks to track aliveness and latency

 

Module 5: Configuring Google Cloud Services for Observability

  • Integrate logging and monitoring agents into Compute Engine VMs and images
  • Enable and utilize Kubernetes Monitoring
  • Extend and clarify Kubernetes monitoring with Prometheus
  • Expose custom metrics through code, and with the help of OpenCensus

 

Module 6: Advanced Logging and Anaylsis

  • Identify and choose among resource tagging approaches
  • Define log sinks (inclusion filters) and exclusion filters
  • Create metrics based on logs
  • Define custom metrics
  • Link application errors to Logging using Error Reporting
  • Export logs to BigQuery

 

Module 7: Monitoring Network Security and Audit Logs

  • Collect and analyze VPC Flow logs and Firewall Rules logs
  • Enable and monitor Packet Mirroring
  • Explain the capabilities of Network Intelligence Center
  • Use Admin Activity audit logs to track changes to the configuration or metadata of resources
  • Use Data Access audit logs to track accesses or changes to
    user-provided resource data
  • Use System Event audit logs to track GCP administrative actions

 

Module 8: Managing Incidents

  • Define incident management roles and communication channels
  • Mitigate incident impact
  • Troubleshoot root causes
  • Resolve incidents
  • Document incidents in a post-mortem process

 

Module 9: Investigating Application Performance Issues

  • Debug production code to correct code defects
  • Trace latency through layers of service interaction to eliminate performance bottlenecks
  • Profile and identify resource-intensive functions in an application

 

Module 10:

  • Optimizing the Costs of Monitoring
  • Analyze resource utilization cost for monitoring related components within Google Cloud
  • Implement best practices for controlling the cost of monitoring within Google Cloud
LABS

Will Be Updated Soon!
Will Be Updated Soon!
WHO SHOULD ATTEND

This class is intended for the following participants:

  • Cloud architects, administrators, and SysOps personnel
  • Cloud developers and DevOps personnel
PREREQUISITES

To get the most out of this course, participants should have:

  • Google Cloud Platform Fundamentals: Core Infrastructure or equivalent experience
  • Basic scripting or coding familiarity
  • Proficiency with command-line to