Advanced Big Data Analytics Expert (ABDE) - HRDF Claimable

  Learn Online     5-Day / 40 Hours      International Certification

    Coming Soon      9.30am – 5.30pm

  Learn Online

     5-Day / 40 Hours  

   International Certification

    Coming Soon   

   9.30am – 5.30pm

Course Objective
  • To provide participants with the advanced knowledge on Big Data Analytics.

  • Learn how Big Data Analytics is being applied in real life through real-time demonstration on scenario based hands-on exercises.

Advanced Big Data Analytics Expert (ABDE) - HRDF Claimable

 International Certification

   Learn Online   

  5-Day / 40 Hours

    Coming Soon

9.30am – 5.30pm

 

 

Who Should Attend?

  • Data Scientist, Analyst, Data Engineers, CIO, CTO, Software Programmers, and Anyone interested in acquiring advanced knowledge and skills in Big Data, Hadoop and Python

Programme Details

Advanced Big Data Analytics Expert (ABDE) involves rigorous usage of real-time case studies, hands-on exercises and group discussions

  • Python Refresher
  • Standard Toolkit for Hadoop and Analytics
  • Understanding Relational, NoSQL, and Graph Databases
  • Construction of Data Pipeline
  • Data Modeling in Hadoop
  • HDFS Schema Design
  • HBase Schema Design
  • Working on Metadata
  • Stream Processing
  • Integration of Apache Storm with HDFS and HBase
  • Trident Overview
  • Spark Streaming Overview
  • Flume Interceptors
  • Low-Latency Enrichment, Validation, Alerting, and Ingestion
  • NRT Counting, Rolling Averages, and Iterative Processing
  • Complex Data Pipelines
  • Reproducible Approach to Gathering Data
  • Understanding the Standards and Code Practices
  • Segmentation of Workflow
  • Missing Value Preprocessing with High Reproducibility
  • Use of Functions / Loops to Optimize Coding
  • Utilization of Libraries / Packages / Algorithms
  • Normalization of Data
  • What is Data Ingestion?
  • Different Ways to Perform Data Ingestion
  • Data Extraction
  • Data Processing in Hadoop
  • Overview on MapReduce
  • Working on Spark components
  • Pig and How it is Being Used
  • Overview on Hive
  • Impala Speed-Oriented Design
  • Hadoop and Spark Refresher
  • Spark SQL and Python Pandas DataFrame
  • Improving Analysis Performance with Parquet and Partitions
  • Working with Unstructured Data
  • Working on Spark DataFrames
  • Writing Output from Spark DataFrames
  • Data Manipulation with Spark DataFrames
  • Plotting Graph in Sparks
  • Reading Data from a CSV File with Python PySpark Object
  • Reading JSON Data with Python PySpark Object
  • Using Python PySpark Objects for SQL Operations
  • Generating Statistical Measurements
  • Visualisation Using Plotly
  • Orchestration Frameworks in Hadoop
  • Oozie Terminology and Workflow
  • Windowing Analysis using Spark
  • Parameterizing Workflows
  • Scheduling Patterns
  • Execution of Workflows
  • Handling of Missing Values using Spark DataFrame
  • Correlation Analysis with Python PySpark DataFrame
  • Improving Analysis Performance with Parquet and Partitions
  • Understanding Exploratory Data Analysis
  • Identify Target Variable and Related KPIs
  • Feature Importance of Target Variable
  • Different Phases of an Analytics Project Life Cycle
  • Gaussian Distribution of Numeric Features
  • Resilient Distributed Datasets with Spark
  • Introduction to Spark MLlib
  • Decision Tree with Spark MLlib
  • K-Means Clustering with Spark
  • Term Frequency – Inverse Document Frequency (TF-IDF)
  • DataFrame API with Spark MLlib
  • Understanding A/B Testing
  • Python Refresher
  • Standard Toolkit for Hadoop and Analytics
  • Understanding Relational, NoSQL, and Graph Databases
  • Construction of Data Pipeline
  • Data Modeling in Hadoop
  • HDFS Schema Design
  • HBase Schema Design
  • Working on Metadata
  • What is Data Ingestion?
  • Different Ways to Perform Data Ingestion
  • Data Extraction
  • Data Processing in Hadoop
  • Overview on MapReduce
  • Working on Spark components
  • Pig and How it is Being Used
  • Overview on Hive
  • Impala Speed-Oriented Design
  • Orchestration Frameworks in Hadoop
  • Oozie Terminology and Workflow
  • Windowing Analysis using Spark
  • Parameterizing Workflows
  • Scheduling Patterns
  • Execution of Workflows
  • Stream Processing
  • Integration of Apache Storm with HDFS and HBase
  • Trident Overview
  • Spark Streaming Overview
  • Flume Interceptors
  • Low-Latency Enrichment, Validation, Alerting, and Ingestion
  • NRT Counting, Rolling Averages, and Iterative Processing
  • Complex Data Pipelines
  • Hadoop and Spark Refresher
  • Spark SQL and Python Pandas DataFrame
  • Improving Analysis Performance with Parquet and Partitions
  • Working with Unstructured Data
  • Working on Spark DataFrames
  • Writing Output from Spark DataFrames
  • Data Manipulation with Spark DataFrames
  • Plotting Graph in Sparks
  • Handling of Missing Values using Spark DataFrame
  • Correlation Analysis with Python PySpark DataFrame
  • Improving Analysis Performance with Parquet and Partitions
  • Understanding Exploratory Data Analysis
  • Identify Target Variable and Related KPIs
  • Feature Importance of Target Variable
  • Different Phases of an Analytics Project Life Cycle
  • Gaussian Distribution of Numeric Features
  • Reproducible Approach to Gathering Data
  • Understanding the Standards and Code Practices
  • Segmentation of Workflow
  • Missing Value Preprocessing with High Reproducibility
  • Use of Functions / Loops to Optimize Coding
  • Utilization of Libraries / Packages / Algorithms
  • Normalization of Data
  • Reading Data from a CSV File with Python PySpark Object
  • Reading JSON Data with Python PySpark Object
  • Using Python PySpark Objects for SQL Operations
  • Generating Statistical Measurements
  • Visualisation Using Plotly
  • Resilient Distributed Datasets with Spark
  • Introduction to Spark MLlib
  • Decision Tree with Spark MLlib
  • K-Means Clustering with Spark
  • Term Frequency – Inverse Document Frequency (TF-IDF)
  • DataFrame API with Spark MLlib
  • Understanding A/B Testing

International Certificate

Upon successful completion of the programme, participants will be awarded a verified digital certificate by Casugol.

HRDF

All our training programs are HRDF approved, under the “SBL-Khas” scheme. Fee will be paid by PSMB to REDtone on behalf of employers. No upfront payment is required from the participants. For more information, please visit www.hrdf.com.my

Request Information
Speak with REDtone Academy Specialist about our training programmes 

Get In Touch

phone

Sales Enquiry

phone

Sales Enquiry

Sales Hotline

1800 87 7770 +603 8084 8194 (if you are abroad)

Email Us

WhatsApp

customer care_

Customer Care

customer care_

Customer Care

Support Toll Free

1800 87 7790
+603 8084 8910
(if you are abroad)

Email Us

chat

Lets Talk​

chat

Lets Talk