• :+ 91-96765 20832
  • : contact@maatrainings.com

Spark and Scala Learning

  • Home
  • /
  • Spark and Scala

Spark and Scala Course Content :

About The Course

Maa Trainings : is a One of the best quality training centre for Self-paced trainings. We are providing training throughout the world wide. Maa trainings self-paced programs are designed by keeping in mind about the people who are busy with their schedule. So it is prepared by spark and scala expert who are real time working on spark and scala. The course is developed by them in such a way that everyone will feel it easy to understand. You can go Through the free demo so that you will get clear idea about it. We provide have High Quality Video Recorded We provide related course material. You will get Lifetime access to the course you have selected. We have 24x7 technical support for your help.

  • Review:
  • Course Name

    Spark and Scala
  • Course duration

    35-40 Hrs
  • Faculty

    Real time Expert
  • Category

    Selfpaced Learning
  • Support

    24/7 Technical Support

Who Can Learn

  • Professionals from Analysts background.
  • Software Developers
  • Web Developers
  • Fresher’s / Professionals from any field
  • Graduates looking for a career in Android..

With the growing era of technology and need to constantly update oneself to outstand in the competitive market, Maa Trainings has come to existence to provide people the knowledge about the latest trends in technology . We provide a team of trainers who will put across a thorough and detailed idea about the respective technical courses that you wish to explore . Our work doesnot end here. Maa Trainings gives an opportunity to work on real time projects which would be guided by our real time trainers. A technical back end team would always be available to answer your queries at any point of time and will also assist you to arrange your training sessions

Tableau Course Information

In MAA Trainings all trainers are well experts and providing training with practically..Here we are teaching from basic to advance. Our real time trainers fulfill your dreams and create professionally driven environment. In Spark and Scala Selfpaced training we are providing sample live projects, materials, explaining real time scenarios, Interview skills…We are providing Best Spark and Scala Selfpaced Training in Hyderabad,India

Course content

* Functional Test Case
* What is Big Data?
* What are the challenges for processing big data?
* What technologies support big data?
* 3V’s of BigData and Growing.
* What is Hadoop?
* Why Hadoop and its Use cases
* History of Hadoop
* Different Ecosystems of Hadoop.
* Advantages and Disadvantages of Hadoop
* Real Life Use Cases
* MapReduce limitations
* Spark History
* Spark Architecture
* Spark and Hadoop Advantages
* Benefits of Spark + Hadoop
* Introduction to Spark Eco-system

* HDFS architecture
* Features of HDFS
* Where does it fit and Where doesn't fit?
* HDFS daemons and its functionalities
* Name Node and its functionality
* Data Node and its functionality
* Secondary Name Node and its functionality
* Data Storage in HDFS
* Introduction about Blocks
* Data replication
* Accessing HDFS
* CLI(Command Line Interface) and admin commands
* Java Based Approach
* Hadoop Administration
* Hadoop Configuration Files
* Configuring Hadoop Domains
* Precedence of Hadoop Configuration
* Diving into Hadoop Configuration
* Scheduler
* RackAwareness
* Cluster Administration Utilities
* Rebalancing HDFS DATA
* Copy Large amount of data from HDFS FSImage and Edit.log file.

* Single node, Pseudo-distribution and Multinode Cluster
* Hadoop Installation
* Hive Installation
* Sqoop Installation
* Spark Installation
* Cassandra Installation
* VMware Installation
* Ubuntu Installation
* Kafka Installation
* Zookeeper Installation
* MongoDB Installation
* Zeppelin Installation
* Python Installation * Java Installation
* Scala Installation
* R Installation
* Eclipse Installation
* SBT Installation
* Maven Installation

*Scala foundation
*Features of Scala
*Setup Spark and Scala on Unbuntu and Windows OS Install IDE's for Scala
*Run Scala Codes on Scala Shell Understanding Data types in Scala Implementing Lazy Values
*Control Structures Looping Structures Functions Procedures Collections
*Loop Statements
*Arrays and Array Buffers
*Map's, Tuples and Lists

* Implementing Classes
* Implementing Getter & Setter Object & Object Private Fields
* Implementing Nested Classes Using Auxilary Constructor Primary Constructor Companion Object
* Apply Method Understanding Packages Override Methods
* Type Checking Access Modifier Casting
* Abstract Classes
* Extractors
* Exception Handling

* Understanding Functional programming in Scala
* Implementing Traits
* Layered Traits
* Rich Traits
* Call By Name Function
* Function with Named Arguments
* Function With Variable Argument Recursion Function
* Default Parameter Values
* Nested Functions
* Functions Anonymous
* Partially Applied Function
* Higher Order Functions
* Closures and Currying
* Performing File Processing

* What is Apache Spark
* A Unified Stack * Spark Core, Spark SQL, Spark Streaming, MLib, GraphX, Cluster Manager
* Basic operations on Shell
* Spark Java projects
* Spark Context and Spark Properties
* Persistence in Spark
* HDFS data from Spark

* What is Spark RDDs
* How RDDs make Spark a feature rich framework
* Transformations, action and persistence
* Lazy operations and fault tolerance
* Load data and create RDD
* Persist RDD in memory or disk
* Pair operations and key-value
* Spark Hadoop Integration
* Hands on and core concepts of map() transformation.
* Hands on and core concepts of filter() transformation.
* Hands on and core concepts of flatMap() transformation.
* Compare map and flatMap transformation.
* Understanding RDD
* Loading data into RDD
* Scala RDD, Paired RDD, Double RDD & General RDD Functions
* Implementing HadoopRDD, Filtered RDD, Joined RDD
* Transformations, Actions and Shared Variables
* Spark Operations on YARN Sequence File Processing
* Partitioner and its role in Performance improvement
* Difference between Map Reduce Key-Value pair and RDD Key-Value pair
* RDD Lineage
* Garbage Collector and Memory Management
* Working with Key-Value Paired RDD RDD Partitions
* Partitioning of File-based RDDs
* HDFS and Data Locality
* All Methods if Transformations and Actions (Every RDD Method will get covered)

* File Formats
* Text Files
* JSON
* Comma-Separated Values and Tab-Separated Values
* Sequence Files
* Object Files
* Parquet Files
* Hadoop Input and Output Formats
* File Compression
* Filesystems
* Local Regular FS
* HDFS
* Structured Data with Spark SQL
* Apache Hive
* JSON
* Connectivity with Databases
* Java Database Connectivity
* Connectivity with Cassandra * Connectivity with Mongo DB

* Introduction of a Cluster Manager
* Spark Runtime Architecture
* YARN
* Mesos
* Amazon
* The Driver
* Executors
* Cluster Manager
* Launching a Program
* Deploying Applications with spark-submit
* Cluster Managers
* Standalone Cluster Manager
* Hadoop YARN
* Apache Mesos
* Amazon EC2
* Which Cluster Manager to Use?

* What is Spark SQL
* Features and Data flow
* Spark SQL architecture and components
* Hive and Spark together
* Data frames and loading data
* Hive Queries through Spark
* Various DDL and DML operations
* Caching
* Loading and Saving Data
* Apache Hive
* Parquet
* JSON
* From RDD's
* JDBC/ODBC Server
* Working with Beeline
* Long-Lived Tables and Queries
* User-Defined Functions
* Spark SQL UDFs
* Hive UDFs
* Catalyst Optimizer
* Various Execution Plans
* Joins (SQL & Core)
* DataFrames
* DataSets

* Introduction to Spark Streaming
* Need for stream analytics
* Comparison with Storm and S4
* Real time data processing using streaming
* Fault tolerance and check pointing
* Stateful Stream Processing
* DStream and window operations
* Spark Stream execution flow
* Connection to various source systems
* Performance optimizations in Spark
* Spark Streaming Overview-Example: Streaming Word Count.
* Other Streaming Operations.
* Sliding Window Operation.
* Developing Spark Streaming Applications.
* Architecture and Abstraction
* Transformations
* Stateless Transformations
* Stateful Transformations
* Output Operations
* Input Sources
* Core Sources
* Additional Sources
* Multiple Sources and Cluster Sizing
* 24/7 Operation
* Checkpointing
* Driver Fault Tolerance
* Worker Fault Tolerance
* Receiver Fault Tolerance
* Processing Guarantees
* Streaming UI
* Performance Considerations
* Batch and Window Sizes
* Word Count Socket Streaming and Twitter Example
* Level of Parallelism
* Garbage Collection and Memory Usage

* Kafka with Spark
* Cassandra with Spark
* Zeppelin with Spark
* Spark with Python
* Spark with Java
* Accumulators
* Accumulators and Fault Tolerance
* Custom Accumulators
* Broadcast Variables
* Optimizing Broadcasts
* Working on a Per-Partition Basis
* Piping to External Programs
* Numeric RDD Operations
* Optimizing and Performance Tuning
* Optimizing Garbage Collection
* Optimizing Level of Parallelism
* Understanding the future of optimization - project Tungsten
* Putting Spark into Production
Apache Spark Developer Cheat Sheet

World class courses from our institute. Don’t hesitate to contact us for details