Data Science

Data Science and Machine Learning

Data Science and Machine Learning Using Python
Learn Data Science & Machine Learning, Deep Learning with Python Language

Trainer :- Experienced Data Science Consultant

Duration : 3 Months

Become a Data Scientist

One time class room registration to Payment Details Fee 1000/-

Curriculum
What is Next
Introduction
Why Yess Infotech
Trainer Profile

Data Science, Deep Learning, & Machine Learning with Python & R Language With Live Machine Learning & Deep Learning Projects

Project 1 Build your own image recognition model with TensorFlow
Project 2 Predict fraud with data visualization & predictive modeling!
Project 3 Spam Detection
Project 4 Build your own Recommendation System
Project 5 Build your own Python predictive modeling, regression analysis & machine learning Model
Getting Started
- Course Introduction
- Course Material & Lab Setup
- Installation
- Python Basic – Part – 1
- Python Basic – Part – 2
- Advance Python – Part – 1
- Advance Python – Part – 2

● Statistics and Probability Refresher, and Python Practice

Types of Data
Mean, Median, Mode
Using mean, median, and mode in Python
Variation and Standard Deviation
Probability Density Function; Probability Mass Function
Common Data Distributions
Percentiles and Moments
A Crash Course in matplotlib
Covariance and Correlation
Conditional Probability
Exercise Solution: Conditional Probability of Purchase by Age
Bayes’ Theorem

● Predictive Models

Linear Regression
Polynomial Regression
Multivariate Regression, and Predicting Car Prices
Multi-Level Models

● Machine Learning with Python

Supervised vs. Unsupervised Learning, and Train/Test
Using Train/Test to Prevent Overfitting a Polynomial Regression
Bayesian Methods: Concepts
Implementing a Spam Classifier with Naive Bayes
K-Means Clustering
Clustering people based on income and age
Measuring Entropy
Install GraphViz32. Decision Trees: Concepts
Decision Trees: Predicting Hiring Decisions
Ensemble Learning
Support Vector Machines (SVM) Overview
Using SVM to cluster people using scikit-learn

● Recommender Systems

User-Based Collaborative Filtering
Item-Based Collaborative Filtering
Finding Movie Similarities
Improving the Results of Movie Similarities
Making Movie Recommendations to People
Improve the recommender’s results

● More Data Mining and Machine Learning Techniques

K-Nearest-Neighbors: Concepts
Using KNN to predict a rating for a movie
Dimensionality Reduction; Principal Component Analysis
PCA Example with the Iris data set
Data Warehousing Overview: ETL and ELT
Reinforcement Learning

● Dealing with Real-World Data

Bias/Variance Tradeoff
K-Fold Cross-Validation to avoid overfitting
Data Cleaning and Normalization
Cleaning web log data
Normalizing numerical data
Detecting outliers

● Experimental Design

A/B Testing Concepts
T-Tests and P-Values
Hands-on With T-Tests
Determining How Long to Run an Experiment
A/B Test Gotchas

● Deep Learning and Neural Network

● Statistics and Data Science in R

● Introduction

Introduction to R
R and R studio Installation & Lab Setup
Descriptive Statistics

● Descriptive Statistics

0Mean, Median, Mode
Our first foray into R : Frequency Distributions
Draw your first plot : A Histogram
Computing Mean, Median, Mode in R
What is IQR (Inter-quartile Range)?
Box and Whisker Plots
The Standard Deviation
Computing IQR and Standard Deviation in R

● Inferential Statistics

Drawing inferences from data
Random Variables are ubiquitous
The Normal Probability Distribution
Sampling is like fishing
Sample Statistics and Sampling Distributions

● Case studies in Inferential Statistics

● Diving into R

Harnessing the power of R
Assigning Variables
Printing an output
Numbers are of type numeric
Characters and Dates
Logicals

● Vectors

Data Structures are the building blocks of R
Creating a Vector
The Mode of a Vector
Vectors are Atomic
Doing something with each element of a Vector
Aggregating Vectors
Operations between vectors of the same length
Operations between vectors of different length
Generating Sequences
Using conditions with Vectors
Find the lengths of multiple strings using Vectors
Generate a complex sequence (using recycling)
Vector Indexing (using numbers)
Vector Indexing (using conditions)
Vector Indexing (using names)

● Arrays

Creating an Array
Indexing an Array
Operations between 2 Arrays
Operations between an Array and a Vector
Outer Products

● Matrices

A Matrix is a 2-Dimensional Array
Creating a Matrix
Matrix Multiplication
Merging Matrices
Solving a set of linear equations

● Factors

What is a factor?
Find the distinct values in a dataset (using factors)
Replace the levels of a factor
Aggregate factors with table()
Aggregate factors with tapply()

● Lists and Data Frames

Introducing Lists
Introducing Data Frames
Reading Data from files
Indexing a Data Frame
Aggregating and Sorting a Data Frame
Merging Data Frames

● Regression quantifies relationships between variables

Linear Regression in Excel : Preparing the data.
Linear Regression in Excel : Using LINEST()

● Linear Regression in R

Linear Regression in R : Preparing the data
Linear Regression in R : lm() and summary()
Multiple Linear Regression
Adding Categorical Variables to a linear mode
Robust Regression in R : rlm()
Parsing Regression Diagnostic Plots

○ Predictive Models

Linear Regression
Polynomial Regression
Multivariate Regression, and Predicting Car Prices
Multi-Level Models

○ Machine Learning with R

Supervised vs. Unsupervised Learning, and Train/Test
Using Train/Test to Prevent Overfitting a Polynomial Regression
Bayesian Methods: Concepts
Implementing a Spam Classifier with Naive Bayes
K-Means Clustering
Clustering people based on income and age
Measuring Entropy
Install GraphViz32. Decision Trees: Concepts
Decision Trees: Predicting Hiring Decisions
Ensemble Learning
Support Vector Machines (SVM) Overview
Using SVM to cluster people using scikit-learn

○ Recommender Systems

User-Based Collaborative Filtering
Item-Based Collaborative Filtering
Finding Movie Similarities
Improving the Results of Movie Similarities
Making Movie Recommendations to People
Improve the recommender’s results

○ More Data Mining and Machine Learning Techniques

K-Nearest-Neighbors: Concepts
Using KNN to predict a rating for a movie
Dimensionality Reduction; Principal Component Analysis
PCA Example with the Iris data set
Data Warehousing Overview: ETL and ELT
Reinforcement Learning

○ Dealing with Real-World Data

Bias/Variance Tradeoff
K-Fold Cross-Validation to avoid overfitting
Data Cleaning and Normalization
Cleaning web log data
Normalizing numerical data
Detecting outliers

○ Experimental Design

A/B Testing Concepts
T-Tests and P-Values
Hands-on With T-Tests
Determining How Long to Run an Experiment
A/B Test Gotchas

● Data Visualization in R

Data Visualization
The plot() function in R
Control color palettes with RColorbrewer
Drawing bar plots
Drawing a heatmap
Drawing a Scatterplot Matrix
Plot a line chart with ggplot

We Will Be Updated Soon.

Machine Learning Using Python

Basic Python

1 Introduction

1.1 What is Python..?

1.2 A Brief history of Python

1.3 Installing Python

1.4 How to execute Python program

Using the Python Interpreter
- 1. Invoking the Interpreter
  - 1.1. Argument Passing
  - 1.2. Interactive Mode
- 2. The Interpreter and Its Environment
  - 2.1. Source Code Encoding

An Informal Introduction to Python
- 1. Using Python as a Calculator
  - 1.1. Numbers
  - 1.2. Strings
  - 1.3. Lists
- 2. First Steps Towards Programming

More Control Flow Tools
- 1. if Statements
- 2. for Statements
- 3. The range() Function
- 4. break and continue Statements, and else Clauses on Loops
- 5. pass Statements
- 6. Defining Functions
- 7. More on Defining Functions
  - 7.1. Default Argument Values
  - 7.2. Keyword Arguments
  - 7.3. Arbitrary Argument Lists
  - 7.4. Unpacking Argument Lists
  - 7.5. Lambda Expressions
  - 7.6. Documentation Strings
  - 7.7. Function Annotations
- 8. Intermezzo: Coding Style

Data Structures
- 1. More on Lists
  - 1.1. Using Lists as Stacks
  - 1.2. Using Lists as Queues
  - 1.3. List Comprehensions
  - 1.4. Nested List Comprehensions
- 2. The del statement
- 3. Tuples and Sequences
- 4. Sets
- 5. Dictionaries
- 6. Looping Techniques
- 7. More on Conditions
- 8. Comparing Sequences and Other Types

Modules
- 1. More on Modules
  - 1.1. Executing modules as scripts
  - 1.2. The Module Search Path
  - 1.3. “Compiled” Python files
- 2. Standard Modules
- 3. The dir() Function
- 4. Packages
  - 4.1. Importing * From a Package
  - 4.2. Intra-package References
  - 4.3. Packages in Multiple Directories

Input and Output
- 1. Fancier Output Formatting
  - 1.1. Formatted String Literals
  - 1.2. The String format() Method
  - 1.3. Manual String Formatting
  - 1.4. Old string formatting
- 2. Reading and Writing Files
  - 2.1. Methods of File Objects
  - 2.2. Saving structured data with json

Data Science with Python

Install Anaconda Distribution as per OS from https://www.anaconda.com/distribution/ (Python 3.7 version)
Sign Up for account creation on https://www.hackerrank.com/
Sign up for account creation on https://www.kaggle.com/
Sign up for account creation on https://github.com/
Git Bash Utility – https://git-scm.com/downloads

Module 1: Statistics and Probability

Descriptive Statistics:

Central tendency: Mean, Median, Mode
Sample variance
Standard deviation
Random Variables: Discrete, Continuous
Probability density functions
Binomial distribution
Expected Value, E(X)
Poisson Process
Law of large numbers
Standard normal distribution and empirical rule
Z-score

Inferential Statistics:

Central limit theorem
Sampling distribution of the sample mean
Standard error of the mean
Mean and variance of Bernoulli distribution
Margin of error 1
Margin of error 2
Confidence interval
Hypothesis testing and p-value
One-tailed and two tailed tests
Z-statistics and T-statistics
Type 1 error
Squared error of regression line
Co-efficient of determination
Chi-square distribution
Pearson’s chi square test (goodness of fit)
Co-relation and casualty.

Module 2: Data Analysis using Python

Numpy

Numpy Vector and Matrix
Functions – arange(), zeros(), ones(), linspace(), eye(),
reshape(), random(), max(), min(),
argmax(), argmin(), shape and dtype attribute
Indexing and Selection
Numpy Operations – Array with Array, Array with Scalars,
Universal Array Functions

Pandas

Pandas Series
Pandas Data-Frame
Missing Data (Imputation)
Group by Operations
Merging, Joining and Concatenating Data-Frame.
Pandas Operations
Data Input and Output from wide variety of formats like csv, excel, db and html etc.

Module 3: Data Visualization using python Matplotlib, Seaborn, Pandas-in built, Plotly and Cufflinks

Matplotlib

plot() using Functional approach
multi-plot using subplot()
figure() using OO API Methods
add_axes(), set_xlabel(), set_ylabel(), set_title() Methods
Customization – figure size, impoving dpi, Plot appearance,
Markers, Control over axis appearance and special Plot Types

Seaborn

Distribution Plots using distplot(), jointplot(), pairplot(), rugplot(), kdeplot()
Categorical Plots using barplot(), countplot(), boxplot(), violinplot(), stripplot(), swarmplot(), factorplot()
Matrix Plots using heatmap(), clustermap()
Grid Plots using PairGrid(), FacetGrid()
Regression Plots using lmplot()
Styles and Colors customization.

Plotly and Cufflinks

Interactive Plotting using Plotly and Cufflinks

Pandas Built-in

Histogram, Area Plot, Bar Plot, Scatter Plot, Box-plot, Hex-plot, Kde-plot, Density Plot e. Choropleth Maps
Interactive World Map and US Map using Plotly and Cufflinks Module

Module 4: GIT

Distribution Version Control System
How internally, GIT Manages Version Control on Changesets.
Creating Repository
Basic Commands like, git status, git add, git remove, git branch, git checkout, git log, git cat-file, git pull, git push, git commit
Managing Configuration – System Level, User Level, Repository level

Module 5: Jupyter Notebook

Introduction, Basic Commands, Keyboard Shortcut and Magic Functions

Module 6: Linear Algebra and Calculus

Vector and Matrix, basic operations
Trigonometry
Derivatives

Module 7: SQL

MySQL Server and Client Installation
SQL Queries
CRUD Operations
Types of tables(Fact and dimension)

Module 8: Big Data

What is big data?
What is distributed computing?
What is parallel processing?
Why data scientist require big data?

Module 9: Machine Learning Introduction

What is Machine Learning?
Machine Learning Process Flow-Diagram
Different Categories of Machine Leaning – Supervised, Unsupervised and Reinforcement
Scikit-Learn Overview
Scikit-Learn cheat-sheet

Module 10: Regression

Linear Regression
Robust Regression (RANSAC Algorithm)
Exploratory Data Analysis (EDA)
Correlation Analysis and Feature Selection
Performance Evaluation – Residual Analysis, Mean Square Error (MSE), Co-efficient of
Determination R^2, Mean Absolute Error (MAE), Root Mean Square Error (RMSE)
Polynomial Regression
Regularized Regression – Ridge, Lasso and Elastic Net Regression
Bias-Variance Trade-Off
Cross Validation – Hold Out and K-Fold Cross Validation
Data Pre-Processing – Standardization, Min-Max, Normalization and Binarization
Gradient Descent

Module 11: Classification – Logistic Regression

Sigmoid function
Logistic Regression learning using Stochastic Gradient Descent (SGD)
SGDClassifier
Measuring accuracy using Cross-Validation, Stratified k-fold
Confusion Matrix – True Positive (TP), False Positive (FP), False
Negative (FN), True Negative (TN)
Precision, Recall, F1 Score, Precision/Recall Trade-Off
Receiver Operating Characteristics (ROC) Curve.

Module 12: Classification – k-Nearest Neighbor(KNN)

Classification and Regression
Application, Advantages and Disadvantages
Distance Metric – Euclidean, Manhattan, Chebyshev, Minkowski
Measuring accuracy using Cross-Validation, Stratified k-fold, Confusion Matrix, Precision, Recall, F1-score.

Module 13: Classification – SVM (Support Vector Machine)

Classification and Regression
Separating line, Margin and Support Vectors
Linear SVC Classification
Polynomial Kernel – Kernel Trick
Gaussian Radial Basis Function (rbf)
Grid Search to tune hyper-parameters.
Support Vector Regression.

Module 14: Classification –Decision Trees

CART (Classification and Regression Tree)
Advantages and Disadvantages and its applications.
Decision Tree Learning algorithms – ID3, C4.5, C5.0 and CART.
Gini Impurity, Entropy and Information Gain
Decision Tree Regression
Visualizing a Decision Tree using graphviz module.
Regularization using tuning hyper-parameters using GridSearch CV.

Module 15: Classification – Ensemble Methods

Bootstrap Aggregating or Bagging
Random Forest algorithm
Extremely Randomized (Extra-Trees) Ensemble
Boosting – AdaBoost (Adaptive Boosting), Gradient Boosting
Machine (GBM), XGBoost (Extreme Gradient Boosting)

Module 16: Unsupervised Learning – Clustering

Connectivity- based Clustering using Hierarchical Clustering.
Ward’s Agglomerative Hierarchical Clustering
K-Means Clustering
Elbow Method and Solhouette Analysis

Module 17: Unsupervised Learning – Dimensionality Reduction

Linear Principal Component Analysis (PCA) reduction.
Kernel PCA
Linear Discriminant Analysis (LDA) on Supervised Data.

Module 18: Model Deployment On AWS Cloud

What is cloud computing?
What is AWS?
How to store data in AWS S3?
Create deep learning instance on EC2.
Amazon sagemaker to train, tune, build and deploy on production.

Module 19: Tableau

What is tableau? Its Application
Installing tableau public
Tableau Application and use
Tableau tool introduction
Tableau UI-Dimensions and measures
Connecting to data
Filter and its types
Groups
Set
Hierarchy
Graphs
Table calculation
LOD Expression
Data Blending

How we are Different from Others : Our Teachers covers each topics with Real Time Examples . They take 8 Real time project and more than 72+ assignments for almost every topic. We have Trainer from Real Time Industry with 15 years experience in DS. They are working as Data Science Machine Learning and AI consultant having 10+ years in ML & AI real time implementation and migrations.

This is completely Practical oriented training , Means everything you learn you will be able to code for the same . We have students who get confident in coding within 1 week of joining the training. that is our success and method of teaching. Here in Yess InfoTech , we always take prerequisite sessions also. Also we start from basic installation of the IDEs and other required softwares. Our way of teaching is that student will gain the confidence that , they got up-skilled to a different level. Also our student got many great positions and salary ranges in many great organizations.

- 5 DS Domain Based Project With Real Time Data ( with one trainer – two project.
- 9 Moc interviews(Monthly 3)
- Unlimited Assignments
- 28 Real Time Scenarios and Major topics
- Basic Python
- Machine Learning with Python
- Installation
- Data Visualization in R
- 19 Modules on Basics

60 Hours Online Sessions

12 Hours of assignments

10 hours for One Project and 50 Hrs for 2 Project ( Candidates should prepare with mentor support . 50 hours mentioned is total hours spent on project by each trainer )

Unlimited Interview Questions

Administration and Manual Installation of python with other Domain based projects will be done on regular basis apart from our normal batch schedule .

We do take projects

- Training By 15+ Years experienced Real Time Trainer
- A pool of 60+ real time Practical Sessions on Data Science
- Scenarios and Assignments to make sure you compete with current Industry standards
- World class training methods
- Training until the candidate get satisfed
- Certification and Placement Support until you get certified and placed for 4 years
- All training in reasonable cost
- 10000+ Satisfied candidates
- 5000+ Placement Records
- Corporate and Online Training in reasonable Cost
- Complete End-to-End Project with Each Course
- World Class Lab Facility which facilitates I3 /I5 /I7 computers
- Wifi available in Lab

- Resume And Interview preparation with 100% Hands-on Practical sessions
- Doubt clearing sessions any time after the course till 1 year
- Happy to help you any time after the course also

Trainer is having 15 year experience in Data Science with 10 years in Data Science Machine Learning and AI. It has been 15 years now that he has been working extensively in the top level Software company. He is having different kind of certifications in DS. He also have done corporate sessions and seminars both in India and abroad. Recently he was engaged by Yess InfoTech for sessions and professional motivator for working processionals to achieve their day to day targets.

All trainers at our organization are currently working on the technologies in reputed organization. The curriculum is not just some theory or some PPTs. We have all practical sessions and that to we ask our student to implement the same in the session only. We provide notes for the same. We use simple easy language and the contents are well absorbed by the candidates. The always give assignment. Also that the faculties are industry experienced so we give real time projects and practice. We also provide recorded sessions but that will be costing differently. Also we provide result oriented training.

Curriculum