Data Science

The objective of this course is to impart necessary knowledge of the mathematical foundations needed for data science and develop programming skills required to build data science applications.

Program Overview

Learning Outcomes

  • Demonstrate understanding of the mathematical foundations needed for data science.
  • Collect, explore, clean, munge and manipulate data.
  • Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks and clustering.
  • Build data science applications using Python based toolkits.

Duration : 45 hours


Enroll Now

Course Contents

Module 1
Introduction to Data Science (4 Hours)

Concept of Data Science, Traits of Big data, Web Scraping, Analysis vs Reporting

Module 2
Introduction to Programming Tools for Data Science (6 Hours)
  • Toolkits using Python: Matplotlib, NumPy, Scikit-learn, NLTK
  • Visualizing Data: Bar Charts, Line Charts, Scatterplots
  • Working with data: Reading Files, Scraping the Web, Using APIs (Example: Using the Twitter APIs), Cleaning and Munging, Manipulating Data, Rescaling, Dimensionality Reduction
Module 3
Mathematical Foundations (12 Hours)
  • Linear Algebra: Vectors, Matrices,
  • Statistics: Describing a Single Set of Data, Correlation, Simpson’s Paradox, Correlation and Causation
  • Probability: Dependence and Independence, Conditional Probability, Bayes’s Theorem, Random Variables, Continuous Distributions, The Normal Distribution, The Central Limit Theorem
  • Hypothesis and Inference: Statistical Hypothesis Testing, Confidence Intervals, Phacking, Bayesian Inference
Module 4
Machine Learning (16 Hours)

Overview of Machine learning concepts – Over fitting and train/test splits, Types of Machine learning – Supervised, Unsupervised, Reinforced learning, Introduction to Bayes Theorem, Linear Regression- model assumptions, regularization (lasso, ridge, elastic net), Classification and Regression algorithms- Naïve Bayes, K-Nearest Neighbors, logistic regression, support vector machines (SVM), decision trees, and random forest, Classification Errors, Analysis of Time Series- Linear Systems Analysis, Nonlinear Dynamics, Rule Induction, Neural NetworksLearning And Generalization, Overview of Deep Learning.

Module 5
Case Studies of Data Science Application ( 6 Hours)

Weather forecasting, Stock market prediction, Object recognition, Real Time Sentiment Analysis.

List of Practicals

  1. Write a programme in Python to predict the class of the flower based on available attributes.
  2. Write a programme in Python to predict if a loan will get approved or not.
  3. Write a programme in Python to predict the traffic on a new mode of transport.
  4. Write a programme in Python to predict the class of user.
  5. Write a programme in Python to indentify the tweets which are hate tweets and which are not.
  6. Write a programme in Python to predict the age of the actors.
  7. Mini project to predict the time taken to solve a problem given the current status of the user.