# Data Science

The objective of this course is to impart necessary knowledge of the mathematical foundations needed for data science and develop programming skills required to build data science applications.

## Program Overview

#### Learning Outcomes

- Demonstrate understanding of the mathematical foundations needed for data science.
- Collect, explore, clean, munge and manipulate data.

- Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks and clustering.
- Build data science applications using Python based toolkits.

#### Course Contents

###### Module 1

Introduction to Data Science (4 Hours)Concept of Data Science, Traits of Big data, Web Scraping, Analysis vs Reporting

###### Module 2

Introduction to Programming Tools for Data Science (6 Hours)- Toolkits using Python: Matplotlib, NumPy, Scikit-learn, NLTK
- Visualizing Data: Bar Charts, Line Charts, Scatterplots
- Working with data: Reading Files, Scraping the Web, Using APIs (Example: Using the Twitter APIs), Cleaning and Munging, Manipulating Data, Rescaling, Dimensionality Reduction

###### Module 3

Mathematical Foundations (12 Hours)- Linear Algebra: Vectors, Matrices,
- Statistics: Describing a Single Set of Data, Correlation, Simpson’s Paradox, Correlation and Causation
- Probability: Dependence and Independence, Conditional Probability, Bayes’s Theorem, Random Variables, Continuous Distributions, The Normal Distribution, The Central Limit Theorem
- Hypothesis and Inference: Statistical Hypothesis Testing, Confidence Intervals, Phacking, Bayesian Inference

###### Module 4

Machine Learning (16 Hours)Overview of Machine learning concepts – Over fitting and train/test splits, Types of Machine learning – Supervised, Unsupervised, Reinforced learning, Introduction to Bayes Theorem, Linear Regression- model assumptions, regularization (lasso, ridge, elastic net), Classification and Regression algorithms- Naïve Bayes, K-Nearest Neighbors, logistic regression, support vector machines (SVM), decision trees, and random forest, Classification Errors, Analysis of Time Series- Linear Systems Analysis, Nonlinear Dynamics, Rule Induction, Neural NetworksLearning And Generalization, Overview of Deep Learning.

###### Module 5

Case Studies of Data Science Application ( 6 Hours)Weather forecasting, Stock market prediction, Object recognition, Real Time Sentiment Analysis.

#### List of Practicals

- Write a programme in Python to predict the class of the flower based on available attributes.
- Write a programme in Python to predict if a loan will get approved or not.
- Write a programme in Python to predict the traffic on a new mode of transport.
- Write a programme in Python to predict the class of user.
- Write a programme in Python to indentify the tweets which are hate tweets and which are not.
- Write a programme in Python to predict the age of the actors.
- Mini project to predict the time taken to solve a problem given the current status of the user.