MATH 255 - Introduction to Data Analytics
Department Syllabus
Description
Introduction to data analysis techniques and programming that enables real-time decision making in IT organizations. Includes skills and applications in pre-processing, preparing, and reporting data for further analysis. (3.0 credits)
Cross-listed with INTE 255, credit may not be received for both courses.
Prerequisites
MATH 130 OR MATH 234 OR MATH 235
Prerequisite that may be taken as a corequisite: MATH 130 OR MATH 234 OR MATH 235
Course Objectives
Students will be able to
- Be proficient in the R statistical programming language and to be able to expand the knowledge learned from the class to another preferred programming language, if need be
- Describe sources of data, their format, and how they can be prepared for analysis
- Evaluate the quality and validity of data used to support a claim or argument
- Implement quantitative strategies to use data for forecasting and prediction
- Develop and deliver effective presentations of data-informed conclusions to a specific audience
Assessment
Assessment of student achievement of the course objectives will vary from one instructor to another. Typical assessment will be made through work in class, homework, projects, presentations, and examinations administered in a traditional face-to-face classroom environment.
Use of Technology
Students will be well served by a statistical computing package on their own devices.
Topics
- Introduction
- Why use R?
- Installing and getting started with R and RStudio
- Installing Packages
- Dataset Creation
- Understanding datasets
- Understanding the different data structures, such as vectors, matrices, arrays, data frames, etc.
- Data inputting and importation from various sources
- Annotating datasets
- Data Management
- How to create variables
- How to recode and rename variables
- How to deal with missing values
- How to convert data values
- How to sort data
- How to merge and subset datasets
- Advanced Data Management
- Numerical and character manipulation functions (statistical, probability, mathematical, regex’s, etc.)
- Applying the functions to matrices and data frames
- Logical expressions
- Loops
- User-written functions
- Aggregation, reshaping, melting, and recasting of data
- Graphical Techniques
- Creating graphs with base R
- Creating graphs with ggplot (if time permits)
- Manipulating parameters of graphs (text, legends, etc.)
- Creating different types of graphs (bar graphs, histograms, box plots, etc.)
- Nonparametric tests for group differences (if time permits)
- Statistical Methods
- Descriptive statistics
- Frequency and Contingency Tables
- Sampling distributions
- Confidence intervals
- Hypothesis testing (t-tests, z-tests, and tests for proportion for one-sample and two-sample data)
- Regression
- Ordinary least Squares (Simple linear regression, quadratic regression, multiple linear regression with and without interactions (if time permits))
- Regression diagnostics and how to correct a model
- How to identify and deal with unusual observations
- Selecting the `best’ model
Recently Used Textbooks
- R in Action: Data Analysis and Graphics with R, 2nd or 3rd edition, Robert Kabacoff, Manning (2015/2022).