CSCI 452

Data Mining

Coordinator: Stephanie Schwartz

Credits: 4.0

Description

An introduction to data mining, including data cleaning, the application of statistical and machine learning techniques to discover patterns in data, and the analysis of the quality and meaning of results. Machine learning topics may include algorithms for discovering association rules, classification, prediction, and clustering. Lab assignments provide practice applying specific techniques and analyzing results. An independent project provides students with the opportunity to guide a project from data selection and cleaning through to presentation of results.

Prerequisites

CSCI 366 AND (MATH 235 OR MATH 333 OR MATH 335).

Course Outcomes

At the end of this course, a student will:

  1. Understand the fundamentals of data mining, including what kinds of data can be mined, what kinds of patterns can be mined, and what kinds of applications are targeted
  2. Understand and apply the underlying mathematical and statistical methods used in data mining

  3. Apply machine learning techniques and statistical techniques in data mining applications

  4. Analyze data in both an exploratory and targeted manner

  5. Evaluate the appropriateness of various algorithms and techniques for different domains and problems

  6. Evaluate results in terms of significance, reliability and meaning

These goals will be accomplished through the content of the lectures and textbook, as well as hands-on experience. This hands-on experience includes writing programs (both in the lab and in project assignments). There will also be a significant course project in which you identify an analysis topic, discover data, model the data using data mining techniques, analyze the results, and report outcomes. The achievement of the goals will be measured through your performance on approximately 7 lab assignments, the project, and two exams (midterm and final).

Tentative Semester Schedule

Week 1: Introductory materials on experimental design and data

Week 2: Data and Linear/Logistic Regression

Week 3: Decision Trees

Week 4: Evaluation

Week 5: Naïve Bayes

Week 6: KNN

Week 7: SVMs

Week 8: Ensemble Classification Methods

Week 9: Class Imbalance

Week 10: Association Analysis

Week 11: Cluster Analysis

Week 12: Outlier Analysis

Week 13: Time Series, Graph, Spatial Data

Week 14: Presentations