CSCI 161

Large-Scale Data Analytics and Visualization

Coordinator: jingnan xie

Credits: 4.0

Description

A practical introduction to data analytics, visualization, and blending theory. Students will learn about and apply various clustering algorithms and techniques for dealing with noisy data, use a distributed data analytics framework, complete laboratory assignments using version control, and enforce reproducibility by having all science easily sharable. Students will become familiar with modern data analytics methods and explore real-world data sets. Visualization of results will be a large component of the course through interactive and static frameworks. Offered Periodically.

Prerequisites

CSCI 366 AND (MATH 235 OR MATH 333 OR MATH 335).

Course Outcomes

At the end of this course, a student will:

Create reproducible, explainable data science workflows
Use modern distributed Map-Reduce framework, such as Apache Spark, to analyze data
Implement parallel clustering methods
Develop strategies for overcoming common imperfections in real-world datasets
Apply visualization techniques to multi-dimensional data
Apply gained skills to extract insights from multi-dimensional, real-word datasets

These goals will be accomplished through the content of the lectures and textbook, as well as hands-on experience. This hands-on experience includes writing programs (both in the lab and in project assignments). There will also be a significant course project in which you identify an analysis topic, discover data, model the data using data mining techniques, analyze the results, and report outcomes. The achievement of the goals will be measured through your performance on approximately 7 lab assignments, the project, and two exams (midterm and final).

Tentative Semester Schedule

Week 1: Introductory materials on experimental design and data

Week 2: Data operations: filtering, transforming, reducing

Week 3: Distributed computing

Week 4: Distributed regression

Week 5: Visualization of one-dimensional data

Week 6: Visualization of two-dimensional data

Week 7: Exam

Week 8: Case Study: K-Means Clustering

Week 9: Distributed Graph Algorithms

Week 10: Case Study: Page Rank

Week 11: Distributed Regression

Week 12: Distributed Machine Learning + Cross Validation

Week 13: Distributed SQL

Week 14: Presentations

08/23
Candlelighting Ceremony
Learn More
09/18
COT Meeting
Learn More
09/26
Mentorship Fair & Kickoff Celebration
Learn More

See All Events

07/23
ASSP Students Honored
Learn More
07/22
Prepare for Impact: Hurricane Season is Here
Learn More
07/17
Tuition Freeze for 7th Year in a Row
Learn More

See All News

CSCI 453

Large-Scale Data Analytics and Visualization

Coordinator: jingnan xie

Credits: 4.0

Description

Prerequisites

Course Outcomes

Candlelighting Ceremony

COT Meeting

Mentorship Fair & Kickoff Celebration

ASSP Students Honored

Prepare for Impact: Hurricane Season is Here

Tuition Freeze for 7th Year in a Row