This course covers the principles of data mining, exploratory analysis and visualization of complex data sets, and predictive modeling, Topics include: a) techniques and algorithms for exploratory data analysis and for discovering associations, patterns, changes, and anomalies in large data sets; and b) modern methods for multivariate analysis and statistical learning in regression, classification, and clustering. The presentation balances statistical concepts (such as model bias and over-fitting data, and interpreting results) and computational issues (including algorithmic complexity and strategies for efficient implementation). Students are exposed to algorithms, computations, and hands-on data analysis in weekly discussion sessions.
Course Requirements:
Evaluation will be based on weekly problem sets, one midterm exam, and a final project. The final project will be an individual project involving either data analysis using the methods covered in the course, or a simulation-based or analytical investigation of the properties of one of the methods covered in the course. Students will be expected to write a statement of their findings of approximately 3 pages in length, as well as providing clean and documented versions of their computer code,
Intended Audience:
Course can be used as an elective to satisfy the requirements of the statistics concentration, the applied statistics minor, and the statistics minor.
Class Format:
3 hours of lecture and 1 hour GSI-led discussion.