Data Mining & Information Analysis

The collection, analysis, and visualization of complex data play critical roles in research, business, and government. Powerful tools from applied statistics, mathematics, and computational science can be used to uncover the meaning behind complex data sets. The Data Mining and Information Analysis track integrates these disciplines to provide students with practical skills and a theoretical basis for approaching challenging data analysis problems. Students in this track learn how to develop and test models for making predictions, to search through large collections of data for rare and unexpected patterns, and to characterize the degree of certainty associated with discoveries made in the course of data analysis. Skills and knowledge acquired in this track are increasingly important in the job market and are highly relevant for a number of graduate school programs.

All Data Mining & Information Analysis students who declared the major in Informatics between September 2008 and December 2009 may follow the original curriculum or the curriculum outlined for students who declared after January 1, 2010.  Students who declare January 2013 or after will follow the curriculum below.   Please contact the Program Coordinator for questions.

Track Courses (15-16 credits)
The following courses:

MATH 217 Linear Algebra +

The topics covered include: systems of linear equations; matrix algebra; vectors, vector spaces, and subspaces; geometry of Rn; linear dependence, bases, and dimension; linear transformations;  Eigenvalues and Eigenvectors;  diagonalization;  inner products. Throughout there will be emphasis on the concepts, logic, and methods of theoretical mathematics.

Enforced prerequisite:  Math 215, 255 or 285

4 credits.  Offered F, W

STATS 406 Introduction to Statistical Computing +

Selected topics in statistical computing, including basic numerical aspects, iterative statistical methods, principles of graphical analyses, simulation and Monte Carlo methods, generation of random variables, stochastic modeling, importance sampling, numerical and Monte Carlo integration.
Enforced prerequisite: STATS 401, STATS 412, STATS 425, or MATH 425.
4 credits. Offered F

STATS 415 Data Mining and Statistical Learning +

This course covers the principles of data mining, exploratory analysis and visualization of complex data sets, and predictive modeling. The presentation balances statistical concepts (such as over-fitting data, and interpreting results) and computational issues. Students are exposed to algorithms, computations, and hands-on data analysis in the weekly discussion sessions.
Advisory prerequisites: MATH 215 and MATH 217; and STATS 401, STATS 406, STATS 412 or STATS 426.
4 credits. Offered W

One of the following courses:

MATH 471 Introduction to Numerical Methods +

This course is a survey of the basic numerical methods that are used to solve scientific problems. The emphasis is evenly divided between theory and applications. Some convergence theorems and error bounds are proved while applications in science and engineering are discussed. The course encourages students to use MATLAB for numerical computations but does not emphasize it. The course covers floating point arithmetic, nonlinear equations and root-finding, solving systems of linear equations, eigenvalue problems, polynomial and spline interpolation, linear regression, and optimization.
Advisory prerequisites: MATH 216, 256, 286, or 316; and 214, 217, 417, or 419; and a working knowledge of one high-level computer language. No credit granted to those who have completed or are enrolled in MATH 371 or 472.
3 credits. Offered F, W, Su

MATH 571 Numerical Methods for Scientific Computing I +

Background and Goals: This course is a rigorous introduction to numerical linear algebra with applications to 2-point boundary value problems and the Laplace equation in two dimensions. Both theoretical and computational aspects of the subject are discussed. Some of the homework problems require computer programming. Students should have a strong background in linear algebra and calculus, and some programming experience. This course is a core course for the Applied and Interdisciplinary Mathematics (AIM) graduate program.
Content: The topics covered usually include direct and iterative methods for solving systems of linear equations: Gaussian elimination, Cholesky decomposition, Jacobi iteration, Gauss-Seidel iteration, the SOR method, an introduction to the multigrid method, conjugate gradient method; finite element and difference discretizations of boundary value problems for the Poisson equation in one and two dimensions; numerical methods for computing eigenvalues and eigenvectors. Alternatives: Math 471 (Intro to Numerical Methods) is a survey course in numerical methods at a more elementary level. Subsequent Courses: Math 572 (Numer Meth for Sci Comput II) covers initial value problems for ordinary and partial differential equations. Math 571 and 572 may be taken in either order. Math 671 (Analysis of Numerical Methods I) is an advanced course in numerical analysis with varying topics chosen by the instructor.
Advisory prerequisites: MATH 214, 217, 417, 419, or 513; and one of MATH 450, 451, or 454.
3 credits. Offered F, W

MATH / STATS 425 Introduction to Probability +

Basic concepts of probability; expectation, variance, covariance; distribution functions; and bivariate, marginal, and conditional distributions.
Advisory prerequisite: MATH 215
3 credits. Offered F, W, Sp Su

STATS 500 Applied Statistics I +

Linear models; definitions, fitting, identifiability, collinearity, Gauss-Markov theorem, variable selection, transformation, diagnostics, outliers and influential observations. ANOVA and ANCOVA. Common designs. Applications and real data analysis are stressed, with students using the computer to perform statistical analyses.
Advisory prerequisites: MATH 417; and STATS 250 or STATS 426.
3 credits. Offered F, W

IOE 310 Introduction to Optimization Methods +

Introduction to deterministic models with emphasis on linear programming; simplex and transportation algorithms, engineering applications, relevant software, introduction to integer, network and dynamic programming, critical path methods.
Enforced prerequisites: MATH 214 or 216 or 256 or 286 or 316; and IOE 202; and ENGR 101 or 101X or 104 or 151 or EECS 100 or 183 or CMPTRSC 100 or 183 (C- or better).
4 credits. Offered F, W

IOE 510 / MATH 561 Linear Programming I +

Formulation of problems from the private and public sectors using the mathematical model of linear programming. Development of the simplex algorithm; duality theory and economic interpretations. Postoptimality (sensitivity) analysis application and interpretations. Introduction to transportation and assignment problems; special purpose algorithms and advanced computational techniques. Students have opportunities to formulate and solve models developed from more complex case studies and to use various computer programs.
Advisory prerequisites: MATH 217, 417, or 419.
3 credits. Offered F, W

IOE 511 / MATH 562 Continuous Optimization Methods +

Survey of continuous optimization problems. Unconstrained optimization problems: unidirectional search techniques; gradient, conjugate direction, quasi-Newton methods. Introduction to constrained optimization using techniques of unconstrained optimization through penalty transformations, augmented Langrangians, and others. Discussion of computer programs for various algorithms.
Advisory prerequisites: MATH 217, 417, or 419
3 credits. Offered F

IOE 512 Dynamic Programming +

Techniques of recursive optimization and their use in solving multistage decision problems, applications to various types of problems, including an introduction to Markov decision processes.
Advisory prerequisites: IOE 510 and 316.
3 credits. Offered F

* Courses have been historically offered as indicated (F = Fall, W = Winter, Sp = Spring, Su = Summer). Terms in which courses are offered are, however, subject to change.

Note: Students may enroll in track courses prior to completing all prerequisite and core courses.

Use this spreadsheet to calculate a concentration GPA in Informatics with a Data Mining and Information Analysis track.  Use all attempts at a course in the GPA calculation.

Elective Courses (12-13 credits)

Eight [8] elective credits must be at the 300 level or higher. See the list of approved concentration electives.

In consultation with a faculty advisor, a course not on the approved list of electives may be selected to fulfill elective credit.  Approval of the course must be obtained prior to enrollment.  The Informatics Elective Approval Form must also be submitted to the Program Coordinator in 439 West Hall.

Informatics Elective Approval Form