A central problem in statistical learning is that of learning a functional dependence from an input (instance) space to an output space. There has been extensive research on this topic for the past few decades. Researchers have recently focused on more complex problems involving structured output spaces such as strings, trees and graphs. Ranking is a special learning problem where the output space, i.e. the space where ranking functions map to, is a space of permutations. An interesting aspect of ranking problems is that the supervision space may be different from the output space.
In the first part of the presentation, we will describe a mathematical framework for establishing a unified theory of ranking, when there exists different types of supervision and multiple loss functions. We will discuss two problems: dependence of learnability of a class of ranking functions on
the ranking loss, and the theory of convex surrogates for non-smooth ranking losses. We will propose a new large-margin ranking algorithm that is scalable to large datasets, adapts to listwise loss func- tions, has desirable theoretical properties, and has empirical performance comparable to state-of-the- art ranking algorithms.
In the second part of the presentation, we will discuss theoretical problems related to the optimiza- tion and generalization rates of stochastic gradient descent algorithms, when applied in parallel in a distributed setting (e.g., a cluster of computers). In particular, we will show empirical results that suggest that the generalization performance of a popular stochastic gradient descent algorithm, when applied in parallel, is possibly similar to its serial version. However, the parallel implementation scales to large datasets.