4055f9a51e5c3410VgnVCM100000c2b1d38dRCRDapproved/UMICH/stats/Home/News & Events/Statistics SeminarDepartment Seminar Series: Lihong Li, Multi-World Testing: Unbiased Offline Evaluation in Contextual Bandits ###@###(Mon, 3 Feb 2014)Department Seminar Series: Lihong Li, Multi-World Testing: Unbiased Offline Evaluation in Contextual Bandits ###@###(Mon, 3 Feb 2014)2336 Mason Hallstats139146120000013914612000004:00 PM<p style=" font-family: 'Times New Roman', serif; font-size: 10.0pt; color: black; margin: 0in; margin-bottom: .0001pt;"><span style=" font-size: 12.0pt;">Abstract: &nbsp;Optimizing an interactive learning system against a predefined metric is hard, especially when the metric is computed from user actions (like clicks and purchases).&nbsp; The key challenge is the counterfactual nature: in the example of Bing, any change to the search engine may result in different search result pages for the same query, but we normally cannot infer reliably from historical search log how users would react to the new search results.&nbsp; To compare two systems on a target metric, one typically runs an A/B test on live users, just like a randomized clinical trial.&nbsp; While A/B tests have been very successful, they are unfortunately expensive and time-inefficient.</span></p> <p style=" font-family: 'Times New Roman', serif; font-size: 10.0pt; color: black; margin-bottom: .0001pt; margin: 0in;">&nbsp;</p> <p style=" font-family: 'Times New Roman', serif; font-size: 10.0pt; color: black; margin-bottom: .0001pt; margin: 0in;"><span style=" font-size: 12.0pt;">Recently, offline evaluation (a.k.a. counterfactual analysis) of interactive learning systems, without the need for online A/B testing, has gained growing interests in both industry and the research community, with successes in several important applications.&nbsp; This approach effectively allows one to run (potentially infinitely) many A/B tests *offline* from historical log, making it possible to estimate and optimize online metrics easily and inexpensively.&nbsp; In this talk, I will formulate the problem in the framework of contextual bandit, explain the basic techniques for unbiased offline evaluation as well as several improvements, and describe successful stories in two important applications: personalized news recommendation and Web search.&nbsp; It is anticipated that this approach will find substantial use in many other learning problems, yielding greater offline experimental agility with improved online performance.</span></p>Nlorieannbzuniga13910248182101055f9a51e5c3410VgnVCM100000c2b1d38d____once11112newnewLihong Li, Ph.D., Researcher, Machine Learning and Intelligence Group, Microsoft Researchhttp://research.microsoft.com/en-us/people/lihongli/