moved to presnick.wordpress.com

This blog has now moved to presnick.wordpress.com, where spam comments seem to be handled a little better.

I apologize to all of my past real commenters, whose comments are now hidden. The spam comments had overwhelmed your genuine contributions.

Personalized Filters Yes; Bubbles No

On Thursday, I gave the closing keynote for the UMAP conference (User Modeling and Personalization) in Girona, Spain (Catalunya). I had planned to talk about by work directed towards creating balanced news aggregators that people will prefer to use over unbalanced ones (see project site). But then Eli Pariser’s TED Talk  and book on “Filter Bubbles” starting getting a lot of attention.  He’s started a trend of a little parlor game where a group of friends all try the same search on Google and look in horror when they see that they get different results. So I decided to broaden my focus a little beyond news aggregators. I titled the talk, “Personalized Filters Yes; Bubbles No.”

As you can perhaps guess from my title, I agree with some of Pariser’s concerns about bubbles. But I think he’s on the wrong track in attributing those concerns to personalization. Most of his concerns, I argue, come from badly implemented personalization, not from personalization itself. I’ve posted a copy of my slides and notes. For anyone who wants a summary of my arguments, here goes.

His first concern I summarize as “Trapped in the Old You”. I argued that personalization systems that try to maximize your long-term clickthrough rates will naturally try to explore to see if you like things, not just give you more of the same. This is the whole point of the algorithmic work on multi-armed bandit models, for example. Moreover, once our personalization systems take into account that  our interests and tastes may change over time, and that there is declining marginal utility, eventually, for more of the same (consider the 7,000th episode of Star Trek; even I stopped watching). Personalization systems that are designed to optimize some measure of user satisfaction (such as click-throughs or purchases or dwell time or ratings) are going to be designed to give you serendipitous experiences, introducing you to things you like that you didn’t know you would like. Moreover, even today’s systems often do that pretty well, in part because when they optimize on matching on one dimension (say topic) they end up giving us some diversity in another dimension that matters to us (say, political ideology). From introspection, I think most people can recall times when automated personalization systems did introduce them to something new that became a favorite.

His second concern I summarize as, “Reinforcing Your Baser Instincts”. Here, too, good personalization systems should take into account the difference between short-term and long-term preferences (entertainment vs. education, for example). We will need delayed indicators of long-term value, such as measuring which words from articles we end up using in our own messages and blog posts, or explicit user feedback after some delay (the next day or week). It may also be helpful to offer features that people can opt in to that nudge them toward their better selves (their long-term preferences). Here, I gave examples of nudges toward balanced news reading that you can see if you look at the slides.

His third concern I summarize as, “Fragmenting Society”, but there are two somewhat separable sub-elements of this. One is the need for common reference points, so that we can have something to discuss with colleagues at the water cooler or strangers on the subway. Here, I think if individuals value having these common reference points, then it will get baked into the personalized information streams they consume in a natural way. That is, they’ll click on some popular things that aren’t inherently interesting to them, and the automated personalization algorithms will infer that they have some interest in those things. Perhaps better would be for personalization algorithms to try to learn a model that assumes individual utility is a combination of personal match and wanting what everyone else is getting, with the systems learning the right mix of the two for the individual, or the individual actually getting a slider bar to control the mix.

The second sub-concern is fragmenting of the global village into polarized tribes.  Here it’s an open question whether personalization will lead to such polarization. It hinges on whether the network fractures into cliques with very little overlap or permeability. But the small-world properties of random graphs suggest that even a few brokers, or a little brokering by a lot of people, may be enough to keep average shortest path short. Individual preferences would have to be strongly in favor of insularity within groups in order to get an outcome of real fragmentation. It turns out that people’s preferences with respect to political information, as concluded by my former doctoral student Kelly Garrett, is that they like confirmatory information but have at best a mild aversion to challenge. Moreover, some people prefer a mix of challenging and confirmatory information and everyone wants challenging information sometimes (like when they know they’re going to have to defend their position at an upcoming family gathering.) Thus, it’s not clear that personalization is going to lead us to political fragmentation, or any other kind. Other forces in society may or may not be doing that, but probably not personalization. Despite that, I do think that it’s a good idea to include perspective-taking features in our personalization interfaces, features that make it easy to see what other people are seeing. My slides include a nice example of this from the ConsiderIt work of Travis Kriplean, a PhD student at the University of Washington.

The final point I’d like to bring up is that personalization broadens the set of things that are seen by *someone*. That means that more things have a chance to get spread virally, and eventually reach a broader audience than would be possible if everyone saw the same set of things. Instead of being horrified by the parlor game showing that we get different search results than our friends do, we should delight in the possibility that our friends will be able to tell us something different.

Overall, we should be pushing for better personalization, and transparent personalization, not concluding that personalization per se is a bad thing.

At the conference banquet the night before my talk, attendees from different countries were invited to find their compatriots and choose a song to sing for everyone else. (The five Americans sang, “This Land is Your Land”). Inspired by that, I decided to compose a song we could all sing together to close the talk and the conference, and which would reinforce some themes of my talk.  The conference venue was a converted church, and the acoustics were great. Many people sang along. The melody is “Twinkle, Twinkle, Little Star”.

(Update: someone captured it on video: )

The Better Personalization Anthem

User models set me free
                as you build the Daily Me

Yes exploit, but please explore
                could just be that I’ll want more

Broaden what my models know
                UMAP scholars make it so

 Words: Paul Resnick and Joe Konstan
 Melody: Wolfgang Amadeus Mozart

Yelp gets more reviews per reviewer than CitySearch or Yahoo Local

Author attributes it to the fact that reviewers are anonymous at CitySearch and Yahoo local, but build up reputations on Yelp. Of course, there are also other differences between the sites.

Zhongmin Wang (2010) “Anonymity, Social Image, and the Competition for Volunteers: A Case
Study of the Online Market for Reviews,” The B.E. Journal of Economic Analysis & Policy: Vol.
10: Iss. 1 (Contributions), Article 44.
Available at: http://www.bepress.com/bejeap/vol10/iss1/art44

Abstract:
This paper takes a first step toward understanding the working of the online market for re-
views.   Most online review firms rely on unpaid volunteers to write reviews.   Can a for-profit
online review firm attract productive volunteer reviewers, limit the number of ranting or raving
reviewers, and marginalize fake reviewers?  This paper sheds light on this issue by studying re-
viewer productivity and restaurant ratings at Yelp, where reviewers are encouraged to establish a
social image, and two competing websites, where reviewers are completely anonymous. Using a
dataset of nearly half a million reviewer accounts, we find that the number (proportion) of prolific
reviewers on Yelp is an order of magnitude larger than that on either competing site, more produc-
tive reviewers on all three websites are less likely to give an extreme rating, and restaurant ratings
on Yelp tend to be much less extreme than those on either competing site.

Need recommender systems contest ideas

Do you have an idea or plan for a future challenge/contest that you think could move the field of Recommender Systems forward? I’d love to hear about your idea or plan, even if only in sketch form, and even if you’re not in a position to carry it out yourself. At this year’s RecSys conference in Barcelona, I’ll be moderating a panel titled, “Contests: Way Forward or Detour?” As part of that panel, I’d like to present brief sketches of several contest ideas for the panelists to respond to.

Please send me your ideas!

----------------------Abstract of the Session
Panelists:
Joseph A. Konstan, University of Minnesota, USA
Andreas Hotho, University of Würzburg, Germany
Jesus Pindado, Strands, Inc., USA

Contests and challenges have energized researchers and focused attention in many fields recently, including recommender systems. At the 2008 RecSys conference, winners were announced for a contest proposing new startup companies. The 2009 conference featured a panel reflecting on the then recently completed Netflix challenge.

Would additional contests help move the field of recommender systems forward? Or would they just draw attention from the most important problems to problems that are most easily formulated as contests? If contests would be useful, what should the tasks be and how should performance be evaluated? The panel will begin with short presentations by the panelists. Following that, the panelists will respond to brief sketches of possible new contests. In addition to prediction and ranking tasks, tasks might include making creative use of the outputs of a fixed recommender engine, or eliciting inputs for a recommender engine.

Gerhard Fischer paper at C&T

"Towards an Analytic Framework for Understanding and Supporting Peer-Support Communities in Using and Evolving Software Products" at C&T

Participation in SAP's online community.

Before/After point system introduced in SAP:
Mean response time decreased (51 min vs. 34 min.)
Mean helper count increased (1.89 vs. 2.02)
Percentage answered (12% vs. 30%)

Some evidence of gaming of the system—people just ask questions to gain points.

Karim Lakhani at C&T

Karim Lakhani is giving a great keynote at C&T about tracking innovation. He has worked with MatLab programming contests that have a fascinating format. There's clear performance outcome; source code of all entries is available to other people; leaders are tracked. Researchers can track which lines of code get reused.

What leads to displacing the currently leading entry?
--novel code
--novel combos of others' code
--NOT borrowed code
--complexity
--NOT conformance

What leads code to get reused in future (leading) entries?
--novel code
--novel combos of others' code
--borrowed code
--complexity
--conformance

Also did experiments with TopCoder.
One experiment with computational biology contest problem.
Three conditions (random assignment?):
Fully collaborative vs. fully competitive vs. mixed (competitive first week, then all code shared)
Fully collaborative got the best performance
Best performing entries did better than state-of-the-art in computational biology

Universally Utility-Maximizing Privacy Mechanisms

Interesting paper presentation by Tim Roughgarden.

He gave a nice introduction to the recent literature on provably privacy-preserving mechanisms for publishing statistical summaries such as counts of rows from databases satisfying some property (e.g., income > 100,000). Suppose a mechanism computes the actual count, and then reports something possibly different (e.g., by adding noise). There is a definition of a p-privacy if, for every possible output (count), for any person (row) the ratio of the probability of that output with the row present to the probability of that output with the row omitted is always in the range [p, 1/p]. Intuitively, whatever the actual count, there's not much revealed about whether any particular person has high income.

One technique that works for counts, LaPlace-p, is to add to the correct count +/- z, where probability of z is 1/2(-lnp)e^^(z/lnp). For any reported count, there's some confidence interval around it, and the size of that confidence interval is independent of the count. Thus, for reported count 1, you can't really tell whether the correct count is 1 or 0, and thus you can't really tell whether a particular person has high income, *even if you have great side information about everyone else in the database*. On the other hand, if the reported count is 27,000, you still can't tell much about any one person, but you can be pretty sure that the correct count is somewhere around 27,000.

Roughgarden's paper is about how much value you can get from the best count function (in terms of some loss function comparing true result to reported result) while still preserving the p-privacy requirement. It turns out that a mechanism very much like LaPlace-p, but discretized, works to minimize the expected loss no matter what the user of the count's priors are about the distribution of correct counts. It is in this sense that it is universal. This requires a little user-specific post-processing of the algorithm's output, based on the user's priors about the correct counts. For example, if the reported count is -1, we know that's not the correct count; it must really be 0 or something positive, and you can back out from the report and the user's prior beliefs to infer a belief distribution over correct counts.

Babaioff; Characterizing Truthful Multi-Armed Bandit Mechanisms

At Economics of Search conference.

Moshe Babaioff presented an interesting paper.

Suppose that you're conducting an auction for adwords, where you want to rank the bidders based on expected revenue in order to allocate slots and determine prices for slots based on bids. But suppose you don't know what the clickthrough rate will be for the items.

In a multi-armed bandit model, there are multiple bandit slot machines and you have to decide which arms to pull. There is an explore/exploit tradeoff-- you need to explore (experiment) to estimate the clickthrough rates, including some experimentation with those you have low estimates for, in case that estimate is wrong. But over time you switch to more exploitation, where you pull the arm of the highest expected value.

The new twist in this paper is that you want advertisers to truthfully reveal their valuation for a click. If clickthrough rates are known, you can set price essentially using a second-price mechanism based on bid*clickthrough. But if you're using a multi-armed bandit algorithm to determined clickthrough rates, the correct prices would depend on estimated clickthrough rates that you don't necessarily see because you don't test them.

It's a theory paper. They prove that, with the requirement that the mechanism induce trruthful bidding, there's always a pure exploration phase, where the selection of the winners can depend on previous clickthroughs but *not* on the bids; and then a pure exploitation phase, where the clickthroughs no longer affect allocation of slots in the next round. The best multi-armed bandit algorithms without the truth-telling requirement don't have that separation of phases. And, it turns out that the best algorithms without the truth-telling requirement have less "regret" relative to the best you could do if you magically knew the clickthrough rates at the beginning.

So now I'm curious what the best algorithms are without the truth-telling requirement. My guess is that they put more exploration into things that the best estimate so far has higher value for. We actually need to use an algorithm like this for the next version of our "converation double pivots" work on drupal.org, where we're going to dynamically change the set of recommended items based on a combination of prior generated from recommender algorithms and actual observed clicks. But we don't have any truthful revelation requirement, so we should be able to use the standard algorithms.

Rewards program for social networking activity

As part of the CommunityLab project, for the past five years I've been doing research related to incentives for participation in online communities. Now one of my colleagues, Yan Chen, is working with a startup company, urTurn, that has created a cross-platform rewards program. That is, you accumulate points for posting photos or making friend links in social network sites like Facebook and MySpace. Then you turn in the points for prizes.

I'm not quite sure what their business model will be (what do they get from having people accumulate points on their site)? But it will be interesting to see how motivating the points are for people, and how they will prevent various attempts to game the system.

So, sign up, help Yan with her research (she has no financial stake in the company), and win valuable prizes!

How newsgroups refer to NetScan data

Reflections and Reactions to Social Accounting Meta-Data. Eric Gleave (U of Washington) and Marc Smith (Microsoft Research). At C&T.

In 18 months, there were about 5000 messages that explicitly referred to "netscan.research". Analyzed/coded 952 messages.

Basic findings:

  • Half discuss groups. 80% of those linking to the Netscan report card for the group, 17% explicitly discuss the group's "health".

  • 22% discuss the message's author, such as saying that the author is #1 in the group.

  • 31% discuss others, including their stats; 5% of these are "troll checks"

  • 48% discuss the Netscan system itself



Some discussion points:

  • Helpful for comparisons between competing groups on similar topics

  • Reduces costs of monitoring and sanctioning

  • Facilitates construction and maintenance of status

  • Identifies people who are trolls