Simple popularity algorithm with decay
June 2, 2012 1 Comment
Most social web sites have user created objects such as pictures, posts, links, etc.. that need to be shown in some dynamic popularity or ranking order on website. The measurement of popularity is typically some user activity such as votes, downvotes, number of clicks or shares per object.
It would be trivial just to calculate total sum of the actions, and sort objects by the sum but this doesn’t take into account the bandwagon effect where people start voting the already top items so the top list freezes quickly and no new items have change to bubble on top. Another problem of this approach is that once object reaches high score it stays high even when no actions would anymore happen. Good popularity measurement needs to decay somehow.
First page of top list (typically 10-20 items) should always show fresh “hot” items together with long time favorites. We want that older objects fade and need more and more actions to stay on the top list as time passes by.
Simple way to do this is with relative scoring, where base score grows constantly so that score for younger object is always naturally higher than for old objects with same number of actions. Object creation timestamp can be used trivially as base score. For example:
object_score = created_timestamp + number_of_actions
created_timestamp is object creation time as number of hours or days from some arbitrary fixed reference time . The popularity list is simply all objects sorted by the score. New objects have naturally higher score and old ones need more and more actions to stay on top.
Example in Javascript
Fixed reference time 1st Jan 2000. Timestamp as hours
Object A created on midnight 1st May 2012, 30 votes
(new Date('2012-5-1') - new Date('2000-1-1'))/1000/60/60 + 30 = 108126
Object B created on midnight 2nd May 2012 2 votes
(new Date('2012-5-2') - new Date('2000-1-1'))/1000/60/60 + 2 = 108122
So, on 2nd May the object A score would still be ahead B but the B needs only 5 more votes to pass A. The A will fall behind quickly unless it gets constantly more votes. In fact with this trivial formula object has to get 24 actions per day just to keep up with the increasing base score.
Major benefit of growing base score is that its one pass, i.e. app updates score for every new action and thats it. There is no need to do computation on query and no periodic maintenance is required. It’s also compatible with Map-Reduce based views (e.g. CouchDB) as its not dependent of query time.
I assume many high traffic web sites (Digg for sure) use some derivative of this basic method, but each site needs to tune it based on their actions, traffic and user behavior.
See also Simple Cooldown and Rate Control Algorithm for another perspective to time based scoring.
Pingback: Simple Cooldown and Rate Control Algorithm | Brave New Method