HackerNews Ranking Algorithm
Here's a small thought experiment that explores an alternative approach to ranking posts on HackerNews.
One of my favorite sites to visit daily is HackerNews(HN), it never failed me to deliver good quality links and very interesting discussions.
This is just a small thought experiment for a different ranking algorithm for HN. There's nothing wrong with the current one, but it's just a small thought experiment.
Just to refresh, according to Paul Graham's comment and FAQ, the HN algorithm is...
P = Points
T = Age in hours
G = 1.8
Minus one removes the user's own upvote. By default, HN upvotes its own user submission.
I’ve seen one with penalties but let’s stick to what’s official. Also, I just did a quick Grep search and found this article and wonderful repo about “Reverse Engineering the Hacker News Ranking Algorithm“
Intuitively, the ranking algorithm is simple: the more upvotes a link receives in a short amount of time, the higher it will be ranked at the top. As time passes, the post will gradually move down the rankings.
Here is my approach
I can't find the official page on why Paul Graham created HN, but to me, HN is the place to find the most interesting links and have a healthy discussion about various topics. In fact, discussions are my favorite part, except for the useless comments.
Given that discussions are my favorite part, why don't we apply PageRank for every user based on the upvotes they receive for the comments they leave on any post, and replace the P (points) value in the current version of the HN algorithm?
PageRank is an algorithm created by the founders of Google. It is used to determine the popularity of web pages on the internet.
By the way, the name PageRank did not suit this use case. I am going to call it HackerRank (HR). Here is a visualization if you are trying to picture it.
Since it's likely that one user may upvote multiple comments from the same user, we check whether a user has already upvoted a comment from that specific user before considering their upvote. In other words, we treat user profiles as nodes and upvotes for comments as edges.
Considering this, a HackerRank user profile will look like this:
N = Total number of user profiles on HN
D = Damping factor
HR = HackerRank score for the user who upvoted
TU = Total upvotes given
By default, the HR for every profile will be 1/N
.
But I won't consider HackerRank as it is. Sometimes, HN comments are inappropriate, and they will be flagged. We should consider the flagging because, remember, HN should be a place to have healthy discussions.
Let's consider that 1 flag equals a deduction of 20% from the HR score. However, we will only take into account the flags received in the current month, as people can change from being unpleasant to becoming better human beings.
So, HR with “flag” consideration will look like this.
TF = Total flags received in the current month
FP = Flag penalty which is 20%
If HR is negative, then it will be 0.
We now have HR for every user profile, so the final ranking algorithm will look like this by hooking HR into the current version of the HN algorithm.
HR = HR value represents the HackerRank score of the profile that upvoted this post
T = Age in hours
G = 1.8
This is a very simple approach, but here are some other ideas that are worth exploring.
HackerRank score, which is also determined by the upvotes a user receives for their submissions.
Reading time for the article.
Track how well a website is performing on HN and put it on the front page if the website has a high reputation for performing well.
Will I ever publicly write about how HN ranks posts if I am Dang (HN moderator)? No, because Pagerank can be manipulated by people despite its reputation. In fact, Pagerank is being exploited for years. Moreover, there are financial incentives for companies to get on the first page in HN.
I would use HackerRank for ranking posts but publicly say that we are using original PaulGraham’s algorithm for ranking posts, and hide the upvotes for comments since it is powering HackerRank, also take some additional steps to avoid reverse engineering and rank manipulation.
But, I am curious. How would you have done it? If you were designing the HN algorithm, please leave your thoughts in the comments. I am curious to know.
Discussion on HN → https://news.ycombinator.com/item?id=35510413
Plug: Hey, we are building a new kind of search engine. Our goal is to deliver authoritative and non-SEO-spammed results. Please check it out and let me know your feedback.
Great write up, but substack is messing up the formulas displayed after clicks. It really messed up the equations as it appears that anything after "\" is escaped from latex rendering.
Beauty of current algorithm is its simple and easy to compute but most importantly anyone can submit the news and get a fair chance at it. Whether you are new user or old user, news is news and news trend based on votes. But your suggested algorithm (that requires 100 cpu and 128 gb ram btw but I digress) is very allowing the power to concentrate in a very few hands irrespective of the news value they submit. it very much incentivizing to create bots and voting rings and spam the site to harvest as much as votes as possible like stackoverflow.