SEO Basics: The Hilltop Algorithm

Hilltop

attrib(flickr/Storm Crypt)

When Larry Page and Sergey Brin founded the Google search engine, their core insight was implemented as the PageRank algorithm. Put simply, PageRank determines where a page should rank in the SERPs based on the PageRank of incoming links from other sites — it’s a recursive algorithm. The higher the PageRank, the greater the assumed authority of a particular site. That authority can then be combined with a keyword analysis of a page and that combination is then to devise a ranking of pages in response to a search query.

Thus, PageRank is a proxy measure of authority. Google algorithms weren’t, and still aren’t, for the most part, capable of determining whether a page is authoritative for a particular search query by looking at the page content. PageRank, as anyone who used early search engines knows, was an enormous improvement on previous efforts, but it is far from perfect, and it opened up opportunities for gaming the system with various link-building tactics.

PageRank also has another flaw. The determination of authority and the determination of query relevance are separate processes. PageRank determines an overall measure of authority, but not a measure of authority for a particular topic. First, relevant results are found based on a page’s content, and then those are ranked according to their perceived authority. The chink in the armor here is that a page can be topically relevant for a search query — i.e. contain relevant keywords and semantically related terms — and have a high PageRank, but may not be authoritative for the topic in question. This problem is especially noticeable for broad search terms.

For example, suppose I am a connoisseur of nonsense poetry, like Lewis Carroll’s Jabberwocky, and I have a site where I publish such poetry about my everyday experiences, including my work as an SEO. I’m a wonderful nonsense poet; my work is very popular, and has many incoming links from poetry sites with a high PageRank. Because my poems contain lots of SEO-related keywords, and a high PageRank, they would end up ranking well when people are searching for SEO tips, even though they aren’t at all relevant to those people.

The Hilltop Algorithm, developed by Krishna Bharat, and acquired by Google in 2004, implements a solution to this difficulty. Hilltop makes two assumptions. Firstly, that there are ‘Expert’ pages that contain a lot of links to the best resources on a topic without having a direct affiliation with those resources. Secondly, that there are Authority pages that many of the Expert pages will link to. Once a database of Expert pages has been built, Authority pages can be discovered, and it can be assumed that those pages, and the pages they link to are both authoritative and relevant to the query in question.

In the case of our example, my nonsense poems might be considered Authority pages for nonsense poetry because they are linked to by lots of Expert pages on poetry, but they will no longer rank well for SEO terms, because Expert pages that link to lots of SEO resources and the Authority pages that those Expert pages link to will not include links to my poetry.

Now, the Hilltop algorithm is has been in use for a decade, and I can imagine some of you out there shouting:’old news!’, and you’d be right. Google’s algorithms have moved on leaps and bounds in the intervening years, especially with Panda and Penguin, but, understanding Hilltop is instructive because it helps explain why some link-building strategies practiced by less than scrupulous SEOs are worthless.

Some SEOs like to pretend that ‘a link is a link is a link’, and will sell packages to unsuspecting businesses based on a claim that 1000 links put wherever they can be is a good SEO tactic. And some business people fall for this, because a rapidly growing number of incoming links is an impressive metric. However, as you can see from the above explanation, incoming links are not in any sense equal, and even links from high PageRank sites aren’t going to help a page in the SERPS if those sites are not topic relevant.