Google On Protecting Anchor Text Signal From Spam Site Influence – Search Engine Journal

In a Google SEO office hours session, Googles Duy Nguyen of the search quality team answered a question about links on spam sites and how trust has something to do with it.

It was interesting how the Googler said they were protecting the anchor text signal. Its not something thats commonly discussed.

Building trust with Google is an important consideration for many publishers and SEOs.

Theres an idea that trust will help get a site indexed and properly ranked.

Its also known that there is no trust metric, which sometimes confuses some in the search community.

How can algorithm trust if its not measuring something?

Googlers dont really answer that question but there are patents and research paper that give an idea.

The person who submitted a question to the SEO office hours asked:

If a domain gets penalized does it affect the links that are outbound from it?

The Googler, Duy Nguyen, answered:

I assume by penalize you mean that the domain was demoted by our spam algorithms or manual actions.

In general, yes, we dont trust links from sites we know are spam.

This helps us maintain the quality of our anchor signals.

Googlers talk about trust and its clear that theyre talking about their algorithms trusting something or not trusting something.

In this case its not about not counting links that are on spam sites, in particular, this is about not counting the anchor text signal.

The SEO community talks about building trust but in this case, its really about not building spam.

Not every site is penalized or receives a manual action. Some sites arent even indexed and thats the job of Googles Spam Brain, an AI platform that analyzes webpages at different points, beginning at crawl time.

The spam brain platform functions as:

The way the Spam Brain platform works is that it trains an AI on the knowledge Google has about spam.

Google commented on how spam brain works:

By combining our deep knowledge of spam with AI, last year we were able to build our very own spam-fighting AI that is incredibly effective at catching both known and new spam trends.

We dont know what knowledge of spam Google is talking about, but there are various patents and research papers about it.

Those who want to take a deep dive on this topic may consider reading an article I wrote about the concept of link distance ranking algorithms, a method for ranking links.

I also published a comprehensive article about multiple research papers that describe link related algorithms that may describe what the Penguin algorithm is.

Although many of the patents and research papers are within the last ten or so years, there havent really been anything else published by search engines and university researchers since.

The importance of those patents and research papers is that its possible that they can make it into Googles algorithm in a different form, such as for training and AI like Spam Brain.

The patent discussed in the link distance ranking article describes how the method assigns ranking scores for pages based on the distances between the a set of trusted seed sites and the pages that they link to. The seed sites are like starting points for calculating what sites are normal and which sites are not (i.e. spam).

The intuition is that the further a site is from a seed site the likelier the site can be considered spammy. This part, about determining spamminess through link distance is discussed in research papers cited in the Penguin article I referenced earlier.

The patent, (Producing a Ranking for Pages Using Distances in a Web-link Graph), explains:

The system then assigns lengths to the links based on properties of the links and properties of the pages attached to the links.

The system next computes shortest distances from the set of seed pages to each page in the set of pages based on the lengths of the links between the pages.

Next, the system determines a ranking score for each page in the set of pages based on the computed shortest distances.

The same patent also mentions whats known as a reduced link graph.

But its not just one patent that discusses reduced link graphs. Reduced link graphs were researched outside of Google, too.

A link graph is like a map of the Internet that is created by mapping with links.

In a reduced link graph the low quality links and associated sites are removed.

Whats left is whats called a reduced link graph.

Heres a quote from the above cited Google patent:

A Reduced Link-Graph

Note that the links participating in the k shortest paths from the seeds to the pages constitute a sub-graph that includes all the links that are flow ranked from the seeds.

Although this sub-graph includes much less links than the original link-graph, the k shortest paths from the seeds to each page in this sub-graph have the same lengths as the paths in the original graph.

Furthermore, the rank flow to each page can be backtracked to the nearest k seeds through the paths in this sub-graph.

Its a kind of an obvious thing that Google doesnt trust links from penalized websites.

But sometimes one doesnt know if a site is penalized or flagged as spam by Spam Brain.

Researching to see if a site might not be trusted is a good idea before going through the effort of trying to get a link from a site.

In my opinion, third party metrics should not be used for making business decisions like this because the calculations used to produce a score are hidden.

If a site is already linking to possibly spammy sites that themselves have inbound links from possible paid links like PBNs (private blog networks), then its probably a spam site.

Featured image by Shutterstock/Krakenimages.com

Watch the SEO Office Hours:

Read the original here:
Google On Protecting Anchor Text Signal From Spam Site Influence - Search Engine Journal

Related Posts

Comments are closed.