Search engines like Google use entities to identify subjects a writer discusses in an article. Named entities refer to specific people, places, and businesses. For example, Revolut and Monzo are named entities because they are the names of specific challenger banks. But the terms challenger bank and neobank are not named entities because these terms can refer to a wide variety of mobile banking apps.
The use of named entities also shows that a writer understands the subject of the article. For example, a low-end SEO agency might hire a generalist writer to write about neobanks and challenger banks. But if the writer doesn’t know what these financial institutions are, or what they do, the writer might just repeat the keyphrases neobank and challenger bank over and over again, along with similar keyphrases. If the writer mentions specific neobanks like Varo and Chime instead, that provides evidence that the writer is actively following the industry.
The concept of thin content might help here. When I was writing for Demand Studios 10 years ago, sometimes an editor sent an article back for a rewrite because it didn’t contain enough useful information. The editor would say that the article was thin content. It’s hard to define the concept of thin content, but I have noticed that lower-quality articles often don’t mention as many named entities. Another way to think about it is that the article contains a lot of fluff instead of usable information, so it has less information density. And Google doesn’t want to show readers thin content or fluff in the search results either, so it’s been developing ways to rank articles that use metrics other than keywords or backlinks.
Keywords and backlinks are still important, of course, but Google’s also been using entities to analyze articles. To do this, Google compares the entities mentioned in the articles to the entities stored in a database. I examined the Python code for entity analysis in Google’s cloud natural language API and learned that the search engine uses Wikipedia and the Google Knowledge Graph as databases for relevant entities. Entities that are mentioned in those databases are more likely to be notable. That factor may not be fair to small startups but Wikipedia has similar rules about notability for company pages.
So I decided to make a Python script that identified named entities myself. Instead of using the API, I used the Python library TextBlob to detect proper nouns. These nouns are one type of named entity, although phone numbers and addresses are also considered named entities. The script counts the number of proper nouns in an article and then divides this count by the word count of the article, creating a metric called entity density.
After creating the Python script and testing it on a few websites, it did appear that articles with higher entity density were higher-quality articles. I also examined those articles with the SEO tool Ahrefs. And it appeared that articles with higher entity density were more likely to rank for keyphrases, so they were more likely to appear in the Google search results. I then uploaded the entity analysis script to my Gitlab profile.
I updated the script again after that because of an issue I observed. The original script just counted proper nouns as a stand-in for entities. As a result, an article would have a high entity density if it repeated the name of the same company many times. And I felt that the script could be improved further. An article that mentions the challenger banks Monzo, Revolut, Varo, and Chime could be higher quality than one that just discusses Revolut, for example.
The second version of the script counts unique proper pronouns or unique named entities. And it also reports a new metric, unique entity density. Like the original metric, this is calculated by dividing the number of unique named entities in the article by the word count in the article. And it’s also designed to measure the information density in the article. My hypothesis is that if a fintech article has a higher unique entity density, it will be a better and more informative article and it will rank for more keywords.
Although Google does analyze entities, it also monitors other metrics such as user behavior. My hypothesis is that readers who see articles with higher entity density will be more likely to read them all the way through, more likely to share them with friends, and more likely to visit the websites that posted them again. So this concept isn’t just about gaming SEO metrics, it’s also a signal of content quality. And that signal could be hard to fake if your SEO tool just shows you keyphrases and backlinks and doesn’t list entities. More advanced SEO tools like Frase can create content briefs by scraping the information in the top-ranking articles and telling writers what topics to discuss, though.
I ran a few tests and it did appear that articles that mentioned more personal pronouns were more likely to rank for keywords in Google. I also thought that those articles were higher quality than similar articles by the same authors on the same websites, and came to the same conclusion after testing the entity analysis script on my own website. So I decided to run a larger-scale experiment on fintech blogs to see if this concept made the fintechs’ articles more likely to appear in Google search results. That analysis will appear in a second article so readers who are already familiar with the concepts discussed in this one won’t have to read lots of additional text to view the results.