Google does not index every site on the internet.
The year-over-year percentage of the web that is indexed continues to decline.
With nearly 2 billion websites on the internet, it would prove excessively costly and make little business sense to index every possible page the internet has to offer.
Especially when nearly 10% of internet sites are scams at worst and spammy at best.
For the web’s most competitive search terms, most users still don’t scroll past the first page on Google.
Would it even make sense to have more than 100 indexed little blue links for a single query?
Not really, especially when 90% of clicks will go to the first few results above the fold.
Hence, the latest trend in Google’s arsenal is the deindexation of low-quality, low-discoverability content.
Here’s what that looks like in Google Search Console (GSC) under Indexing >> Pages:
Why Are More Pages Getting Deindexed by Google?
In an effort to fight spam, search engines are cracking down on:
- Low-quality content
- Content created by artificial intelligence (AI) tools like Jasper.ai and Copy.ai
- Content that is difficult to discover
- Repurposed domains used solely for link farming
- Massive, low-quality content on sites with low domain authority
The latest trend to scale up content using AI (without much human intervention), reminds me of a brilliant statement made by Syndrome from The Incredibles:
The latest Helpful Content Update (or the Unhelpful Content Demotion as we like to call it) and SpamBrain update are the perfect combos to help fight what Google sees as a massive garbling of internet content devoid of true quality inputs to encourage active human consumption.
In other words, they’re fighting content created for search engines and not for humans.
The ranking algorithm’s new regime is especially deleterious to marketers who engage in heavy link building campaigns.
As Google and other search engines continue to deindex pages they deem irrelevant or spammy, the link building efforts of even some high-profile sites are likely to see a negative downward pressure on some sites’ rankings.
Think about how it would look in practice:
A site with hundreds of low-quality backlinks suddenly experiences a massive deindexation of a large percentage of their linkgraph, pushing their rankings down.
Adding insult to injury, pages in similar neighborhoods as other deindexed pages are more likely to also experience similar deindexation issues as their content is marked as irrelevant and bumped out of the index.
The good news is that Google is also kind enough to show you where your deficiencies lie, so you can adapt accordingly.
Examples from GSC include:
- Page with redirect
- Excluded by ‘noindex’ tag
- Not found (404)
- Alternative page with proper canonical tag
- Duplicate without user-selected canonical
- Blocked due to other 4xx issue
- Blocked due to access forbidden (403)
- Crawled- currently not indexed
- Discovered – currently not indexed
- Server error (5xx)
Most of the issues listed above are simple fixes for a knowledgeable webmaster.
The two bolded are a bit more nuanced and worth further discussion.
How to Solve Page Deindexation Issues
While the chart above might get your cortisol levels rising, there are solutions to both avoiding deindexation and reversing the tide of existing deindexation across your website.
Crawled – not indexed vs. Discovered – not indexed
When you submit a brand new sitemap to Google, the webcrawlers may take some time before they crawl your sitemap.
When your sitemap is crawled, the search spiders will notice pages to be crawled, but they will likely only crawl a small portion of your overall sitemap with each subsequent pass.
A site can have pages that are in a sitemap (which is a clear indication a webmaster would prefer to have them crawled and indexed in search), but that doesn’t necessarily mean that:
- Search engine crawlers will crawl at all OR
- Search engines will be indexed once the page has been crawled
The following two graphs are from the same site listed above, just for continuity.
Even when the search engines DO crawl, they may choose not to index your pages:
GSC will show you exactly which pages have been crawled, but not indexed.
Upgrading the Quality of Your Content
Any solution always starts with the quality of your content assets.
You may need a content quality audit which could use one or more of the following tools:
- SEOToolLab (Cora Software)
While some tools are meant to be AI plagiarism checkers, others are better at finding what keywords may be missing in the body of your text that would make them comparable and uniquely created.
Cora and Surfer do a great job of providing prompts for inputting the right LSI (latent semantic indexing) and entity keywords that will be critical for a particular page to be statistically on par with other pages in top positions.
Furthermore, these and other software tools can show you other ways in which your page may be deficient compared to higher-ranking competitors.
We outline the most statistically significant factors here.
But the most important on-site factors to consider are:
- Comparable word count
- LSI & entity terms in body
- Total H1 to H6 tags
- Keywords in H1 to H6 tags
- Overall keyword density
- Partial keyword density
- Page structure (e.g. paragraph elements, image elements, etc.)
- Page load speed
If you have been using an AI tool to help generate pages, you will need to perform some human upgrades, but not necessarily a full rewrite of the content itself.
Robots (and even people) can tell when content has been created using GPT-3 and other natural language processing or machine learning tools.
Machine learning algorithms may not be able to discover the internet’s ghostwriting problems, but they can discover AI-generated content.
A good content editor or editing team can also be incredibly helpful in upgrading your content, making edits, and otherwise improving.
Rescan in Google Search Console
There is a fun manual scan tool in Google Search Console
When you are done upgrading and updating your content, be sure to show the precise URL as a prompt in GSC for Google to add to the priority crawl queue.
While your sitemap will post any newly updated items to the top of the list when bots come to recrawl the site, this is more likely to speed up the time to get a recrawl and reassessment of the quality of your content.
Your content needs to be discoverable to search engine crawlers.
If you’re not giving them the right signals, you will continue to have pages that are not indexed or fall out of the index.
Build more links using relevant anchor text to the most important pages of your site, particularly those you want to have discovered, crawled, indexed and ranked. It’s that simple.
Just because a page or post is in your sitemap, doesn’t mean search engines feel it’s worth ranking. You need to show them what is important.
Promote Your Content
Get the word out to industry-specific influencers, writers, and promoters about the value of your content and your site.
Notice this heading did not say, “go out and build links.”
While link building can be helpful, it’s also a double-edged sword when it comes to promotion, and the power of links continues to wane.
Yes, link building is a subset of your promotion efforts, but you should remain very picky about where and how your inbound links are acquired.
Improve the Value of Content Assets
The question you ultimately need to ask is, “is my content truly an asset, or is it a liability?”
In some cases, and depending on search intent, long-form skyscraper content is completely dead.
Focus on answering a specific query, solve a real-life problem for someone’s online search, and do it better than your competitors and you will win traffic.
If you want to rank for competitive terms, you will need to work page by painstaking page.
- How Implementing Agile Marketing Solutions Can Improve Your Marketing ROI - January 19, 2023
- 7 Forgotten Strategies for Running Successful Print Ads - January 11, 2023
- How to Draft the Perfect Response to Your Enterprise Digital Marketing RFP - January 5, 2023