Blog SEO Diagnostic Records: Causes and Optimization List of Nearly 15000 Pages Not Indexed

by 永夜 · 2026/06/10

Blog SEO Log

Google webpage cannot be indexed? I solved it like this

Five search engine site map submissions full record: from Google to Sogou, my practical experience to share

Sogou sitemap permission application record: the whole process from review to submission

Baidu search resource platform new station completes the complete collection of practical records

Technical Blog English Station Index and Low Value Label Processing

Fixed WordPress fatal error: from sitemap error to return to normal

Blog SEO Diagnostic Records: Causes and Optimization List of Nearly 15000 Pages Not Indexed

浏览量： 4

A serious problem was found when reviewing the search engine performance of the blog recently: the website has a large number of pages that are not indexed in Google and Bing, and the number is close to the 15,000 mark. After in-depth investigations in Google Search Console and Bing Webmaster Tools, the core pain points have been positioned, and the subsequent optimization solutions have been sorted out. This is recorded so that it can be progressively progressed at priority.

1. Current status data diagnosis

1. Google Search Console Index Overview

The web indexing report shows stunning data, and pages that are not indexed are mainly composed of the following reasons:

Crawled – not indexed yet:7,975 pages
Discovered – not indexed: 6,947 pages
not found (404):1,019 pages
Server error (5xx): 500 pages
Excluded by ‘noindex’ tag:491 pages
Duplicate web pages, users do not select canonical web pages: 245 pages
Web pages are automatically redirected:60 pages
Others (alternate web page, 4xx shield, soft 404, etc.): a small amount

Crawled - The number of pages not indexed is close to 15000 — Crawled – The number of pages not indexed is close to 15000

core discovery: In the details page of ‘Discovered-Not Indexed’, it is flooded with /en/tag/Chinese label/ URLs of the format (such as /en/tag/download failed/,/en/tag/main version number/,/en/tag/Jingdong/ etc.). This shows that the mixed Chinese label under the English path is the hardest hit area that Google is reluctant to index.

In the "discovered-not indexed index" details page, there are a lot of URLs full of /en/tag/Chinese tags/ formats (such as /en/tag/download failed/, /en/tag/main version number/, /en/tag/Jingdong/ etc.).

2. Bing Webmaster Tools Index Overview

The data of Bing is also not optimistic, and the index of the entire site is as follows:

Indexed: 4.4k
No indexing, may need to be careful: 7.7K
- Among quality of content Question: 4.2K
- Found but not in the index: 2.2K
- Redirect URL: 87
- Unable to crawl content (403, 5xx): 9
- Dead Link (404-410): 2

core discovery: Bing clearly pointed out the ‘content quality’ problem of 4.2k pages, which echoes its performance in Google as ‘crawled – not indexed’, which is essentially a resistance to low-quality aggregate pages.

Bing clearly pointed out that there is a "content quality" problem in 4.2K pages

Second, the cause of the problem

To synthesize the data at both ends, the root causes of a large number of pages that are not indexed are mainly three points:

Language Path Conflict with URL Garbage:/en/ The path should serve English users, but the label is directly copied from the Chinese library, resulting in Chinese escape characters (such as /en/tag/download failed/), not only is the semantic confusion, but also violates the principle of URL normalization.
Tag Synonymous Repeat: There are labels in Chinese that coexist with ‘pipeline’, ‘channel’, ‘category’ and ‘category’.
flood of content: Among the 8636 English labels of the whole station, there are only 1 article under 6000 labels. This type of tab pages highly overlap the content of a single article, which is a typical low-quality automatic generation page, which seriously lowers the trust rating of the entire search engine.

3. Subsequent optimization plan (ordered by priority)

Given the balance of time and energy, I will step by step by step by step:

🔴 P0: Emergency Stop Loss and Cleanup (Execute Now)

Clear empty labels: Delete 85 invalid tags with the number of articles 0 to eliminate the hidden danger of dead chain.
Handling 404 and 5xx errors: Check and fix the more than 1500 404/5xx errors reported by the GSC, avoiding waste of crawling quotas and affecting spider crawling experience.

🟡 P1: Quality control flow (low cost and high yield)

The thin content label will not be deleted for the time being, but add noindex: Considering that the label of a single article may add new content in the future, the 6000 thin content tags will not be deleted for the time being. but at the code levelNumber of articles < 2 tabsautomatic addition <meta name='GoogleBot' content='noindex, follow'> label. This not only retains the page structure, but also clearly informs the search engine that it does not index for the time being, so as to avoid dragging down the quality score of the entire site.

🟢 P2: The underlying architecture refactoring (core and time consuming)

Merge Synonymous Chinese Tags: In the database, the ‘pipe’ is merged into ‘Channel’, ‘Classification’ is merged into ‘Category’, and redundancy is eliminated.
Tag English translation and 301 redirect: Write a script to translate the remaining 3200 pure Chinese labels into English. After the translation is completed,The original /en/tag/Chinese/ Path 301 Redirect to the corresponding new English path(such as /en/tag/download-failed/) to pass weights and prevent new 404 errors.

🔵 P3: Details specification (subsequent iteration)

Handling duplicate web pages: Check and improve the errors of the 245 ‘duplicate web pages, the user did not select the canonical web page’ canonical Label settings.
Check redirect link: Check the 87 redirect URLs of the 60 automatic redirect pages and Bing errors, and ensure that there are no redundant jump chains.

Summary: These 15,000 unindexed pages are not formed overnight, and they also need to go step by step. Stop the bleeding first by cleaning and noindex, and then treat the root cause by translation and 301. Looking forward to the completion of the follow-up optimization, the index volume can usher in a wave of rebound!

Blog SEO Diagnostic Records: Causes and Optimization List of Nearly 15000 Pages Not Indexed

You may also like...

Leave a Reply Cancel reply