SEO for data and thousands of pages

"Hello! I am in the process of creating a website with the following characteristics: the main content consists of data such as tables, lists, images, some text, and a few charts. There are about 100k entities in the database, with one page dedicated to each entity. Each entity has varying amounts of text to display, ranging from none to thousands of characters. Additionally, each page includes a short text at the beginning and end, with the only variation being the entity name for setting keywords. The content on each page may change by up to 2% daily, localized in two languages, and mostly server-side rendered. Despite good performance and SEO scores when pages are assessed individually, 90% of my content is not indexed in Google search console. I am struggling to address this issue and any help would be greatly appreciated."

There are several potential reasons why only 10% of your website’s pages are being indexed by Google, despite good individual SEO scores:

  • Crawling Budget: With 100,000 pages, Google might not be able to crawl and index all of them efficiently. A large number of pages can quickly exhaust Google’s crawling budget, especially if content updates frequently.
  • Duplicate Content: While each page is dedicated to a unique entity, the use of similar starting and ending text could lead to Google identifying the content as duplicate.
  • Dynamic Content: Server-side rendering with frequent content updates can make it difficult for Google’s crawlers to consistently access and index the latest versions of your pages.
  • Structured Data: While you mention images, charts, and tables, are you using structured data markup to help Google understand the context and content of this data?
  • Internal Linking: Having a well-structured internal linking strategy is crucial for Google to discover and navigate your website. Without proper linking, many pages may remain undiscovered.

Recommendations to address these issues:

  • Prioritize Content: Focus on creating high-quality, unique, and relevant content for your most important pages. This will help Google identify your website as valuable and prioritize crawling it.
  • Optimize for Crawling: Implement a sitemap (both XML and HTML) to guide Google’s crawlers to the most important pages. Use robots.txt to exclude irrelevant pages and ensure your pages are crawlable.
  • Structured Data: Use schema.org vocabulary to mark up your data, images, charts, and tables. This will help Google understand the context and value of your content, leading to better indexing and visibility.
  • Reduce Duplicate Content: While the entity name is the only variable in your page structure, use unique descriptions for each page. This will help Google identify each page as unique and avoid penalizing you for duplicate content.
  • Internal Linking: Develop a strategic internal linking structure that connects relevant pages and guides Google to your most important content.
  • Consider Google Search Console: Use the “Fetch as Google” tool to analyze how Google sees your pages and troubleshoot any rendering issues.
  • Content Updates: If your content changes frequently, use a tool like Google Search Console’s “URL Inspection” to notify Google about the updated pages and ensure they get crawled and indexed quickly.

By implementing these strategies, you can significantly increase the chances of getting your website indexed and improve your overall SEO performance.