How to increase your crawl budget and speed up indexing

19 min.

Contents

This article was written by Inna Sidorenko, Senior SEO Specialist at Why SEO Serious.
She explains how the crawl budget works, how to identify if something’s wrong with your site, and what to do so search engines don’t skip important pages.

If you already feel confident with diagnostics and want to move straight to practical steps, jump to the section “Ways to improve crawl budget and indexing.”
There, Inna outlines specific errors and techniques that help bots navigate your site efficiently.

Large websites — marketplaces, aggregators, news platforms — often face a problem: search engines can’t crawl every page on time. This is especially true when a site has hundreds of thousands of URLs, dynamically generated filters, regional sections, or constantly updated content. All of this puts pressure on the crawl budget, a limited resource that search engines allocate for crawling your website. If it runs out, some pages simply never get indexed.

What a crawl budget is and how it works

A crawl budget is the number of pages a search engine bot can and wants to crawl on your site within a specific period of time.

It’s important to understand that a bot may have the capacity to crawl your website — meaning Google allocates certain resources for it — but not always the ability. That part depends on your website’s technical performance.

Imagine this: in theory, Googlebot is ready to crawl 1,000 pages per day. But if your server responds slowly and each page takes three or four seconds to load, the bot will only manage to crawl around 200–300 pages instead of 1,000.

How to identify crawl budget issues

Before fixing anything, you need to understand what’s actually wrong. Below is a list of common situations. If you find at least two of them on your website, it’s a clear sign that your crawl budget is being wasted somewhere.

Problem What it means
“Discovered – currently not indexed” status in Google Search Console The first sign of crawl budget issues. The bot finds your content but doesn’t crawl or index it.
Pages appear in search results with a long delay — after weeks or months The crawl budget is too low, so indexing is slow.
PageSpeed Insights shows loading times above 2–3 seconds The bot spends too much time waiting for responses instead of crawling more pages.
Many 4xx and 5xx errors Each error consumes part of your crawl budget that could have been spent on valuable URLs.
Less than 50–60% of submitted pages are indexed A sign of inefficient crawling and major losses in indexing coverage.

How to measure your current crawl rate

Once you’ve spotted the symptoms, it’s time to dig deeper. Google Search Console gives you only a general view of indexing, but it doesn’t show how exactly bots move through your site. To understand that, you need to look at your server logs — the only reliable source of information about how search bots actually behave.

Log files record details such as:

  • which bots visit your site — Googlebot, Bingbot, and others;
  • which URLs are crawled;
  • which HTTP response codes they receive (200, 301, 404, 500);
  • how long each request takes;
  • how often different sections are accessed.

Log file locations depend on your server setup:

  • Apache: /var/log/apache2/access.log
  • Nginx: /var/log/nginx/access.log
  • Cloudflare or CDN: export data from the control panel.

For analysis, you can use specialised tools:

  • Screaming Frog Log File Analyzer — a desktop tool designed for SEO tasks, suitable for most websites.
  • GoAccess — lightweight and visual, great for quick reports.
  • ELK Stack (Elasticsearch + Logstash + Kibana) — a powerful solution for large-scale and data-heavy projects.

Ways to improve crawl budget and indexing

It’s always easier to prevent indexing issues than to fix them later. But even if things have already gone off track, they can still be fixed. The first step in optimising your crawl budget is to reduce the number of useless pages. Let’s look at how to avoid creating unnecessary ones.

👉 Don’t generate uncontrolled filter pages

On large catalogue websites, it’s easy to end up with tens of thousands of filter-based pages — by brand, colour, size, or their combinations. Most of these pages have no search demand, meaning no one is looking for them. Still, they consume crawl budget.

Create filter-based pages only when there’s real search interest. Before turning a filter into a standalone page:

  • Check the search volume for combinations of filters (for example, “buy Nike sneakers size 44”).
  • Identify commercial keywords with intent modifiers such as buy, price, and store.
  • Keep only those filter pages with real search demand and visible competition in SERPs.

All others should be closed from indexing or crawling. Keep them as parameter-based URLs and block them in robots.txt, while keeping clean URLs (SEO-friendly) for high-demand filters in the index.

👉 Find and remove duplicate category pages

Sometimes identical sections exist under different names but with the same content — for example, “smartphones” and “mobile phones.” Such pages compete with each other in search results and cannibalise traffic.

How to detect and fix duplicates:

  • Collect all category names and URLs based on H1 tags.
  • Cluster the categories. If pages fall into the same cluster, that’s a duplication signal.
  • Keep one main page and remove or redirect the rest.
  • Save the keywords from the removed page — include them in the remaining page’s text and metadata. For instance, if you deleted the “HDD” section and kept “hard drives,” use both terms in your optimisation.

👉 Delete “zero-demand” categories

A large website might contain categories that no one searches for or clicks on. These pages don’t bring traffic, so they’re easy to identify and remove.

How to do it:

  • Collect all category names via H1 headings.
  • Add commercial modifiers such as buy, price, and order.
  • Check keyword frequency in Google Keyword Planner.
  • Group them into clusters. If a cluster has little or no search demand, the category can safely be deleted.

A detailed analysis of your project

We run audits and build practical strategies to help your new pages get indexed fast — not weeks later.

Technical optimisation to increase crawl budget

Once you’ve cleaned up unnecessary pages, it’s time to move on to technical improvements. They directly affect how quickly and deeply a search bot crawls your site. Below are the key areas that can help speed up the process.

Indexing new pages

When you publish a new page or update an existing one, you want it to appear in search as soon as possible. In reality, Google bots might take several days — sometimes even weeks — to get to it. That’s especially painful when the page contains time-sensitive content such as news, promotions, product updates, or listings.

To speed things up, use tools for on-demand indexing.

For Google, that’s the Indexing API, which allows you to send URLs for indexing right after publication.

How it works:

  • Register in Google Cloud Console.
  • Generate an API key.
  • Connect the API to your CMS or set up an automated script to send new URLs.

There’s a limit of 200 requests per day. Google doesn’t guarantee instant indexing, but it significantly accelerates it.

Page loading speed

Search bots operate under strict time limits. If a page takes longer than 2–3 seconds to load, this happens:

  • Googlebot reduces crawl depth and may not reach inner pages.
  • Fewer URLs are crawled per visit.
  • New sections get queued for indexing with delays.

To fix this, regularly check loading speed using PageSpeed Insights or WebPageTest. Optimise scripts and images, and don’t forget about your mobile version, as it often performs worse than desktop, even though most users come from mobile devices.

Broken links

404 and similar errors act as dead ends for search bots. Run regular crawls using tools like Screaming Frog to find and fix them. Restore missing pages where possible. If a page has been removed permanently, return a 410 status code instead of 404 — it signals to Google that the page is gone for good.

Redirect chains

Redirect chains occur when one page leads to another, then another, and so on. Ideally, both users and bots should reach the final page through a single redirect.

Each redirect triggers another HTTP request. If there are too many, the bot spends time on server responses and transitions instead of accessing the actual content. On large websites, a single chain can easily include five to seven redirects — and that wastes crawl budget.

The worst-case scenario is a redirect loop, where the bot gets stuck (A → B → C → A) until it hits its internal limit — usually five to ten redirects — wasting crawl resources entirely.

How to detect problem chains:

  • Check suspicious URLs manually in your browser’s developer tools (Network tab) or with command-line tools like curl.
  • Use SEO crawlers such as Screaming Frog or Netpeak Spider — they automatically detect long chains.
  • Review your log files: if bots keep following redirects, you’ll notice the pattern quickly.

Dynamic content

Websites built with JavaScript frameworks like React, Vue, or Angular often work as single-page applications (SPA). The problem is that when a search bot visits such a site, it only sees the basic HTML skeleton — the actual content is loaded later with JavaScript. Unlike users, bots don’t wait, so they end up seeing an empty page.

The solution is server-side rendering (SSR). With SSR, the HTML page is generated on the server before it’s sent to the browser — and to the bot — in a fully rendered form. This approach:

  • removes the need for bots to execute JavaScript;
  • speeds up indexing;
  • makes it easier for search engines to analyse on-page content and structure.

Robots.txt configuration

The robots.txt file controls which parts of your site search engines can access. If you leave everything open, bots will crawl unnecessary areas — parameterised URLs, filter pages, sorting options, or technical sections.

You should block the following from crawling:

  • URLs with parameters like ?sort=, ?filter=, ?utm=;
  • duplicate pages;
  • system directories such as /admin/, /cart/, /auth/.

👉 And don’t rely only on <meta name="robots" content="noindex">.

While this tag prevents indexing, it doesn’t prevent crawling. Bots will still visit those pages and spend crawl budget, even though nothing gets added to the index. If you want to restrict access entirely, block those URLs in robots.txt.

Sitemap relevance

A sitemap acts as a map for search engines. If it’s outdated, full of broken links, or packed with redirects, bots can easily waste crawl budget on pages that should have been removed or excluded.

Make sure your sitemap includes only live pages returning a 200 status code. Remove URLs that redirect (3xx), return errors (4xx or 5xx), or have been temporarily deleted.

Set up automatic sitemap updates so that:

  • pages returning 404 or 410 are removed within 2–4 weeks;
  • URLs that change are replaced promptly with their new addresses.

👉 Use the lastmod and changefreq attributes correctly.

These fields tell bots that a page has been updated, but they shouldn’t be abused.

  • Add lastmod only when real changes occur — not automatically every day.
  • Include only the date, not the time (format: YYYY-MM-DD).
  • Don’t refresh all lastmod values automatically — it lowers trust.
  • Avoid using changefreq unless you’re sure it’s relevant; search engines rarely rely on it.

Doing this helps bots focus on new and updated pages instead of re-crawling unchanged ones, saving crawl budget for what matters most.

HTTP headers Last-Modified and If-Modified-Since

Search engines regularly revisit known pages to check whether anything has changed. If your site doesn’t clearly indicate that a page remains the same, the bot will reload it each time wasting crawl budget. Properly configured HTTP headers help prevent that.

Here’s how it works:

  • Last-Modified (server → bot)
    • The server tells the bot the date when the page was last updated, for example:
    • Last-Modified: Wed, 21 Feb 2024 14:28:00 GMT
  • If-Modified-Since (bot → server)
    • When the bot returns, it sends this date back, essentially asking: “Has anything changed since then?”
    • If-Modified-Since: Wed, 21 Feb 2024 14:28:00 GMT
  • If the content hasn’t changed, the server responds with 304 Not Modified — no need to send the full page again. The bot skips it and moves on.
    • If there are changes, the server returns 200 OK with the updated content.

Common mistakes:

  • Sending Last-Modified for pages with dynamic AJAX blocks without checking whether the actual content has changed.
  • Using the page generation time instead of the real content update time.
  • Having inconsistent dates in the lastmod field of the sitemap and the HTTP header — this confuses the bot and reduces trust.

Conclusion

A crawl budget isn’t unlimited, and improving it takes consistent, systematic work. To speed up indexing, start with the essentials:

  • Analyse your log files;
  • Clean up unnecessary pages and optimise the site structure;
  • Improve loading speed;
  • Configure your sitemap, robots.txt, and HTTP headers properly.

Get the week's best content first

    By subscribing you agree to with our Privacy Policy.

    Ready to take your business to new heights?

    Fill out the form to discuss your project:

        Contact us

        Please fill out the form below

        and we will get back to you within 24 hours.



        Or get in touch directly: