

Sitemaps should contain all your valuable URLs. If there are no internal links pointing to a page, Google may think it’s irrelevant.Ĭomparison of URLs found in sitemaps and in a crawl. List of pages with less than x links incoming It’s helpful if a crawler can store crawl data for a long period of time.Ī crawler should inform you when the crawl is done (desktop notification/email). It’s beneficial if a crawler detects changes in robots.txt and informs you. If you deal with large websites, you should be able to see the current status of the crawl.Īccidental changes in robots.txt can lead to an SEO disaster. It’s handy to be able to schedule a crawl and set monthly/weekly crawls.

It’s helpful if you can disallow the crawler from crawling a particular directory or a subdomain. This is a helpful feature if you want to crawl the staging environment of a website. When crawling a very large website, you may want to set a limit to the number of crawled URLs or the crawl depth.Īnalyzing a domain protected by an htaccess login Much like Googlebot, you should be able to adjust your crawl speed according to the server’s response. Some websites may block crawlers, and it’s necessary to change the user agent to be able to crawl them. This feature can help you if you want to perform a quick crawl of a small set of URLs. It’s important to compare the crawls that were done before and after any changes implemented on the website. Having all the detected issues on a single dashboard will not do the job for you, but it can make SEO audits more streamlined. A crawler should support filtering by type. When I view a single report, I may want to add additional columns to get the most out of the data.įiltering URLs by type (HTML, CSS, JS, PDF, etc.)Ĭrawlers visit resources of various types ( HTML, PDF, JPG). Some crawlers offer the possibility to categorize crawled pages (blog, product pages, etc.) and generate reports dedicated to specific categories of pages. It’s common that I want to see the URLs that end with “.html” or contain a product ID.

You may want to see internal links pointing to a particular URL or to see its headers, canonical tags, etc. A crawler should report them.Ī crawler should give you at least basic information on duplicates across your website. If an important page isn’t accessible within a few clicks from a homepage, it may indicate poor website structure.Ī large number of thin pages can negatively affect your SEO efforts. Hreflang tags are the foundation of international SEO, so a crawler should recognize them.Įvery SEO crawler should inform you about the canonical tags to let you spot potential indexing issues.Ĭrawl depth – number of clicks from a homepageĪdditional information about the crawl depth can give you an overview of the structure of your website.
WEBSITE AUDITOR INTERACTIVE GRAPHICAL MAPS SERIES
When you perform an SEO audit, you should analyze if the pagination series are implemented properly. Link rel=”next” (to indicate a pagination series) Seeing an internal nofollow list allows you to make sure there aren’t any mistakes in your internal linking.Ī crawler should allow you to analyze both the internal and external outbound links. “Google looks at the Hx headers to understand the structure of the text on a page better.” – John Mueller (Google) How many URLs are not found (404)? How many URLs are redirected (301)? This helps you make sure that your indexing strategy is properly implemented.Ī crawler should show you a list of pages that have missing title tags. Refer back to this list if anything is unclear down the road. Let me clarify why each feature is important for you and should be a part of a crawling tool, and in what situations they may be particularly useful. 9 Wrapping up Key features of SEO crawlersīefore I delve into the specifics of each crawler, let me explain the features and characteristics of these tools that I considered when testing them and preparing this article.
