Go back - See all tools

Robots.txt Bulk Check API

Welcome to the Robots.txt Bulk Check API documentation! Before we delve into the intricacies of how this API works, let's lay the groundwork by understanding what a robots.txt file is and why it plays a critical role in web development and SEO.


A robots.txt file is a standard used by websites to communicate with web crawlers and other automated agents. It's used to indicate which areas of a website these agents are allowed or disallowed from accessing and interacting with. These rules help website administrators control the behaviour of web crawlers and ensure that they index the site in a way that is conducive to the site's SEO strategy.


Now, onto what our API does. The Robots.txt Bulk Check API is designed to facilitate the process of validating multiple URLs against the robots.txt file of a given host. It helps you ascertain whether certain URLs are allowed or disallowed by the robots.txt file of a host, thus automating and simplifying a task that can be quite cumbersome when done manually, especially when dealing with a large number of URLs.


Here are the unique features and benefits of utilizing this API:


  1. Bulk Checking: Validate up to 1,000 URLs in a single API call, saving time and resources.
  2. Quick Response Times: Receive responses often in less than 500 milliseconds, allowing for seamless integration into your workflows.
  3. User-Agent Optimization: The API leverages a specific user-agent to maximize successful crawls, adhering to the rules set for Googlebot or wildcard (*) user-agent in the robots.txt file.
  4. Ease of Use: With a straightforward request payload structure, the API is user-friendly, even for those less acquainted with POST endpoints.

This documentation will guide you through the features, limitations, and how to effectively use the API to its fullest potential. Let's get started!


The API can be accessed via the endpoint /api/bulk-robots-txt/v1 found within this URL: https://tools.estevecastells.com/api/bulk-robots-txt/v1


Features:



This is an example of the payload of a request that you could do: {"robots_txt_url":"https://tools.estevecastells.com/robots.txt","links":["https://tools.estevecastells.com/ping-sitemaps","https://tools.estevecastells.com/xml-sitemap-analyzer","https://tools.estevecastells.com/google-kg-api-exporter","https://tools.estevecastells.com/merge-csv","https://tools.estevecastells.com/combination-tool","https://tools.estevecastells.com/cat-name-api","https://tools.estevecastells.com/remove-image-metadata"]}


As you can see it's a pretty straightforward API where you have on one side the "robots_txt_url" where you need to send the robots.txt URL of the given host, and on the other hand the links part where you can pass as many as 1,000 URLs to be analysed.


Limitations:



Results:


At the moment the API can return two different types of succesful response results. Here is what a typical request response looks like:


{ "https://tools.estevecastells.com/": true, "https://tools.estevecastells.com/disallowed/url.html": false }

Examples:


Here are some examples on how you can call the API to get started:


curl:


curl -X POST "https://tools.estevecastells.com/api/bulk-robots-txt/v1" \
-H "Content-Type: application/json" \
-d "{\"robots_txt_url\": \"https://tools.estevecastells.com/robots.txt\", \"links\": [\"https://tools.estevecastells.com/ping-sitemaps\", \"https://tools.estevecastells.com/xml-sitemap-analyzer\"]}"

Python:


import requests; url = "https://tools.estevecastells.com/api/bulk-robots-txt/v1"; payload = {"robots_txt_url": "https://tools.estevecastells.com/robots.txt", "links": ["https://tools.estevecastells.com/ping-sitemaps", "https://tools.estevecastells.com/xml-sitemap-analyzer"]}; response = requests.post(url, json=payload); print(response.json())

For now, that's all. Future improvements might make the API support multiple robots.txt parses in one request, allowing for custom user-agent parsing, etc. But for now it will remain as is. In any case any improvements will be made backwards-compatible so you never need to worry about the API breaking. You can reach out by clicking my name in the future for any questions or suggestions.