Skip to content

Error Classes

DomainResolutionError

Bases: EthicrawlError

Raised when a domain cannot be resolved through DNS.

This error occurs when a hostname in a URL cannot be resolved to an IP address, typically indicating network connectivity issues or a non-existent domain.

Attributes:

Name Type Description
url

The URL that was attempted to be accessed

hostname

The specific hostname that could not be resolved

DomainWhitelistError

Bases: EthicrawlError

Raised when attempting to access a non-whitelisted domain.

This error occurs when a request is made to a domain that differs from the primary bound domain and hasn't been explicitly whitelisted.

Attributes:

Name Type Description
url

URL that was attempted to be accessed

bound_domain

The domain the ethicrawl is bound to

EthicrawlError

Bases: Exception

Base exception class for all Ethicrawl-specific errors.

This class serves as the parent for all custom exceptions raised by the Ethicrawl library. By catching this exception type, client code can handle all Ethicrawl-specific errors while allowing other exceptions to propagate normally.

Example

try: ... crawler.get("https://example.com/disallowed") ... except EthicrawlError as e: ... print(f"Ethicrawl error: {e}")

RobotDisallowedError

Bases: EthicrawlError

Raised when a URL is disallowed by robots.txt rules.

This exception indicates that the requested URL cannot be accessed because it is explicitly disallowed by the site's robots.txt file.

Attributes:

Name Type Description
url

The URL that was disallowed

robot_url

The URL of the robots.txt file that disallowed access

SitemapError

Bases: EthicrawlError

Raised when a sitemap cannot be parsed or processed.

This exception indicates problems with sitemap fetching, parsing, or validation, such as invalid XML or missing required elements.