Error Classes

`DomainResolutionError`

Bases: EthicrawlError

Raised when a domain cannot be resolved through DNS.

This error occurs when a hostname in a URL cannot be resolved to an IP address, typically indicating network connectivity issues or a non-existent domain.

Attributes:

Name	Type	Description
`url`		The URL that was attempted to be accessed
`hostname`		The specific hostname that could not be resolved

`DomainWhitelistError`

Bases: EthicrawlError

Raised when attempting to access a non-whitelisted domain.

This error occurs when a request is made to a domain that differs from the primary bound domain and hasn't been explicitly whitelisted.

Attributes:

Name	Type	Description
`url`		URL that was attempted to be accessed
`bound_domain`		The domain the ethicrawl is bound to

`EthicrawlError`

Bases: Exception

Base exception class for all Ethicrawl-specific errors.

This class serves as the parent for all custom exceptions raised by the Ethicrawl library. By catching this exception type, client code can handle all Ethicrawl-specific errors while allowing other exceptions to propagate normally.

Example

try: ... crawler.get("https://example.com/disallowed") ... except EthicrawlError as e: ... print(f"Ethicrawl error: {e}")

`RobotDisallowedError`

Bases: EthicrawlError

Raised when a URL is disallowed by robots.txt rules.

This exception indicates that the requested URL cannot be accessed because it is explicitly disallowed by the site's robots.txt file.

Attributes:

Name	Type	Description
`url`		The URL that was disallowed
`robot_url`		The URL of the robots.txt file that disallowed access

`SitemapError`

Bases: EthicrawlError

Raised when a sitemap cannot be parsed or processed.

This exception indicates problems with sitemap fetching, parsing, or validation, such as invalid XML or missing required elements.