Error Classes
DomainResolutionError
Bases: EthicrawlError
Raised when a domain cannot be resolved through DNS.
This error occurs when a hostname in a URL cannot be resolved to an IP address, typically indicating network connectivity issues or a non-existent domain.
Attributes:
| Name | Type | Description |
|---|---|---|
url |
The URL that was attempted to be accessed |
|
hostname |
The specific hostname that could not be resolved |
DomainWhitelistError
Bases: EthicrawlError
Raised when attempting to access a non-whitelisted domain.
This error occurs when a request is made to a domain that differs from the primary bound domain and hasn't been explicitly whitelisted.
Attributes:
| Name | Type | Description |
|---|---|---|
url |
URL that was attempted to be accessed |
|
bound_domain |
The domain the ethicrawl is bound to |
EthicrawlError
Bases: Exception
Base exception class for all Ethicrawl-specific errors.
This class serves as the parent for all custom exceptions raised by the Ethicrawl library. By catching this exception type, client code can handle all Ethicrawl-specific errors while allowing other exceptions to propagate normally.
Example
try: ... crawler.get("https://example.com/disallowed") ... except EthicrawlError as e: ... print(f"Ethicrawl error: {e}")
RobotDisallowedError
Bases: EthicrawlError
Raised when a URL is disallowed by robots.txt rules.
This exception indicates that the requested URL cannot be accessed because it is explicitly disallowed by the site's robots.txt file.
Attributes:
| Name | Type | Description |
|---|---|---|
url |
The URL that was disallowed |
|
robot_url |
The URL of the robots.txt file that disallowed access |
SitemapError
Bases: EthicrawlError
Raised when a sitemap cannot be parsed or processed.
This exception indicates problems with sitemap fetching, parsing, or validation, such as invalid XML or missing required elements.