Context Module
Context
Dependency container for crawler operations providing resource and client access.
Context serves as a dependency injection mechanism that bundles a resource (URL) with a client for making requests. It provides type validation for these dependencies and simplifies passing related objects throughout the system.
The class enables components to operate with a consistent set of dependencies without having to pass multiple parameters or manage connections independently.
Attributes:
| Name | Type | Description |
|---|---|---|
resource |
Resource
|
The Resource object representing the current target |
client |
Client
|
The Client used to make requests |
Example
from ethicrawl.core import Resource from ethicrawl.client.http import HttpClient context = Context(Resource("https://example.com"), HttpClient()) response = context.client.get(context.resource) logger = context.logger("robots") logger.info("Processing robots.txt")
client
property
writable
resource
property
writable
__init__(resource, client=None)
Initialize a Context with a resource and optional client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Resource
|
The Resource object representing the current URL |
required |
client
|
Client | None
|
Optional Client to use for requests. If None, a NoneClient will be used as a placeholder. |
None
|
Raises:
| Type | Description |
|---|---|
TypeError
|
If resource is not a Resource instance |
TypeError
|
If client is not a Client instance or None |
__repr__()
Return an unambiguous string representation of the context.
__str__()
Return a human-readable string representation of the context.
logger(component)
Get a component-specific logger within this context.
Creates a logger associated with the current resource and the specified component name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
component
|
str
|
Component name for the logger |
required |
Returns:
| Type | Description |
|---|---|
|
A Logger instance for the specified component |
ContextManager
Manages target contexts for different domain resources.
This class handles the lifecycle of domain contexts, including binding resources to clients, managing robots.txt permissions, and providing access to domain-specific functionality like robots.txt handlers and sitemaps.
bind(resource, client=None)
Bind a resource to a client in this context manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Resource
|
The resource to bind |
required |
client
|
Client | None
|
The client to use for requests to this resource. If None, uses the default client. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if binding was successful |
Raises:
| Type | Description |
|---|---|
TypeError
|
If client is not a Client instance or None |
client(resource)
get(resource, headers=None)
Fetch a resource respecting robots.txt rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Resource
|
The resource to fetch |
required |
headers
|
Headers | None
|
Optional headers for the request |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Response |
Response
|
The HTTP response from the resource |
Raises:
| Type | Description |
|---|---|
RobotDisallowedError
|
If the request is disallowed by robots.txt |
DomainWhitelistError
|
If the domain is not bound to this context manager |
robot(resource)
Get the robot instance for a resource's domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Resource
|
The resource to get the robot for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Robot |
Robot
|
The robot instance for this domain |
Raises:
| Type | Description |
|---|---|
DomainWhitelistError
|
If the domain is not registered |
sitemap(resource)
Get the sitemap parser for a resource's domain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Resource
|
The resource to get the sitemap parser for |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SitemapParser |
SitemapParser
|
The sitemap parser for this domain |
Raises:
| Type | Description |
|---|---|
DomainWhitelistError
|
If the domain is not registered |
unbind(resource)
Unbind a resource from this context manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource
|
Resource
|
The resource to unbind |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if unbinding was successful |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the resource's domain is not bound |