Skip to content

Context Module

Context

Dependency container for crawler operations providing resource and client access.

Context serves as a dependency injection mechanism that bundles a resource (URL) with a client for making requests. It provides type validation for these dependencies and simplifies passing related objects throughout the system.

The class enables components to operate with a consistent set of dependencies without having to pass multiple parameters or manage connections independently.

Attributes:

Name Type Description
resource Resource

The Resource object representing the current target

client Client

The Client used to make requests

Example

from ethicrawl.core import Resource from ethicrawl.client.http import HttpClient context = Context(Resource("https://example.com"), HttpClient()) response = context.client.get(context.resource) logger = context.logger("robots") logger.info("Processing robots.txt")

client property writable

Get the current client.

Returns:

Type Description
Client

The Client object for this context

resource property writable

Get the current resource.

Returns:

Type Description
Resource

The Resource object for this context

__init__(resource, client=None)

Initialize a Context with a resource and optional client.

Parameters:

Name Type Description Default
resource Resource

The Resource object representing the current URL

required
client Client | None

Optional Client to use for requests. If None, a NoneClient will be used as a placeholder.

None

Raises:

Type Description
TypeError

If resource is not a Resource instance

TypeError

If client is not a Client instance or None

__repr__()

Return an unambiguous string representation of the context.

__str__()

Return a human-readable string representation of the context.

logger(component)

Get a component-specific logger within this context.

Creates a logger associated with the current resource and the specified component name.

Parameters:

Name Type Description Default
component str

Component name for the logger

required

Returns:

Type Description

A Logger instance for the specified component

ContextManager

Manages target contexts for different domain resources.

This class handles the lifecycle of domain contexts, including binding resources to clients, managing robots.txt permissions, and providing access to domain-specific functionality like robots.txt handlers and sitemaps.

bind(resource, client=None)

Bind a resource to a client in this context manager.

Parameters:

Name Type Description Default
resource Resource

The resource to bind

required
client Client | None

The client to use for requests to this resource. If None, uses the default client.

None

Returns:

Name Type Description
bool bool

True if binding was successful

Raises:

Type Description
TypeError

If client is not a Client instance or None

client(resource)

Get the client for a resource's domain.

Parameters:

Name Type Description Default
resource Resource

The resource to get the client for

required

Returns:

Name Type Description
Client Client | None

The client instance for this domain, or None if not found

get(resource, headers=None)

Fetch a resource respecting robots.txt rules.

Parameters:

Name Type Description Default
resource Resource

The resource to fetch

required
headers Headers | None

Optional headers for the request

None

Returns:

Name Type Description
Response Response

The HTTP response from the resource

Raises:

Type Description
RobotDisallowedError

If the request is disallowed by robots.txt

DomainWhitelistError

If the domain is not bound to this context manager

robot(resource)

Get the robot instance for a resource's domain.

Parameters:

Name Type Description Default
resource Resource

The resource to get the robot for

required

Returns:

Name Type Description
Robot Robot

The robot instance for this domain

Raises:

Type Description
DomainWhitelistError

If the domain is not registered

sitemap(resource)

Get the sitemap parser for a resource's domain.

Parameters:

Name Type Description Default
resource Resource

The resource to get the sitemap parser for

required

Returns:

Name Type Description
SitemapParser SitemapParser

The sitemap parser for this domain

Raises:

Type Description
DomainWhitelistError

If the domain is not registered

unbind(resource)

Unbind a resource from this context manager.

Parameters:

Name Type Description Default
resource Resource

The resource to unbind

required

Returns:

Name Type Description
bool bool

True if unbinding was successful

Raises:

Type Description
ValueError

If the resource's domain is not bound