DesyncClient
The DesyncClient class provides a high-level interface to the Desync Search API, managing individual searches, bulk operations, domain crawling, and credit balance checks. Below are its primary methods, each listed with a short heading. The full method signature appears under each heading in a code block.
init()
Signature:
| Python | |
|---|---|
Description:
Initializes the client with the provided API key or reads it from the DESYNC_API_KEY environment variable. If developer_mode is True, the client uses a test endpoint; otherwise, it uses the production endpoint.
Parameters:
- user_api_key (str, optional): Your Desync API key.
- developer_mode (bool, optional): Toggles between test and production endpoints.
Example:
| Python | |
|---|---|
search()
Signature:
| Python | |
|---|---|
Description:
Performs a single search on a specified URL. Returns a PageData object containing the page’s text, links, timestamps, and other metadata.
Parameters:
- url (str): The URL to scrape.
- search_type (str):
"stealth_search"(default, 10 credits) or"test_search"(1 credit). - scrape_full_html (bool): If
True, returns the full HTML content. - remove_link_duplicates (bool): If
True, removes duplicate links from the results.
Example:
bulk_search()
Signature:
Description:
Initiates an asynchronous bulk search on up to 1000 URLs at once. Returns a dictionary containing a bulk_search_id and other metadata.
Parameters:
- target_list (list[str]): List of URLs to process.
- extract_html (bool): If
True, includes the full HTML content in results.
Example:
| Python | |
|---|---|
list_available()
Signature:
Description:
Retrieves minimal data about previously collected search results (IDs, domains, timestamps, etc.). Returns a list of PageData objects with limited fields.
Parameters:
- url_list (list[str], optional): Filters results by specific URLs.
- bulk_search_id (str, optional): Filters results by a particular bulk search ID.
Example:
| Python | |
|---|---|
pull_data()
Signature:
| Python | |
|---|---|
Description:
Retrieves full data (including text and optional HTML content) for one or more records matching the provided filters. Returns a list of PageData objects.
Parameters:
Any combination of filters like record_id, url, domain, timestamp, or bulk_search_id.
Example:
| Python | |
|---|---|
pull_credits_balance()
Signature:
| Python | |
|---|---|
Description:
Checks the user’s current credit balance and returns it as a dictionary.
Example:
collect_results()
Signature:
| Python | |
|---|---|
Description:
Polls periodically for bulk search completion until a specified fraction of pages are done or a maximum wait time elapses, then retrieves full data. Returns a list of PageData objects.
Parameters:
- bulk_search_id (str): The unique identifier for the bulk search.
- target_links (list[str]): The list of URLs in the bulk job.
- wait_time (float): Maximum polling duration in seconds.
- poll_interval (float): Interval between status checks.
- completion_fraction (float): Fraction of completed results needed to stop polling.
Example:
| Python | |
|---|---|
simple_bulk_search()
Signature:
| Python | |
|---|---|
Description:
Splits a large list of URLs into chunks (up to 1000 URLs each), initiates a bulk search for each chunk, then collects and aggregates the results.
Parameters:
- target_list (list[str]): URLs to be processed, possibly more than 1000.
- extract_html (bool): If
True, includes the full HTML content. - poll_interval (float): Polling interval in seconds.
- wait_time (float): Maximum wait time in seconds per chunk.
- completion_fraction (float): Fraction of completed links needed to stop polling each chunk.
Example:
| Python | |
|---|---|
crawl()
Signature:
| Python | |
|---|---|
Description:
Recursively crawls the specified start_url up to max_depth levels. Performs a stealth search on the start URL, collects same-domain links, and uses bulk searches to fetch pages at each depth.
Parameters:
- start_url (str): Initial URL to crawl.
- max_depth (int): Maximum crawl depth.
- scrape_full_html (bool): If
True, includes the full HTML. - remove_link_duplicates (bool): If
True, removes duplicate links. - poll_interval (float): Polling interval in seconds.
- wait_time_per_depth (float): Maximum wait time in seconds per depth.
- completion_fraction (float): Fraction of completed links required to move to the next depth.
Example:
| Python | |
|---|---|
_post_and_parse()
Signature:
| Python | |
|---|---|
Description:
An internal helper method that sends the given payload to the API, parses the JSON response, and raises an error if the request fails.