|
| 1 | +--- |
| 2 | +title: Bright Data |
| 3 | +description: Scrape websites, search engines, and extract structured data |
| 4 | +--- |
| 5 | + |
| 6 | +import { BlockInfoCard } from "@/components/ui/block-info-card" |
| 7 | + |
| 8 | +<BlockInfoCard |
| 9 | + type="brightdata" |
| 10 | + color="#FFFFFF" |
| 11 | +/> |
| 12 | + |
| 13 | +## Usage Instructions |
| 14 | + |
| 15 | +Integrate Bright Data into the workflow. Scrape any URL with Web Unlocker, search Google and other engines with SERP API, discover web content ranked by intent, or trigger pre-built scrapers for structured data extraction. |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +## Tools |
| 20 | + |
| 21 | +### `brightdata_scrape_url` |
| 22 | + |
| 23 | +Fetch content from any URL using Bright Data Web Unlocker. Bypasses anti-bot protections, CAPTCHAs, and IP blocks automatically. |
| 24 | + |
| 25 | +#### Input |
| 26 | + |
| 27 | +| Parameter | Type | Required | Description | |
| 28 | +| --------- | ---- | -------- | ----------- | |
| 29 | +| `apiKey` | string | Yes | Bright Data API token | |
| 30 | +| `zone` | string | Yes | Web Unlocker zone name from your Bright Data dashboard \(e.g., "web_unlocker1"\) | |
| 31 | +| `url` | string | Yes | The URL to scrape \(e.g., "https://example.com/page"\) | |
| 32 | +| `format` | string | No | Response format: "raw" for HTML or "json" for parsed content. Defaults to "raw" | |
| 33 | +| `country` | string | No | Two-letter country code for geo-targeting \(e.g., "us", "gb"\) | |
| 34 | + |
| 35 | +#### Output |
| 36 | + |
| 37 | +| Parameter | Type | Description | |
| 38 | +| --------- | ---- | ----------- | |
| 39 | +| `content` | string | The scraped page content \(HTML or JSON depending on format\) | |
| 40 | +| `url` | string | The URL that was scraped | |
| 41 | +| `statusCode` | number | HTTP status code of the response | |
| 42 | + |
| 43 | +### `brightdata_serp_search` |
| 44 | + |
| 45 | +Search Google, Bing, DuckDuckGo, or Yandex and get structured search results using Bright Data SERP API. |
| 46 | + |
| 47 | +#### Input |
| 48 | + |
| 49 | +| Parameter | Type | Required | Description | |
| 50 | +| --------- | ---- | -------- | ----------- | |
| 51 | +| `apiKey` | string | Yes | Bright Data API token | |
| 52 | +| `zone` | string | Yes | SERP API zone name from your Bright Data dashboard \(e.g., "serp_api1"\) | |
| 53 | +| `query` | string | Yes | The search query \(e.g., "best project management tools"\) | |
| 54 | +| `searchEngine` | string | No | Search engine to use: "google", "bing", "duckduckgo", or "yandex". Defaults to "google" | |
| 55 | +| `country` | string | No | Two-letter country code for localized results \(e.g., "us", "gb"\) | |
| 56 | +| `language` | string | No | Two-letter language code \(e.g., "en", "es"\) | |
| 57 | +| `numResults` | number | No | Number of results to return \(e.g., 10, 20\). Defaults to 10 | |
| 58 | + |
| 59 | +#### Output |
| 60 | + |
| 61 | +| Parameter | Type | Description | |
| 62 | +| --------- | ---- | ----------- | |
| 63 | +| `results` | array | Array of search results | |
| 64 | +| ↳ `title` | string | Title of the search result | |
| 65 | +| ↳ `url` | string | URL of the search result | |
| 66 | +| ↳ `description` | string | Snippet or description of the result | |
| 67 | +| ↳ `rank` | number | Position in search results | |
| 68 | +| `query` | string | The search query that was executed | |
| 69 | +| `searchEngine` | string | The search engine that was used | |
| 70 | + |
| 71 | +### `brightdata_discover` |
| 72 | + |
| 73 | +AI-powered web discovery that finds and ranks results by intent. Returns up to 1,000 results with optional cleaned page content for RAG and verification. |
| 74 | + |
| 75 | +#### Input |
| 76 | + |
| 77 | +| Parameter | Type | Required | Description | |
| 78 | +| --------- | ---- | -------- | ----------- | |
| 79 | +| `apiKey` | string | Yes | Bright Data API token | |
| 80 | +| `query` | string | Yes | The search query \(e.g., "competitor pricing changes enterprise plan"\) | |
| 81 | +| `numResults` | number | No | Number of results to return, up to 1000. Defaults to 10 | |
| 82 | +| `intent` | string | No | Describes what the agent is trying to accomplish, used to rank results by relevance \(e.g., "find official pricing pages and change notes"\) | |
| 83 | +| `includeContent` | boolean | No | Whether to include cleaned page content in results | |
| 84 | +| `format` | string | No | Response format: "json" or "markdown". Defaults to "json" | |
| 85 | +| `language` | string | No | Search language code \(e.g., "en", "es", "fr"\). Defaults to "en" | |
| 86 | +| `country` | string | No | Two-letter ISO country code for localized results \(e.g., "us", "gb"\) | |
| 87 | + |
| 88 | +#### Output |
| 89 | + |
| 90 | +| Parameter | Type | Description | |
| 91 | +| --------- | ---- | ----------- | |
| 92 | +| `results` | array | Array of discovered web results ranked by intent relevance | |
| 93 | +| ↳ `url` | string | URL of the discovered page | |
| 94 | +| ↳ `title` | string | Page title | |
| 95 | +| ↳ `description` | string | Page description or snippet | |
| 96 | +| ↳ `relevanceScore` | number | AI-calculated relevance score for intent-based ranking | |
| 97 | +| ↳ `content` | string | Cleaned page content in the requested format \(when includeContent is true\) | |
| 98 | +| `query` | string | The search query that was executed | |
| 99 | +| `totalResults` | number | Total number of results returned | |
| 100 | + |
| 101 | +### `brightdata_sync_scrape` |
| 102 | + |
| 103 | +Scrape URLs synchronously using a Bright Data pre-built scraper and get structured results directly. Supports up to 20 URLs with a 1-minute timeout. |
| 104 | + |
| 105 | +#### Input |
| 106 | + |
| 107 | +| Parameter | Type | Required | Description | |
| 108 | +| --------- | ---- | -------- | ----------- | |
| 109 | +| `apiKey` | string | Yes | Bright Data API token | |
| 110 | +| `datasetId` | string | Yes | Dataset scraper ID from your Bright Data dashboard \(e.g., "gd_l1viktl72bvl7bjuj0"\) | |
| 111 | +| `urls` | string | Yes | JSON array of URL objects to scrape, up to 20 \(e.g., \[\{"url": "https://example.com/product"\}\]\) | |
| 112 | +| `format` | string | No | Output format: "json", "ndjson", or "csv". Defaults to "json" | |
| 113 | +| `includeErrors` | boolean | No | Whether to include error reports in results | |
| 114 | + |
| 115 | +#### Output |
| 116 | + |
| 117 | +| Parameter | Type | Description | |
| 118 | +| --------- | ---- | ----------- | |
| 119 | +| `data` | array | Array of scraped result objects with fields specific to the dataset scraper used | |
| 120 | +| `snapshotId` | string | Snapshot ID returned if the request exceeded the 1-minute timeout and switched to async processing | |
| 121 | +| `isAsync` | boolean | Whether the request fell back to async mode \(true means use snapshot ID to retrieve results\) | |
| 122 | + |
| 123 | +### `brightdata_scrape_dataset` |
| 124 | + |
| 125 | +Trigger a Bright Data pre-built scraper to extract structured data from URLs. Supports 660+ scrapers for platforms like Amazon, LinkedIn, Instagram, and more. |
| 126 | + |
| 127 | +#### Input |
| 128 | + |
| 129 | +| Parameter | Type | Required | Description | |
| 130 | +| --------- | ---- | -------- | ----------- | |
| 131 | +| `apiKey` | string | Yes | Bright Data API token | |
| 132 | +| `datasetId` | string | Yes | Dataset scraper ID from your Bright Data dashboard \(e.g., "gd_l1viktl72bvl7bjuj0"\) | |
| 133 | +| `urls` | string | Yes | JSON array of URL objects to scrape \(e.g., \[\{"url": "https://example.com/product"\}\]\) | |
| 134 | +| `format` | string | No | Output format: "json" or "csv". Defaults to "json" | |
| 135 | + |
| 136 | +#### Output |
| 137 | + |
| 138 | +| Parameter | Type | Description | |
| 139 | +| --------- | ---- | ----------- | |
| 140 | +| `snapshotId` | string | The snapshot ID to retrieve results later | |
| 141 | +| `status` | string | Status of the scraping job \(e.g., "triggered", "running"\) | |
| 142 | + |
| 143 | +### `brightdata_snapshot_status` |
| 144 | + |
| 145 | +Check the progress of an async Bright Data scraping job. Returns status: starting, running, ready, or failed. |
| 146 | + |
| 147 | +#### Input |
| 148 | + |
| 149 | +| Parameter | Type | Required | Description | |
| 150 | +| --------- | ---- | -------- | ----------- | |
| 151 | +| `apiKey` | string | Yes | Bright Data API token | |
| 152 | +| `snapshotId` | string | Yes | The snapshot ID returned when the collection was triggered \(e.g., "s_m4x7enmven8djfqak"\) | |
| 153 | + |
| 154 | +#### Output |
| 155 | + |
| 156 | +| Parameter | Type | Description | |
| 157 | +| --------- | ---- | ----------- | |
| 158 | +| `snapshotId` | string | The snapshot ID that was queried | |
| 159 | +| `datasetId` | string | The dataset ID associated with this snapshot | |
| 160 | +| `status` | string | Current status of the snapshot: "starting", "running", "ready", or "failed" | |
| 161 | + |
| 162 | +### `brightdata_download_snapshot` |
| 163 | + |
| 164 | +Download the results of a completed Bright Data scraping job using its snapshot ID. The snapshot must have ready status. |
| 165 | + |
| 166 | +#### Input |
| 167 | + |
| 168 | +| Parameter | Type | Required | Description | |
| 169 | +| --------- | ---- | -------- | ----------- | |
| 170 | +| `apiKey` | string | Yes | Bright Data API token | |
| 171 | +| `snapshotId` | string | Yes | The snapshot ID returned when the collection was triggered \(e.g., "s_m4x7enmven8djfqak"\) | |
| 172 | +| `format` | string | No | Output format: "json", "ndjson", "jsonl", or "csv". Defaults to "json" | |
| 173 | +| `compress` | boolean | No | Whether to compress the results | |
| 174 | + |
| 175 | +#### Output |
| 176 | + |
| 177 | +| Parameter | Type | Description | |
| 178 | +| --------- | ---- | ----------- | |
| 179 | +| `data` | array | Array of scraped result records | |
| 180 | +| `format` | string | The content type of the downloaded data | |
| 181 | +| `snapshotId` | string | The snapshot ID that was downloaded | |
| 182 | + |
| 183 | +### `brightdata_cancel_snapshot` |
| 184 | + |
| 185 | +Cancel an active Bright Data scraping job using its snapshot ID. Terminates data collection in progress. |
| 186 | + |
| 187 | +#### Input |
| 188 | + |
| 189 | +| Parameter | Type | Required | Description | |
| 190 | +| --------- | ---- | -------- | ----------- | |
| 191 | +| `apiKey` | string | Yes | Bright Data API token | |
| 192 | +| `snapshotId` | string | Yes | The snapshot ID of the collection to cancel \(e.g., "s_m4x7enmven8djfqak"\) | |
| 193 | + |
| 194 | +#### Output |
| 195 | + |
| 196 | +| Parameter | Type | Description | |
| 197 | +| --------- | ---- | ----------- | |
| 198 | +| `snapshotId` | string | The snapshot ID that was cancelled | |
| 199 | +| `cancelled` | boolean | Whether the cancellation was successful | |
| 200 | + |
| 201 | + |
0 commit comments