Base URL: https://getobj.com
GetObj is a web crawling service that extracts content from websites using a headless browser. It supports various tasks like extracting links, downloading files, counting words, and converting pages to markdown.
Currently, no authentication is required. All endpoints are publicly accessible.
No rate limits currently enforced. Please use responsibly.
Check if the service is running.
Endpoint: GET /api/health
Response:
{
"status": "ok",
"timestamp": "2025-11-25T10:41:19.669Z"
}
Example:
curl https://getobj.com/api/health
Crawl a webpage and extract content or perform specific tasks.
Endpoint: POST /api/crawl
Headers:
Content-Type: application/jsonRequest Body:
{
"url": "https://example.com",
"instruction": "extract links", // optional
"format": "markdown" // optional: "html", "markdown", "both"
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The URL to crawl (must be valid URL) |
instruction |
string | No | Task to perform (see available tasks below) |
format |
string | No | Output format: html, markdown, or both. Default: markdown |
Response (Default - No Instruction):
{
"url": "https://example.com",
"content": {
"markdown": "# Page Title\n\nContent here...",
"text": "Plain text content..."
},
"metadata": {
"title": "Example Domain",
"statusCode": 200,
"contentType": "text/html",
"timestamp": "2025-11-25T10:39:10.160Z",
"executionTime": 6935
}
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"format": "markdown"
}'
Same as POST but using query parameters.
Endpoint: GET /api/crawl
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The URL to crawl |
instruction |
string | No | Task to perform |
format |
string | No | Output format |
Example:
curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"
Use these instructions in the instruction parameter:
Extract all hyperlinks from the page.
Instruction: extract links
Response:
{
"url": "https://example.com",
"instruction": "extract links",
"result": {
"task": "extract_links",
"count": 1,
"links": [
{
"href": "https://iana.org/domains/example",
"text": "Learn more",
"title": ""
}
]
},
"metadata": { ... }
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "extract links"}'
Count occurrences of a specific word (case-insensitive by default).
Instruction: count word "WORD"
Response:
{
"url": "https://example.com",
"instruction": "count word \"example\"",
"result": {
"task": "count_word",
"word": "example",
"count": 2,
"caseSensitive": false
},
"metadata": { ... }
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "count word \"example\""}'
Download all file attachments from the page.
Instruction: download files
Response:
{
"url": "https://example.com",
"instruction": "download files",
"result": {
"task": "download_files",
"filesFound": 9,
"downloads": [
{
"filename": "document1.pdf",
"path": "/app/downloads/document1.pdf",
"size": "2.5 MB"
}
],
"totalSize": "19 MB"
},
"metadata": { ... }
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "download files"}'
Extract all image URLs from the page.
Instruction: download images
Response:
{
"url": "https://example.com",
"instruction": "download images",
"result": {
"task": "download_images",
"count": 5,
"images": [
{
"src": "https://example.com/image1.jpg",
"alt": "Image description"
}
]
},
"metadata": { ... }
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "download images"}'
Capture a screenshot of the page.
Instruction: take screenshot
Response:
{
"url": "https://example.com",
"instruction": "take screenshot",
"result": {
"task": "take_screenshot",
"screenshot": "...",
"dimensions": {
"width": 1920,
"height": 1080
}
},
"metadata": { ... }
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "take screenshot"}'
Extract and analyze plain text content from the page.
Instruction: extract text
Response:
{
"url": "https://github.com",
"instruction": "extract text",
"result": {
"task": "extract_text",
"wordCount": 1174,
"characterCount": 19184,
"headingCount": 32,
"paragraphCount": 28,
"fullText": "The complete text content...",
"headings": [
{
"level": "h1",
"text": "Main Heading"
}
],
"paragraphs": ["First paragraph...", "Second paragraph..."]
},
"metadata": { ... }
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://github.com", "instruction": "extract text"}'
Find elements using CSS selectors.
Instruction: find element "CSS_SELECTOR"
Response:
{
"url": "https://example.com",
"instruction": "find element \"h1\"",
"result": {
"task": "find_element",
"selector": "h1",
"found": true,
"count": 3,
"elements": [
{
"text": "Example Domain",
"html": "<h1>Example Domain</h1>"
}
]
},
"metadata": { ... }
}
Example:
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "find element \"h1\""}'
{
"url": "https://example.com",
"error": "Error message here",
"metadata": {
"timestamp": "2025-11-25T10:39:10.160Z",
"executionTime": 1500
}
}
| Status Code | Error | Description |
|---|---|---|
| 400 | Validation failed | Invalid URL or missing required parameters |
| 500 | Internal server error | Browser failed to launch or page load timeout |
Example Error:
{
"error": "Validation failed",
"details": [
{
"code": "invalid_string",
"message": "Invalid URL format",
"path": ["url"]
}
]
}
async function crawlPage(url, instruction) {
const response = await fetch('https://getobj.com/api/crawl', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: url,
instruction: instruction,
format: 'markdown'
})
});
return await response.json();
}
// Usage
const result = await crawlPage('https://example.com', 'extract links');
console.log(result);
import requests
def crawl_page(url, instruction=None):
response = requests.post(
'https://getobj.com/api/crawl',
json={
'url': url,
'instruction': instruction,
'format': 'markdown'
}
)
return response.json()
# Usage
result = crawl_page('https://example.com', 'extract links')
print(result)
const axios = require('axios');
async function crawlPage(url, instruction) {
const response = await axios.post('https://getobj.com/api/crawl', {
url: url,
instruction: instruction,
format: 'markdown'
});
return response.data;
}
// Usage
crawlPage('https://example.com', 'extract links')
.then(result => console.log(result));
# Basic page crawl
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "format": "markdown"}'
# Extract links
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "extract links"}'
# Count words
curl -X POST https://getobj.com/api/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "instruction": "count word \"example\""}'
# Using GET endpoint
curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"
Extract structured data from websites for analysis or monitoring.
Convert web pages to markdown for documentation or CMS migration.
Extract links, headings, and text content for SEO auditing.
Monitor competitor websites for changes in content.
Gather specific information from multiple pages programmatically.
Verify website content and structure in automated tests.
For issues or questions: