GetObj API Documentation

Base URL: https://getobj.com

Overview

GetObj is a web crawling service that extracts content from websites using a headless browser. It supports various tasks like extracting links, downloading files, counting words, and converting pages to markdown.

Authentication

Currently, no authentication is required. All endpoints are publicly accessible.

Rate Limits

No rate limits currently enforced. Please use responsibly.

Endpoints

1. Health Check

Check if the service is running.

Endpoint: GET /api/health

Response:

{
  "status": "ok",
  "timestamp": "2025-11-25T10:41:19.669Z"
}

Example:

curl https://getobj.com/api/health

2. Crawl Page (POST)

Crawl a webpage and extract content or perform specific tasks.

Endpoint: POST /api/crawl

Headers:

Content-Type: application/json

Request Body:

{
  "url": "https://example.com",
  "instruction": "extract links",  // optional
  "format": "markdown"              // optional: "html", "markdown", "both"
}

Parameters:

Parameter	Type	Required	Description
`url`	string	Yes	The URL to crawl (must be valid URL)
`instruction`	string	No	Task to perform (see available tasks below)
`format`	string	No	Output format: `html`, `markdown`, or `both`. Default: `markdown`

Response (Default - No Instruction):

{
  "url": "https://example.com",
  "content": {
    "markdown": "# Page Title\n\nContent here...",
    "text": "Plain text content..."
  },
  "metadata": {
    "title": "Example Domain",
    "statusCode": 200,
    "contentType": "text/html",
    "timestamp": "2025-11-25T10:39:10.160Z",
    "executionTime": 6935
  }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

3. Crawl Page (GET)

Same as POST but using query parameters.

Endpoint: GET /api/crawl

Query Parameters:

Parameter	Type	Required	Description
`url`	string	Yes	The URL to crawl
`instruction`	string	No	Task to perform
`format`	string	No	Output format

Example:

curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"

Available Tasks

Use these instructions in the instruction parameter:

Extract Links

Extract all hyperlinks from the page.

Instruction: extract links

Response:

{
  "url": "https://example.com",
  "instruction": "extract links",
  "result": {
    "task": "extract_links",
    "count": 1,
    "links": [
      {
        "href": "https://iana.org/domains/example",
        "text": "Learn more",
        "title": ""
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "extract links"}'

Count Words

Count occurrences of a specific word (case-insensitive by default).

Instruction: count word "WORD"

Response:

{
  "url": "https://example.com",
  "instruction": "count word \"example\"",
  "result": {
    "task": "count_word",
    "word": "example",
    "count": 2,
    "caseSensitive": false
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "count word \"example\""}'

Download Files

Download all file attachments from the page.

Instruction: download files

Response:

{
  "url": "https://example.com",
  "instruction": "download files",
  "result": {
    "task": "download_files",
    "filesFound": 9,
    "downloads": [
      {
        "filename": "document1.pdf",
        "path": "/app/downloads/document1.pdf",
        "size": "2.5 MB"
      }
    ],
    "totalSize": "19 MB"
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "download files"}'

Download Images

Extract all image URLs from the page.

Instruction: download images

Response:

{
  "url": "https://example.com",
  "instruction": "download images",
  "result": {
    "task": "download_images",
    "count": 5,
    "images": [
      {
        "src": "https://example.com/image1.jpg",
        "alt": "Image description"
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "download images"}'

Take Screenshot

Capture a screenshot of the page.

Instruction: take screenshot

Response:

{
  "url": "https://example.com",
  "instruction": "take screenshot",
  "result": {
    "task": "take_screenshot",
    "screenshot": "data:image/png;base64,iVBORw0KG...",
    "dimensions": {
      "width": 1920,
      "height": 1080
    }
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "take screenshot"}'

Extract Text

Extract and analyze plain text content from the page.

Instruction: extract text

Response:

{
  "url": "https://github.com",
  "instruction": "extract text",
  "result": {
    "task": "extract_text",
    "wordCount": 1174,
    "characterCount": 19184,
    "headingCount": 32,
    "paragraphCount": 28,
    "fullText": "The complete text content...",
    "headings": [
      {
        "level": "h1",
        "text": "Main Heading"
      }
    ],
    "paragraphs": ["First paragraph...", "Second paragraph..."]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com", "instruction": "extract text"}'

Find Element

Find elements using CSS selectors.

Instruction: find element "CSS_SELECTOR"

Response:

{
  "url": "https://example.com",
  "instruction": "find element \"h1\"",
  "result": {
    "task": "find_element",
    "selector": "h1",
    "found": true,
    "count": 3,
    "elements": [
      {
        "text": "Example Domain",
        "html": "<h1>Example Domain</h1>"
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "find element \"h1\""}'

Error Handling

Error Response Format

{
  "url": "https://example.com",
  "error": "Error message here",
  "metadata": {
    "timestamp": "2025-11-25T10:39:10.160Z",
    "executionTime": 1500
  }
}

Common Errors

Status Code	Error	Description
400	Validation failed	Invalid URL or missing required parameters
500	Internal server error	Browser failed to launch or page load timeout

Example Error:

{
  "error": "Validation failed",
  "details": [
    {
      "code": "invalid_string",
      "message": "Invalid URL format",
      "path": ["url"]
    }
  ]
}

Performance Considerations

Simple pages: ~1-2 seconds
Complex pages: ~6-9 seconds
File downloads: Depends on file size and count
Timeout: 30 seconds for page load

Examples

JavaScript (Fetch API)

async function crawlPage(url, instruction) {
  const response = await fetch('https://getobj.com/api/crawl', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url: url,
      instruction: instruction,
      format: 'markdown'
    })
  });

  return await response.json();
}

// Usage
const result = await crawlPage('https://example.com', 'extract links');
console.log(result);

Python (requests)

import requests

def crawl_page(url, instruction=None):
    response = requests.post(
        'https://getobj.com/api/crawl',
        json={
            'url': url,
            'instruction': instruction,
            'format': 'markdown'
        }
    )
    return response.json()

# Usage
result = crawl_page('https://example.com', 'extract links')
print(result)

Node.js (axios)

const axios = require('axios');

async function crawlPage(url, instruction) {
  const response = await axios.post('https://getobj.com/api/crawl', {
    url: url,
    instruction: instruction,
    format: 'markdown'
  });

  return response.data;
}

// Usage
crawlPage('https://example.com', 'extract links')
  .then(result => console.log(result));

cURL

# Basic page crawl
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "markdown"}'

# Extract links
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "extract links"}'

# Count words
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "count word \"example\""}'

# Using GET endpoint
curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"

Use Cases

1. Web Scraping

Extract structured data from websites for analysis or monitoring.

2. Content Migration

Convert web pages to markdown for documentation or CMS migration.

3. SEO Analysis

Extract links, headings, and text content for SEO auditing.

4. Competitive Research

Monitor competitor websites for changes in content.

5. Data Collection

Gather specific information from multiple pages programmatically.

6. Testing

Verify website content and structure in automated tests.

Limitations

Maximum page load timeout: 30 seconds
No support for authenticated pages (login required)
No natural language instruction parsing (OpenAI integration not active)
Downloads are temporary (not permanently stored)
Single-page crawling only (no site-wide crawling)

Support

For issues or questions:

GitHub: Create an issue in the repository
Website: https://getobj.com
Health Check: https://getobj.com/api/health

Changelog

v1.0.0 (2025-11-25)

Initial release
Basic page crawling (HTML/Markdown)
Link extraction
Word counting
File downloads
Image extraction
Screenshot capture
Text analysis
CSS selector search