GetObj API Documentation

Base URL: https://getobj.com

Overview

GetObj is a web crawling service that extracts content from websites using a headless browser. It supports various tasks like extracting links, downloading files, counting words, and converting pages to markdown.

Authentication

Currently, no authentication is required. All endpoints are publicly accessible.

Rate Limits

No rate limits currently enforced. Please use responsibly.


Endpoints

1. Health Check

Check if the service is running.

Endpoint: GET /api/health

Response:

{
  "status": "ok",
  "timestamp": "2025-11-25T10:41:19.669Z"
}

Example:

curl https://getobj.com/api/health

2. Crawl Page (POST)

Crawl a webpage and extract content or perform specific tasks.

Endpoint: POST /api/crawl

Headers:

Request Body:

{
  "url": "https://example.com",
  "instruction": "extract links",  // optional
  "format": "markdown"              // optional: "html", "markdown", "both"
}

Parameters:

Parameter Type Required Description
url string Yes The URL to crawl (must be valid URL)
instruction string No Task to perform (see available tasks below)
format string No Output format: html, markdown, or both. Default: markdown

Response (Default - No Instruction):

{
  "url": "https://example.com",
  "content": {
    "markdown": "# Page Title\n\nContent here...",
    "text": "Plain text content..."
  },
  "metadata": {
    "title": "Example Domain",
    "statusCode": 200,
    "contentType": "text/html",
    "timestamp": "2025-11-25T10:39:10.160Z",
    "executionTime": 6935
  }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

3. Crawl Page (GET)

Same as POST but using query parameters.

Endpoint: GET /api/crawl

Query Parameters:

Parameter Type Required Description
url string Yes The URL to crawl
instruction string No Task to perform
format string No Output format

Example:

curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"

Available Tasks

Use these instructions in the instruction parameter:

Extract Links

Extract all hyperlinks from the page.

Instruction: extract links

Response:

{
  "url": "https://example.com",
  "instruction": "extract links",
  "result": {
    "task": "extract_links",
    "count": 1,
    "links": [
      {
        "href": "https://iana.org/domains/example",
        "text": "Learn more",
        "title": ""
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "extract links"}'

Count Words

Count occurrences of a specific word (case-insensitive by default).

Instruction: count word "WORD"

Response:

{
  "url": "https://example.com",
  "instruction": "count word \"example\"",
  "result": {
    "task": "count_word",
    "word": "example",
    "count": 2,
    "caseSensitive": false
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "count word \"example\""}'

Download Files

Download all file attachments from the page.

Instruction: download files

Response:

{
  "url": "https://example.com",
  "instruction": "download files",
  "result": {
    "task": "download_files",
    "filesFound": 9,
    "downloads": [
      {
        "filename": "document1.pdf",
        "path": "/app/downloads/document1.pdf",
        "size": "2.5 MB"
      }
    ],
    "totalSize": "19 MB"
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "download files"}'

Download Images

Extract all image URLs from the page.

Instruction: download images

Response:

{
  "url": "https://example.com",
  "instruction": "download images",
  "result": {
    "task": "download_images",
    "count": 5,
    "images": [
      {
        "src": "https://example.com/image1.jpg",
        "alt": "Image description"
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "download images"}'

Take Screenshot

Capture a screenshot of the page.

Instruction: take screenshot

Response:

{
  "url": "https://example.com",
  "instruction": "take screenshot",
  "result": {
    "task": "take_screenshot",
    "screenshot": "...",
    "dimensions": {
      "width": 1920,
      "height": 1080
    }
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "take screenshot"}'

Extract Text

Extract and analyze plain text content from the page.

Instruction: extract text

Response:

{
  "url": "https://github.com",
  "instruction": "extract text",
  "result": {
    "task": "extract_text",
    "wordCount": 1174,
    "characterCount": 19184,
    "headingCount": 32,
    "paragraphCount": 28,
    "fullText": "The complete text content...",
    "headings": [
      {
        "level": "h1",
        "text": "Main Heading"
      }
    ],
    "paragraphs": ["First paragraph...", "Second paragraph..."]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com", "instruction": "extract text"}'

Find Element

Find elements using CSS selectors.

Instruction: find element "CSS_SELECTOR"

Response:

{
  "url": "https://example.com",
  "instruction": "find element \"h1\"",
  "result": {
    "task": "find_element",
    "selector": "h1",
    "found": true,
    "count": 3,
    "elements": [
      {
        "text": "Example Domain",
        "html": "<h1>Example Domain</h1>"
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "find element \"h1\""}'

Error Handling

Error Response Format

{
  "url": "https://example.com",
  "error": "Error message here",
  "metadata": {
    "timestamp": "2025-11-25T10:39:10.160Z",
    "executionTime": 1500
  }
}

Common Errors

Status Code Error Description
400 Validation failed Invalid URL or missing required parameters
500 Internal server error Browser failed to launch or page load timeout

Example Error:

{
  "error": "Validation failed",
  "details": [
    {
      "code": "invalid_string",
      "message": "Invalid URL format",
      "path": ["url"]
    }
  ]
}

Performance Considerations


Examples

JavaScript (Fetch API)

async function crawlPage(url, instruction) {
  const response = await fetch('https://getobj.com/api/crawl', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url: url,
      instruction: instruction,
      format: 'markdown'
    })
  });

  return await response.json();
}

// Usage
const result = await crawlPage('https://example.com', 'extract links');
console.log(result);

Python (requests)

import requests

def crawl_page(url, instruction=None):
    response = requests.post(
        'https://getobj.com/api/crawl',
        json={
            'url': url,
            'instruction': instruction,
            'format': 'markdown'
        }
    )
    return response.json()

# Usage
result = crawl_page('https://example.com', 'extract links')
print(result)

Node.js (axios)

const axios = require('axios');

async function crawlPage(url, instruction) {
  const response = await axios.post('https://getobj.com/api/crawl', {
    url: url,
    instruction: instruction,
    format: 'markdown'
  });

  return response.data;
}

// Usage
crawlPage('https://example.com', 'extract links')
  .then(result => console.log(result));

cURL

# Basic page crawl
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "markdown"}'

# Extract links
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "extract links"}'

# Count words
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "count word \"example\""}'

# Using GET endpoint
curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"

Use Cases

1. Web Scraping

Extract structured data from websites for analysis or monitoring.

2. Content Migration

Convert web pages to markdown for documentation or CMS migration.

3. SEO Analysis

Extract links, headings, and text content for SEO auditing.

4. Competitive Research

Monitor competitor websites for changes in content.

5. Data Collection

Gather specific information from multiple pages programmatically.

6. Testing

Verify website content and structure in automated tests.


Limitations


Support

For issues or questions:


Changelog

v1.0.0 (2025-11-25)