GetObj API Documentation

Base URL: https://getobj.com

Overview

GetObj is a web crawling service that extracts content from websites using a headless browser. It supports various tasks like extracting links, downloading files, counting words, and converting pages to markdown.

Authentication

Currently, no authentication is required. All endpoints are publicly accessible.

Rate Limits

No rate limits currently enforced. Please use responsibly.


Endpoints

1. Health Check

Check if the service is running.

Endpoint: GET /api/health

Response:

{
  "status": "ok",
  "timestamp": "2025-11-25T10:41:19.669Z"
}

Example:

curl https://getobj.com/api/health

2. Crawl Page (POST)

Crawl a webpage and extract content or perform specific tasks.

Endpoint: POST /api/crawl

Headers:

Request Body:

{
  "url": "https://example.com",
  "instruction": "extract links",  // optional
  "format": "markdown"              // optional: "html", "markdown", "both"
}

Parameters:

Parameter Type Required Description
url string Yes The URL to crawl (must be valid URL)
instruction string No Task to perform (see available tasks below)
format string No Output format: html, markdown, or both. Default: markdown

Response (Default - No Instruction):

{
  "url": "https://example.com",
  "content": {
    "markdown": "# Page Title\n\nContent here...",
    "text": "Plain text content..."
  },
  "metadata": {
    "title": "Example Domain",
    "statusCode": 200,
    "contentType": "text/html",
    "timestamp": "2025-11-25T10:39:10.160Z",
    "executionTime": 6935
  }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "format": "markdown"
  }'

3. Crawl Page (GET)

Same as POST but using query parameters.

Endpoint: GET /api/crawl

Query Parameters:

Parameter Type Required Description
url string Yes The URL to crawl
instruction string No Task to perform
format string No Output format

Example:

curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"

Available Tasks

Use these instructions in the instruction parameter:

Extract Links

Extract all hyperlinks from the page.

Instruction: extract links

Response:

{
  "url": "https://example.com",
  "instruction": "extract links",
  "result": {
    "task": "extract_links",
    "count": 1,
    "links": [
      {
        "href": "https://iana.org/domains/example",
        "text": "Learn more",
        "title": ""
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "extract links"}'

Count Words

Count occurrences of a specific word (case-insensitive by default).

Instruction: count word "WORD"

Response:

{
  "url": "https://example.com",
  "instruction": "count word \"example\"",
  "result": {
    "task": "count_word",
    "word": "example",
    "count": 2,
    "caseSensitive": false
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "count word \"example\""}'

Download Files

Download all file attachments from the page.

Instruction: download files

Response:

{
  "url": "https://example.com",
  "instruction": "download files",
  "result": {
    "task": "download_files",
    "filesFound": 9,
    "downloads": [
      {
        "filename": "document1.pdf",
        "path": "/app/downloads/document1.pdf",
        "size": "2.5 MB"
      }
    ],
    "totalSize": "19 MB"
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "download files"}'

Download Images

Extract all image URLs from the page.

Instruction: download images

Response:

{
  "url": "https://example.com",
  "instruction": "download images",
  "result": {
    "task": "download_images",
    "count": 5,
    "images": [
      {
        "src": "https://example.com/image1.jpg",
        "alt": "Image description"
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "download images"}'

Take Screenshot

Capture a screenshot of the page.

Instruction: take screenshot

Response:

{
  "url": "https://example.com",
  "instruction": "take screenshot",
  "result": {
    "task": "take_screenshot",
    "screenshot": "...",
    "dimensions": {
      "width": 1920,
      "height": 1080
    }
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "take screenshot"}'

Extract Text

Extract and analyze plain text content from the page.

Instruction: extract text

Response:

{
  "url": "https://github.com",
  "instruction": "extract text",
  "result": {
    "task": "extract_text",
    "wordCount": 1174,
    "characterCount": 19184,
    "headingCount": 32,
    "paragraphCount": 28,
    "fullText": "The complete text content...",
    "headings": [
      {
        "level": "h1",
        "text": "Main Heading"
      }
    ],
    "paragraphs": ["First paragraph...", "Second paragraph..."]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com", "instruction": "extract text"}'

Find Element

Find elements using CSS selectors.

Instruction: find element "CSS_SELECTOR"

Response:

{
  "url": "https://example.com",
  "instruction": "find element \"h1\"",
  "result": {
    "task": "find_element",
    "selector": "h1",
    "found": true,
    "count": 3,
    "elements": [
      {
        "text": "Example Domain",
        "html": "<h1>Example Domain</h1>"
      }
    ]
  },
  "metadata": { ... }
}

Example:

curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "find element \"h1\""}'

Error Handling

Error Response Format

{
  "url": "https://example.com",
  "error": "Error message here",
  "metadata": {
    "timestamp": "2025-11-25T10:39:10.160Z",
    "executionTime": 1500
  }
}

Common Errors

Status Code Error Description
400 Validation failed Invalid URL or missing required parameters
500 Internal server error Browser failed to launch or page load timeout

Example Error:

{
  "error": "Validation failed",
  "details": [
    {
      "code": "invalid_string",
      "message": "Invalid URL format",
      "path": ["url"]
    }
  ]
}

Performance Considerations


Examples

JavaScript (Fetch API)

async function crawlPage(url, instruction) {
  const response = await fetch('https://getobj.com/api/crawl', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url: url,
      instruction: instruction,
      format: 'markdown'
    })
  });

  return await response.json();
}

// Usage
const result = await crawlPage('https://example.com', 'extract links');
console.log(result);

Python (requests)

import requests

def crawl_page(url, instruction=None):
    response = requests.post(
        'https://getobj.com/api/crawl',
        json={
            'url': url,
            'instruction': instruction,
            'format': 'markdown'
        }
    )
    return response.json()

# Usage
result = crawl_page('https://example.com', 'extract links')
print(result)

Node.js (axios)

const axios = require('axios');

async function crawlPage(url, instruction) {
  const response = await axios.post('https://getobj.com/api/crawl', {
    url: url,
    instruction: instruction,
    format: 'markdown'
  });

  return response.data;
}

// Usage
crawlPage('https://example.com', 'extract links')
  .then(result => console.log(result));

cURL

# Basic page crawl
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "markdown"}'

# Extract links
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "extract links"}'

# Count words
curl -X POST https://getobj.com/api/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "instruction": "count word \"example\""}'

# Using GET endpoint
curl "https://getobj.com/api/crawl?url=https://example.com&instruction=extract%20links"

Use Cases

1. Web Scraping

Extract structured data from websites for analysis or monitoring.

2. Content Migration

Convert web pages to markdown for documentation or CMS migration.

3. SEO Analysis

Extract links, headings, and text content for SEO auditing.

4. Competitive Research

Monitor competitor websites for changes in content.

5. Data Collection

Gather specific information from multiple pages programmatically.

6. Testing

Verify website content and structure in automated tests.


YouTube Transcript API

Extract transcripts/captions from YouTube videos.

Get YouTube Transcript (GET)

Endpoint: GET /api/youtube/transcript

Query Parameters:

Parameter Type Required Description
url string Yes YouTube video URL or video ID
lang string No Language code (default: en). Examples: ko, ja, es, fr

Response:

{
  "success": true,
  "data": {
    "videoId": "dQw4w9WgXcQ",
    "language": "en",
    "segments": [
      {
        "start": 12.645,
        "end": 14.015,
        "text": "So in college,"
      }
    ],
    "fullText": "So in college, I was a government major..."
  }
}

Example:

# Using video ID
curl "https://getobj.com/api/youtube/transcript?url=dQw4w9WgXcQ"

# Using full URL
curl "https://getobj.com/api/youtube/transcript?url=https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# With Korean language
curl "https://getobj.com/api/youtube/transcript?url=jkL7DjPchRo&lang=ko"

Get YouTube Transcript (POST)

Endpoint: POST /api/youtube/transcript

Headers:

Request Body:

{
  "url": "VIDEO_ID_OR_URL",
  "lang": "en"
}

Parameters:

Parameter Type Required Description
url string Yes YouTube video URL or video ID
lang string No Language code (default: en)

Example:

curl -X POST https://getobj.com/api/youtube/transcript \
  -H "Content-Type: application/json" \
  -d '{"url": "jkL7DjPchRo", "lang": "ko"}'

Supported URL Formats


Supported Languages

Use ISO 639-1 language codes:

Code Language
en English
ko Korean
ja Japanese
zh Chinese
es Spanish
fr French
de German
pt Portuguese

Error Responses

Status Error Description
400 Invalid YouTube URL or video ID The provided URL/ID is not a valid YouTube video
400 Invalid language code Language code must be ISO 639-1 format (e.g., 'en', 'ko')
400 Video ID or URL is required Missing the url parameter
404 No subtitle file found Video doesn't have captions in the requested language
404 No transcript content found Captions exist but are empty
404 Video is unavailable or private Video cannot be accessed
503 yt-dlp is not installed Server dependency missing
504 Request timed out YouTube rate limiting or network issues

Example Error Responses:

// 400 - Invalid input
{
  "success": false,
  "error": "Invalid YouTube URL or video ID: xyz"
}

// 404 - No subtitles
{
  "success": false,
  "error": "No subtitle file found for language: en. The video may not have captions in this language."
}

// 504 - Rate limited
{
  "success": false,
  "error": "Request timed out. The video may be unavailable or YouTube is rate limiting requests."
}

Limitations


Support

For issues or questions:


Changelog

v1.1.0 (2025-12-20)

v1.0.0 (2025-11-25)