Skills document-extraction-api

Name: document-extraction-api
Author: iterationlayer

Extract structured data from documents using AI-powered field extraction.

install

source · Clone the upstream repo

git clone https://github.com/iterationlayer/skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/iterationlayer/skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/document-extraction-api" ~/.claude/skills/iterationlayer-skills-document-extraction-api && rm -rf "$T"

manifest: skills/document-extraction-api/SKILL.md

source content

Document Extraction API

Extract structured data from documents using AI-powered field extraction.

Cost: 1 credit per page

Prerequisites

You need an Iteration Layer API key. Get one at platform.iterationlayer.com — free trial credits included, no credit card required.

For full integration guidance (SDKs, auth, MCP, error handling), see the Iteration Layer Integration Guide.

API Reference

Extract structured data from any document with a single API call. Send one or more files and a schema defining the fields you need, and receive typed, validated results with confidence scores and source citations.

Key Features

Multi-Format Support — Extract from 40+ file formats: PDF, Office documents (DOCX, PPTX, ODT, ODS, XLSX), EPUB, LaTeX, email (EML), Jupyter notebooks, images, and text/markup formats.
Rich Field Types — Primitive types (text, number, date, boolean, email, enum) plus purpose-built validated types:
```
IBAN
```
(validated against the standard format),
```
ADDRESS
```
(returns a structured object with street, city, region, postal code, and country),
```
CURRENCY_CODE
```
(normalized to ISO 4217),
```
CURRENCY_AMOUNT
```
(numeric monetary value), and
```
COUNTRY
```
(normalized to ISO 3166-1 alpha-2). These validated types mean you get clean, usable data — not raw strings you have to parse yourself.
Structured Arrays — Extract repeating data like invoice line items with nested schemas.
Calculated Fields — Define arithmetic operations (sum, subtract, multiply, divide) computed from other extracted fields.
Confidence Scores — Every extracted value includes a confidence score between 0 and 1.
Source Citations — Verbatim quotes from the document that support each extracted value.
Schema Validation — Field schemas are validated before extraction, catching errors like circular dependencies or type mismatches early.

Overview

The Document Extraction API analyzes documents and extracts structured data based on a schema you define. You send one or more files (base64 or URL) and a schema with field definitions, and receive a JSON response with typed values, confidence scores, and citations.

Endpoint:

POST /document-extraction/v1/extract

Limits:

Max files per request: 20
Max file size: 50 MB per file

Supported File Formats

Documents: PDF, DOCX, PPTX, ODT, EPUB, RTF
Spreadsheets: XLSX, XLS, ODS, CSV, TSV
Email: EML, MSG (headers, body, and attachment extraction — attachments are ingested and included in extraction context)
Notebooks: Jupyter (.ipynb)
Academic & Publishing: LaTeX (.tex, .latex), BibTeX (.bib), Typst (.typst, .typ)
Markup & Text: HTML, Markdown, JSON, XML, YAML, TOML, RST, Org, Djot, MDX, TXT
Images: PNG, JPEG, GIF, WebP, AVIF, HEIF, BMP, TIFF, JP2, PNM/PBM/PGM/PPM, SVG

How It Works

Every extraction runs the same pipeline:

Validate — the schema is checked before any files are touched. Three things are validated: CALCULATED source field references (each must exist in the schema and be a numeric type), circular dependencies between CALCULATED fields, and default values matching their field type. The first failure stops validation with a descriptive error. No LLM call is made if the schema is invalid.
Ingest — files are converted to a normalized text representation.
Extract — each field is extracted according to its type and configuration. All non-CALCULATED fields are handled together, with all source files available during extraction. Every extracted value is tagged with the file it came from.
Calculate — CALCULATED fields are computed from the extraction results. Operations are pure arithmetic:
```
sum
```
,
```
subtract
```
,
```
multiply
```
,
```
divide
```
. The confidence of a CALCULATED field is the minimum confidence of its source fields. Division by zero returns
```
0
```
. The result is deterministic — if the source values are correct, the computed value is exact.
Consolidate — default values are applied to fields that were not extracted, and required field constraints are checked.

Intelligent extraction — the API automatically selects the best extraction approach based on the complexity of your schema and the nature of the documents. You don't configure this.

Request Format

curl -X POST \
  https://api.iterationlayer.com/document-extraction/v1/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "type": "base64",
        "name": "invoice.pdf",
        "base64": "<base64-encoded-file>"
      }
    ],
    "schema": {
      "fields": [
        {
          "name": "invoice_number",
          "type": "TEXT",
          "description": "The invoice number"
        },
        {
          "name": "total_amount",
          "type": "CURRENCY_AMOUNT",
          "description": "Total invoice amount"
        }
      ]
    }
  }'

import { IterationLayer } from "iterationlayer";
const client = new IterationLayer({
  apiKey: "YOUR_API_KEY",
});

const result = await client.extract({
  files: [
    {
      type: "base64",
      name: "invoice.pdf",
      base64: "<base64-encoded-file>",
    },
  ],
  schema: {
    fields: [
      {
        name: "invoice_number",
        type: "TEXT",
        description: "The invoice number",
      },
      {
        name: "total_amount",
        type: "CURRENCY_AMOUNT",
        description: "Total invoice amount",
      },
    ],
  },
});

from iterationlayer import IterationLayer
client = IterationLayer(api_key="YOUR_API_KEY")

result = client.extract(
    files=[
        {
            "type": "base64",
            "name": "invoice.pdf",
            "base64": "<base64-encoded-file>",
        }
    ],
    schema={
        "fields": [
            {
                "name": "invoice_number",
                "type": "TEXT",
                "description": "The invoice number",
            },
            {
                "name": "total_amount",
                "type": "CURRENCY_AMOUNT",
                "description": "Total invoice amount",
            },
        ]
    },
)

import il "github.com/iterationlayer/sdk-go"
client := il.NewClient("YOUR_API_KEY")

result, err := client.Extract(il.ExtractRequest{
	Files: []il.FileInput{
		il.NewFileFromBase64(
			"invoice.pdf",
			"<base64-encoded-file>",
		),
	},
	Schema: il.ExtractionSchema{
		"invoice_number": il.NewTextFieldConfig(
			"invoice_number",
			"The invoice number",
		),
		"total_amount": il.NewCurrencyAmountFieldConfig(
			"total_amount",
			"Total invoice amount",
		),
	},
})

{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-001",
      "confidence": 0.98,
      "citations": ["Invoice No: INV-2024-001"],
      "source": "invoice.pdf"
    },
    "total_amount": {
      "type": "CURRENCY_AMOUNT",
      "value": 1250.00,
      "confidence": 0.95,
      "citations": ["Total: €1,250.00"],
      "source": "invoice.pdf"
    }
  }
}

Top-Level Fields

Field	Type	Required	Description
`files`	array	Yes	List of file inputs to extract from (see File Input below)
`schema`	object	Yes	Extraction schema defining the fields to extract
`webhook_url`	string	No	HTTPS URL to receive results asynchronously. If provided, returns 201 immediately. See Webhooks.

Async Mode

Add a

webhook_url

parameter to process the request in the background. The API returns

201 Accepted

immediately and delivers the result to your webhook URL when processing completes. See Webhooks for payload format and retry behavior.

File Input

Provide each file as either base64 or a URL:

Field	Type	Required	Description
`type`	string	Yes	`"base64"` or `"url"`
`name`	string	Yes	Filename with extension (e.g., `invoice.pdf` )
`base64`	string	If type is `base64`	Base64-encoded file content
`url`	string	If type is `url`	Publicly accessible URL to fetch the file from

URL Example

{
  "files": [
    {
      "type": "url",
      "name": "invoice.pdf",
      "url": "https://example.com/invoice.pdf"
    }
  ],
  "schema": {
    "fields": [
      {
        "name": "invoice_number",
        "type": "TEXT",
        "description": "The invoice number"
      }
    ]
  }
}

Schema Definition

The

schema

object contains a

fields

array. Each field has these base properties:

Field	Type	Required	Description
`name`	string	Yes	Unique field identifier (used as key in the response)
`type`	string	Yes	One of the supported field types (see Field Types below)
`description`	string	Yes	Natural language description of what to extract
`is_required`	boolean	No	If `true` , returns an error when the field cannot be extracted and has no default. Default: `false`

Each field type supports additional type-specific properties described below.

Field Types

TEXT

Short single-line text value.

Property	Type	Required	Description
`max_length`	integer	No	Maximum character length (> 0)
`default_value`	string	No	Default value if not found

{
  "name": "company_name",
  "type": "TEXT",
  "description": "Name of the company"
}

TEXTAREA

Multi-line text value.

Property	Type	Required	Description
`max_length`	integer	No	Maximum character length (> 0)
`default_value`	string	No	Default value if not found

{
  "name": "notes",
  "type": "TEXTAREA",
  "description": "Additional notes or comments"
}

INTEGER

Whole number value.

Property	Type	Required	Description
`min`	integer	No	Minimum value (inclusive)
`max`	integer	No	Maximum value (inclusive)
`unit`	string	No	Unit label (e.g., `"kg"` , `"items"` )
`default_value`	integer	No	Default value if not found

{
  "name": "quantity",
  "type": "INTEGER",
  "description": "Number of items ordered",
  "min": 1
}

DECIMAL

Floating-point number value.

Property	Type	Required	Description
`min`	float	No	Minimum value (inclusive)
`max`	float	No	Maximum value (inclusive)
`decimal_points`	integer	No	Number of decimal places to round to (>= 0)
`unit`	string	No	Unit label
`default_value`	float	No	Default value if not found

{
  "name": "weight",
  "type": "DECIMAL",
  "description": "Package weight",
  "unit": "kg",
  "decimal_points": 2
}

DATE

Calendar date, extracted as an ISO 8601 string (

YYYY-MM-DD

Property	Type	Required	Description
`allow_future_dates`	boolean	No	Whether to allow dates in the future
`allow_past_dates`	boolean	No	Whether to allow dates in the past

{
  "name": "invoice_date",
  "type": "DATE",
  "description": "Date the invoice was issued"
}

DATETIME

Date and time, extracted as an ISO 8601 datetime string.

Property	Type	Required	Description
`allow_future_dates`	boolean	No	Whether to allow dates in the future
`allow_past_dates`	boolean	No	Whether to allow dates in the past

{
  "name": "timestamp",
  "type": "DATETIME",
  "description": "Transaction timestamp"
}

TIME

Time value (e.g.,

"14:30:00"

). No additional parameters.

{
  "name": "delivery_time",
  "type": "TIME",
  "description": "Scheduled delivery time"
}

ENUM

One or more values from a predefined list. Extracted as a string array.

Property	Type	Required	Description
`values`	string[]	Yes	Allowed options
`min_selected`	integer	No	Minimum number of selected values (>= 0)
`max_selected`	integer	No	Maximum number of selected values (> 0)
`default_value`	string[]	No	Default selected values

{
  "name": "payment_method",
  "type": "ENUM",
  "description": "How the invoice was paid",
  "values": ["bank_transfer", "credit_card", "cash", "paypal"],
  "max_selected": 1
}

BOOLEAN

True or false value.

Property	Type	Required	Description
`default_value`	boolean	No	Default value if not found

{
  "name": "is_paid",
  "type": "BOOLEAN",
  "description": "Whether the invoice has been paid"
}

EMAIL

Email address string.

Property	Type	Required	Description
`default_value`	string	No	Default value if not found

{
  "name": "contact_email",
  "type": "EMAIL",
  "description": "Contact email address"
}

IBAN

International Bank Account Number. Validated against the pattern

^[A-Z]{2}\d{2}[A-Z0-9]{11,30}$

Property	Type	Required	Description
`default_value`	string	No	Default value if not found

{
  "name": "bank_account",
  "type": "IBAN",
  "description": "Recipient IBAN"
}

COUNTRY

ISO 3166-1 alpha-2 country code (e.g.,

"DE"

"US"

Property	Type	Required	Description
`default_value`	string	No	Must be a valid ISO 3166-1 alpha-2 code

{
  "name": "origin_country",
  "type": "COUNTRY",
  "description": "Country of origin"
}

CURRENCY_CODE

ISO 4217 currency code (e.g.,

"EUR"

"USD"

Property	Type	Required	Description
`default_value`	string	No	Must be a valid ISO 4217 code

{
  "name": "currency",
  "type": "CURRENCY_CODE",
  "description": "Invoice currency"
}

CURRENCY_AMOUNT

Numeric monetary amount.

Property	Type	Required	Description
`min`	float	No	Minimum value (inclusive)
`max`	float	No	Maximum value (inclusive)
`decimal_points`	integer	No	Number of decimal places to round to (>= 0)
`default_value`	float	No	Default value if not found

{
  "name": "total_amount",
  "type": "CURRENCY_AMOUNT",
  "description": "Total invoice amount",
  "decimal_points": 2
}

ADDRESS

Structured address object. Extracted as an object with

street

city

region

postal_code

, and

country

fields.

Property	Type	Required	Description
`allowed_country_codes`	string[]	No	Restrict to specific ISO 3166-1 alpha-2 country codes

Response value shape:

{
  "street": "123 Main St",
  "city": "Berlin",
  "region": "Berlin",
  "postal_code": "10115",
  "country": "DE"
}

{
  "name": "billing_address",
  "type": "ADDRESS",
  "description": "Billing address",
  "allowed_country_codes": ["DE", "AT", "CH"]
}

ARRAY

A list of structured objects, each conforming to a nested schema. Use this for repeating data like line items.

Property	Type	Required	Description
`fields`	array	Yes	Array of field configurations for each item

The

fields

array uses the same field configuration format as top-level fields.

{
  "name": "line_items",
  "type": "ARRAY",
  "description": "Invoice line items",
  "fields": [
    {
      "name": "description",
      "type": "TEXT",
      "description": "Item description"
    },
    {
      "name": "quantity",
      "type": "INTEGER",
      "description": "Quantity ordered",
      "min": 1
    },
    {
      "name": "unit_price",
      "type": "CURRENCY_AMOUNT",
      "description": "Price per unit",
      "decimal_points": 2
    },
    {
      "name": "total",
      "type": "CURRENCY_AMOUNT",
      "description": "Line item total",
      "decimal_points": 2
    }
  ]
}

CALCULATED

A derived numeric value computed from other extracted fields. Not extracted from the document — calculated locally after all source fields are resolved.

Property Type Required Description

operation

string

Yes

One of:

"sum"

"subtract"

"multiply"

"divide"

source_field_names

string[]

Yes

Names of fields to apply the operation to, in order

unit

string

Unit label

Source fields must be numeric types:

INTEGER

DECIMAL

CURRENCY_AMOUNT

, or another

CALCULATED

. Circular dependencies are detected and rejected at validation time.

The confidence score of a CALCULATED field is the minimum confidence of its source fields. Division by zero returns

{
  "name": "tax_amount",
  "type": "CALCULATED",
  "description": "Tax amount (total minus net)",
  "operation": "subtract",
  "source_field_names": ["total_amount", "net_amount"]
}

Response Format

Success Response

{
  "success": true,
  "data": {
    "invoice_number": {
      "type": "TEXT",
      "value": "INV-2024-001",
      "confidence": 0.98,
      "citations": ["Invoice No: INV-2024-001"],
      "source": "invoice.pdf"
    },
    "total_amount": {
      "type": "CURRENCY_AMOUNT",
      "value": 1250.00,
      "confidence": 0.95,
      "citations": ["Total: €1,250.00"],
      "source": "invoice.pdf"
    }
  }
}

Each field in

data

contains:

Field	Type	Description
`type`	string	The field type from the schema
`value`	varies	Extracted value (type depends on field type — see below)
`confidence`	float	Confidence score between 0.0 and 1.0
`citations`	string[]	Verbatim quotes from the source document
`source`	string	Filename the value was extracted from

Value types by field type:

Field Type	Value Type
`TEXT` , `TEXTAREA` , `EMAIL` , `IBAN` , `COUNTRY` , `CURRENCY_CODE` , `DATE` , `DATETIME` , `TIME`	string
`INTEGER` , `DECIMAL` , `CURRENCY_AMOUNT` , `CALCULATED`	number
`BOOLEAN`	boolean
`ENUM`	string[]
`ADDRESS`	object
`ARRAY`	object[]

Fields that could not be extracted and have no

default_value

are omitted from

data

(unless

is_required

true

, which causes an error). Fields resolved via

default_value

have a confidence of

1.0

Recipes

For complete, runnable examples see the Recipes page.

Extract Invoice Data -- Extract line items, totals, and vendor details from an invoice into structured JSON.
Extract Resume Data -- Extract contact info, work history, and skills from a resume into structured data.
Extract Medical Record -- Extract patient details, diagnoses, and medications from a medical record into structured JSON.
Extract Receipt Data -- Extract merchant, amount, date, and line items from a receipt image or PDF.
Extract Multi-Invoice Data -- Extract structured data from multiple invoice files in a single API call using an array schema.

Error Responses

All errors return a JSON body with

{ "success": false, "error": "<message>" }

Status	Description
400	Invalid request (missing files/schema, invalid base64, URL fetch failure, file size exceeded, invalid field config)
401	Missing or invalid API key
402	Insufficient credits or budget cap exceeded
422	Processing error (circular dependency in CALCULATED fields, required field not extractable, LLM parsing failure)
429	Rate limit exceeded