Goose-skills orthogonal-exa

Neural web search - find similar content, extract pages, and run deep research

install

source · Clone the upstream repo

git clone https://github.com/gooseworks-ai/goose-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/gooseworks-ai/goose-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/capabilities/orthogonal-exa" ~/.claude/skills/gooseworks-ai-goose-skills-orthogonal-exa && rm -rf "$T"

manifest: skills/capabilities/orthogonal-exa/SKILL.md

Exa - Neural Web Search & Research

Setup

Read your credentials from ~/.gooseworks/credentials.json:

export GOOSEWORKS_API_KEY=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json'))['api_key'])")
export GOOSEWORKS_API_BASE=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json')).get('api_base','https://api.gooseworks.ai'))")

If ~/.gooseworks/credentials.json does not exist, tell the user to run:

npx gooseworks login

All endpoints use Bearer auth:

-H "Authorization: Bearer $GOOSEWORKS_API_KEY"

Neural search engine for finding similar content, extracting pages, and deep research.

Capabilities

Exa Research: Retrieve a paginated list of your research tasks
Answer: Get an LLM answer to a question informed by Exa search results
Search: The search endpoint lets you intelligently search the web and extract contents from the results
Get a task: Retrieve the status and results of a previously created research task
Find similar links: Find similar links to the link provided and optionally return the contents of the pages
Create a task: Create an asynchronous research task that explores the web, gathers sources, synthesizes findings, and returns results with citations
Get contents: Get the full page contents, summaries, and metadata for a list of URLs

Usage

Exa Research

Retrieve a paginated list of your research tasks. The response follows a cursor-based pagination pattern. Pass the

limit

parameter to control page size (max 50) and use the

cursor

token returned in the response to fetch subsequent pages.

Parameters:

cursor (string) - The cursor to paginate through the results Minimum string length:
```
1
```
limit (number) - Number of results per page (1-50) Required range:
```
1 <= x <= 50
```

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/research/v1"}'

Answer

Get an LLM answer to a question informed by Exa search results. /answer performs an Exa search and uses an LLM to generate either:

A direct answer for specific queries. (i.e.

Parameters:

query* (string) - The question or query to answer.
stream (boolean) - If true, the response is returned as a server-sent events (SSS) stream.
text (boolean) - If true, the response includes full text content in the search results

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/answer","body":{"query":"What are the best practices for prompt engineering?"}}'

Search

The search endpoint lets you intelligently search the web and extract contents from the results.By default, it automatically chooses the best search method using Exa’s embeddings-based model and other techniques to find the most relevant results for your query.

Parameters:

query* (string) - The query string for the search.
additionalQueries (string[]) - Additional query variations for deep search. Only works with type="deep". When provided, these queries are used alongside the main query for comprehensive results.
type (enum<string>) - The type of search. Neural uses an embeddings-based model, auto (default) intelligently combines neural and other search methods, fast uses streamlined versions of the search models, and deep provides comprehensive search with query expansion and detailed context.
category (enum<string>) - A data category to focus on. The people and company categories have improved quality for finding LinkedIn profiles and company pages. Note: The company and people categories only support a limited set of filters. The following parameters are NOT supported for these categories: startPublishedDate, endPublishedDate, startCrawlDate, endCrawlDate, includeText, excludeText, excludeDomains. For people category, includeDomains only accepts LinkedIn domains. Using unsupported parameters will result in a 400 error.
userLocation (string) - The two-letter ISO country code of the user, e.g. US.
numResults (integer) - Number of results to return. Limits vary by search type: With "neural": max 100 results With "deep": max 100 results If you want to increase the num results beyond these limits, contact sales (hello@exa.ai)
includeDomains (string[]) - List of domains to include in the search. If specified, results will only come from these domains.
excludeDomains (string[]) - List of domains to exclude from search results. If specified, no results will be returned from these domains.
startCrawlDate (string<date-time>) - Crawl date refers to the date that Exa discovered a link. Results will include links that were crawled after this date. Must be specified in ISO 8601 format.
endCrawlDate (string<date-time>) - Crawl date refers to the date that Exa discovered a link. Results will include links that were crawled before this date. Must be specified in ISO 8601 format.
startPublishedDate (string<date-time>) - Only links with a published date after this will be returned. Must be specified in ISO 8601 format.
endPublishedDate (string<date-time>) - Only links with a published date before this will be returned. Must be specified in ISO 8601 format.
includeText (string[]) - List of strings that must be present in webpage text of results. Currently, only 1 string is supported, of up to 5 words.
excludeText (string[]) - List of strings that must not be present in webpage text of results. Currently, only 1 string is supported, of up to 5 words. Checks from the first 1000 words of the webpage text.
context (string) - Return page contents as a context string for LLM. When true, combines all result contents into one string. We recommend using 10000+ characters for best results, though no limit works best. Context strings often perform better than highlights for RAG applications.
moderation (boolean) - Enable content moderation to filter unsafe content from search results.
contents (object)

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/search"}'
  "query": "startups building AI coding assistants",
  "num_results": 10,
  "contents": {"text": true}
}'

Get a task

Retrieve the status and results of a previously created research task.Use the unique researchId returned from POST /research/v1 to poll until the task is finished.

Parameters:

stream (string) - Set to "true" to receive real-time updates via Server-Sent Events (SSE)
events (string) - Set to "true" to include the detailed event log of all operations performed

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/research/v1/{researchId}"}'

Find similar links

Find similar links to the link provided and optionally return the contents of the pages.

Parameters:

url* (string) - The url for which you would like to find similar links.
numResults (integer) - Number of results to return. Limits vary by search type: With "neural": max 100 results With "deep": max 100 results If you want to increase the num results beyond these limits, contact sales (hello@exa.ai)
includeDomains (string[]) - List of domains to include in the search. If specified, results will only come from these domains.
excludeDomains (string[]) - List of domains to exclude from search results. If specified, no results will be returned from these domains.
startCrawlDate (string<date-time>) - Crawl date refers to the date that Exa discovered a link. Results will include links that were crawled after this date. Must be specified in ISO 8601 format.
endCrawlDate (string<date-time>) - Crawl date refers to the date that Exa discovered a link. Results will include links that were crawled before this date. Must be specified in ISO 8601 format.
startPublishedDate (string<date-time>) - Only links with a published date after this will be returned. Must be specified in ISO 8601 format.
endPublishedDate (string<date-time>) - Only links with a published date before this will be returned. Must be specified in ISO 8601 format.
includeText (string[]) - List of strings that must be present in webpage text of results. Currently, only 1 string is supported, of up to 5 words.
excludeText (string[]) - List of strings that must not be present in webpage text of results. Currently, only 1 string is supported, of up to 5 words. Checks from the first 1000 words of the webpage text.
context (string) - Return page contents as a context string for LLM. When true, combines all result contents into one string. We recommend using 10000+ characters for best results, though no limit works best. Context strings often perform better than highlights for RAG applications.
moderation (boolean) - Enable content moderation to filter unsafe content from search results.
contents (object)

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/findSimilar"}'
  "url": "https://example.com/article",
  "num_results": 10
}'

Create a task

Create an asynchronous research task that explores the web, gathers sources, synthesizes findings, and returns results with citations.

Parameters:

instructions* (string) - Instructions for what you would like research on. A good prompt clearly defines what information you want to find, how research should be conducted, and what the output should look like.
model (enum<string>) - Research model to use. exa-research is faster and cheaper, while exa-research-pro provides more thorough analysis and stronger reasoning.
outputSchema (object) - JSON Schema to enforce structured output. When provided, the research output will be validated against this schema and returned as parsed JSON.

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/research/v1","body":{"instructions":"Research the current state of AI coding assistants"}}'

Get contents

Get the full page contents, summaries, and metadata for a list of URLs.Returns instant results from our cache, with automatic live crawling as fallback for uncached pages.

Parameters:

urls* (string[]) - Array of URLs to crawl (backwards compatible with 'ids' parameter).
ids (string[]) - Deprecated - use 'urls' instead. Array of document IDs obtained from searches.
text (string) - If true, returns full page text with default settings. If false, disables text return.
highlights (object) - Text snippets the LLM identifies as most relevant from each page.
summary (object) - Summary of the webpage
livecrawl (enum<string>) - Options for livecrawling pages.'never': Disable livecrawling (default for neural search).'fallback': Livecrawl when cache is empty.'preferred': Always try to livecrawl, but fall back to cache if crawling fails.'always': Always live-crawl, never use cache. Only use if you cannot tolerate any cached content. This option is not recommended unless consulted with the Exa team.
livecrawlTimeout (integer) - The timeout for livecrawling in milliseconds.
subpages (integer) - The number of subpages to crawl. The actual number crawled may be limited by system constraints.
subpageTarget (string) - Term to find specific subpages of search results. Can be a single string or an array of strings, comma delimited.
extras (object) - Extra parameters to pass.
context (string) - Return page contents as a context string for LLM. When true, combines all result contents into one string. We recommend using 10000+ characters for best results, though no limit works best. Context strings often perform better than highlights for RAG applications.

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/contents"}'
  "ids": ["https://example.com"],
  "text": true,
  "summary": true
}'

Use Cases

Competitive Research: Find companies similar to competitors
Content Discovery: Find related articles and resources
Market Research: Discover companies in specific niches
Fact-Finding: Get sourced answers to questions
Deep Research: Comprehensive research on complex topics

Discover More

For full endpoint details and parameters:

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/search \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"exa API endpoints"}' List all endpoints
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/details \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"exa","path":"/research"}'   # Get endpoint details