Skillshub langsmith-dataset
INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.
git clone https://github.com/ComeOnOliver/skillshub
T=$(mktemp -d) && git clone --depth=1 https://github.com/ComeOnOliver/skillshub "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/Harmeet10000/skills/langsmith-dataset" ~/.claude/skills/comeonoliver-skillshub-langsmith-dataset && rm -rf "$T"
skills/Harmeet10000/skills/langsmith-dataset/SKILL.mdLANGSMITH_API_KEY=lsv2_pt_your_api_key_here # Required LANGSMITH_PROJECT=your-project-name # Check this to know which project has traces LANGSMITH_WORKSPACE_ID=your-workspace-id # Optional: for org-scoped keys
IMPORTANT: Always check the environment variables or
.env file for LANGSMITH_PROJECT before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.
Python Dependencies
pip install langsmith
JavaScript Dependencies
npm install langsmith
CLI Tool
</setup> <usage> Use the `langsmith` CLI to manage datasets and examples.curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh
Dataset Commands
- List datasets in LangSmithlangsmith dataset list
- View dataset detailslangsmith dataset get <name-or-id>
- Create a new empty datasetlangsmith dataset create --name <name>
- Delete a datasetlangsmith dataset delete <name-or-id>
- Export dataset to local JSON filelangsmith dataset export <name-or-id> <output-file>
- Upload a local JSON file as a datasetlangsmith dataset upload <file> --name <name>
Example Commands
- List examples in a datasetlangsmith example list --dataset <name>
- Add an example to a datasetlangsmith example create --dataset <name> --inputs <json>
- Delete an examplelangsmith example delete <example-id>
Experiment Commands
- List experiments for a datasetlangsmith experiment list --dataset <name>
- View experiment resultslangsmith experiment get <name>
Common Flags
- Limit number of results--limit N
- Skip confirmation prompts (use with caution)--yes
IMPORTANT - Safety Prompts:
- The CLI prompts for confirmation before destructive operations (delete, overwrite)
- If you are running with user input: ALWAYS wait for user input; NEVER use
unless the user explicitly requests it--yes - If you are running non-interactively: Use
to skip confirmation prompts </usage>--yes
<dataset_types_overview> Common evaluation dataset types:
- final_response - Full conversation with expected output. Tests complete agent behavior.
- single_step - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
- trajectory - Tool call sequence. Tests execution path (ordered list of tool names).
- rag - Question/chunks/answer/citations. Tests retrieval quality. </dataset_types_overview>
<creating_datasets>
Creating Datasets
Datasets are JSON files with an array of examples. Each example has
inputs and outputs.
From Exported Traces (Programmatic)
Export traces first, then process them into dataset format using code:
<python> ```python import json from pathlib import Path from langsmith import Client# 1. Export traces to JSONL files langsmith trace export ./traces --project my-project --limit 20 --full
client = Client()
2. Process traces into dataset examples
examples = [] for jsonl_file in Path("./traces").glob("*.jsonl"): runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")] root = next((r for r in runs if r.get("parent_run_id") is None), None) if root and root.get("inputs") and root.get("outputs"): examples.append({ "trace_id": root.get("trace_id"), "inputs": root["inputs"], "outputs": root["outputs"] })
3. Save locally
with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)
</typescript></python> <typescript> ```typescript import { Client } from "langsmith"; import { readFileSync, writeFileSync, readdirSync } from "fs"; import { join } from "path"; const client = new Client(); // 2. Process traces into dataset examples const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = []; const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl")); for (const file of files) { const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n"); const runs = lines.map(line => JSON.parse(line)); const root = runs.find(r => r.parent_run_id == null); if (root?.inputs && root?.outputs) { examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs }); } } // 3. Save locally writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));
Upload to LangSmith
# Upload local JSON file as a dataset langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"
Using the SDK Directly
<python> ```python from langsmith import Clientclient = Client()
Create dataset and add examples in one step
dataset = client.create_dataset("My Dataset", description="Evaluation dataset")
client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )
</typescript> </creating_datasets></python> <typescript> ```typescript import { Client } from "langsmith"; const client = new Client(); // Create dataset and add examples const dataset = await client.createDataset("My Dataset", { description: "Evaluation dataset", }); await client.createExamples({ inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }], outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }], datasetName: "My Dataset", });
<dataset_structures>
Dataset Structures by Type
Final Response
{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}
Single Step
{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}
Trajectory
{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}
RAG
{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}
</dataset_structures>
<script_usage>
CLI Usage
# List all datasets langsmith dataset list # Get dataset details langsmith dataset get "My Dataset" # Create an empty dataset langsmith dataset create --name "New Dataset" --description "For evaluation" # Upload a local JSON file langsmith dataset upload /tmp/dataset.json --name "My Dataset" # Export a dataset to local file langsmith dataset export "My Dataset" /tmp/exported.json --limit 100 # Delete a dataset langsmith dataset delete "My Dataset" # List examples in a dataset langsmith example list --dataset "My Dataset" --limit 10 # Add an example langsmith example create --dataset "My Dataset" \ --inputs '{"query": "test"}' \ --outputs '{"answer": "result"}' # List experiments langsmith experiment list --dataset "My Dataset" langsmith experiment get "eval-v1"
</script_usage>
<example_workflow> Complete workflow from traces to uploaded LangSmith dataset:
# 1. Export traces from LangSmith langsmith trace export ./traces --project my-project --limit 20 --full # 2. Process traces into dataset format (using Python/JS code) # See "Creating Datasets" section above # 3. Upload to LangSmith langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response" langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory" # 4. Verify upload langsmith dataset list langsmith dataset get "Skills: Final Response" langsmith example list --dataset "Skills: Final Response" --limit 3 # 5. Run experiments langsmith experiment list --dataset "Skills: Final Response"
</example_workflow>
<troubleshooting> **Dataset upload fails:** - Verify LANGSMITH_API_KEY is set - Check JSON file is valid: each element needs `inputs` (and optionally `outputs`) - Dataset name must be unique, or delete existing first with `langsmith dataset delete`Empty dataset after upload:
- Verify JSON file contains an array of objects with
keyinputs - Check file isn't empty:
langsmith example list --dataset "Name"
Export has no data:
- Ensure traces were exported with
flag to include inputs/outputs--full - Verify traces have both
andinputs
populatedoutputs
Example count mismatch:
- Use
to check remote countlangsmith dataset get "Name" - Compare with local file to verify upload completeness </troubleshooting>