Opendirectory stargazer-deep-extractor

Advanced 5-tier OSINT scraper for extracting GitHub stargazer emails. Use this skill when a user wants to scrape, extract, or download stargazers from a GitHub repository.

install

source · Clone the upstream repo

git clone https://github.com/Varnan-Tech/opendirectory

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Varnan-Tech/opendirectory "$T" && mkdir -p ~/.claude/skills && cp -r "$T/packages/cli/skills/stargazer/stargazer-skill" ~/.claude/skills/varnan-tech-opendirectory-stargazer-deep-extractor && rm -rf "$T"

manifest: packages/cli/skills/stargazer/stargazer-skill/SKILL.md

source content

Stargazer Deep Extractor Skill

This skill provides a highly detailed, script-like workflow for an AI Agent to extract GitHub stargazers and their email addresses. It leverages the 5-tier Stargazer Deep Extractor toolkit (Profile API, PushEvents, GPG Keys, Patch Regex, and Global Search API) to maximize extraction yields while bypassing rate limits through multi-token rotation and asyncio semaphores.

Tone and Formatting Constraints

You must adopt a strictly professional, technical tone.
Do not use emojis in your responses.
Do not use em-dashes (use standard hyphens or colons instead).

Workflow Execution Steps

Step 1: Input Validation and Extraction

When a user requests a repository scrape, you must extract the exact GitHub owner and repository name.

For a URL like
```
https://github.com/openai/codex
```
, the owner is
```
openai
```
and the repository is
```
codex
```
.
If the user only provides
```
openai/codex
```
, split it by the slash.
If the repository name is not provided, you must ask the user for it.

Step 2: Interrogation and Environment Setup

If the user is running this for the first time, you must interrogate them to set up the

.env

file correctly. Ask the following questions in a point-wise format:

"What is the maximum number of stargazers you would like to scrape? By default, this will be the total star count of the repository."
"Please provide your GitHub Personal Access Tokens (PATs) as a comma-separated list."

Step 3: Token Requirement Notice

You must remind the user about GitHub's strict rate limits and recommend the optimal number of tokens based on the repository's size. Provide the following advisory notice to the user:

Less than 2,000 stars: 1 token is generally sufficient.
2,000 to 5,000 stars: 2 to 3 tokens are recommended.
More than 20,000 stars: 4 or more tokens are strongly recommended.

CRITICAL RULE: Do not hard code this token math as a mandatory constraint. You must deliver this as a notice or recommendation. If the user decides not to follow the advice and wants to proceed with fewer tokens, you must allow them to do so and proceed with whatever they provide. Let the user use whatever they want.

Step 4: Configuration Editing

After gathering the user's responses, you must configure the

.env

file in the execution directory (referencing

assets/.env.example

if needed). Ensure the following variables are written to

.env

```
GITHUB_PATS
```
: Comma-separated list of the user's tokens.
```
TARGET_OWNER
```
: The parsed repository owner.
```
TARGET_REPO
```
: The parsed repository name.
```
MAX_USERS
```
: The user's defined limit (or the total star count).
```
MAX_CONCURRENT
```
: Set to 50 for optimal performance.

Step 5: Sequence Execution

You must run the bundled scripts in the exact sequence below using the Bash tool. The JSONL checkpointing system ensures that if the process is interrupted, it can resume without losing data.

Deep Extraction: Run
```
python scripts/stargazer_deep_extractor.py
```
. This executes the 5-tier OSINT extraction.
Real-time Statistics: Run
```
python scripts/count_emails.py
```
. This analyzes the generated JSONL file (
```
{TARGET_OWNER}_{TARGET_REPO}_detailed.jsonl
```
) to count the exact number of emails successfully found vs. null emails.
CSV Conversion: Run
```
python scripts/convert_to_csv.py
```
. This converts the final JSONL data into a structured CSV file with proper
```
utf-8-sig
```
encoding for Excel compatibility.

Step 6: Final Transparency Report

After the sequence completes, you must be fully transparent with the user. Provide a final summary reporting:

The total number of people fetched.
The exact total of emails successfully extracted.
The total number of null (hidden) emails.
The absolute path to the final CSV deliverable.