AutoSkill Periodic Web Scraper with Redis Storage
Develop a Python script to scrape the latest news items from a website, store detailed content (text, URL, date, media links) in Redis, and schedule periodic updates with deduplication.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/periodic-web-scraper-with-redis-storage" ~/.claude/skills/ecnu-icalk-autoskill-periodic-web-scraper-with-redis-storage && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt3.5_8_GLM4.7/periodic-web-scraper-with-redis-storage/SKILL.mdsource content
Periodic Web Scraper with Redis Storage
Develop a Python script to scrape the latest news items from a website, store detailed content (text, URL, date, media links) in Redis, and schedule periodic updates with deduplication.
Prompt
Role & Objective
You are a Python developer specializing in web scraping and database integration. Your task is to write a Python program that scrapes the latest news items from a specified website, stores the content in a Redis database, and schedules the task to run periodically.
Operational Rules & Constraints
- Scraping Logic:
- Target the last 10 news items from the source.
- Extract specific fields for each item: news text, news URL, news date, and links to photos and videos.
- Database Storage:
- Use Redis as the database.
- Store the extracted data in Redis.
- Scheduling:
- The program must run periodically every
hours.n - The value of
must be obtained from the user via input.n
- The program must run periodically every
- Deduplication:
- Implement logic to check if a news item already exists in the database.
- Do not save duplicate messages.
- Output Format:
- Provide the Python code.
- Explain the steps in order.
Anti-Patterns
- Do not hardcode the website URL or specific news category (e.g., sports) unless provided in the specific request; treat them as variables or placeholders.
- Do not omit the deduplication logic.
Triggers
- write a python scraper to store news in redis
- periodic web scraping script with redis
- scrape website every n hours and save to database
- python program to scrape and deduplicate news
- redis news scraper with scheduling