Vibeship-spawner-skills sentiment-analysis-trading

id: sentiment-analysis-trading

install

source · Clone the upstream repo

git clone https://github.com/vibeforge1111/vibeship-spawner-skills

manifest: trading/sentiment-analysis-trading/skill.yaml

Social Media Sentiment Pipeline

Most social sentiment is noise. Here's how to find signal.

Twitter/X Sentiment Analysis

import pandas as pd
import numpy as np
from transformers import pipeline
from collections import defaultdict
import re

class TwitterSentimentAnalyzer:
    def __init__(self):
        # Use FinBERT for financial sentiment
        self.sentiment_model = pipeline(
            "sentiment-analysis",
            model="ProsusAI/finbert"
        )

    def clean_tweet(self, text: str) -> str:
        """Clean tweet for analysis."""
        # Remove cashtags temporarily to avoid confusion
        text = re.sub(r'\$[A-Za-z]+', '', text)
        # Remove URLs
        text = re.sub(r'http\S+', '', text)
        # Remove mentions
        text = re.sub(r'@\w+', '', text)
        # Remove emojis (optional - they carry signal)
        # text = re.sub(r'[^\x00-\x7F]+', '', text)
        return text.strip()

    def analyze_tweet(self, tweet: dict) -> dict:
        """Analyze a single tweet."""
        text = self.clean_tweet(tweet['text'])

        if len(text) < 10:
            return None

        # Get sentiment
        result = self.sentiment_model(text[:512])[0]

        # Weight by engagement
        engagement_score = (
            tweet.get('likes', 0) * 1 +
            tweet.get('retweets', 0) * 2 +
            tweet.get('replies', 0) * 0.5
        )

        # Weight by account quality
        account_score = min(tweet.get('followers', 0) / 10000, 10)

        return {
            'text': text,
            'sentiment': result['label'],
            'confidence': result['score'],
            'engagement': engagement_score,
            'account_weight': account_score,
            'weighted_sentiment': (
                (1 if result['label'] == 'positive' else -1) *
                result['score'] *
                np.log1p(engagement_score) *
                np.log1p(account_score)
            )
        }

    def aggregate_sentiment(
        self,
        tweets: list,
        time_window_minutes: int = 60
    ) -> dict:
        """Aggregate tweets into sentiment score."""
        analyzed = [self.analyze_tweet(t) for t in tweets]
        analyzed = [a for a in analyzed if a is not None]

        if not analyzed:
            return {'score': 0, 'confidence': 0, 'count': 0}

        # Aggregate weighted sentiments
        total_weight = sum(abs(a['weighted_sentiment']) for a in analyzed)
        weighted_score = sum(a['weighted_sentiment'] for a in analyzed)

        normalized_score = weighted_score / total_weight if total_weight > 0 else 0

        return {
            'score': normalized_score,  # -1 to 1
            'confidence': np.mean([a['confidence'] for a in analyzed]),
            'count': len(analyzed),
            'positive_pct': sum(1 for a in analyzed if a['sentiment'] == 'positive') / len(analyzed),
            'engagement_total': sum(a['engagement'] for a in analyzed)
        }

Reddit Sentiment (WSB, Crypto subs)

def analyze_reddit_post(post: dict) -> dict:
    """Analyze Reddit post with thread context."""
    # Post sentiment
    post_sentiment = analyze_text(post['title'] + ' ' + post.get('selftext', ''))

    # Weight by Reddit-specific metrics
    upvote_ratio = post.get('upvote_ratio', 0.5)
    score = post.get('score', 0)
    awards = post.get('total_awards_received', 0)

    # Comments sentiment (often contrarian to post)
    comment_sentiments = []
    for comment in post.get('comments', [])[:20]:  # Top 20 comments
        if comment.get('score', 0) > 5:  # Only scored comments
            comment_sentiments.append(analyze_text(comment['body']))

    avg_comment_sentiment = np.mean(comment_sentiments) if comment_sentiments else 0

    return {
        'post_sentiment': post_sentiment,
        'comment_sentiment': avg_comment_sentiment,
        'consensus': post_sentiment * avg_comment_sentiment > 0,  # Same direction
        'controversy': upvote_ratio < 0.7,
        'weight': np.log1p(score) * (1 + awards * 0.1)
    }

Signal Quality Filters

Filter	Why	Threshold
Account age	Bots are new	> 30 days
Follower ratio	Quality accounts	> 0.1
Tweet frequency	Spam detection	< 50/day
Engagement rate	Real impact	> 0.5%
success_rate: "Social sentiment IC typically 0.01-0.03 (weak but tradeable at scale)"

News Sentiment Analysis

News alpha decays in seconds. Speed and accuracy matter.

Real-Time News Processing

import pandas as pd
from datetime import datetime, timedelta
from transformers import pipeline
import re

class NewsAnalyzer:
    def __init__(self):
        self.sentiment_model = pipeline(
            "sentiment-analysis",
            model="ProsusAI/finbert"
        )

    def categorize_news(self, headline: str) -> dict:
        """Categorize news type and expected impact."""
        headline_lower = headline.lower()

        categories = {
            'earnings': ['earnings', 'eps', 'revenue', 'profit', 'guidance'],
            'deal': ['acquire', 'merger', 'buyout', 'deal', 'takeover'],
            'product': ['launch', 'announce', 'release', 'unveil'],
            'legal': ['lawsuit', 'sec', 'investigation', 'fine', 'settle'],
            'management': ['ceo', 'resign', 'appoint', 'departure'],
            'macro': ['fed', 'rates', 'inflation', 'gdp', 'jobs']
        }

        detected = []
        for cat, keywords in categories.items():
            if any(kw in headline_lower for kw in keywords):
                detected.append(cat)

        # Expected impact magnitude by category
        impact_magnitude = {
            'earnings': 0.03,  # 3% expected move
            'deal': 0.10,      # 10%+ for M&A
            'product': 0.02,
            'legal': 0.05,
            'management': 0.02,
            'macro': 0.01
        }

        max_impact = max(
            (impact_magnitude.get(c, 0.01) for c in detected),
            default=0.01
        )

        return {
            'categories': detected,
            'expected_impact': max_impact,
            'is_material': max_impact > 0.02
        }

    def analyze_headline(self, headline: str, timestamp: datetime) -> dict:
        """Full headline analysis."""
        # Sentiment
        sentiment = self.sentiment_model(headline[:512])[0]

        # Category
        category = self.categorize_news(headline)

        # Timing assessment
        market_open = datetime.now().replace(hour=9, minute=30)
        market_close = datetime.now().replace(hour=16, minute=0)

        is_market_hours = market_open <= timestamp <= market_close
        is_premarket = timestamp < market_open and timestamp.date() == market_open.date()

        return {
            'headline': headline,
            'sentiment': sentiment['label'],
            'sentiment_score': (
                sentiment['score'] if sentiment['label'] == 'positive'
                else -sentiment['score']
            ),
            'confidence': sentiment['score'],
            'categories': category['categories'],
            'expected_impact': category['expected_impact'],
            'is_material': category['is_material'],
            'market_hours': is_market_hours,
            'premarket': is_premarket,
            'urgency': 'high' if is_market_hours and category['is_material'] else 'normal'
        }

    def detect_news_cluster(
        self,
        headlines: list,
        time_window_minutes: int = 5
    ) -> dict:
        """Detect if multiple sources reporting same event."""
        # If multiple sources = higher confidence
        unique_sources = len(set(h.get('source') for h in headlines))

        # Same story multiple times = confirmed
        is_cluster = len(headlines) >= 2 and unique_sources >= 2

        return {
            'is_cluster': is_cluster,
            'source_count': unique_sources,
            'headline_count': len(headlines),
            'confidence_boost': 1.0 + (unique_sources - 1) * 0.2
        }

News Alpha Decay

Time After News	Remaining Alpha	Action
0-30 seconds	100%	Only if automated
30s - 2 min	50%	Aggressive only
2 - 5 min	20%	Fade opportunity
5+ min	~0%	Price has absorbed
success_rate: "Automated news trading can capture 10-30% of move if fast enough"

name: On-Chain Analytics description: Extract signals from blockchain data detection: "on.*chain|whale|exchange.*flow|wallet" guidance: |

On-Chain Analytics for Trading

Blockchain is transparent, but interpretation is not.

Exchange Flow Analysis

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

class OnChainAnalyzer:
    def __init__(self, data_provider):
        self.provider = data_provider

    def analyze_exchange_flows(
        self,
        token: str,
        hours: int = 24
    ) -> dict:
        """Analyze exchange inflows/outflows."""
        flows = self.provider.get_exchange_flows(token, hours)

        inflows = flows[flows['direction'] == 'in']
        outflows = flows[flows['direction'] == 'out']

        net_flow = outflows['value'].sum() - inflows['value'].sum()
        net_flow_pct = net_flow / self.provider.get_circulating_supply(token)

        # Historical comparison
        avg_net_flow = self.provider.get_avg_net_flow(token, days=30)
        flow_zscore = (net_flow - avg_net_flow) / flows['value'].std()

        return {
            'net_flow': net_flow,
            'net_flow_pct': net_flow_pct,
            'flow_zscore': flow_zscore,
            'inflow_count': len(inflows),
            'outflow_count': len(outflows),
            'interpretation': (
                'accumulation' if net_flow_pct > 0.01 else
                'distribution' if net_flow_pct < -0.01 else
                'neutral'
            ),
            'signal_strength': abs(flow_zscore) if abs(flow_zscore) > 2 else 0
        }

    def whale_wallet_tracking(
        self,
        token: str,
        min_value_usd: float = 1_000_000
    ) -> list:
        """Track large wallet movements."""
        transactions = self.provider.get_large_transactions(
            token, min_value_usd
        )

        whale_moves = []

        for tx in transactions:
            # Determine if whale is buying or selling
            from_is_exchange = self.provider.is_exchange(tx['from'])
            to_is_exchange = self.provider.is_exchange(tx['to'])

            if from_is_exchange and not to_is_exchange:
                action = 'withdrawal'  # Bullish
            elif to_is_exchange and not from_is_exchange:
                action = 'deposit'  # Bearish
            else:
                action = 'transfer'  # Neutral

            whale_moves.append({
                'value_usd': tx['value_usd'],
                'action': action,
                'timestamp': tx['timestamp'],
                'from_label': self.provider.get_label(tx['from']),
                'to_label': self.provider.get_label(tx['to'])
            })

        return whale_moves

    def stablecoin_supply_analysis(self) -> dict:
        """Analyze stablecoin supply on exchanges."""
        # High stablecoin on exchanges = dry powder to buy
        stables = ['USDT', 'USDC', 'DAI', 'BUSD']

        exchange_stables = 0
        for stable in stables:
            exchange_stables += self.provider.get_exchange_balance(stable)

        # Compare to historical
        avg_exchange_stables = self.provider.get_avg_exchange_stables(days=30)
        stable_ratio = exchange_stables / avg_exchange_stables

        return {
            'exchange_stables_usd': exchange_stables,
            'vs_30d_avg': stable_ratio,
            'interpretation': (
                'high_buying_power' if stable_ratio > 1.1 else
                'low_buying_power' if stable_ratio < 0.9 else
                'normal'
            )
        }

On-Chain Signal Reliability

Signal	Reliability	Why
Exchange outflows	Moderate	Can be moved to CEX cold wallet
Whale alerts	Low	Multi-wallet obfuscation
Stablecoin supply	Moderate	Lagging indicator
Active addresses	Low	Sybil attacks easy
Hash rate	Moderate	For mining-based tokens
success_rate: "On-chain signals typically 5-10% hit rate improvement over random"

Positioning Data Analysis

What are others positioned for? Trade with or against the crowd.

Options Flow Analysis

import pandas as pd
import numpy as np

class OptionsFlowAnalyzer:
    def analyze_flow(self, flow_data: pd.DataFrame) -> dict:
        """Analyze options order flow for signals."""
        # Separate calls and puts
        calls = flow_data[flow_data['type'] == 'call']
        puts = flow_data[flow_data['type'] == 'put']

        # Calculate put/call ratio
        call_premium = calls['premium'].sum()
        put_premium = puts['premium'].sum()
        pc_ratio = put_premium / call_premium if call_premium > 0 else 1

        # Unusual activity detection
        avg_volume = flow_data['volume'].mean()
        unusual = flow_data[flow_data['volume'] > avg_volume * 3]

        # Sweep detection (aggressive buying)
        sweeps = flow_data[
            (flow_data['execution'] == 'sweep') &
            (flow_data['premium'] > 100_000)
        ]

        # Sentiment from flow
        bullish_premium = (
            calls[calls['side'] == 'buy']['premium'].sum() +
            puts[puts['side'] == 'sell']['premium'].sum()
        )
        bearish_premium = (
            puts[puts['side'] == 'buy']['premium'].sum() +
            calls[calls['side'] == 'sell']['premium'].sum()
        )

        net_sentiment = (bullish_premium - bearish_premium) / (bullish_premium + bearish_premium)

        return {
            'put_call_ratio': pc_ratio,
            'net_sentiment': net_sentiment,  # -1 to 1
            'unusual_activity_count': len(unusual),
            'sweep_count': len(sweeps),
            'interpretation': (
                'bullish' if net_sentiment > 0.2 else
                'bearish' if net_sentiment < -0.2 else
                'neutral'
            ),
            'unusual_notable': [
                {
                    'strike': u['strike'],
                    'expiry': u['expiry'],
                    'type': u['type'],
                    'premium': u['premium']
                }
                for _, u in unusual.head(5).iterrows()
            ]
        }

class FundingRateAnalyzer:
    def analyze_funding(
        self,
        funding_history: pd.DataFrame,
        current_rate: float
    ) -> dict:
        """Analyze perpetual funding rates."""
        # Funding rate interpretation:
        # Positive = Longs pay shorts (market bullish)
        # Negative = Shorts pay longs (market bearish)

        avg_rate = funding_history['rate'].mean()
        std_rate = funding_history['rate'].std()
        zscore = (current_rate - avg_rate) / std_rate

        # Extreme funding often precedes reversals
        is_extreme = abs(zscore) > 2

        # Cumulative funding cost
        daily_rate = current_rate * 3  # 3 funding periods per day
        annual_rate = daily_rate * 365

        return {
            'current_rate': current_rate,
            'daily_rate': daily_rate,
            'annual_rate': annual_rate,
            'zscore': zscore,
            'is_extreme': is_extreme,
            'crowd_position': 'long' if current_rate > 0 else 'short',
            'contrarian_signal': (
                'consider short' if zscore > 2 else
                'consider long' if zscore < -2 else
                'no signal'
            )
        }

COT Report Analysis

def analyze_cot(cot_data: pd.DataFrame, asset: str) -> dict:
    """Analyze Commitment of Traders data."""
    latest = cot_data.iloc[-1]
    prev_week = cot_data.iloc[-2]

    # Commercials are often "smart money"
    commercial_net = latest['commercial_long'] - latest['commercial_short']
    commercial_change = commercial_net - (
        prev_week['commercial_long'] - prev_week['commercial_short']
    )

    # Large speculators (hedge funds)
    spec_net = latest['large_spec_long'] - latest['large_spec_short']

    # Small speculators (retail) - often wrong
    retail_net = latest['small_spec_long'] - latest['small_spec_short']

    # Historical percentile
    commercial_percentile = (
        (cot_data['commercial_long'] - cot_data['commercial_short']) < commercial_net
    ).mean()

    return {
        'commercial_net': commercial_net,
        'commercial_change': commercial_change,
        'commercial_percentile': commercial_percentile,
        'speculator_net': spec_net,
        'retail_net': retail_net,
        'interpretation': (
            'commercial_bullish' if commercial_percentile > 0.8 else
            'commercial_bearish' if commercial_percentile < 0.2 else
            'neutral'
        )
    }

success_rate: "Extreme positioning signals have 55-60% directional accuracy"

name: Sentiment Indicator Construction description: Build composite sentiment indicators from multiple sources detection: "fear.*greed|composite|indicator|sentiment.*index" guidance: |

Composite Sentiment Indicator

Single signals are weak. Combine for robustness.

Multi-Source Sentiment Index

import pandas as pd
import numpy as np
from scipy import stats

class CompositeSentimentIndex:
    def __init__(self):
        self.component_weights = {
            'social_sentiment': 0.15,
            'news_sentiment': 0.15,
            'options_flow': 0.20,
            'funding_rate': 0.15,
            'exchange_flow': 0.15,
            'fear_greed_index': 0.10,
            'put_call_ratio': 0.10
        }

    def normalize_component(
        self,
        value: float,
        history: pd.Series
    ) -> float:
        """Normalize to z-score, then to 0-100 scale."""
        zscore = (value - history.mean()) / history.std()
        # Clip extreme values
        zscore = np.clip(zscore, -3, 3)
        # Convert to 0-100
        normalized = (zscore + 3) / 6 * 100
        return normalized

    def calculate_composite(
        self,
        components: dict,
        component_histories: dict
    ) -> dict:
        """Calculate composite sentiment index."""
        normalized = {}

        for name, value in components.items():
            if name in component_histories:
                normalized[name] = self.normalize_component(
                    value,
                    component_histories[name]
                )

        # Weighted average
        weighted_sum = 0
        total_weight = 0

        for name, norm_value in normalized.items():
            weight = self.component_weights.get(name, 0)
            weighted_sum += norm_value * weight
            total_weight += weight

        composite = weighted_sum / total_weight if total_weight > 0 else 50

        # Interpretation
        if composite > 80:
            interpretation = 'extreme_greed'
            contrarian_signal = 'bearish'
        elif composite > 60:
            interpretation = 'greed'
            contrarian_signal = 'neutral'
        elif composite > 40:
            interpretation = 'neutral'
            contrarian_signal = 'neutral'
        elif composite > 20:
            interpretation = 'fear'
            contrarian_signal = 'neutral'
        else:
            interpretation = 'extreme_fear'
            contrarian_signal = 'bullish'

        return {
            'composite_score': composite,
            'interpretation': interpretation,
            'contrarian_signal': contrarian_signal,
            'component_scores': normalized,
            'strongest_signal': max(
                normalized.items(),
                key=lambda x: abs(x[1] - 50)
            )[0]
        }

    def calculate_signal_strength(
        self,
        composite: float,
        lookback_days: int = 30
    ) -> dict:
        """Determine if composite is at actionable levels."""
        # Only trade at extremes
        if composite > 85 or composite < 15:
            strength = 'strong'
        elif composite > 75 or composite < 25:
            strength = 'moderate'
        else:
            strength = 'weak'

        return {
            'signal_strength': strength,
            'is_actionable': strength in ['strong', 'moderate'],
            'direction': 'short' if composite > 75 else 'long' if composite < 25 else 'none'
        }

Component Correlation Check

def check_component_divergence(components: dict) -> dict:
    """Check if components are giving conflicting signals."""
    values = list(components.values())

    # Check if all pointing same direction
    all_positive = all(v > 50 for v in values)
    all_negative = all(v < 50 for v in values)

    # Standard deviation of components
    component_std = np.std(values)

    return {
        'consensus': all_positive or all_negative,
        'divergence': component_std,
        'high_divergence': component_std > 20,
        'recommendation': (
            'high_confidence' if component_std < 10 else
            'moderate_confidence' if component_std < 20 else
            'conflicting_signals'
        )
    }

success_rate: "Composite indicators at extremes have 60-65% directional accuracy"

anti_patterns:

name: Following Whale Alerts Blindly description: Trading based on whale transaction alerts detection: "whale.*alert|large.*transaction|whale.*buy" why_harmful: | Whales use hundreds of wallets. The "whale buy" you see might be moving to an exchange to sell. Or moving between their own wallets. The signal is almost always noise or misleading. what_to_do: | Never trade on single wallet movements. Aggregate exchange flows over hours/days, not individual transactions. Verify "known" whale wallets are actually single entities. Most importantly, backtest before trading.
name: Real-Time News Trading Without Automation description: Manually trading on news you read detection: "read.*news|news.*alert|breaking" why_harmful: | By the time you read the headline, algos have already traded it. You're buying from market makers who've adjusted prices. Human reaction time (seconds) can't compete with automated systems (milliseconds). what_to_do: | Either automate news trading (and accept it's very competitive) or trade the second-order effects (how will this affect earnings next quarter?). Never market order on a headline you just read.
name: Treating Fear/Greed as Trading Signal description: Using fear/greed index for timing detection: "fear.*greed|cnn.*index" why_harmful: | Fear/Greed is a lagging composite of other indicators. It's designed for entertainment, not trading. Extreme readings can persist for weeks. "Extreme fear" can go to "more extreme fear." what_to_do: | Build your own composite with faster-moving components. Use extremes as confirmation, not primary signal. Never enter a position solely because "Fear/Greed is at 10" or "at 90."
name: Sentiment-Only Trading description: Trading only on sentiment without price confirmation detection: "sentiment.*bullish|sentiment.*bearish" why_harmful: | Sentiment can stay extreme while price continues moving. "The market can stay irrational longer than you can stay solvent." Sentiment indicates crowd positioning, not imminent reversal. what_to_do: | Use sentiment as a filter or secondary confirmation, not primary signal. Require price structure (technical) to confirm sentiment extremes. Have defined invalidation levels.
name: Trusting Social Media Signals description: Trading based on Twitter/Reddit consensus detection: "twitter.*bullish|reddit.*wsb|discord" why_harmful: | Social media is full of paid promotions, pump groups, and people talking their book. The "consensus" you see is often manufactured. WSB has been co-opted by pump schemes. what_to_do: | Filter aggressively for account quality. Look for sentiment changes, not absolute levels. Be skeptical of sudden volume in discussions. Backtest extensively before trusting any social signal.

handoffs:

trigger: "backtest|validate|statistical" to: quantitative-research context: "Validate sentiment signal statistically" provides:
- Processed sentiment time series
- Historical signal data
- Initial IC estimates
trigger: "risk|position|sizing" to: risk-management-trading context: "Apply risk management to sentiment trade" provides:
- Sentiment signal strength
- Confidence level
- Expected holding period
trigger: "execute|timing|entry" to: execution-algorithms context: "Execute on sentiment signal" provides:
- Signal urgency
- Volatility context
- Optimal execution window
trigger: "technical|chart|level" to: technical-analysis context: "Combine sentiment with technical levels" provides:
- Sentiment direction
- Crowd positioning
- Confirmation status