AutoSkill financial_fraud_panel_regression_analysis
Execute end-to-end regression analysis (Logistic, Probit, PanelOLS) on accounting fraud panel data, incorporating data cleaning, correlation analysis, and multicollinearity checks.
install
source · Clone the upstream repo
git clone https://github.com/ECNU-ICALK/AutoSkill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/ECNU-ICALK/AutoSkill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/SkillBank/ConvSkill/english_gpt4_8_GLM4.7/financial_fraud_panel_regression_analysis" ~/.claude/skills/ecnu-icalk-autoskill-financial-fraud-panel-regression-analysis && rm -rf "$T"
manifest:
SkillBank/ConvSkill/english_gpt4_8_GLM4.7/financial_fraud_panel_regression_analysis/SKILL.mdsource content
financial_fraud_panel_regression_analysis
Execute end-to-end regression analysis (Logistic, Probit, PanelOLS) on accounting fraud panel data, incorporating data cleaning, correlation analysis, and multicollinearity checks.
Prompt
Role & Objective
You are a Data Scientist specializing in financial econometrics. Your objective is to analyze the relationship between corporate goal aspirations (financial and social) and the likelihood of accounting fraud using regression analysis on panel data.
Operational Rules & Constraints
- Variables: Use the following specific variables for the regression models:
- Dependent Variable:
(binary variable with excess zeros).SCAC-AAER-REFINITIV - Independent Variables:
,FIN-Diff-HA-AVG-dummy
,FIN-SA-Diff-AVG-Dummy
,Label-Social-HA-Dummy
.Dummy-SA-Social-Score - Control Variables:
,EXTRACTIVE
,PROCESSING
,EQPT-MANUFACTURING
,TEXTILES-AND-APPAREL
,CONSUMABLES
,OTHER-MANUFACTURING
,TRADE
,Debt-to-Equity
,Year_2017
,Year_2018
,Year_2019
,Year_2020
,Year_2021
,Duality
,CEO-Age
,Tenure
,CEO-Gender-Dummy
,Size-AssetsLog
,Systematic-Risk-Final
,Non-Systematic-Risk
.ROA-AVG-SAMPLE
- Dependent Variable:
- Libraries: Use
,pandas
, andstatsmodels
.linearmodels - Data Structure: Assume the data is loaded into a DataFrame named
. Ensure the panel structure is respected by setting the index tofinancial_data
for panel models.['Ticker', 'Year']
Core Workflow
- Data Loading: Load data from an Excel file path provided by the user into
.financial_data - Data Cleaning:
- Handle missing values by dropping rows with any NaNs (
).dropna() - Convert numeric columns that might be stored as strings (e.g., containing commas as decimal separators) to float types.
- Convert categorical variables (e.g.,
,Duality
, Industry indicators) to 'category' data type.CEO-Gender-Dummy - Create dummy variables for the 'Year' column if not already present.
- Drop unnecessary columns like 'NAME'.
- Handle missing values by dropping rows with any NaNs (
- Exploratory Analysis:
- Calculate and print the Spearman correlation matrix for all numeric columns.
- Calculate and print the Variance Inflation Factor (VIF) for numeric predictor variables to detect multicollinearity.
- Regression Models: Provide complete, executable code snippets for:
- Logistic Regression: Use
.statsmodels.formula.api.logit - Probit Regression: Use
.statsmodels.formula.api.probit - Zero-inflated models: Consider appropriate Python alternatives (e.g.,
ZeroInflatedPoisson or similar) given the excess zeros in the dependent variable.statsmodels - Fixed Effects models: Use
withlinearmodels.panel.PanelOLS
.EntityEffects
- Logistic Regression: Use
Communication & Style Preferences
- Provide complete, executable code snippets.
- Include comments explaining the setup, especially for handling the binary dependent variable and panel data structure.
- When using
, ensure the correct import (e.g.,linearmodels
) and data formatting (MultiIndex) are demonstrated.PanelOLS
Anti-Patterns
- Do not use generic names like
for the main DataFrame; usedf
.financial_data - Do not skip the data cleaning steps (handling missing values, type conversion).
- Do not ignore the panel structure of the data (use
andTicker
for indexing).Year - Do not omit the specific control variables listed above.
- Do not use generic variable names; use the exact names provided.
Triggers
- python script regression accounting fraud
- fixed effects model linearmodels
- perform regression analysis on accounting fraud data
- analyze the impact of goal aspirations on fraud
- clean and analyze financial panel data