Awesome-omni-skill data-privacy
Ensure data privacy compliance covering GDPR obligations, user consent management, data retention policies, PII detection, and data anonymisation with realistic synthetic data
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/data-privacy" ~/.claude/skills/diegosouzapw-awesome-omni-skill-data-privacy && rm -rf "$T"
skills/data-ai/data-privacy/SKILL.mdCritical rules
- ALWAYS use British English in output, comments, and documentation
- ALWAYS verify that a valid legal basis (Article 6 GDPR) exists for every personal data processing operation
- ALWAYS verify that explicit consent is collected before processing data that relies on consent as legal basis
- ALWAYS verify that consent is freely given, specific, informed, and unambiguous (Article 7 GDPR)
- ALWAYS verify that data retention periods are defined, documented, and enforced for every data category
- ALWAYS detect ALL categories of PII before generating replacements
- ALWAYS replace PII with realistic synthetic data that preserves the same format and structure
- ALWAYS ensure synthetic replacements are deterministic within a single document (same input entity → same synthetic output across all occurrences)
- ALWAYS preserve the semantic structure and readability of the original text
- ALWAYS flag special category data (Article 9 GDPR) with elevated severity
- ALWAYS report confidence level for each detection (HIGH, MEDIUM, LOW)
- NEVER approve a design that collects personal data without a documented legal basis
- NEVER approve pre-ticked consent checkboxes or bundled consent
- NEVER approve indefinite data retention ("keep forever") without explicit legal justification
- NEVER produce synthetic data that accidentally matches real individuals, organisations, or addresses
- NEVER leave partial PII visible (e.g., masking only part of a name but leaving the surname)
- NEVER use obviously fake placeholders like "REDACTED", "XXX", or "John Doe" — generate diverse, realistic synthetic identities
- NEVER store or log the mapping between real and synthetic data unless explicitly requested
- NEVER modify non-PII content; only PII entities are replaced
Workflow
Follow these steps in order when performing a data privacy review.
Step 1: Determine scope
Identify what kind of review is needed:
What is the user requesting? ├── Full data privacy review → Proceed with all sections (GDPR, consent, retention, PII) ├── GDPR compliance check → Focus on legal basis, consent, retention ├── Consent review → Focus on consent management ├── Retention review → Focus on data retention ├── PII detection only → Focus on PII detection (detect, report, no replacement) ├── PII anonymisation → Focus on PII detection + replace with synthetic data └── Mixed → Combine relevant sections
Step 2: GDPR compliance assessment
For each data processing operation identified:
- Identify what personal data is collected
- Verify a legal basis exists (Article 6)
- Check if special category data is involved (Article 9)
- Verify data subject rights are supported (Articles 13-22)
- Check data protection by design measures (Article 25)
- Verify Data Processing Impact Assessment exists for high-risk processing (Article 35)
Step 3: Consent validation
If consent is used as legal basis:
- Run through the Consent Checklist
- Verify consent architecture matches an approved pattern (A, B, or C)
- Check for consent anti-patterns — flag any found as CRITICAL
- Verify consent withdrawal mechanism exists and works
Step 4: Retention assessment
For each data category:
- Run through the Retention Checklist
- Apply the Retention Decision Tree
- Verify automated deletion mechanisms exist
- Flag any data category with undefined or indefinite retention as CRITICAL
Step 5: PII detection (if applicable)
Scan the text/code and tag every PII entity found:
- Apply Tier 1 patterns (direct identifiers) — HIGH priority
- Apply Tier 2 patterns (indirect identifiers) — MEDIUM priority
- Apply Tier 3 patterns (quasi-identifiers) — LOW priority
- Apply Special Category patterns — CRITICAL priority
- For each detection, record:
- Entity text (exact match)
- Category and tier
- Position in text (line/offset)
- Confidence level (HIGH, MEDIUM, LOW)
- All occurrences of the same entity
Step 6: Build entity map (if anonymisation requested)
Create a mapping of all unique entities to synthetic replacements:
Entity Map: "María García López" → "Laura Fernández Ruiz" [name, Tier 1, HIGH confidence] "maria.garcia@empresa.es" → "laura.fernandez@example.net" [email, Tier 1, HIGH confidence] "+34 612 345 678" → "+34 698 721 043" [phone, Tier 1, HIGH confidence] "Calle Mayor 5, 28001" → "Avenida de la Paz 12, 28003" [address, Tier 2, MEDIUM confidence]
Ensure cross-field coherence:
- Email derives from synthetic name
- Phone matches same country
- Address matches same region
Step 7: Apply replacements (if anonymisation requested)
- Start with longest matches first (avoid partial replacements)
- Replace all occurrences of each entity with its synthetic counterpart
- Verify no original PII remains after replacement
- Verify no synthetic data accidentally matches other real entities in the text
Step 8: Validate output
For anonymisation:
Is any original PII still present? ├── YES → Fix missed replacements, return to Step 7 └── NO → Continue Does the text remain readable and coherent? ├── YES → Continue └── NO → Adjust synthetic data for better readability Are there any synthetic collisions with real data? ├── YES → Regenerate colliding synthetic values └── NO → Output is ready
Step 9: Report findings
Present findings ordered by severity (critical, high, medium, low) with GDPR article references and remediation advice.
GDPR Compliance Framework
What is personal data? (Article 4 GDPR)
Personal data means any information relating to an identified or identifiable natural person ("data subject"). An identifiable natural person is one who can be identified, directly or indirectly, by reference to an identifier such as:
- A name
- An identification number
- Location data
- An online identifier
- One or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person
Special category data (Article 9 GDPR)
Processing of special category data is prohibited unless explicit consent or legal basis exists:
- Racial or ethnic origin
- Political opinions
- Religious or philosophical beliefs
- Trade union membership
- Genetic data
- Biometric data (for identification purposes)
- Health data
- Sex life or sexual orientation data
Data protection principles (Article 5 GDPR)
All data processing must comply with these principles:
- Lawfulness, fairness, and transparency — Data must be processed lawfully, fairly, and transparently
- Purpose limitation — Data must be collected for specified, explicit, and legitimate purposes
- Data minimisation — Only process data that is necessary for the stated purpose
- Accuracy — Data must be accurate and kept up to date
- Storage limitation — Data must not be kept longer than necessary
- Integrity and confidentiality — Data must be protected against unauthorised access, loss, or destruction
- Accountability — The controller must demonstrate compliance with all principles
Legal bases for processing (Article 6 GDPR)
Processing is lawful only if at least one of the following applies:
| Legal Basis | When to Use | Example |
|---|---|---|
| Consent (Art. 6(1)(a)) | User freely agrees to processing for a specific purpose | Marketing emails, analytics cookies |
| Contract (Art. 6(1)(b)) | Processing is necessary to perform a contract with the data subject | Processing an order, delivering a service |
| Legal obligation (Art. 6(1)(c)) | Processing is required by law | Tax records, anti-money laundering checks |
| Vital interests (Art. 6(1)(d)) | Processing protects someone's life | Emergency medical situations |
| Public task (Art. 6(1)(e)) | Processing is necessary for a task in the public interest | Public health reporting |
| Legitimate interests (Art. 6(1)(f)) | Processing is necessary for legitimate interests, balanced against data subject rights | Fraud prevention, network security |
Data subject rights
The system must support exercising these rights:
| Right | GDPR Article | Implementation Requirement |
|---|---|---|
| Right to be informed | Art. 13-14 | Clear privacy notice at point of collection |
| Right of access | Art. 15 | Data export/download within 30 days |
| Right to rectification | Art. 16 | Edit personal data via self-service or request |
| Right to erasure ("right to be forgotten") | Art. 17 | Delete personal data on request (with exceptions) |
| Right to restrict processing | Art. 18 | Pause processing while disputes are resolved |
| Right to data portability | Art. 20 | Export data in machine-readable format (JSON, CSV) |
| Right to object | Art. 21 | Opt out of processing based on legitimate interests |
| Rights related to automated decision-making | Art. 22 | Human review for automated decisions with significant effects |
Data protection by design and by default (Article 25 GDPR)
Every system must implement:
- Privacy by design — Build privacy into the architecture from the start, not as an afterthought
- Privacy by default — Only process data that is strictly necessary; default settings must be the most privacy-friendly
- Technical measures — Anonymisation, pseudonymisation, encryption, access controls
- Organisational measures — Policies, training, DPIAs, data processing agreements
Consent Management
Consent requirements (Article 7 GDPR)
Valid consent must be:
- Freely given — No detriment for refusing; no power imbalance exploitation
- Specific — Separate consent for each distinct processing purpose
- Informed — Clear explanation of what data is collected, why, and by whom
- Unambiguous — Requires a clear affirmative action (opt-in, not opt-out)
Consent checklist
When reviewing a system for consent compliance, verify:
| # | Check |
|---|---|
| 1 | Consent is collected via an affirmative action (checkbox, toggle, click) — never pre-ticked |
| 2 | Each processing purpose has its own separate consent request |
| 3 | The consent request clearly states: what data, why, who processes it, how long |
| 4 | Users can withdraw consent as easily as they gave it (same number of clicks) |
| 5 | Consent withdrawal actually stops the associated processing |
| 6 | Consent records are stored with timestamp, user ID, version, and scope |
| 7 | Consent is re-requested when the processing purpose changes |
| 8 | Minor users (under 16, or under 13 in some jurisdictions) have parental consent |
| 9 | Consent is not bundled with terms of service acceptance |
| 10 | Users can view and manage their active consents in a self-service dashboard |
Consent architecture patterns
Pattern A: Granular consent with preference centre
User registers → Present granular consent options: [ ] Marketing emails [ ] Third-party data sharing [ ] Analytics and personalisation [ ] Location-based services → Store each consent separately with metadata: { purpose, granted: true/false, timestamp, version, method } → Provide "Manage Preferences" dashboard: - View all active consents - Toggle each on/off - Changes take effect immediately
Pattern B: Just-in-time consent
User reaches feature requiring additional data → Display contextual consent prompt: "To enable [feature], we need to collect [data]. This will be used for [purpose] and stored for [duration]. [Allow] [Decline]" → If declined, gracefully degrade feature → Store consent decision with context
Pattern C: Cookie/tracking consent banner
First visit → Display non-blocking consent banner: "We use cookies for [purposes]. [Accept All] [Reject All] [Manage Preferences]" → No tracking scripts loaded until consent is given → Preferences stored in first-party cookie → Re-prompt annually or when categories change
Consent anti-patterns (MUST REJECT)
- ❌ Pre-ticked consent checkboxes
- ❌ "By continuing to use this site, you consent to..."
- ❌ Consent buried in terms of service
- ❌ No way to withdraw consent
- ❌ Withdrawal harder than granting (e.g., grant = 1 click, withdraw = email support)
- ❌ Bundled consent (single checkbox for multiple unrelated purposes)
- ❌ Ignoring consent withdrawal (processing continues after opt-out)
- ❌ No consent records (cannot demonstrate when/how consent was obtained)
Data Retention
Retention principles
- Define retention periods for every data category before collection begins
- Document the legal basis for each retention period (regulatory requirement, contract, legitimate interest)
- Automate deletion — manual deletion processes are error-prone and non-compliant
- Log retention actions — record when data was created, when it expires, and when it was deleted
- Apply the minimum principle — if you don't need it, don't keep it
Retention policy template
Every data category must have a documented retention policy:
| Data Category | Retention Period | Legal Basis | Deletion Method | Review Cycle |
|---|---|---|---|---|
| User account data | Duration of contract + 30 days | Art. 6(1)(b) Contract | Hard delete from DB, remove backups within 90 days | Annual |
| Transaction records | 7 years from transaction date | Art. 6(1)(c) Legal obligation (tax law) | Archive → purge after 7 years | Annual |
| Marketing consent records | 3 years from last interaction | Art. 6(1)(a) Consent | Soft delete, then hard delete after 90 days | Biannual |
| Session/access logs | 90 days | Art. 6(1)(f) Legitimate interests (security) | Auto-purge via log rotation | Quarterly |
| Support tickets | 2 years from resolution | Art. 6(1)(b) Contract | Anonymise PII, keep metadata | Annual |
| Analytics data | 26 months from collection | Art. 6(1)(a) Consent | Auto-purge or aggregate/anonymise | Annual |
| Backup data | Same as source data + 90 days | Mirrors source legal basis | Rotate backups, verify purge | Quarterly |
Retention checklist
| # | Check |
|---|---|
| 1 | Every data category has a defined retention period |
| 2 | Retention periods have a documented legal basis |
| 3 | Automated deletion/anonymisation is implemented |
| 4 | Deletion includes backups and replicas (within defined grace period) |
| 5 | Retention schedule is reviewed at the documented review cycle |
| 6 | Data subject erasure requests are fulfilled within 30 days |
| 7 | Audit trail exists for all retention actions (creation, expiry, deletion) |
| 8 | "Right to be forgotten" requests cascade to third-party processors |
| 9 | No data category has "indefinite" retention without legal justification |
| 10 | Retention exceptions (legal hold, active investigation) are documented |
Data lifecycle management
Collection → Processing → Storage → Retention → Deletion/Anonymisation ↓ ↓ ↓ ↓ ↓ Consent Purpose Encryption Schedule Hard delete recorded validated at rest enforced or anonymise
PII Detection Categories
Tier 1: Direct identifiers (HIGH severity)
These directly identify a natural person:
| Category | Examples | Detection Patterns |
|---|---|---|
| Full name | "María García López", "John Smith" | Two or more capitalised words in name context; salutation + word patterns |
| Email address | "user@example.com" | RFC 5322 pattern: |
| Phone number | "+34 612 345 678", "(555) 123-4567" | International/national phone patterns with optional country codes |
| National ID | "12345678Z" (DNI), "AB123456C" (NI number) | Country-specific ID patterns |
| Passport number | "PAB123456" | Alphanumeric patterns in passport context |
| Social security number | "123-45-6789" (US SSN) | Country-specific SSN patterns |
| Credit/debit card number | "4111 1111 1111 1111" | Luhn-valid 13-19 digit sequences |
| IBAN / Bank account | "ES91 2100 0418 4502 0005 1332" | Country-prefix + check digit + BBAN pattern |
Tier 2: Indirect identifiers (MEDIUM severity)
These may identify a person when combined:
| Category | Examples | Detection Patterns |
|---|---|---|
| Postal address | "Calle Mayor 5, 28001 Madrid" | Street patterns + postcode + city combinations |
| Date of birth | "15/03/1990", "March 15, 1990" | Date patterns in birth/DOB context |
| Place of birth | "Born in Sevilla" | Location in birth context |
| Vehicle registration | "1234 BCD" (ES), "AB12 CDE" (UK) | Country-specific plate patterns |
| IP address | "192.168.1.1", "2001:0db8::1" | IPv4 and IPv6 patterns |
| Geolocation | "40.4168, -3.7038" | Coordinate pairs (lat/lon decimal) |
| Employee/student ID | "EMP-20230045" | Prefixed sequential identifiers |
Tier 3: Quasi-identifiers (LOW severity)
These rarely identify alone but contribute to re-identification:
| Category | Examples | Detection Patterns |
|---|---|---|
| Age | "35 years old" | Number + age-related keywords |
| Gender | "male", "female", "non-binary" | Gender keywords in personal context |
| Job title | "Senior Software Engineer at Acme" | Title + organisation patterns |
| Nationality | "Spanish", "British" | Nationality adjectives in personal context |
| Organisation name | "works at Telefónica" | Named entity in employment context |
Special category data (CRITICAL severity)
| Category | Examples | Detection Patterns |
|---|---|---|
| Health data | "diagnosed with diabetes", "takes metformin" | Medical terms, diagnoses, medications |
| Racial/ethnic origin | "of Moroccan descent" | Ethnicity/race references in personal context |
| Political opinions | "member of Partido X" | Political party/affiliation references |
| Religious beliefs | "practising Muslim", "Catholic" | Religious terms in personal belief context |
| Biometric data | "fingerprint ID: ABC123" | Biometric identifiers |
| Sexual orientation | "gay", "heterosexual" | Sexual orientation terms in personal context |
| Trade union membership | "member of UGT" | Union references in membership context |
| Criminal records | "convicted of", "arrested for" | Criminal justice terms in personal context |
| Genetic data | "carries BRCA1 mutation" | Genetic markers, DNA references |
Synthetic Data Generation
General principles
- Format preservation — Synthetic data must match the exact format of the original (length, separators, casing)
- Linguistic consistency — Names must match the apparent cultural/linguistic context (Spanish names for Spanish text, etc.)
- Internal consistency — The same real entity always maps to the same synthetic entity within a document
- Cross-field coherence — Related fields should be coherent (e.g., Spanish name + Spanish phone number + Spanish address)
- Statistical plausibility — Synthetic values should fall within realistic ranges (valid postcodes, plausible dates, etc.)
- Non-collision — Synthetic data must not accidentally match other real entities in the text
Replacement strategies per category
| PII Category | Synthetic Strategy | Example |
|---|---|---|
| Full name | Generate culturally appropriate name from synthetic name pool | "María García" → "Laura Fernández" |
| Derive from synthetic name + synthetic domain | "maria.garcia@empresa.es" → "laura.fernandez@example.net" | |
| Phone number | Preserve country code + generate random valid digits | "+34 612 345 678" → "+34 698 721 043" |
| National ID | Generate format-valid but non-real ID | "12345678Z" → "87654321X" (valid check digit) |
| Postal address | Generate plausible address in same region/country | "Calle Mayor 5, 28001 Madrid" → "Avenida de la Paz 12, 28003 Madrid" |
| Date of birth | Shift by random offset (±1-5 years, ±1-90 days) | "15/03/1990" → "22/07/1988" |
| IBAN | Generate format-valid IBAN with correct check digits | "ES91 2100 0418 4502 0005 1332" → "ES76 0049 1500 0512 3456 7890" |
| Credit card | Generate Luhn-valid number with same BIN prefix category | "4111 1111 1111 1111" → "4532 7895 1234 5670" |
| IP address | Generate from non-routable or documentation ranges | "85.123.45.67" → "198.51.100.42" (RFC 5737) |
| Geolocation | Offset by ±0.01-0.05 degrees | "40.4168, -3.7038" → "40.4312, -3.6891" |
| Health data | Replace specific condition with different plausible condition | "diabetes" → "hypertension" |
| Organisation | Replace with fictional but plausible organisation | "Telefónica" → "Meridian Technologies" |
Name generation pools
Maintain diverse pools of synthetic names by cultural context:
- Spanish: "Laura Fernández", "Carlos Ruiz", "Ana Moreno", "Pablo Jiménez", "Elena Torres", "Diego Navarro"
- English: "Emily Carter", "James Wilson", "Sophie Brown", "Oliver Davies", "Charlotte Evans", "William Harris"
- French: "Camille Dupont", "Lucas Martin", "Léa Bernard", "Hugo Petit", "Manon Durand", "Théo Lambert"
- German: "Hannah Müller", "Felix Schmidt", "Lena Fischer", "Maximilian Weber", "Sophie Wagner", "Paul Becker"
- Portuguese: "Beatriz Silva", "Tomás Santos", "Inês Costa", "Miguel Oliveira", "Mariana Ferreira", "Diogo Pereira"
(Expand pools as needed for other cultural contexts.)
Decision trees
When to anonymise vs when to flag only
Is the user requesting anonymisation (replace PII)? ├── YES → Detect PII + generate synthetic replacements + produce anonymised text └── NO → Is the user requesting PII detection only? ├── YES → Detect PII + report findings without replacement └── NO → Is this a security review that found PII in code? ├── YES → Flag as finding in security report format │ Suggest: "Use @security with 'anonymise' for synthetic replacement" └── NO → Not applicable
How to determine confidence level
Does the pattern match exactly (email regex, phone format, ID format)? ├── YES → HIGH confidence └── NO → Does the context strongly suggest PII (preceded by "name:", "email:", "tel:")? ├── YES → MEDIUM confidence └── NO → Is it a partial or ambiguous match? ├── YES → LOW confidence (flag but ask user to confirm) └── NO → Do not flag
How to handle mixed-language text
Does the text contain multiple languages? ├── YES → Identify primary language per paragraph/section │ Apply language-appropriate name pools │ Preserve original language in synthetic output └── NO → Apply single-language name pool matching the text
How to handle special category data
Is the detected PII special category (Article 9)? ├── YES → Assign CRITICAL severity │ Warn: "Special category data detected. Ensure legal basis exists for processing." │ Replace with different but plausible value from same category │ Include explicit GDPR Article 9 reference in report └── NO → Assign severity based on tier (Tier 1 = HIGH, Tier 2 = MEDIUM, Tier 3 = LOW)
Retention decision tree
Is there a legal obligation to retain the data? ├── YES → Set retention period to legal requirement (e.g., 7 years for tax records) │ Document the specific law or regulation │ Implement automated deletion on expiry └── NO → Is the data needed for contract performance? ├── YES → Retain for duration of contract + reasonable wind-down period │ Define wind-down period (e.g., 30 days) │ Delete when contract ends + wind-down expires └── NO → Is the data retained based on consent? ├── YES → Retain until consent is withdrawn or purpose expires │ Define a maximum retention period (e.g., 26 months) │ Delete promptly on consent withdrawal └── NO → Is there a legitimate interest? ├── YES → Document the interest and balancing test │ Set shortest reasonable retention period │ Review retention regularly └── NO → DELETE IMMEDIATELY — no legal basis to retain
When to require a DPIA
Does the processing involve: ├── Large-scale processing of special category data? → DPIA REQUIRED ├── Systematic monitoring of publicly accessible areas? → DPIA REQUIRED ├── Automated decision-making with legal/significant effects? → DPIA REQUIRED ├── Large-scale profiling? → DPIA REQUIRED ├── New technologies with high risk to rights? → DPIA REQUIRED ├── Cross-border data transfers to non-adequate countries? → DPIA RECOMMENDED └── None of the above → DPIA not required (but recommended for high-risk processing)
Examples
Example 1: Consent management review
Input: A registration form that asks users to accept terms and conditions with a single pre-ticked checkbox that also covers marketing emails and third-party sharing.
Review Output:
Critical Issues:
- ❌ Pre-ticked checkbox violates Article 7 GDPR — Consent must require an affirmative action. Pre-ticked boxes do not constitute valid consent.
- Remediation: Replace with unticked checkboxes that users must actively select.
- ❌ Bundled consent violates purpose limitation — Separate consent required for: (a) Terms of Service acceptance, (b) Marketing emails, (c) Third-party data sharing.
- Remediation: Provide three separate checkboxes with clear descriptions of each purpose.
- ❌ No consent withdrawal mechanism — Users cannot revoke marketing or sharing consent after registration.
- Remediation: Implement a preference centre accessible from user settings.
Example 2: Retention policy review
Input: A system that stores user session logs indefinitely and keeps deleted user accounts in a "soft delete" state forever.
Review Output:
Critical Issues:
- ❌ Indefinite session log retention violates storage limitation (Article 5(1)(e)) — Session logs have no defined retention period.
- Remediation: Define a retention period (e.g., 90 days for security logs). Implement automated log rotation and purge.
- ❌ Soft-deleted accounts never purged violates right to erasure (Article 17) — Users who request deletion expect their data to be removed, not merely hidden.
- Remediation: Implement a hard-delete pipeline that purges soft-deleted accounts after a 30-day grace period. Ensure backups are also cleaned within 90 days.
Example 3: Full anonymisation of a customer record
Input:
Customer: María García López Email: maria.garcia@empresa.es Phone: +34 612 345 678 DNI: 12345678Z Address: Calle Mayor 5, 28001 Madrid Notes: Patient diagnosed with diabetes. Follow-up appointment on 15/03/2025. Emergency contact: Pedro García at +34 698 111 222.
Output:
PII Detection: 8 entities found (1 special category)
Anonymised Text:
Customer: Laura Fernández Ruiz Email: laura.fernandez@example.net Phone: +34 698 721 043 DNI: 87654321X Address: Avenida de la Paz 12, 28003 Madrid Notes: Patient diagnosed with hypertension. Follow-up appointment on 22/07/2025. Emergency contact: Diego Navarro at +34 641 893 507.
Validation: All PII replaced. Internal consistency maintained (García family → separate synthetic identities). Special category data (health) flagged and replaced.
Example 4: PII in code comments
Input:
// Created by María García (maria.garcia@empresa.es) // Test user: DNI 12345678Z, phone +34 612 345 678 public class UserService { private static final String ADMIN_EMAIL = "admin@empresa.es";
Output:
PII Detection: 4 entities found (name, email, DNI, phone in comments)
Anonymised Text:
// Created by Laura Fernández (laura.fernandez@example.net) // Test user: DNI 87654321X, phone +34 698 721 043 public class UserService { private static final String ADMIN_EMAIL = "admin@empresa.es";
Note:
ADMIN_EMAIL constant flagged separately as potential hardcoded credential (escalate to security-review skill). Not treated as PII anonymisation since it is a system account, not a natural person.
Output expectations
- Findings — Ordered by severity (critical, high, medium, low) with GDPR article references
- Remediation — Concrete, actionable fix for each finding
- GDPR mapping — Each finding mapped to the relevant article
- Checklists — Consent and retention checklists completed when applicable
Notes
- This skill extends the security agent's capabilities; it does NOT replace the security-review skill
- The security-review skill detects secrets and vulnerabilities; this skill focuses on data privacy: GDPR, consent, retention, and PII
- Anonymisation is a form of data protection by design (Article 25 GDPR)
- True anonymisation (irreversible) vs pseudonymisation (reversible with key) — this skill performs anonymisation by default
- If the user needs pseudonymisation (reversible mapping), they must explicitly request it; in that case, the mapping must be stored securely and separately
- For automated pipelines, consider integrating with libraries like Microsoft Presidio, spaCy NER, or AWS Comprehend for production-grade detection
- The agent performs best-effort detection; it is not a substitute for formal Data Protection Impact Assessment (DPIA)
- Consent management and retention policies should be reviewed with legal counsel for jurisdiction-specific requirements
- Data retention automation should be tested thoroughly to avoid accidental data loss