Anthropic-Cybersecurity-Skills building-role-mining-for-rbac-optimization
Apply bottom-up and top-down role mining techniques to discover optimal RBAC roles from existing user-permission
git clone https://github.com/mukul975/Anthropic-Cybersecurity-Skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/mukul975/Anthropic-Cybersecurity-Skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/building-role-mining-for-rbac-optimization" ~/.claude/skills/mukul975-anthropic-cybersecurity-skills-building-role-mining-for-rbac-optimizati && rm -rf "$T"
skills/building-role-mining-for-rbac-optimization/SKILL.mdBuilding Role Mining for RBAC Optimization
Overview
Role mining is the process of analyzing existing user-permission assignments to discover optimal roles for a Role-Based Access Control (RBAC) system. Organizations accumulate excessive permissions over time through job changes, project assignments, and ad-hoc access grants, leading to "role explosion" where thousands of granular roles exist with significant overlap. Role mining uses data analysis -- including clustering algorithms, formal concept analysis, and graph-based methods -- to consolidate permissions into a minimal set of roles that accurately represent business functions while enforcing least privilege.
When to Use
- When deploying or configuring building role mining for rbac optimization capabilities in your environment
- When establishing security controls aligned to compliance requirements
- When building or improving security architecture for this domain
- When conducting security assessments that require this implementation
Prerequisites
- Export of current user-permission assignments (CSV/database)
- Identity governance platform or directory service access
- Python 3.9+ with pandas, scikit-learn, numpy
- Understanding of organizational structure and job functions
- Stakeholder access for role validation workshops
Core Concepts
Role Mining Approaches
| Approach | Description | Best For |
|---|---|---|
| Bottom-Up | Analyze existing permissions to discover common patterns | Large datasets with organic permission growth |
| Top-Down | Design roles from business requirements and job descriptions | Greenfield RBAC or organizational restructuring |
| Hybrid | Combine bottom-up analysis with top-down business validation | Most production environments |
Role Mining Algorithms
1. Permission Clustering: Group users with similar permission sets using k-means or hierarchical clustering. Users in the same cluster share a common role.
2. Formal Concept Analysis (FCA): Mathematical framework that identifies complete set of concepts (user groups sharing exact permission sets) from a binary user-permission matrix.
3. Graph-Based Mining: Model users and permissions as a bipartite graph, then find dense subgraphs representing candidate roles.
4. Boolean Matrix Decomposition: Decompose the user-permission matrix U into U ≈ R × P where R maps users to roles and P maps roles to permissions.
Role Mining Metrics
| Metric | Formula | Target |
|---|---|---|
| Role Count | Total distinct roles after mining | Minimize |
| Coverage | Permissions explained by mined roles / Total permissions | > 95% |
| Weighted Structural Complexity (WSC) | Sum of role-user + role-permission assignments | Minimize |
| Deviation | Extra permissions not covered by assigned roles | < 5% |
Workflow
Step 1: Extract User-Permission Data
Collect the current access state from all identity sources:
import pandas as pd import numpy as np # Load user-permission assignments # Format: user_id, permission_id (one row per assignment) assignments = pd.read_csv("user_permissions.csv") # Create binary user-permission matrix (UPA matrix) upa_matrix = assignments.pivot_table( index="user_id", columns="permission_id", aggfunc="size", fill_value=0 ) upa_matrix = (upa_matrix > 0).astype(int) print(f"Users: {upa_matrix.shape[0]}") print(f"Permissions: {upa_matrix.shape[1]}") print(f"Assignments: {assignments.shape[0]}") print(f"Density: {upa_matrix.values.sum() / upa_matrix.size:.2%}")
Step 2: Bottom-Up Role Discovery Using Clustering
from sklearn.cluster import AgglomerativeClustering from sklearn.metrics import silhouette_score def find_optimal_clusters(matrix, max_k=50): """Find optimal number of roles using silhouette analysis.""" scores = [] for k in range(2, min(max_k, matrix.shape[0])): clustering = AgglomerativeClustering( n_clusters=k, metric="jaccard", linkage="average" ) labels = clustering.fit_predict(matrix) score = silhouette_score(matrix, labels, metric="jaccard") scores.append((k, score)) optimal_k = max(scores, key=lambda x: x[1])[0] return optimal_k, scores def mine_roles_clustering(upa_matrix, n_clusters): """Mine roles using hierarchical clustering on Jaccard distance.""" clustering = AgglomerativeClustering( n_clusters=n_clusters, metric="jaccard", linkage="average" ) user_matrix = upa_matrix.values labels = clustering.fit_predict(user_matrix) roles = {} for cluster_id in range(n_clusters): cluster_users = upa_matrix.index[labels == cluster_id] cluster_permissions = upa_matrix.loc[cluster_users] # Core role = permissions held by >80% of cluster members permission_frequency = cluster_permissions.mean() core_permissions = permission_frequency[permission_frequency >= 0.8].index.tolist() roles[f"Role_{cluster_id}"] = { "permissions": core_permissions, "user_count": len(cluster_users), "users": cluster_users.tolist(), "coverage": permission_frequency[permission_frequency >= 0.8].mean() } return roles, labels
Step 3: Formal Concept Analysis
def mine_roles_fca(upa_matrix, min_support=3): """Mine roles using Formal Concept Analysis (frequent closed itemsets).""" from itertools import combinations users = upa_matrix.index.tolist() permissions = upa_matrix.columns.tolist() concepts = [] # Find all maximal permission sets shared by at least min_support users for size in range(len(permissions), 0, -1): for perm_combo in combinations(permissions, size): perm_set = set(perm_combo) # Find users who have ALL permissions in this set matching_users = [] for user in users: user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1]) if perm_set.issubset(user_perms): matching_users.append(user) if len(matching_users) >= min_support: # Check if this is a closed concept (no superset with same extent) is_closed = True for concept in concepts: if set(matching_users) == set(concept["users"]) and \ perm_set.issubset(set(concept["permissions"])): is_closed = False break if is_closed: concepts.append({ "permissions": list(perm_set), "users": matching_users, "support": len(matching_users) }) if len(concepts) > 100: # Limit for performance break return concepts
Step 4: Evaluate and Select Roles
def evaluate_role_set(roles, upa_matrix): """Evaluate the quality of a mined role set.""" total_assignments = upa_matrix.values.sum() covered_assignments = 0 extra_assignments = 0 for role_name, role_data in roles.items(): role_perms = set(role_data["permissions"]) for user in role_data["users"]: user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1]) covered = role_perms.intersection(user_perms) extra = role_perms - user_perms covered_assignments += len(covered) extra_assignments += len(extra) metrics = { "total_roles": len(roles), "total_assignments": total_assignments, "covered_assignments": covered_assignments, "coverage_rate": covered_assignments / total_assignments if total_assignments else 0, "extra_permissions": extra_assignments, "deviation_rate": extra_assignments / (covered_assignments + extra_assignments) if (covered_assignments + extra_assignments) else 0, "avg_role_size": np.mean([len(r["permissions"]) for r in roles.values()]), "avg_users_per_role": np.mean([r["user_count"] for r in roles.values()]), } return metrics
Step 5: Business Validation
After mining candidate roles:
- Map mined roles to business functions (department, job title)
- Conduct workshops with business unit managers to validate role definitions
- Identify outlier permissions that indicate misconfiguration
- Refine roles based on feedback and re-evaluate metrics
- Document role definitions with business justification
Validation Checklist
- User-permission matrix extracted from all identity sources
- Multiple mining algorithms compared (clustering, FCA)
- Optimal role count determined via silhouette analysis or WSC
- Coverage rate exceeds 95% of existing assignments
- Deviation rate below 5% (minimal extra permissions)
- Mined roles validated with business stakeholders
- Role hierarchy defined (parent-child inheritance)
- Exception/outlier permissions documented
- Migration plan created for transitioning to new role model
- Ongoing role governance process defined