Claude-code-plugins-plus glean-data-handling
install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/glean-pack/skills/glean-data-handling" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-glean-data-handling && rm -rf "$T"
manifest:
plugins/saas-packs/glean-pack/skills/glean-data-handling/SKILL.mdsource content
Glean Data Handling
Overview
Glean enterprise search ingests documents from dozens of connectors (Google Drive, Confluence, Slack, Jira, Salesforce, etc.) and builds a unified search index with permission-aware access control. Data types include indexed document content, connector metadata, user permission maps, query logs, and search analytics. All document content must be PII-filtered before indexing, permission boundaries must be preserved to prevent data leakage across teams, and retention policies must be enforced to comply with corporate governance and GDPR/CCPA obligations.
Data Classification
| Data Type | Sensitivity | Retention | Encryption |
|---|---|---|---|
| Indexed document content | High (may contain PII) | Per source retention policy | AES-256 at rest |
| User permission maps | High (access control) | Sync lifecycle | TLS + at rest |
| Connector metadata | Medium | Until connector removed | AES-256 at rest |
| Search query logs | Medium (reveals intent) | 90 days default | AES-256 at rest |
| Search analytics/aggregates | Low | 1 year | TLS in transit |
Data Import
interface GleanDocument { id: string; datasource: string; title: string; body: string; permissions: { allowedUsers?: string[]; allowAnonymousAccess?: boolean }; updatedAt: string; url: string; } async function indexDocuments(docs: GleanDocument[], datasource: string) { // PII strip before indexing const sanitized = docs.map(doc => ({ ...doc, body: stripPII(doc.body), })); // Batch upload with pagination (max 100 per request) for (let i = 0; i < sanitized.length; i += 100) { const batch = sanitized.slice(i, i + 100); await fetch(`https://customer-be.glean.com/api/index/v1/bulkindexdocuments`, { method: 'POST', headers: { Authorization: `Bearer ${process.env.GLEAN_INDEXING_TOKEN}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ datasource, documents: batch }), }); } } function stripPII(text: string): string { return text .replace(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/g, '[EMAIL_REDACTED]') .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]') .replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]'); }
Data Export
async function exportSearchAnalytics(startDate: string, endDate: string) { const res = await fetch(`https://customer-be.glean.com/api/v1/analytics`, { method: 'POST', headers: { Authorization: `Bearer ${process.env.GLEAN_API_TOKEN}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ startDate, endDate, metrics: ['query_count', 'click_through', 'zero_results'] }), }); const data = await res.json(); // Redact user identifiers from analytics export return data.results.map((r: any) => ({ ...r, userId: undefined, query: r.query?.length > 3 ? r.query : '[SHORT_QUERY_REDACTED]' })); }
Data Validation
function validateDocument(doc: GleanDocument): string[] { const errors: string[] = []; if (!doc.id || doc.id.length > 512) errors.push('Invalid document ID'); if (!doc.datasource) errors.push('Missing datasource identifier'); if (!doc.title || doc.title.length > 1000) errors.push('Title missing or exceeds 1000 chars'); if (!doc.body || doc.body.length === 0) errors.push('Empty document body'); if (!doc.permissions) errors.push('Missing permissions — defaults to deny-all'); if (doc.updatedAt && isNaN(Date.parse(doc.updatedAt))) errors.push('Invalid updatedAt timestamp'); return errors; }
Compliance
- PII stripped from document body before indexing (emails, phones, SSNs)
- Permission boundaries enforced: allowedUsers scope matches source system ACLs
- Connector credentials stored in secret manager, rotated quarterly
- Search query logs retained max 90 days, purged via automated job
- GDPR right-to-erasure: delete all indexed content referencing a specific user on request
- CCPA: honor do-not-sell signals for search analytics data
- SOC 2 Type II audit trail for all indexing and deletion operations
Error Handling
| Issue | Cause | Fix |
|---|---|---|
| 403 on bulk index | Expired or insufficient indexing token | Rotate token, verify datasource permissions |
| Permission mismatch in search | Stale ACL sync from connector | Force re-sync connector permissions via admin API |
| PII detected in indexed content | New PII pattern not in strip regex | Add pattern to , re-index affected datasource |
| Zero-result queries spike | Connector sync failure, stale index | Check connector health dashboard, trigger manual re-crawl |
| Rate limit 429 on indexing | Batch size too large or too frequent | Reduce batch to 50 docs, add 500ms delay between batches |
Resources
Next Steps
See
glean-security-basics.