Claude-code-plugins-plus glean-data-handling

install

source · Clone the upstream repo

git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/glean-pack/skills/glean-data-handling" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-glean-data-handling && rm -rf "$T"

manifest: plugins/saas-packs/glean-pack/skills/glean-data-handling/SKILL.md

Glean Data Handling

Overview

Glean enterprise search ingests documents from dozens of connectors (Google Drive, Confluence, Slack, Jira, Salesforce, etc.) and builds a unified search index with permission-aware access control. Data types include indexed document content, connector metadata, user permission maps, query logs, and search analytics. All document content must be PII-filtered before indexing, permission boundaries must be preserved to prevent data leakage across teams, and retention policies must be enforced to comply with corporate governance and GDPR/CCPA obligations.

Data Classification

Data Type	Sensitivity	Retention	Encryption
Indexed document content	High (may contain PII)	Per source retention policy	AES-256 at rest
User permission maps	High (access control)	Sync lifecycle	TLS + at rest
Connector metadata	Medium	Until connector removed	AES-256 at rest
Search query logs	Medium (reveals intent)	90 days default	AES-256 at rest
Search analytics/aggregates	Low	1 year	TLS in transit

Data Import

interface GleanDocument {
  id: string; datasource: string; title: string;
  body: string; permissions: { allowedUsers?: string[]; allowAnonymousAccess?: boolean };
  updatedAt: string; url: string;
}

async function indexDocuments(docs: GleanDocument[], datasource: string) {
  // PII strip before indexing
  const sanitized = docs.map(doc => ({
    ...doc,
    body: stripPII(doc.body),
  }));
  // Batch upload with pagination (max 100 per request)
  for (let i = 0; i < sanitized.length; i += 100) {
    const batch = sanitized.slice(i, i + 100);
    await fetch(`https://customer-be.glean.com/api/index/v1/bulkindexdocuments`, {
      method: 'POST',
      headers: { Authorization: `Bearer ${process.env.GLEAN_INDEXING_TOKEN}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ datasource, documents: batch }),
    });
  }
}

function stripPII(text: string): string {
  return text
    .replace(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/g, '[EMAIL_REDACTED]')
    .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]')
    .replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]');
}

Data Export

async function exportSearchAnalytics(startDate: string, endDate: string) {
  const res = await fetch(`https://customer-be.glean.com/api/v1/analytics`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${process.env.GLEAN_API_TOKEN}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ startDate, endDate, metrics: ['query_count', 'click_through', 'zero_results'] }),
  });
  const data = await res.json();
  // Redact user identifiers from analytics export
  return data.results.map((r: any) => ({ ...r, userId: undefined, query: r.query?.length > 3 ? r.query : '[SHORT_QUERY_REDACTED]' }));
}

Data Validation

function validateDocument(doc: GleanDocument): string[] {
  const errors: string[] = [];
  if (!doc.id || doc.id.length > 512) errors.push('Invalid document ID');
  if (!doc.datasource) errors.push('Missing datasource identifier');
  if (!doc.title || doc.title.length > 1000) errors.push('Title missing or exceeds 1000 chars');
  if (!doc.body || doc.body.length === 0) errors.push('Empty document body');
  if (!doc.permissions) errors.push('Missing permissions — defaults to deny-all');
  if (doc.updatedAt && isNaN(Date.parse(doc.updatedAt))) errors.push('Invalid updatedAt timestamp');
  return errors;
}

Compliance

PII stripped from document body before indexing (emails, phones, SSNs)
Permission boundaries enforced: allowedUsers scope matches source system ACLs
Connector credentials stored in secret manager, rotated quarterly
Search query logs retained max 90 days, purged via automated job
GDPR right-to-erasure: delete all indexed content referencing a specific user on request
CCPA: honor do-not-sell signals for search analytics data
SOC 2 Type II audit trail for all indexing and deletion operations

Error Handling

Issue	Cause	Fix
403 on bulk index	Expired or insufficient indexing token	Rotate token, verify datasource permissions
Permission mismatch in search	Stale ACL sync from connector	Force re-sync connector permissions via admin API
PII detected in indexed content	New PII pattern not in strip regex	Add pattern to `stripPII` , re-index affected datasource
Zero-result queries spike	Connector sync failure, stale index	Check connector health dashboard, trigger manual re-crawl
Rate limit 429 on indexing	Batch size too large or too frequent	Reduce batch to 50 docs, add 500ms delay between batches

Resources

Next Steps

See

glean-security-basics