Claude-code-plugins-plus glean-data-handling

install
source · Clone the upstream repo
git clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/jeremylongshore/claude-code-plugins-plus-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/saas-packs/glean-pack/skills/glean-data-handling" ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-glean-data-handling && rm -rf "$T"
manifest: plugins/saas-packs/glean-pack/skills/glean-data-handling/SKILL.md
source content

Glean Data Handling

Overview

Glean enterprise search ingests documents from dozens of connectors (Google Drive, Confluence, Slack, Jira, Salesforce, etc.) and builds a unified search index with permission-aware access control. Data types include indexed document content, connector metadata, user permission maps, query logs, and search analytics. All document content must be PII-filtered before indexing, permission boundaries must be preserved to prevent data leakage across teams, and retention policies must be enforced to comply with corporate governance and GDPR/CCPA obligations.

Data Classification

Data TypeSensitivityRetentionEncryption
Indexed document contentHigh (may contain PII)Per source retention policyAES-256 at rest
User permission mapsHigh (access control)Sync lifecycleTLS + at rest
Connector metadataMediumUntil connector removedAES-256 at rest
Search query logsMedium (reveals intent)90 days defaultAES-256 at rest
Search analytics/aggregatesLow1 yearTLS in transit

Data Import

interface GleanDocument {
  id: string; datasource: string; title: string;
  body: string; permissions: { allowedUsers?: string[]; allowAnonymousAccess?: boolean };
  updatedAt: string; url: string;
}

async function indexDocuments(docs: GleanDocument[], datasource: string) {
  // PII strip before indexing
  const sanitized = docs.map(doc => ({
    ...doc,
    body: stripPII(doc.body),
  }));
  // Batch upload with pagination (max 100 per request)
  for (let i = 0; i < sanitized.length; i += 100) {
    const batch = sanitized.slice(i, i + 100);
    await fetch(`https://customer-be.glean.com/api/index/v1/bulkindexdocuments`, {
      method: 'POST',
      headers: { Authorization: `Bearer ${process.env.GLEAN_INDEXING_TOKEN}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ datasource, documents: batch }),
    });
  }
}

function stripPII(text: string): string {
  return text
    .replace(/\b[\w.+-]+@[\w-]+\.[\w.]+\b/g, '[EMAIL_REDACTED]')
    .replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE_REDACTED]')
    .replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN_REDACTED]');
}

Data Export

async function exportSearchAnalytics(startDate: string, endDate: string) {
  const res = await fetch(`https://customer-be.glean.com/api/v1/analytics`, {
    method: 'POST',
    headers: { Authorization: `Bearer ${process.env.GLEAN_API_TOKEN}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ startDate, endDate, metrics: ['query_count', 'click_through', 'zero_results'] }),
  });
  const data = await res.json();
  // Redact user identifiers from analytics export
  return data.results.map((r: any) => ({ ...r, userId: undefined, query: r.query?.length > 3 ? r.query : '[SHORT_QUERY_REDACTED]' }));
}

Data Validation

function validateDocument(doc: GleanDocument): string[] {
  const errors: string[] = [];
  if (!doc.id || doc.id.length > 512) errors.push('Invalid document ID');
  if (!doc.datasource) errors.push('Missing datasource identifier');
  if (!doc.title || doc.title.length > 1000) errors.push('Title missing or exceeds 1000 chars');
  if (!doc.body || doc.body.length === 0) errors.push('Empty document body');
  if (!doc.permissions) errors.push('Missing permissions — defaults to deny-all');
  if (doc.updatedAt && isNaN(Date.parse(doc.updatedAt))) errors.push('Invalid updatedAt timestamp');
  return errors;
}

Compliance

  • PII stripped from document body before indexing (emails, phones, SSNs)
  • Permission boundaries enforced: allowedUsers scope matches source system ACLs
  • Connector credentials stored in secret manager, rotated quarterly
  • Search query logs retained max 90 days, purged via automated job
  • GDPR right-to-erasure: delete all indexed content referencing a specific user on request
  • CCPA: honor do-not-sell signals for search analytics data
  • SOC 2 Type II audit trail for all indexing and deletion operations

Error Handling

IssueCauseFix
403 on bulk indexExpired or insufficient indexing tokenRotate token, verify datasource permissions
Permission mismatch in searchStale ACL sync from connectorForce re-sync connector permissions via admin API
PII detected in indexed contentNew PII pattern not in strip regexAdd pattern to
stripPII
, re-index affected datasource
Zero-result queries spikeConnector sync failure, stale indexCheck connector health dashboard, trigger manual re-crawl
Rate limit 429 on indexingBatch size too large or too frequentReduce batch to 50 docs, add 500ms delay between batches

Resources

Next Steps

See

glean-security-basics
.