Sf-skills sf-datacloud-prepare
git clone https://github.com/Jaganpro/sf-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/Jaganpro/sf-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/sf-datacloud-prepare" ~/.claude/skills/jaganpro-sf-skills-sf-datacloud-prepare && rm -rf "$T"
skills/sf-datacloud-prepare/SKILL.mdsf-datacloud-prepare: Data Cloud Prepare Phase
Use this skill when the user needs ingestion and lake preparation work: data streams, Data Lake Objects (DLOs), transforms, Document AI, unstructured ingestion, or the handoff from connector setup into a live stream.
When This Skill Owns the Task
Use
sf-datacloud-prepare when the work involves:
sf data360 data-stream *sf data360 dlo *sf data360 transform *sf data360 docai *- choosing how data should enter Data Cloud
- rerunning or rescanning ingestion after a source update
- preparing Ingestion API-backed streams after connector setup is complete
Delegate elsewhere when the user is:
- still creating/testing source connections → sf-datacloud-connect
- mapping to DMOs or designing IR/data graphs → sf-datacloud-harmonize
- querying ingested data → sf-datacloud-retrieve
Required Context to Gather First
Ask for or infer:
- target org alias
- source connection name
- source object / dataset / document source
- desired stream type
- DLO naming expectations
- whether the user is creating, updating, running, or deleting a stream
- whether the source is CRM, a database connector, an unstructured file source, or an Ingestion API feed
Core Operating Rules
- Verify the external plugin runtime before running Data Cloud commands.
- Run the shared readiness classifier before mutating ingestion assets:
.node ~/.claude/skills/sf-datacloud/scripts/diagnose-org.mjs -o <org> --phase prepare --json - Prefer inspecting existing streams and DLOs before creating new ingestion assets.
- Suppress linked-plugin warning noise with
for normal usage.2>/dev/null - Treat DLO naming and field naming as Data Cloud-specific, not CRM-native.
- Confirm whether each dataset should be treated as
,Profile
, orEngagement
before creating the stream.Other - Distinguish stream-level refresh from connection-level reruns when working with unstructured sources.
- Use UI setup intentionally when initial stream or unstructured asset creation is platform-gated.
- Hand off to Harmonize only after ingestion assets are clearly healthy.
Recommended Workflow
1. Classify readiness for prepare work
node ~/.claude/skills/sf-datacloud/scripts/diagnose-org.mjs -o <org> --phase prepare --json
2. Inspect existing ingestion assets
sf data360 data-stream list -o <org> 2>/dev/null sf data360 dlo list -o <org> 2>/dev/null
3. Confirm the stream category before creation
Use these rules when suggesting categories:
| Category | Use for | Typical requirement |
|---|---|---|
| person/entity records | primary key |
| time-based events or interactions | primary key + event time field |
| reference/configuration/supporting datasets | primary key |
When the source is ambiguous, ask the user explicitly whether the dataset should be treated as
Profile, Engagement, or Other.
4. Create or inspect streams intentionally
sf data360 data-stream get -o <org> --name <stream> 2>/dev/null sf data360 data-stream create-from-object -o <org> --object Contact --connection SalesforceDotCom_Home 2>/dev/null sf data360 data-stream create -o <org> -f stream.json 2>/dev/null sf data360 data-stream run -o <org> --name <stream> 2>/dev/null
5. Check DLO shape
sf data360 dlo get -o <org> --name Contact_Home__dll 2>/dev/null
6. Choose the right refresh mechanism
Use the smaller refresh scope that matches the user goal:
sf data360 data-stream run -o <org> --name <stream> 2>/dev/null sf data360 connection run-existing -o <org> --name <connection-id> 2>/dev/null
is the closest match to a stream-level refresh or re-scan.data-stream run
runs at the connection level and can be useful for some connector workflows, but it is not a reliable replacement for stream refresh on unstructured sources.connection run-existing- For unstructured document connectors, prefer
when the goal is to re-scan newly added or changed files.data-stream run
7. Handle unstructured sources deliberately
For SharePoint-style document ingestion, a minimal unstructured DLO payload can look like:
{ "name": "my_udlo", "label": "My UDLO", "category": "Directory_Table", "dataSource": { "sourceType": "SF_DRIVE", "directoryAndFilesDetails": [ { "dirName": "SPUnstructuredDocument/<CONNECTION_ID>/<SITE_ID>", "fileName": "*" } ], "sourceConfig": { "reservedPrefix": "$dcf_content$" } } }
Use the UI for the first-time unstructured setup when the user needs the richer end-to-end pipeline. The UI path can seed additional document metadata fields and downstream assets that a bare CLI DLO create flow may not provision automatically.
8. Use the local Ingestion API example for send-data workflows
For external systems pushing records into Data Cloud:
- create the connector in sf-datacloud-connect
- upload the schema with
sf data360 connection schema-upsert - create the stream in the UI when required
- send records with the local example in
examples/ingestion-api/
cd examples/ingestion-api cp .env.example .env python3 send-data.py
Key details:
- auth is a staged flow: JWT → Salesforce token → Data Cloud token
- the ingestion endpoint uses the tenant URL, not the Salesforce instance URL
means the payload was accepted for processing, not that records are queryable immediately202- validation failures often surface in the Problem Records DLO family
9. Only then move into harmonization
Once the stream and DLO are healthy, hand off to sf-datacloud-harmonize.
High-Signal Gotchas
- CRM-backed stream behavior is not the same as fully custom connector-framework ingestion.
andsf data360 data-stream run
are not interchangeable; prefer stream-level refresh for unstructured rescans.sf data360 connection run-existing
streams sync on a platform-managed schedule;SFDC
is not the general control path for CRM connector refresh.data-stream run- Some external database connectors can be created via API while stream creation still requires UI flow or org-specific browser automation. Do not promise a pure CLI stream-creation path for every connector type.
- Initial SharePoint-style unstructured setup can be richer in the UI than in a minimal CLI DLO create flow.
- Stream deletion can also delete the associated DLO unless the delete mode says otherwise.
- DLO field naming differs from CRM field naming, including
→__c
transformations._c - Query DLO record counts with Data Cloud SQL instead of assuming list output is sufficient.
means the stream module is gated for the current org/user; guide the user to provisioning/permissions review instead of retrying blindly.CdpDataStreams
Output Format
Prepare task: <stream / dlo / transform / docai> Source: <connection + object> Target org: <alias> Artifacts: <stream names / dlo names / json definitions> Verification: <passed / partial / blocked> Next step: <harmonize or retrieve>