Claude-skill-registry AppleFoundationModels
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/AppleFoundationModels" ~/.claude/skills/majiayu000-claude-skill-registry-applefoundationmodels && rm -rf "$T"
skills/data/AppleFoundationModels/SKILL.mdname: "AppleFoundationModels" description: "Use Apple's on-device **Foundation Models** framework (iOS/iPadOS/macOS 26.0+) for natural language understanding, structured data generation, and tool-assisted tasks in apps." version: "1.0" dependencies: - iOS 26.0 or later (Apple Intelligence enabled) - iPadOS 26.0 or later - macOS 26.0 or later - Mac Catalyst 26.0 or later - visionOS 26.0 or later ---
Instructions
-
On-Device Foundation Model: Apple's Foundation Models framework provides access to a ~3 billion-parameter large language model running entirely on-device[1]. This model powers Apple Intelligence features and runs offline at no extra cost, ensuring user data stays private[2]. Always check that the device supports Apple Intelligence and that the model is available before use (e.g. via
)[3].SystemLanguageModel.availability -
Session and Context: Interactions with the model occur through a stateful
. Creating a session optionally takes developer-provided instructions (system prompts) to set the model's role, style, or other guidelines[4][5]. Do not inject untrusted user input into these instructions, since the model prioritizes developer instructions over user prompts[6]. The session maintains a Transcript of all turns; multi-turn conversations within one session will automatically include context from previous prompts and responses[7]. Avoid concurrent requests on a single session (the model can handle only one prompt at a time). UseLanguageModelSession
or await the response future to ensure the prior request completes before sending another[8].session.isResponding -
Natural Language Understanding (NLU): The foundation model can interpret and extract meaning from text to perform tasks like classification, tagging, summarization, or entity extraction. For example, apps have used it to parse a user's free-form input into structured tasks and dates ("Call Sophia Friday" becomes a scheduled call on Friday)[9]. By leveraging specialized adapters and prompts, you can enhance NLU for specific domains. Apple provides a Content Tagging adapter for first-class support of topic tagging, entity recognition, and intent detection[10]. This adapter, used via
, is fine-tuned for generating topic tags or other labels from text using the same on-device model[10]. You can still define a custom schema for the output (e.g. a struct of tags or entities) to get structured NLU results.SystemLanguageModel(useCase: .contentTagging) -
Guided Structured Output: Use Guided Generation to ensure the model's output follows a specific structure or format. The framework introduces the
macro to define Swift types (structs or enums) that represent the desired output schema[11]. Mark all output fields as properties of a@Generable
struct; these may include basic types (@Generable
,String
,Int
, etc.), arrays, or even nested generable types (including recursive types)[12][13]. Optionally use theBool
macro on properties to provide natural language descriptions or constrain acceptable values (e.g. range, format) for that field[14]. When you request a response generating a particular@Guide
type, the framework employs constrained decoding to guarantee the model's reply can be parsed into that type[15]. This eliminates unreliable string parsing and ensures structural correctness of responses[16]. You no longer need to prompt with "please output JSON"; the framework automatically formats the model output as JSON under the hood and decodes it into your Swift struct[17][18]. Focus your prompts on content and behavior rather than worrying about output syntax.@Generable -
Tool Invocation (Tool Calling): Extend the model's capabilities by defining tools -- custom functions in your app that the model can call autonomously[19][20]. Conform your tool types to the
protocol, which requires a uniqueTool
and a descriptive explanation of what the tool does[21]. The framework will include this name and description in the model's context so it knows the tool is available and when to use it[22]. Implement the tool'sname
async method to perform the desired action or fetch data. Tool arguments must be acall(arguments:)
type (defined viaGenerable
) because tool invocation is built on guided generation -- the model will only produce a tool call if it can generate valid arguments matching that schema[23]. The framework ensures the model never emits an invalid tool name or malformed arguments[23]. In the@Generable struct
method, use any APIs or services needed (e.g. query a database, call a web API, use a system framework like HealthKit or WeatherKit) to fulfill the request. Return the result as acall
-- this can be created from structured data (anyToolOutput
that matches a Generable schema) or from a plain string if the tool's output is narrative[24]. Attach tools to the session at initialization (e.g.GeneratedContent
) so the model is aware of them during that session[25]. The model will autonomously decide if and when to call a tool based on the user's prompt and the tool descriptions[26][27]. When a tool is invoked, the framework executes yourLanguageModelSession(tools: [MyTool()])
method, injects the tool's output back into the conversation transcript, and then lets the model continue to generate its final answer with that additional information[27][28]. Tool calling enables the model to fetch real-time data or perform actions beyond its built-in knowledge[29][30], all while keeping the logic and data access under your app's control.call -
Streaming Responses: For a more responsive UI, use the asynchronous streaming API (
) to receive partial results as the model generates them[31][32]. Instead of raw text tokens, the Foundation Models framework streams snapshot objects representing partially-filledsession.streamResponse
outputs[33][34]. Each interim snapshot has the same fields as your output type but with incomplete data (fields are@Generable
until generated)[35]. By iterating over the AsyncSequence, you can update your SwiftUI views or UI elements progressively as each field in the structure becomes available[36][37]. This structured streaming avoids the need to manually concatenate tokens or parse incremental JSON, since you get well-typed partial objects. It's especially powerful for gradually displaying content like lists or multi-field data with smooth animations[38][39]. Remember that properties are generated in order; consider declaring important summary fields last to improve coherence and animation order[40].nil -
Safety and Guardrails: The on-device model has built-in guardrails to reduce harmful or unwanted outputs[41][5]. Handle errors such as
(triggered if content violates Apple's safety rules) by catching exceptions fromGenerationError.guardrailViolation
/respond
. The model supports multiple languages (15+ locales)[42], but will return an error if asked to process an unsupported language or if Apple Intelligence is disabled/unavailable[3]. Always test your prompts and outputs, especially for sensitive content. Use developer instructions to enforce style or refusals in certain cases as needed. By keeping all AI processing on-device, Apple's framework inherently protects privacy (no user text or model prompts ever leave the device)[2].streamResponse
Workflow
-
Initialize Model Session: Ensure the device and OS support the Foundation Model. Then create a
(orLanguageModelSession
) instance. Optionally pass a customSystemLanguageModel
object with system-level guidance for the model (e.g. role-playing as an expert, response style)[4]. Also provide anyInstructions
instances via the session's initializer if your feature will use tool calling[25]. For specialized NLU tasks, you may initialize aTool
with a built-in adapter (e.g.SystemLanguageModel
) instead of the general model[10]. Check.contentTagging
and only proceed ifSystemLanguageModel.availability
[3]..available -
Define Output Schema (Optional): If you need structured output, declare Swift types to model the response. Use
to annotate the type (for example, a struct with fields for the info you want). Add@Generable
on any fields where you want to constrain the output format or provide hints (for instance,@Guide
). The framework will auto-generate a JSON schema from your type at compile time and use it to guide the model's output[11][15].@Guide(description: "ISO 8601 date string") var dueDate: String? -
Construct Prompt: Formulate the user prompt or query. This can be a simple string or a
object if you need to include variables or formatted text. Keep prompts focused on the task/content you want; avoid having to specify output format (the guided generation mechanism will handle that)[17]. If you provided custom instructions in the session, you generally do not need to include those again in the prompt; they are automatically prepended for each query.Prompt -
Request Model Response: Call the session's API to generate a response. There are two primary modes:
-
Synchronous generation: Use
to get a completetry await session.respond(to: prompt, generating: OutputType.self)
in one call[43][44]. If you don't need structured output, you can omit theResponse<OutputType>
parameter and get agenerating:
containing the model's reply text. You can also passResponse<String>
to adjust parameters like temperature (creativity), max tokens, or sampling method[45][46]. The first call may incur a short delay as the model warms up; you can call a lightweight method (e.g.GenerationOptions
) in advance if needed.session.prewarm() -
Streaming generation: Use
to get ansession.streamResponse(generating: OutputType.self) { prompt }
of partial results[31]. Iterate over the sequence withAsyncSequence
to handle eachfor await
snapshot as it arrives[47]. Update your UI or state with each partial (Partial<OutputType>
will be a partially-filled instance of your struct). Continue until the sequence completes, then use the final snapshot as the full result. Streaming is useful for long responses or when you want to show incremental progress (e.g. revealing parts of an answer one by one)[33].partial.content -
Process the Response: Once
returns (or the stream completes), retrieve the output:respond() -
For a structured response, you'll get a
whereResponse<T>
is your Generable type. AccessT
to get the fully decoded Swift struct (already populated with the model's output)[44]. You can then use this object directly in your app (e.g. update the UI with these values, or pass it to other logic). Theresponse.content
also contains aResponse
property if you need to examine the conversation history or the raw text form.transcriptEntries -
For a text response (
), simply use theResponse<String>
string. For example, display it in a text view or speak it with AVSpeechSynthesizer..content -
If the model invoked any tools during processing, their effects should be reflected in the final content. For instance, if a tool fetched data, the answer will include that data. The tool call and its output are also logged as entries in the session transcript (you can inspect
to see tool invocations and results in sequence)[48][28].session.transcript -
Handle Errors and Completion: Be prepared to catch errors from the
call. For example, arespond
error indicates the input or requested content violated a safety rule (the model refused), in which case you might show an error message or a sanitized response. AnguardrailViolation
error indicates the model couldn't run (perhaps the device is not in a supported region or Apple Intelligence is off)[3]. Also, after each interaction,unavailable
will go back tosession.isResponding
-- at this point it's safe to allow the user to submit another prompt or end the session.false -
Iterate or End Session: If your feature involves multiple back-and-forth turns, you can continue calling
on the same session object to leverage the accumulated context[7]. The model will remember prior prompts and its answers, enabling follow-up questions or refinements (e.g. "Now do the same for Paris" referring to a previous travel itinerary). If instead each usage is independent, you might reset or discard the session after use. (Currently there's no explicit "reset" method, but you can simply create a new session for a fresh context.)respond() -
Optimize & Refine: Once basic integration works, refine your prompts, instructions, and tools for better results. Use developer tools like the Foundation Models Debugger/Instrument to profile token usage, response time, and verify schema adherence. Ensure your
schema aligns with what the model can produce (complex schemas may require guiding instructions or breaking tasks into smaller prompts). Leverage the model's strengths in language understanding but provide tools or heuristics for tasks it isn't reliable at (e.g. real-time info lookup, calculations, highly domain-specific answers)[49][30]. Test in different conditions (device types, languages, content scenarios) to validate the on-device model's performance.@Generable
Examples
- Structured Output Example: Suppose you want the model to generate a question-and-answer pair for a trivia app. First, define a Swift struct with the desired fields:
-
import FoundationModels @Generable struct QuizQA: Equatable { let question: String let answer: String }
Now you can prompt the model and get a
result:QuizQAlet session = LanguageModelSession() let userPrompt = "Generate a trivia question about WWDC and provide the answer." let result: Response<QuizQA> = try await session.respond(to: userPrompt, generating: QuizQA.self) let qa = result.content // QuizQA object with `question` and `answer` filled in. print("Q: \(qa.question)\nA: \(qa.answer)")The Foundation Models framework ensures the response is a JSON that matches
and parses it accordingly -- for example, it might return question: "What does WWDC stand for?" and answer: "Worldwide Developers Conference." (Your actual output will vary, but it will always conform to theQuizQA
structure)[18][15].QuizQA
- Tool Invocation Example: Consider an app that needs current weather info. You can define a tool that calls WeatherKit:
-
struct WeatherTool: Tool { let name = "getWeather" let description = "Look up the current temperature for a city." @Generable struct Arguments { let city: String } func call(arguments: Arguments) async throws -> ToolOutput { // Use WeatherKit or an API to get weather for arguments.city let (temp, condition) = try await fetchWeather(for: arguments.city) let reply = "It is (temp)° and (condition) now in (arguments.city)." return ToolOutput(reply) } }
When initializing the session, attach this tool:
let session = LanguageModelSession(tools: [WeatherTool()])Now if the user prompt is "What's the weather in Cupertino right now?", the model can decide to invoke
internally to get live data. It will produce a tool call likeWeatherTool
behind the scenes[50], causing yourgetWeather(city: "Cupertino")
to run. The tool's output (e.g. "It is 72° and sunny now in Cupertino.") is then inserted into the model's context, and the model uses it to complete the final answer[28][51]. The user ultimately sees a response that includes the real weather info, even though the base model by itself didn't know it. This demonstrates how tool calling can augment the on-device model with up-to-date information and actions[29][30].WeatherTool.call
- NLU Tagging Example: You can use the content tagging adapter to extract topics or keywords from text. For instance:
-
@Generable struct Topics { let topics: [String] } let session = SystemLanguageModel(useCase: .contentTagging) let text = "Apple unveiled new AR features in iOS at WWDC." let tags: Topics = try await session.respond(to: text, generating: Topics.self).content print(tags.topics) // e.g. ["Apple", "AR", "iOS", "WWDC"]
Here the model categorizes the input sentence, identifying key topics or entities. Under the hood, the
adapter uses a fine-tuned model head to improve accuracy for tagging tasks[10]. We still leverage guided generation to output a consistent JSON with an array of topics. This way, the app can display or use these tags (for example, to index content or trigger certain features) without any server calls. All processing is local, and sensitive text never leaves the device[2]..contentTagging