AI Dev Blog Terminal Header

AI Assisted Development - Claude, Cursor, Gemini, ChatGPT & Copilot

Building a Simple AI Server for Apple Intelligence Foundation Models (On Device)

Building a Simple AI Server for Apple Intelligence Foundation Models (On Device)

When Apple announced the Foundation Models framework at WWDC 2025, I was excited. Finally, a way to run AI locally on my Mac without sending data to the cloud or paying per request fees. There was just one problem: it's Swift only 😕

So I built a simple HTTP server that wraps Apple's Foundation Models API. This solution provides on device AI from JavaScript, Python, or any other language.

Get the source code here: https://github.com/aicodechef/apple-foundation-model-ai-server

The Problem

Apple's Foundation Models framework is powerful:

But accessing it requires:

The Solution

I built a lightweight HTTP server in Swift that exposes Foundation Models through a REST API. Now you can do this from JavaScript:

const response = await fetch('http://localhost:8080/completion', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    prompt: 'Explain closures in JavaScript',
    systemPrompt: 'You are a patient programming tutor'
  })
});

const data = await response.json();
console.log(data.response);

That's it. No Swift, no Xcode, no cloud APIs.

The process of creating the Swift server wrapper app wasn't incredibly difficult or time consuming. It was also interesting to take a closer look at Swift and the Foundation Models API.

I personally don't have any huge desires to build an application completely written in Swift at this point in time. My original idea was to access the AI natively available on my MacBook, preferably from a JavaScript and HTML based application, potentially using Electron. Hence the idea for the Swift server.

Why This Matters

Privacy First

In an era where every API call sends your data to someone else's servers, having truly local AI is powerful. Your prompts, your code, your ideas all stay on your machine.

This matters for:

Zero Cost

Cloud APIs are amazing, but costs add up fast. With this approach:

Developer Experience

Web developers shouldn't need to learn Swift to experiment with AI. An HTTP API is universal:

How It Works

The architecture is surprisingly simple:

┌─────────────┐    HTTP     ┌──────────────┐    Swift API    ┌─────────────┐
│   Your      │ ──────────► │   Server     │ ──────────────► │ Foundation  │
│   Web App   │   JSON      │   (Swift)    │  LanguageModel  │ Models (AI) │
└─────────────┘             └──────────────┘                 └─────────────┘
                                                               (On-Device)

Your app sends HTTP POST with a prompt

The Code

The core is about 200 lines of Swift. Here's the interesting part: handling the actual AI generation:

func generateCompletion(
    prompt: String,
    systemPrompt: String? = nil,
    temperature: Double? = nil
) async throws -> String {
    // System prompts aren't a direct API in LanguageModelSession
    // So we prepend them - this is a common pattern when
    // the API doesn't support native system messages
    let fullPrompt: String
    if let systemPrompt = systemPrompt {
        fullPrompt = """
        System: \(systemPrompt)

        User: \(prompt)
        """
    } else {
        fullPrompt = prompt
    }

    var options = GenerationOptions()
    if let temp = temperature {
        options.temperature = temp
    }

    // This runs on your Mac's Neural Engine + GPU
    let response = try await session.respond(to: fullPrompt, options: options)
    return response.content
}

The trickiest part? Getting the HTTP server to actually wait for responses to send before closing connections. Async networking in Swift has some gotchas:

// Wait for send to complete before closing connection
await withCheckedContinuation { continuation in
    connection.send(content: response, completion: .contentProcessed { _ in
        continuation.resume()
    })
}

Without this, responses would get cut off mid-transmission.

Performance Notes

On my M4 Max:

The first request is slower because the model loads into memory. After that, it's incredibly fast.

Limitations

Let's be honest about what this isn't:

Best for:

Not for:

Final Thoughts

We're in an interesting moment. AI is powerful but mostly cloud-based. Privacy concerns are real. Costs add up. Open source models exist but are complex to deploy.

Apple's Foundation Models framework is a middle ground: powerful, private, and built-in. But it needed a bridge to the web development world.

This server is that bridge.