A complete guide to creating a privacy-focused AI chat interface, and getting your Siri a new personality
This project is a full-stack web application that creates a ChatGPT-like interface for interacting with locally-hosted AI models. Built using vanilla JavaScript, Node.js, and MongoDB, the application demonstrates how to integrate streaming AI responses, user authentication, and persistent chat history into a modern web interface.
The system runs entirely on a local network, allowing multiple users to have private conversations with an AI assistant through their web browsers. Users can create accounts, maintain separate conversation histories, and enjoy real-time streaming responses - all while keeping their data completely local.
The frontend is built with vanilla JavaScript, HTML5, and CSS3, creating a responsive single-page application. The interface features:
The JavaScript architecture separates concerns cleanly:
// Authentication handling
document.getElementById('loginForm').onsubmit = async (e) => {
e.preventDefault();
const username = document.getElementById('loginUsername').value;
const password = document.getElementById('loginPassword').value;
// Handle login logic...
};
// Real-time streaming implementation
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Process streaming response...
}
The backend is built with Node.js and Express, providing a RESTful API that handles:
// Dynamic IP detection for network accessibility
function getLocalIPAddress() {
const interfaces = os.networkInterfaces();
for (const name of Object.keys(interfaces)) {
for (const interface of interfaces[name]) {
if (interface.family === 'IPv4' && !interface.internal) {
return interface.address;
}
}
}
return 'localhost';
}
MongoDB is used to store user accounts and chat histories. The schema design is optimized for chat applications:
const chatSchema = new mongoose.Schema({
userId: { type: mongoose.Schema.Types.ObjectId, ref: 'User', required: true },
title: { type: String, default: 'New Chat' },
messages: [{
content: String,
isUser: Boolean,
timestamp: { type: Date, default: Date.now }
}]
}, { timestamps: true });
The application integrates with Ollama to run Llama 3.1 locally. The streaming implementation provides real-time responses:
const res = await fetch(`${OLLAMA_BASE}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'llama3.1',
prompt: userMessage,
stream: true
})
});
The application consists of four main components that need to be configured:
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_ORIGINS="*" ollama serve
mongod --dbpath /path/to/mongodb --bind_ip 0.0.0.0
node backend.js
http-server . -p 8000 -a 0.0.0.0
One of the main challenges was implementing smooth real-time streaming of AI responses. The solution involved using the Fetch API's ReadableStream to process chunked responses and update the UI incrementally.
Making the application accessible across different devices on the same network required configuring all services to bind to network interfaces rather than just localhost, along with implementing dynamic IP detection.
Creating a chat interface that works well on both desktop and mobile required careful CSS architecture using flexbox and grid layouts, along with responsive typography and spacing.
This project addresses growing concerns about data privacy in AI interactions. By running everything locally, families can give children and elderly relatives access to AI assistance without worrying about sensitive conversations being stored on external servers. Whether it's homework help, health questions, or personal conversations, everything stays within the home network.
Building this application provided hands-on experience with:
The project demonstrates how modern web technologies can be combined to create sophisticated applications that prioritize user privacy while maintaining excellent user experience.
Building on the local AI server setup, I extended the project by integrating the Llama model directly into Siri using Apple Shortcuts. This creates a completely voice-driven AI assistant that leverages the same local infrastructure while providing hands-free interaction.
The integration uses Apple's Shortcuts app to create a seamless voice interface that bridges Siri with your local AI server:
The Apple Shortcut uses the "Get Contents of URL" action to communicate with your local Ollama server:
// Shortcut API Configuration
URL: http://[YOUR_MAC_IP]:11434/api/generate
Method: POST
Headers: Content-Type: application/json
Request Body:
{
"model": "llama3.1",
"prompt": "[Siri Input Text]",
"stream": false,
"options": {
"temperature": 0.7,
"max_tokens": 500
}
}
The shortcut processes the JSON response and extracts the AI's answer:
// Response Processing in Shortcuts
1. Get Contents of URL (API call)
2. Get Value from Dictionary (key: "response")
3. Speak Text (output the AI response)
Ensure your Ollama server is accessible across devices on your network:
# Start Ollama with network access
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_ORIGINS="*" ollama serve
# Verify accessibility from other devices
curl http://[YOUR_MAC_IP]:11434/api/tags
OLLAMA_HOST=0.0.0.0:11434 OLLAMA_ORIGINS="*" ollama serve
ifconfig | grep "inet " | grep -v 127.0.0.1
Create different shortcuts for specialized AI assistants:
Modify the API request to include system prompts:
{
"model": "llama3.1",
"prompt": "You are a helpful cooking assistant. User question: [Siri Input]",
"stream": false,
"system": "Always provide practical, safe cooking advice."
}