Integrating Llama 3.1 Local API with Node.js: Quickstart

Running large language models locally on your system guarantees data privacy, runs completely offline, and eliminates API subscription costs. Ollama runs a local background service that exposes endpoints to interface with models.

This enables you to build custom Node.js scripts or local web tools that call local AI offline.

In this guide, we will write a complete TypeScript integration to connect a Node.js app to local Ollama endpoints.

Prerequisites and Setup

Download and install Ollama from the official website.
Download the Llama 3.1 model using your terminal:
```
ollama pull llama3.1:8b
```
Initialize a TypeScript Node.js project and install the SDK:
```
npm install ollama
```

TypeScript SDK Integration Code

Below is a complete script demonstrating how to connect to the local Ollama API, stream responses, and handle errors:

import ollama from 'ollama';

async function generateAIResponse() {
  const prompt = "Compare TypeScript with vanilla JavaScript in two paragraphs.";

  try {
    // Invoke chat generation with streaming enabled
    const response = await ollama.chat({
      model: 'llama3.1:8b',
      messages: [{ role: 'user', content: prompt }],
      options: {
        temperature: 0.7,      // Controls output creativity
        num_predict: 250,      // Maximum token response length
        stop: ["\n\n"]        // Custom stop tokens
      },
      stream: true,
    });

    console.log("Response Stream Started:\n");

    // Loop through response chunks as they arrive
    for await (const chunk of response) {
      process.stdout.write(chunk.message.content);
    }
    
    console.log("\n\nStream Completed successfully.");
  } catch (error) {
    console.error("Failed to connect to local Ollama service:", error);
  }
}

generateAIResponse();

Forcing Structured JSON Outputs

For many automation tasks, you need structured data (like JSON) rather than plain text. Ollama allows you to enforce JSON format output natively.

Here is how to configure a structured request:

import ollama from 'ollama';

async function fetchStructuredData() {
  const schemaPrompt = "Generate a user profile. Output name, age, and 3 skills.";

  try {
    const response = await ollama.generate({
      model: 'llama3.1:8b',
      prompt: schemaPrompt,
      format: 'json', // Forces output to be a valid JSON object
    });

    const dataObj = JSON.parse(response.response);
    console.log("Parsed JSON object:", dataObj);
  } catch (e) {
    console.error("JSON formatting error:", e);
  }
}

fetchStructuredData();

Port Configuration and Environment Variables

By default, Ollama serves its API on port 11434 on localhost (http://127.0.0.1:11434).

If you are running your Node.js application inside a Docker container, the container won't be able to hit the localhost interface of the host machine directly. You must configure the Ollama background daemon on your host system to bind to all network interfaces.

macOS Command:
```
launchctl setenv OLLAMA_HOST "0.0.0.0"
```
Windows/Linux Environment Variable: Set OLLAMA_HOST to 0.0.0.0 in system environment settings, then restart the Ollama application.

Integrating Llama 3.1 Local API with Node.js: Quickstart

Prerequisites and Setup

TypeScript SDK Integration Code

Forcing Structured JSON Outputs

Port Configuration and Environment Variables

Written by Mehmet Demir

Smart Related Articles

Setting Up a Local RAG System with LangChain and Python

Ollama vs. LM Studio: Which is Best for Local LLM Deployments?