NativeLab
Open Source · Runs Offline · Beta

Your AI.
On your machine.
Answering to you.

NativeLab puts powerful language models on your desktop — no subscriptions, no data leaving your device, no internet required. Build visual AI pipelines, connect cloud APIs when you need them, and run any model you choose.

NativeLab Pro v2
model DeepSeek-R1-Q5_K_M
api GPT-4o · connected
pipeline3 blocks · ready
status server :8642

──────────────────────

You › Analyse PDFs and write a report.

Intermediate · block 2/3…
Tokens: 2,841 / 4,096
Speed: 18.4 tok/s

Lab ›
✦ What's new in this release

Pipelines, API models
& a smarter interface

🏗️  Visual Pipeline Builder 💾  Save & Load Pipelines 🌐  Cloud API Models 🎨  Custom Prompt Formats ✨  Animated UI 🔗  Run Pipeline from Chat 📜  Scrollable Sidebar 🖱️  Reference Panel Slide-in

Who it's for

Built for work that
demands privacy and depth

NativeLab is for anyone who thinks carefully, works with sensitive material, or simply wants an AI that doesn't phone home.

🔬
Academic Researchers

Feed NativeLab your papers, drafts, and datasets. Build a pipeline that summarises sources, extracts findings, and drafts a report — all without uploading sensitive research to external servers.

Literature review · Pipeline chains
👩‍💻
Developers

Wire a reasoning model into a coding model on the visual canvas. Use GPT-4o for heavy lifting, a local model for proprietary code. Build once, save the pipeline, and reuse it every session.

Pipeline builder · Local + cloud hybrid
⚖️
Legal & Medical Professionals

Analyse case documents and clinical notes without compliance risk. Everything runs on your hardware by default. When you need extra power, connect an API model — that's always your explicit choice.

Air-gapped · Zero telemetry
📓
Writers & Knowledge Workers

Attach notes, books, and reference documents. Build a pipeline that reads, summarises, and synthesises across all of them. Run against GPT-4o when online, a local model when offline.

Document chat · Reference context
🏗️
Builders & Tinkerers

The visual pipeline editor lets you wire any combination of models, context blocks, and output stages. Add a custom API endpoint, define your own prompt format, and iterate freely on an open codebase.

Open source · Custom prompts · Any endpoint
🌐
Offline & Remote Environments

Field researchers, journalists in restricted regions, or anyone without reliable internet can run full AI inference with zero cloud dependency. Back online? Plug in an API model and scale up instantly.

No internet required · API-ready when online

What you can do

Less setup.
More thinking.

NativeLab handles the complexity of local AI so you can focus on your work. Here's what it unlocks.

01
Talk to your PDFs
Ask questions across one or many documents at once
02
Visual Pipeline Builder
Drag, drop, and wire models together on a live canvas
03
Cloud API Models
Connect GPT-4o, Claude, Groq, Mistral — same interface as local
04
Reasoning → Code Pipeline
Chain a reasoning model into a coding model automatically
05
Reference Library
Attach documents to any chat for persistent, searchable context
06
Pause & Resume Long Jobs
Save mid-summarization and pick up exactly where you left off
07
Export & Session History
Save conversations as JSON, Markdown, or plain text

Talk to your documents

Load one PDF or a whole folder. NativeLab chunks, indexes, and feeds relevant passages into the model automatically — no manual copy-pasting. Ask questions naturally and it finds the right sections.

Multi-PDF mode summarises each document separately, then synthesises a final answer across all of them. Ideal for literature reviews, legal discovery, or comparing research papers.

PDF ingestion Multi-document synthesis Context-aware chunking

Visual Pipeline Builder

A full drag-and-drop canvas where you build AI workflows by connecting blocks. Drop in an Input, wire it through Reference or Knowledge context, pass it to a Model block, carry results through an Intermediate block, and collect everything in the Output.

Every connection is a smooth curve you draw yourself. Build loops that repeat automatically, watch each stage stream live in its own tab, and save any pipeline to reload with one click. Run saved pipelines directly from the chat window — no tab-switching needed.

Drag-and-drop canvas 7 block types Save & reload Loop support

Cloud API Models

Connect to OpenAI, Anthropic, Groq, Mistral, Together AI, OpenRouter, Ollama, or any custom endpoint from the new API Models tab. Pick a provider, paste your key, and click Test & Load. A 1-token check confirms it works before any real request is sent.

Once loaded, the cloud model behaves exactly like a local one — it appears in chat, works inside pipelines, and shows in the status bar. Keys are stored only on your machine. Save configs to reload instantly next session.

8 providers built-in 1-token verification Custom endpoints Saved configs

Reasoning into Code

Assign a reasoning model and a coding model. When you submit a coding task, the reasoning model thinks through intent and architecture first, then passes a concise brief to the coding model to generate the implementation.

You see the full chain of thought in the chat — nothing is hidden. Better code, less back-and-forth. Build this flow visually in the Pipeline Builder or let it trigger automatically from a single prompt.

Two-model chain Visible reasoning trace Auto-detects coding tasks

Reference Library

Build a persistent knowledge base for any chat session. Attach PDFs, notes, or text files as references — they're indexed once and retrieved semantically every time you ask something relevant.

The reference panel slides in smoothly from the right edge when you need it and stays out of the way when you don't. Think of it as giving your AI a bookshelf separate from the active conversation.

Semantic retrieval Persistent index Per-session libraries

Pause & Resume

Summarising a 200-page document? You don't have to sit and wait. NativeLab pauses mid-job, saves exactly which chunks have been processed and what's been generated, and lets you pick up precisely where you left off — even after restarting the app.

State serialization Auto-pause threshold Persistent job queue

Sessions & Exports

Every conversation is a session with its own history, title, and context window usage indicator. Export to JSON for programmatic use, Markdown for notes and documentation, or plain text for anything else.

Nothing is locked in. Your conversations are yours.

JSON · Markdown · TXT Context usage meter Sidebar history

Visual pipeline builder

Chain models together
like building blocks.

No code. No config files. Draw a pipeline on a canvas, hit Run, and watch each stage stream its output live — right inside the app.

PIPELINE CANVAS server ready
INPUT
your prompt
📎REF.
inject docs
🤖MODEL
reason
↻×2
INTER.
pass result
🌐API
cloud model
OUTPUT
final answer
Input — entry point
Model — any loaded engine
Intermediate — carry context
Output — final result
Reference — inject text or PDF
API Model — any cloud endpoint
Draw connections by hand
Drag from any port dot on a block to any port on another. Connections snap as smooth curves. Delete with a click. It feels like a whiteboard, not a settings panel.
Loops that repeat automatically
Connect a block back to an earlier one and NativeLab detects the loop. Set how many times it repeats — the chain runs that many passes, accumulating context each time. Shown as a cyan ↻×N arrow.
Every stage streams live
As the pipeline runs, each Intermediate block gets its own live tab in the right panel. You see every model's tokens arriving in real time. No black boxes, no waiting for a final dump.
Save and reload any pipeline
Name a pipeline and save it. All blocks, connections, and model assignments are stored as a small file on your machine. Reload with one click — everything is exactly as you left it.
Run a pipeline from chat
Hit the 🔗 Pipeline button next to Send. Pick a saved pipeline and it runs on your current message without leaving chat. Each stage appears as its own labelled bubble in the conversation.

Cloud & API models

Local when you need privacy.
Cloud when you need scale.

Connect to any major AI provider — or your own server — with a single form. Once verified, a cloud model works exactly like a local one throughout the whole app.

Eight providers are pre-configured with the right URLs and model lists. Pick one, add your key, and go. Or type any custom URL for your own server.

OpenAI
GPT-4o, o1, o3-mini
Anthropic
Claude 3.5, Haiku
Groq
Llama 3.3, Mixtral
🌊
Mistral
Large, Codestral
🤝
Together AI
Llama, Qwen
🔀
OpenRouter
100+ models
🦙
Ollama
Local server
🔧
Custom
Any endpoint

🎨  Custom prompt formats

Some models need a specific way of wrapping messages. Pick a preset and the fields fill in automatically — or define your own system prompt, user prefix/suffix, and assistant prefix from scratch.

Default ChatML Llama-2 Alpaca Gemma Phi-3 Custom

Adding an API model takes about 30 seconds and it's always tested with one message before anything real is sent — so you know it works before you rely on it.

1
Pick a provider and model

Choose from the list or type any model ID. The base URL fills in automatically for known providers.

2
Paste your API key

Keys are stored only on your machine in a local config file. Nothing is sent anywhere except to the provider you chose.

3
Click Test & Load

NativeLab sends a single 1-token "hi" to verify the connection. Green means ready. Errors appear immediately with the reason so you know exactly what to fix.

4
Use it like any other model

It appears in chat, inside pipelines, and in the status bar. Save the config to reload it instantly next time — no re-entering keys.

Why it exists

AI should be a tool,
not a subscription.

We built NativeLab because powerful inference shouldn't require cloud accounts, billing cycles, or trusting a third party with your data.

01 — PRIVACY

Air-gapped by design

No telemetry. No account required. Everything runs on your CPU and RAM by default. When you choose to connect a cloud API, that's your deliberate decision — nothing dials out without your say-so.

02 — OWNERSHIP

Your models, your rules

Download any GGUF model and drop it in. Connect any API endpoint. Define your own prompt format. You're not locked into one vendor's model family or one company's judgment about what you should be able to ask.

03 — TRANSPARENCY

Open source, fully

The entire codebase is on GitHub. Read it, fork it, audit it. Every parameter — threads, context size, prompt templates, API formats — is exposed in the UI. Nothing happens behind the scenes that you can't see.

04 — RELIABILITY

Works without internet

Field work, travel, restricted networks — NativeLab runs wherever your laptop runs. When online, API models extend your capability without replacing the local foundation.

Getting started

Up and running
in four steps

No Docker, no Python environment, no cloud account required to start. Just download, add a model, and go.

Download NativeLab

Grab the latest Beta release from GitHub. Available for Windows and Linux. Comes bundled with everything it needs — no extra installs.

Add a model

Drop any .gguf file into the /localllm folder — NativeLab detects the family and sets up templates automatically. Or skip this entirely and connect a cloud API model from the API Models tab instead.

Configure to your hardware

Set threads, context size, and generation length in the Config tab. RAM usage and context fill are shown live. Nothing runs until you're ready.

Chat, build, or connect

Open a chat session, build a pipeline on the canvas, attach documents, or connect a cloud API. Everything persists between restarts — sessions, pipelines, API configs, references.

Model compatibility

Works with the models
you already know

NativeLab automatically detects model families from filenames and applies the correct prompt template. No manual configuration needed for supported families.

Any GGUF-format model that runs on llama.cpp will work. The list below shows families with automatic template detection.

DeepSeek Mistral LLaMA 2 / 3 Qwen / ChatML Phi Gemma CodeLlama Falcon Vicuna OpenChat Yi Command-R Starling Neural-Chat + more

NativeLab supports the full range of GGUF quantization levels, so you can balance quality and speed for your hardware.

Q2_K → Q8_0 IQ2 / IQ3 / IQ4 Q4_K_M Q5_K_M F16 / F32 BF16

Not sure which quant to pick? Q4_K_M or Q5_K_M strike the best balance for most desktop hardware.

System requirements

What you
actually need

No GPU required. NativeLab runs on CPU with standard RAM — though more RAM gives you larger models and longer contexts. API models use no local RAM at all.

🖥️ Operating System

  • Windows 10 / 11
  • Linux (most distros)
  • macOS support planned

🧠 Memory (RAM)

  • 8 GB minimum (small models)
  • 16 GB recommended
  • 32 GB for larger models

RAM shown live in-app · API models use zero local RAM

💾 Storage

  • ~200 MB for the app
  • 2–8 GB per GGUF model
  • Space for sessions, pipelines & references

⚡ Processor

  • Any modern multi-core CPU
  • More threads = faster generation
  • GPU acceleration: planned

12 threads default, fully configurable

🌐 Internet

  • Not needed for local models
  • Required only for API model calls
  • No telemetry ever sent

Fully offline with local GGUF models

Already downloaded?

Get your first model
running in 5 minutes.

The setup guide walks you through installing llama.cpp, linking it in the Server tab, and choosing the right model for your hardware — with one-click copy for every HuggingFace repo ID.

Open Setup Guide →
Install llama.cpp
Link in Server tab
Download a model
Feature Comparison Matrix  ·  v2.0  ·  March 2026

NativeLab Pro v2 vs The Field

A comprehensive breakdown of NativeLab Pro v2 against LM Studio, AnythingLLM, Jan AI, and GPT4All — covering core capabilities, hardware support, visual pipeline tooling, and overall developer experience.

v2.0  ·  March 2026  ·  14,000+ LOC  ·  Single Python file
NativeLab Pro v2
LM Studio
AnythingLLM
Jan AI
GPT4All
Full support
partial Partial
Not supported
★ ExclusiveNo equivalent in any compared tool
Feature
NativeLab
Pro v2
LM
Studio
Anything
LLM
Jan
AI
GPT
4All
◈ Core
Fully open source
Runs fully offline
Zero telemetry
No account required
Windows
Linux
macOS
v2.0
Zero-config beginner setupPre-built binaries, double-click to launch
In-app model downloadHuggingFace repo search, quantization badges, live progress
SafeTensor model supportModern .safetensor format alongside GGUF
⚡ GPU & Hardware
GPU auto-detection on startupnvidia-smi, system_profiler, vulkaninfo probed automatically
CUDA accelerationNVIDIA GPU offloading via --ngl flags
Apple Metal accelerationmacOS GPU via system_profiler detection
Vulkan accelerationAMD / Intel / cross-platform via vulkaninfo
Multi-GPU tensor splitRatio strings e.g. 0.6,0.4 across devices
GPU layer control via GUI−1 = all layers, spin-box control, no CLI required
Server launch flags via GUIFull llama.cpp flag exposure without command line
RAM watchdog + auto disk spillPrevents OOM crashes, LRU chunk cache
★ Exclusive
Live RAM usage display
Live context fill indicator
Per-model context size / temperature / sampling control
limited
Code-level inference optimisation6.2 tok/s on $250 APU via algorithmic chunking
★ Exclusive
◎ Model Management
GGUF model support
Auto prompt template detection20+ families detected from filename
Custom prompt formats7 presets + fully custom fields
Quantization format detection + quality labelsK-quants, IQ, legacy, float — colour coded
Specialised model rolesGeneral, Reasoning, Summarization, Coding, Secondary
★ Exclusive
llama-server + llama-cli fallbackAuto-switches if server mode is unavailable
★ Exclusive
Auto coding prompt detection + routing
★ Exclusive
⊕ Server & API
In-app server management
basic
OpenAI-compatible local API
Cloud API supportOpenAI, Anthropic, Groq, Mistral, Together, OpenRouter, Ollama, Custom
8 providers
In-app API connection test1-token verification before use
MCP server management tabAdd / start / stop stdio + SSE servers with live logs
Multi-MCP server supportMultiple simultaneous MCP servers running in parallel
★ Exclusive
limited
⬡ Visual Pipelines
Visual pipeline builder (canvas)Drag, drop, Bézier connections, 8-port blocks, grid snap
★ Exclusive
Pipeline save / loadNamed .json per pipeline, one-click reload
★ Exclusive
Loop / cycle stages with ×N repeatAuto-detected back-edges, dashed badge at midpoint
★ Exclusive
Live token streaming per pipeline stageEach block gets its own live output tab
★ Exclusive
Manual prompt injection between stagesIntermediate block wraps context with custom instruction
★ Exclusive
Drag model from sidebar onto canvasGhost block preview while dragging
★ Exclusive
Branch label badges on connectionsColour-coded TRUE / FALSE rendered along curve
★ Exclusive
Horizontal model pill scroll barClick to jump to any block on large canvases
★ Exclusive
In-app pipeline manualThemed HTML manual, adapts to user colour scheme
★ Exclusive
Pipeline validationRefuses run without INPUT, catches direct MODEL→MODEL
★ Exclusive
♨ Python & LLM Logic Blocks
Python IF / ELSE blockBoolean expression routes TRUE → E / FALSE → W
★ Exclusive
Python SWITCH blockExpression returns string key, routes to matching arm
★ Exclusive
Python FILTER blockGates pipeline — clean [FILTER DROPPED] on false condition
★ Exclusive
Python TRANSFORM blockPrefix, suffix, find-replace, upper, lower, strip, truncate
★ Exclusive
MERGE blockConcat, prepend, append, JSON array from multiple incoming connections
★ Exclusive
SPLIT blockBroadcasts identical context to all outgoing connections
★ Exclusive
Custom Code blockSandboxed Python editor, live syntax check, test runner
★ Exclusive
LLM IF / ELSE — natural language conditionModel answers YES / NO, routes to corresponding branch
★ Exclusive
LLM SWITCH — model classification routingCategories from arrow labels, default fallback arm
★ Exclusive
LLM FILTER — model gates pipelinePASS / STOP with structured explanation on drop
★ Exclusive
LLM TRANSFORM — model rewrites contextAuto-strips preamble, output-only system prompt
★ Exclusive
LLM SCORE — rates 1–10, routes LOW / MID / HIGHScore label arm receives raw numeric value downstream
★ Exclusive
Vector search + RAG block in pipelineSemantic retrieval as a first-class pipeline node
★ Exclusive
▤ Documents & References
PDF / document chat (RAG)
basic
Multi-PDF cross-document synthesis
Structured source code parsing (AST)22 languages: Python, JS, TS, Rust, Go, C/C++, Java, SQL…
★ Exclusive
basic
Keyword-ranked chunk retrievalExact phrase match bonus, top-K injection
vector
Per-session reference store (persists across restarts)
Script reference panel with AST detail paneFunction / class list, dedicated Scripts tab
★ Exclusive
▦ Summarization
Chunked long-document summarizationParagraph-boundary splits, context carryover between chunks
★ Exclusive
basic
Pause / resume long jobsFull state serialised every 3 chunks
★ Exclusive
Final consolidation pass (dedicated engine)Reasoning / summarization model for synthesis step
★ Exclusive
Multi-PDF with cross-document synthesisPer-file summaries → unified theme analysis
★ Exclusive
basic
▣ Sessions & Export
Session history persistence
limited
Export (JSON / Markdown / plain text)
Session search, date grouping, rename, delete
◈ UI & Experience
Dark theme
Appearance tab (theme customisation)
Markdown rendering in chat
Syntax highlighting in code blocks
Smooth UI (no flicker, no tearing)Custom QPainter, well-isolated threading model
Code copy buttons in chat bubbles
Collapsible long messagesAuto-collapse at 260 px, show / hide toggle
Thinking / progress block (summarization)Collapsible section log, turns green on completion
★ Exclusive
Pipeline intermediate chat bubblesAmber labelled bubbles per stage in chat view
★ Exclusive
Colour-coded log console
Keyboard shortcuts
limited
limited
⬡ Agents & Extras
Visual agentic pipeline builderFull reasoning agent via LLM logic blocks, no code required
★ Exclusive
AI agents / tool use (MCP)
basic
Web search (built-in)
Multi-user / team support
Docker
Multimodal (image input)
Built-in benchmarking
IPC (Inter-Process Communication)Other applications on the same machine can communicate with NativeLab
⚙ In progress
v2.0 Release — March 2026  ·  GPU auto-detection · CUDA · Apple Metal · Vulkan · Multi-GPU tensor split · HuggingFace model download · SafeTensor support · Multi-MCP server management · Vector RAG pipeline block · 7 Python logic blocks · 5 LLM logic blocks · Visual agentic pipelines · Ghost drag-drop preview · Pill scroll bar · Branch label badges · In-app pipeline manual · macOS binaries · 22 bug fixes.
NativeLab Pro v2
62+
features tracked
Best for power users, pipeline builders, agentic AI workflows, privacy-critical environments, and long-document research on modest hardware.
Open Source · Free
LM Studio
28
features tracked
Best for beginners, multi-GPU setups, Apple Silicon, model discovery, and benchmarking.
Proprietary
AnythingLLM
26
features tracked
Best for teams, agent workflows, web search, vector RAG, and Docker deployment.
Open Source
Jan AI
18
features tracked
Best for simple clean chat, Apple Silicon, and connecting to multiple cloud APIs.
Open Source
GPT4All
14
features tracked
Best for absolute beginners, Windows / Linux offline chat, and zero-setup deployments.
Open Source

Open source · Free forever

Own your AI.
Start today.

Download NativeLab Beta and bring local AI to your workflow. Build pipelines, connect cloud APIs, or run fully offline. No account. No credit card. No data leaving your machine.

↓  Download Linux ↓  Download Windows ↓  Download MacOS View on GitHub →