NativeLab puts powerful language models on your desktop — no subscriptions, no data leaving your device, no internet required. Build visual AI pipelines, connect cloud APIs when you need them, and run any model you choose.
Who it's for
NativeLab is for anyone who thinks carefully, works with sensitive material, or simply wants an AI that doesn't phone home.
Feed NativeLab your papers, drafts, and datasets. Build a pipeline that summarises sources, extracts findings, and drafts a report — all without uploading sensitive research to external servers.
Literature review · Pipeline chainsWire a reasoning model into a coding model on the visual canvas. Use GPT-4o for heavy lifting, a local model for proprietary code. Build once, save the pipeline, and reuse it every session.
Pipeline builder · Local + cloud hybridAnalyse case documents and clinical notes without compliance risk. Everything runs on your hardware by default. When you need extra power, connect an API model — that's always your explicit choice.
Air-gapped · Zero telemetryAttach notes, books, and reference documents. Build a pipeline that reads, summarises, and synthesises across all of them. Run against GPT-4o when online, a local model when offline.
Document chat · Reference contextThe visual pipeline editor lets you wire any combination of models, context blocks, and output stages. Add a custom API endpoint, define your own prompt format, and iterate freely on an open codebase.
Open source · Custom prompts · Any endpointField researchers, journalists in restricted regions, or anyone without reliable internet can run full AI inference with zero cloud dependency. Back online? Plug in an API model and scale up instantly.
No internet required · API-ready when onlineWhat you can do
NativeLab handles the complexity of local AI so you can focus on your work. Here's what it unlocks.
Load one PDF or a whole folder. NativeLab chunks, indexes, and feeds relevant passages into the model automatically — no manual copy-pasting. Ask questions naturally and it finds the right sections.
Multi-PDF mode summarises each document separately, then synthesises a final answer across all of them. Ideal for literature reviews, legal discovery, or comparing research papers.
A full drag-and-drop canvas where you build AI workflows by connecting blocks. Drop in an Input, wire it through Reference or Knowledge context, pass it to a Model block, carry results through an Intermediate block, and collect everything in the Output.
Every connection is a smooth curve you draw yourself. Build loops that repeat automatically, watch each stage stream live in its own tab, and save any pipeline to reload with one click. Run saved pipelines directly from the chat window — no tab-switching needed.
Connect to OpenAI, Anthropic, Groq, Mistral, Together AI, OpenRouter, Ollama, or any custom endpoint from the new API Models tab. Pick a provider, paste your key, and click Test & Load. A 1-token check confirms it works before any real request is sent.
Once loaded, the cloud model behaves exactly like a local one — it appears in chat, works inside pipelines, and shows in the status bar. Keys are stored only on your machine. Save configs to reload instantly next session.
Assign a reasoning model and a coding model. When you submit a coding task, the reasoning model thinks through intent and architecture first, then passes a concise brief to the coding model to generate the implementation.
You see the full chain of thought in the chat — nothing is hidden. Better code, less back-and-forth. Build this flow visually in the Pipeline Builder or let it trigger automatically from a single prompt.
Build a persistent knowledge base for any chat session. Attach PDFs, notes, or text files as references — they're indexed once and retrieved semantically every time you ask something relevant.
The reference panel slides in smoothly from the right edge when you need it and stays out of the way when you don't. Think of it as giving your AI a bookshelf separate from the active conversation.
Summarising a 200-page document? You don't have to sit and wait. NativeLab pauses mid-job, saves exactly which chunks have been processed and what's been generated, and lets you pick up precisely where you left off — even after restarting the app.
Every conversation is a session with its own history, title, and context window usage indicator. Export to JSON for programmatic use, Markdown for notes and documentation, or plain text for anything else.
Nothing is locked in. Your conversations are yours.
Visual pipeline builder
No code. No config files. Draw a pipeline on a canvas, hit Run, and watch each stage stream its output live — right inside the app.
Cloud & API models
Connect to any major AI provider — or your own server — with a single form. Once verified, a cloud model works exactly like a local one throughout the whole app.
Eight providers are pre-configured with the right URLs and model lists. Pick one, add your key, and go. Or type any custom URL for your own server.
Some models need a specific way of wrapping messages. Pick a preset and the fields fill in automatically — or define your own system prompt, user prefix/suffix, and assistant prefix from scratch.
Adding an API model takes about 30 seconds and it's always tested with one message before anything real is sent — so you know it works before you rely on it.
Choose from the list or type any model ID. The base URL fills in automatically for known providers.
Keys are stored only on your machine in a local config file. Nothing is sent anywhere except to the provider you chose.
NativeLab sends a single 1-token "hi" to verify the connection. Green means ready. Errors appear immediately with the reason so you know exactly what to fix.
It appears in chat, inside pipelines, and in the status bar. Save the config to reload it instantly next time — no re-entering keys.
Why it exists
We built NativeLab because powerful inference shouldn't require cloud accounts, billing cycles, or trusting a third party with your data.
No telemetry. No account required. Everything runs on your CPU and RAM by default. When you choose to connect a cloud API, that's your deliberate decision — nothing dials out without your say-so.
Download any GGUF model and drop it in. Connect any API endpoint. Define your own prompt format. You're not locked into one vendor's model family or one company's judgment about what you should be able to ask.
The entire codebase is on GitHub. Read it, fork it, audit it. Every parameter — threads, context size, prompt templates, API formats — is exposed in the UI. Nothing happens behind the scenes that you can't see.
Field work, travel, restricted networks — NativeLab runs wherever your laptop runs. When online, API models extend your capability without replacing the local foundation.
Getting started
No Docker, no Python environment, no cloud account required to start. Just download, add a model, and go.
Grab the latest Beta release from GitHub. Available for Windows and Linux. Comes bundled with everything it needs — no extra installs.
Drop any .gguf file into the /localllm folder — NativeLab detects the family and sets up templates automatically. Or skip this entirely and connect a cloud API model from the API Models tab instead.
Set threads, context size, and generation length in the Config tab. RAM usage and context fill are shown live. Nothing runs until you're ready.
Open a chat session, build a pipeline on the canvas, attach documents, or connect a cloud API. Everything persists between restarts — sessions, pipelines, API configs, references.
Model compatibility
NativeLab automatically detects model families from filenames and applies the correct prompt template. No manual configuration needed for supported families.
Any GGUF-format model that runs on llama.cpp will work. The list below shows families with automatic template detection.
Quantization formats
NativeLab supports the full range of GGUF quantization levels, so you can balance quality and speed for your hardware.
Not sure which quant to pick? Q4_K_M or Q5_K_M strike the best balance for most desktop hardware.
System requirements
No GPU required. NativeLab runs on CPU with standard RAM — though more RAM gives you larger models and longer contexts. API models use no local RAM at all.
RAM shown live in-app · API models use zero local RAM
12 threads default, fully configurable
Fully offline with local GGUF models
Already downloaded?
The setup guide walks you through installing llama.cpp, linking it in the Server tab, and choosing the right model for your hardware — with one-click copy for every HuggingFace repo ID.
A comprehensive breakdown of NativeLab Pro v2 against LM Studio, AnythingLLM, Jan AI, and GPT4All — covering core capabilities, hardware support, visual pipeline tooling, and overall developer experience.
Open source · Free forever
Download NativeLab Beta and bring local AI to your workflow. Build pipelines, connect cloud APIs, or run fully offline. No account. No credit card. No data leaving your machine.