Install the backend engine, connect it to NativeLab, then download a model that fits your hardware. Takes about five minutes.
Video Walkthrough
See every step demonstrated live — from downloading llama.cpp to running your first conversation locally. No cloud, no account, everything on your machine.
Step 1 of 3
NativeLab uses llama.cpp as its inference engine. You need to download the pre-built release that matches your operating system and hardware.
Scroll down past the changelog to find the Assets section. Expand it if collapsed, then download the file matching your system below.
llama-bXXXX-bin-win-avx2-x64.zip.exe filesllama-cli.exe and llama-server.exellama-bXXXX-bin-macos-arm64.zip (M-series) or x64 (Intel)llama-cli and llama-serverllama-bXXXX-bin-ubuntu-x64.tar.gztar -xzf llama-bXXXX-bin-ubuntu-x64.tar.gzchmod +x llama-cli llama-server/home/user/llama/llama-cliHardware capability guide
Your CPU generation determines which binary flags are supported. When in doubt, AVX2 covers the vast majority of hardware made after 2013.
Step 2 of 3
Open NativeLab Pro and navigate to the Server tab. You'll browse to the two binaries you extracted in Step 1.
Understanding the active mode
NativeLab can talk to llama.cpp in two ways. The indicator below shows which one is currently active and what it means for your workflow.
| Windows | C:\Users\you\Downloads\llama\llama-cli.exe |
| macOS | /Users/you/Downloads/llama/llama-cli |
| Linux | /home/you/llama/llama-cli |
Once connected
After setting both paths, NativeLab will start llama-server automatically in the background. The green status indicator in the Server tab confirms it's running. You're now ready to load a model.
Step 3 of 3
Switch to the Download tab inside NativeLab. Paste a HuggingFace repo ID, click Search, choose a quantization, and download directly to your models folder.
SEARCH REPOSITORY
Found 8 GGUF file(s).
AVAILABLE FILES
Quantization guide
Quantization compresses model weights. Lower = smaller file and less RAM needed, but slightly lower quality. Q4_K_M is the sweet spot for most setups.
| Format | Quality | RAM (7B model) | Best for |
|---|---|---|---|
| Q2_K | ~3 GB | 8 GB RAM, speed only | |
| Q3_K_M | ~3.9 GB | 8 GB RAM, better than Q2 | |
| Q4_K_M Recommended | ~4.8 GB | 16 GB RAM · best balance | |
| Q5_K_M Recommended | ~5.7 GB | 16–32 GB RAM | |
| Q6_K | ~6.6 GB | 32 GB RAM | |
| Q8_0 | ~8.7 GB | 32 GB+ RAM |
Recommended repos by hardware
Click Copy on any repo ID below and paste it directly into NativeLab's Download tab search field.
C:\NativeLabPro\localllm on Windows (or the equivalent path on Mac/Linux). Once downloaded, they appear automatically in the Models tab and can be loaded from there.
You're all set
Switch to the Chat tab, type a message, and the model will respond. Everything runs locally — no cloud, no account, no data leaving your machine.