emuV v2.0 - Kernel VRAM emulator with auto spillover for GeForce
29.11.2025
Field notes on turning a vanilla RTX 4060 into a proper LLaMA 30B workhorse: kernel driver, automatic GPU->RAM spillover, monitoring toolkit, real benchmarks, and the roadmap.
https://github.com/bogdanovby/emuv
Why virtual VRAM matters in 2025
Every modern ML experiment hits the VRAM wall. RTX 4060 ships with 8 GB, RTX 4070 Ti has 12 GB, while LLaMA 13B in fp16 already wants 24 GB. An A100 80G still costs like a used Tesla, so we went the opposite way: write a driver that gently glues VRAM and system RAM together.
GeForce cards are VRAM-starved and have no official spillover like Hopper.
Expose a "virtual" 25-30 GB GPU to PyTorch without patching the framework.
emuV, a kernel module that puts VRAM first and automatically pours overflow into RAM.
Signature features in emuV v2.0
- Auto-detect any NVIDIA GeForce via PCI scan (vendor 0x10DE) - no hardcoded device IDs.
- Priority memory hierarchy: VRAM (Priority 1) -> system RAM (Priority 2) with lazy allocation.
- sysfs + char device interface:
/sys/class/emuv/emuv/vram_info. - Tooling:
emuv-top(nvtop-style). - Real LLMs: GPT-2 XL, LLaMA 13B/30B (8-bit and 4-bit) validated on 7.5 GB VRAM + 20 GB RAM.
Architecture
Version 2.0 focuses on predictable behavior: VRAM is used while it lasts, PyTorch catches cudaErrorMemoryAllocation, and moves the next chunk to RAM. emuV keeps live stats on both tiers.
What fits on an RTX 4060 (7.75 GB)
| Model | VRAM | RAM spillover |
|---|---|---|
| GPT-2 | ✓ | - |
| LLaMA 13B (8-bit) | 7 GB | 3 GB |
| LLaMA 30B (4-bit) | 7 GB | 8 GB |
Other products I build
- AppLikeWeb - applikeweb.com
- MegaV VPN - megav.app
- PinVPS - pinvps.com