Welcome Banner

Pavel Bahdanau

← Back to list

emuV v2.0 - Kernel VRAM emulator with auto spillover for GeForce

29.11.2025

Field notes on turning a vanilla RTX 4060 into a proper LLaMA 30B workhorse: kernel driver, automatic GPU->RAM spillover, monitoring toolkit, real benchmarks, and the roadmap.

https://github.com/bogdanovby/emuv

Why virtual VRAM matters in 2025

Every modern ML experiment hits the VRAM wall. RTX 4060 ships with 8 GB, RTX 4070 Ti has 12 GB, while LLaMA 13B in fp16 already wants 24 GB. An A100 80G still costs like a used Tesla, so we went the opposite way: write a driver that gently glues VRAM and system RAM together.

The pain

GeForce cards are VRAM-starved and have no official spillover like Hopper.

The goal

Expose a "virtual" 25-30 GB GPU to PyTorch without patching the framework.

The solution

emuV, a kernel module that puts VRAM first and automatically pours overflow into RAM.

Signature features in emuV v2.0

  • Auto-detect any NVIDIA GeForce via PCI scan (vendor 0x10DE) - no hardcoded device IDs.
  • Priority memory hierarchy: VRAM (Priority 1) -> system RAM (Priority 2) with lazy allocation.
  • sysfs + char device interface: /sys/class/emuv/emuv/vram_info.
  • Tooling: emuv-top (nvtop-style).
  • Real LLMs: GPT-2 XL, LLaMA 13B/30B (8-bit and 4-bit) validated on 7.5 GB VRAM + 20 GB RAM.

Architecture

Version 2.0 focuses on predictable behavior: VRAM is used while it lasts, PyTorch catches cudaErrorMemoryAllocation, and moves the next chunk to RAM. emuV keeps live stats on both tiers.

What fits on an RTX 4060 (7.75 GB)

Model VRAM RAM spillover
GPT-2 -
LLaMA 13B (8-bit) 7 GB 3 GB
LLaMA 30B (4-bit) 7 GB 8 GB

Other products I build

© Pavel. All rights reserved.

Made with simple HTML, CSS and JavaScript

Last update: