Georgy Gerganov, Author of llama.cpp and the Acoustic Keylogger

A profile of Bulgarian developer Georgi Gerganov, whose open-source llama.cpp library powers most local LLM inference but remains overshadowed by wrappers like Ollama. The article explores his 30+ projects spanning AI inference, acoustic keylogging, sound-based data transmission, and more.

Many users know Ollama as a tool for running local AI models, unaware that it is essentially a simple wrapper around the open-source llama.cpp C library that performs the actual inference. Despite this fundamental dependence, Ollama receives credit that should go to Georgi Gerganov's underlying engine.

The Ollama Attribution Problem

Meta's recent announcement of multimodal LLaMA support thanked "partners in the AI community" including Ollama but failed to mention llama.cpp. Similarly, VSCode's GitHub Copilot local model support credits Ollama rather than the underlying engine doing the actual work. Gerganov ironically noted this oversight but did not voice complaints.

More seriously, Ollama violates MIT license terms by failing to credit llama.cpp authors. The broader LLM enthusiast community criticizes Ollama for:

Minimal contribution back to the parent project
Misleading marketing claims suggesting full ChatGPT-scale models run on phones (actually only tiny models with slow inference)
Poor default configurations (originally 2048-token context windows, inadequate for most tasks)
Improper model naming, marketing inadequate distillates as full LLaMA models
Forking open protocols like OCI while making them incompatible with standard tools

llama.cpp: The Core Technology

Gerganov began development in September 2022, following his earlier whisper.cpp project for OpenAI's Whisper speech recognition model. llama.cpp is a C/C++ library enabling LLM inference on CPUs without specialized hardware. It allows modern large language models to run on regular computers and Android devices without requiring specialized GPUs.

The library leverages GGML, a universal tensor algebra library inspired by Fabrice Bellard's LibNC. Key technical achievements include:

Multi-platform support: x86, ARM, CUDA, Metal, Vulkan, and SYCL
Quantization: Pre-quantization using specialized instruction sets (AVX, AVX2, AVX-512 for x86; Neon for ARM)
GGUF format (introduced August 2023): Universal binary format storing tensors and metadata, supporting 2-8 bit quantization, standard floating-point formats, and even exotic 1.56-bit quantization

In March 2024, hacker Justin Tanny contributed optimized matrix multiplication kernels improving FP16 and 8-bit quantized performance. Tanny also created llamafile, which combines models and llama.cpp into single cross-platform executables.

Quantization and the GGUF Format

Quantization in neural networks encodes parameters as 8-bit (or lower) integers, dramatically reducing memory requirements and accelerating computation on resource-constrained devices. GGUF's format design prioritizes fast model loading while supporting various precision levels essential for edge deployment.

The Acoustic Keylogger and Other Projects

Gerganov has created over 30 projects spanning a remarkable range of domains, demonstrating extraordinary productivity and creativity.

AI/ML Projects

whisper.cpp: High-performance inference for OpenAI's Whisper speech recognition model
GPT-J: CPU inference implementation
keytap2 & keytap3: Acoustic keyboard eavesdropping via frequency n-grams — the "acoustic keylogger" referenced in the title. These tools analyze the sound of keyboard typing to determine which keys were pressed

Audio and Communications

ggwave: Sound-based data transmission library
Waver: Ultrasonic file and message exchange (iOS and Android apps available)
r2t2: Data transmission through computer speakers
wave-share: Browser-based file sharing via sound
Spectrogram: Real-time audio spectrum visualization
GGMorse: Real-time Morse code decoding from audio

Text Interfaces

ImTui: Immediate-mode text UI library
slack (tui): Terminal Slack client
hnterm: Hacker News console viewer
wtf-tui: Dashboard utility interface

Games and Interactive Projects

hnguessr: Guess Hacker News headlines game
typing-battles: Multiplayer typing competition
keytap-challenge: Guess typed text challenge
the-story: Collaborative word-voting narrative
wordle-bg: Bulgarian Wordle clone
Diff Challenge: Bash-based puzzle game

Miscellaneous

@tweet2btc: Bitcoin price prediction via Twitter polls
@tweet2doom: Twitter bot playing Doom
morse-meme: Morse code meme generator
dot-to-ascii: Graphviz to ASCII converter
lottery-check: Bulgarian lottery statistics tool
ImGui-WS: Dear ImGui over WebSockets

Georgi Gerganov remains one of the most impactful yet under-recognized developers in the modern AI ecosystem. His work forms the invisible foundation upon which millions of users interact with large language models locally, even if most of them have never heard his name.

Georgy Gerganov, Author of llama.cpp and the Acoustic Keylogger

The Ollama Attribution Problem

llama.cpp: The Core Technology

Quantization and the GGUF Format

The Acoustic Keylogger and Other Projects

AI/ML Projects

Audio and Communications

Text Interfaces

Games and Interactive Projects

Miscellaneous

Further reading

Why Airships Never Took Off. Part 12: Italian Semi-Rigid Airships

Why Airships Never Took Off. Part 11: Aircraft Carriers in the Sky

Why Airships Never Took Off. Part 10: The Most Famous and Successful Zeppelin

Why Airships Never Took Off. Part 9: Ashes of War and New Opportunities