How Open Source Is Made: Personal Experience

The author of sane-airscan and ipp-usb — two packages included in virtually every Linux distribution — shares a candid account of building, debugging, and promoting open source software that became the standard for driverless scanning on Linux.

I am the author of two packages that are included in more or less every Linux distribution: sane-airscan and ipp-usb.

Additionally, sane-airscan is included in all major BSD distributions (FreeBSD, NetBSD, and OpenBSD) and in ChromeOS. They didn't take ipp-usb into ChromeOS because it's written in Go, and they have very strict requirements on executable file sizes. Instead, they wrote their own in Rust, but would have preferred to take my project if they could. Very recently a port of ipp-usb to FreeBSD appeared, and other BSDs will likely follow soon.

Together, these two packages form the stack for "driverless" document scanning on Linux and *BSD, and in the perspective of a few years, when old scanners finally die out, there probably won't be any other drivers left.

Additionally, ipp-usb makes "driverless" printing on USB devices possible.

Here I want to share what it's like to be the author of popular open source packages. While this work didn't bring me much money (which I wasn't particularly counting on anyway), it brought me invaluable experience.

Overall, I believe promoting open source packages is structurally similar to bringing software products to market. Engaging in this activity, you begin to understand very well the difference between (1) writing a program that works for me, (2) writing a program that can be called a product, and (3) bringing a product to market.

The first takes much less time than the second. The second takes much less time than the third.

How It All Started

It all started quite mundanely: I bought myself a multifunction printer with a scanner that wouldn't scan under Linux.

After looking around a bit, I learned that my scanner's protocol is called eSCL, that its specification isn't published, but a reverse engineering of this protocol exists. The protocol is generally not complex, and there was even a Python script that, however, didn't really work with my device.

The protocol itself consists of device discovery using DNS-SD (also known as mDNS, Rendezvous, Bonjour — this protocol has been blessed with many names) and communicating with the device through a combination of HTTP with straightforward XML inside.

I looked at what a SANE driver is and thought, "you're a programmer," why not try writing a driver myself. The task, moreover, looked not at all difficult.

So I sat down and started writing. Beginning, as proper, with the build system, logging, general driver infrastructure, configuration file parser, etc. — not with quickly writing a couple of requests using libcurl and then somehow bringing it to a state resembling a SANE driver. I'm a professional programmer, after all, not a third-year student :)

I had to quickly abandon libcurl, by the way. A SANE driver, by its nature, is a shared object, a DLL that runs in the context of an application. Any application. For example, one as large and complex as LibreOffice. And libcurl has static state that must be explicitly initialized before use and cleaned up afterward. If two DLLs using libcurl end up in the same application context, they can easily fight with each other.

In the process, it turned out I had acquired competitors — students from France with a driver called sane-escl, who had beaten me by literally two weeks. I even considered abandoning the project, but after looking at their source code, I decided to continue anyway.

The students, meanwhile, were accepted into the SANE project — where I had originally planned to get in. The SANE project was really lacking an eSCL driver. And I had to fight my way in.

What Is "Driverless" Scanning and Printing

Here I should explain a bit about what "driverless" scanning and printing means, and what a driver for a driverless scanner is.

If anyone remembers, all printers used to be different. And each required its own driver. Drivers, naturally, existed only for Windows, and users of other operating systems had to carefully choose a device that would work under their OS.

Microsoft was apparently fine with the hassle of dealing with hardware vendors and their drivers, but Apple at some point got tired of it. And Apple took the position: printers with IPP support work on Mac out of the box, and for everything else, manufacturers are free to supply drivers if they want — it's not our problem. Considering how spoiled Mac users are by things "just working," their large numbers by the time this happened, and their purchasing power, it's not surprising that this technology won. And now more or less all printers somewhat support IPP.

This is called "driverless" printing — in the sense that it's not the driver adapting to the device, but the device adapting to the driver, which implements a single, universal protocol.

Then Apple did the same thing with scanners. In their terminology it's called Bonjour Scanning, which is part of the unpublished Bonjour Printing 1.4 (only available under NDA — they didn't give it to me, but I didn't really ask). Unofficially it's called Apple Airscan (by analogy with AirPrint — the earlier name for Bonjour Printing), and in narrower circles it's known as eSCL. The author of the protocol itself appears to be HP, but that's not certain.

Not all scanners support eSCL yet, but things are clearly heading that way.

Since HTTP implies a network, and some devices don't have networking — only USB — something had to be done about that too. As a result, the organization responsible for USB standardization invented a protocol called IPP over USB, which would more accurately be called HTTP over USB, because that's what it is: an HTTP request is sent directly to a USB endpoint, and an HTTP response comes back. Everything HTTP-based works through it: printing via IPP (which uses HTTP transport), scanning via eSCL, and even the printer's web console amazingly works through it too.

And it's not entirely simple there either, because HTTP is fundamentally designed for TCP. A TCP connection can be closed at any moment, and the server will correctly understand that the client isn't interested in continuing the request. But USB provides no indication that the client has disconnected, so once a client starts sending a request, it must be sent to completion and the response must be read. Otherwise, a piece of an unsent request or a piece of an unreceived response will get stuck in the USB buffers, and synchronization between the host and device will be lost. And it's not easy to restore afterward. Different devices react differently to USB reset, for example...

Continuation of the sane-airscan Story

And so at some point my driver reached a state where I wasn't ashamed to show it to people (I'm one of those people who doesn't like showing or announcing unfinished work).

So I, all naive, filed bug #202 in sane-backends with a proposal to throw out sane-escl and replace it with my creation.

You can imagine what happened next...

The main maintainer of the SANE project asked the folks who wrote sane-escl what they thought of my creation. Well, you can guess what they said (they also had a test base of about a dozen different devices, while I had one. So they naturally found incompatible devices in their test base and only wrote about those). In the end, a market share war broke out, the result of which was:

  1. sane-airscan was not included in the SANE distribution. At first they wouldn't take it, and later I no longer wanted to be included.
  2. sane-airscan gained support for another "driverless" protocol — WSD from Microsoft.
  3. sane-airscan was included in all Linux distributions.
  4. sane-escl is part of SANE, but is disabled in most distributions.

Ah, if I had been a more adept politician, I would have proposed putting both side by side rather than replacing one with the other. Maybe I wouldn't have had to write WSD support :)

Ultimately, the fact that my project exists separately from SANE is good for me. I have my own release cycle, as convenient for me, and all the glory goes to me :)

One consequence of the failed attempt to join SANE was that sane-airscan was released under SANE's license (GPL with exceptions, effectively LGPL). I don't like this license — I consider it too greedy — and if I hadn't been looking at SANE, I would have used 2-clause BSD.

Users actually benefited though. As it turned out, roughly 1/3 of devices for which I received user responses (and I don't have my own test base — all debugging is done by correspondence) support only WSD but not eSCL.

How many users do I have in total? I don't know. The list currently has 170 devices. Behind each one is some story. For the most part, people come to say they have some problem. Many problems were resolved without my involvement — I was just told that another device started working. Some users came simply to say thanks. A few even sent PRs with their device added to the list — I never refused to merge them.

I think the number of users who silently use the software without knowing I exist is in the thousands. Possibly tens of thousands. How to find out, I have no idea.

Promotion

Promotion mainly consisted of me periodically visiting English-language Linux user forums and advising people with problems to try my driver.

To make this possible, I had to learn to build packages for the major distributions. Because for a normal user, the idea of building something yourself is roughly equivalent to performing a heroic feat (though there are exceptions).

For this purpose, the openSUSE Build Service exists, which surprisingly can build packages for a bunch of distributions, not just SUSE. But figuring it out isn't easy — its documentation is rather peculiar. But once you get it working, everything runs smoothly, and you don't need to teach users how to build software.

Another important component of success in promotion is supporting users with problems. This requires a lot of patience. Not all users have sufficient technical skills. For some, the request to send a specific file from a specific directory is a non-trivial task. Some show excessive zeal. They'll send you a log from which almost everything is already clear, and you just need a little more verification, but while you're writing back, they've reinstalled the system three times, changed distributions, and stuck their hands into various unexpected places, and you have to start all over.

The first distribution to accept my driver was Arch. Then Ubuntu, Debian, and Fedora followed. Then it snowballed on its own.

Russian distributions (Alt and Astra) came very late, with a significant delay compared to their Western counterparts. I didn't track the others and have no idea what's happening there.

Interestingly, new packages get into Ubuntu through Debian. They actively interact with each other, and Ubuntu strongly prefers to base on Debian's codebase. A package will bypass Debian and go directly into Ubuntu only if Ubuntu really wants the package but Debian doesn't want it for some reason.

Also interesting is the difference in approaches. Debian/Ubuntu did a very thorough manual code review, while Fedora ran it through a static analyzer. The rest of the distributions didn't bother and took it as-is.

It's also worth noting that such scrutiny only happens during the initial inclusion of a package in a distribution. After that, nobody really looked at my patches anymore — except out of curiosity.

Debugging by Correspondence

From the very beginning, I understood that I would have to deal with incompatible devices in the hands of unqualified users. So I immediately took care of a quality logging system. The logs should make everything that's happening clear. Of course, this didn't work out perfectly right away — sometimes situations arose where something was happening but the logs didn't show the details, and I had to add what was missing.

That said, the idea of writing absolutely everything to logs, sometimes the same thing five times (because at different levels), is questionable. Such logs are very hard to read; they confuse you with an excess of unnecessary details.

Ideally, the log should reflect all branching points in the program and the information on which the branching decision was based. And preferably without gaps and without repetition.

The multi-year experience of debugging through logs rather than debuggers leads to the formation of a particular code organization style. You try to develop the main storylines as simply and straightforwardly as possible. Otherwise, it becomes very difficult to figure things out later. I believe this substantially improves and structures the code.

There's great value in logging your program's configuration and system configuration relevant to the issue, hardware details, and so on — this is far simpler and more reliable than trying to gather this information by asking the user questions (some of which will be incomprehensible to them or misinterpreted).

Since I assumed from the start that some problems would involve decoding images received from the scanner, the obtained images also needed to go somewhere. Writing them into the main log is inconvenient — they're large and binary. Creating lots of separate files means you'll exhaust yourself explaining to the user exactly what you need from all of it.

In the end, I decided to write a .tar file alongside the .log file, putting received images and responses I couldn't decode into it. The .tar format is very simple, and generating it directly from the program is quite easy. Essentially, .tar is a sequence of files where each file gets a straightforward header followed by the file body.

What Problems I Had to Deal With

Mostly problems in firmware. Printer firmware is surprisingly buggy. An insignificant whitespace character in XML will be ignored by some devices, cause a response with an obscure error from others, and send a third group of devices into a reboot (I'm not joking).

Enterprise-class devices are no less buggy than budget consumer devices. Moreover, they can also go to sleep and take half a minute to wake up, behaving strangely while half-asleep, and this has to be accounted for.

Some HP devices insist, for some reason, that the Host: field in the HTTP request says "localhost." Otherwise, they don't work. What is this — a naive protection against network access? Unclear. How did I figure out that's what they wanted? I don't know — intuition kicked in. But I figured it out somehow.

Due to the nature of SANE, immediately after returning from the function that starts the scanning process, the driver must correctly answer questions about the actual image parameters (not the requested ones — they can differ). sane-escl solves this problem by hanging in the scan-start request until it's actually complete, but I took a more complex path: my driver returns from the scan-start request immediately, speculatively returns the requested parameters as the actual ones, and if the image doesn't match them, adjusts the actual image to conform to the promised parameters.

Thanks to this, cancel works correctly in my driver.

Some devices send an image larger than requested, rounding dimensions up to a size convenient for them — I had to implement cropping. This mechanism later came in handy when it turned out that in WSD mode, some devices completely ignore the requested size.

Some devices, when asked to do a black-and-white scan, may respond with an RGB file. Which might contain a black-and-white image but in RGB, or it might actually be in color. I had to implement color-to-grayscale conversion. Later I generalized this mechanism a bit and added in-software brightness and contrast control (though I should do in-hardware for devices that support it, because some of them implement brightness control by adjusting the backlight brightness rather than recalculating the sensor signal, which gives better results on poor images).

Recently I encountered a device that advertised PNG support but actually sent JPG. I had to implement format auto-detection by file signature.

Some devices claim the ability to scan at high resolution but don't actually implement it. Apparently, they don't have enough memory. I have to recognize them and not trust their promises — my choice to return image parameters before actually receiving the image has its advantages but also its costs.

WSD Support

I deliberated for a long time over whether it was worth getting into this. But I wanted to have an indisputable competitive advantage over sane-escl.

Microsoft, as you know, is its own special universe, filled with its own stars and planets, inhabited by its own special humanity. And all their network protocols are their own too.

The idea of driverless printing and scanning didn't pass them by, but they did everything their own way.

It's called WSD. Web Services for Devices. WS-Print and WS-Scan. What "Web" has to do with it isn't entirely clear — nobody is going to expose their own printer to the Internet; it wouldn't survive long. But Microsoft, for some reason, pushed WSD through as W3C standards. And there are two dialects: one described on MSDN, and another in the W3C standards. They're sort of the same, but the URLs that designate XML namespaces are different. My printer understands both variants — I haven't tested others. But naturally, it's better to implement the variant that works in Windows. Obviously, printer manufacturers never tested their products with any other variant.

WSD is an entire world of its own. They don't even use DNS-SD for device discovery in the network — they have their own, based on XML documents broadcast via UDP multicasts.

And unlike DNS-SD, there was no ready-made service like that in Linux, so I had to write one myself, right inside the driver.

Ultimately, as it turned out, roughly 1/3 of devices that understand WSD don't understand eSCL. So it was worth it — I opened Linux support for many people's devices. Although in the longer term, WSD is probably a dying protocol. For printing, it was never really used because IPP existed; for scanning, Windows 11 reportedly finally added an eSCL client. So hardware manufacturers seemingly have no particular motivation to support WSD anymore. Well, maybe only on some very high-end enterprise market oriented toward the Microsoft ecosystem...

Device Discovery

A few more words about this.

Imagine your device supports IPv4 and IPv6, HTTP and HTTPS, eSCL and WSD. And it's connected via both WiFi and cable. As a result, you'll find it on the network... 16 times, in all possible combinations.

Dumping everything found on the poor user's head — let the students do that. That's not our approach.

As a result, roughly 1/4 of sane-airscan's codebase is dedicated to consolidating all the variants in which the same device can be discovered. The fact that DNS-SD and WSD use completely different namespaces does not simplify this task at all.

Additionally, DNS-SD essentially reads the list of already-discovered devices from the Avahi daemon's cache. But WS-Discovery has to do everything on its own, at driver startup. And for all of this to work reliably, it's not enough to send one multicast query, wait a bit for responses, and call it a day. You have to scan for several seconds — I took the risk of setting this time to 2.5 seconds. But even that is too long.

So I had to invent a clever algorithm: if all devices found as IPP printers (via DNS-SD) have responded either as WSD devices or as eSCL scanners, there's no need to continue searching. I'm counting on the fact that pure network scanners without a printer component are rare in the wild, and scanners combined with printers respond to the IPP protocol because it's still the primary one for printing. In most cases, this algorithm significantly reduces search time. Theoretically, in some cases it might discover a device unreliably, and to combat this, it can be disabled through driver settings, switching to straightforward full scanning. But nobody has complained yet...

ipp-usb

I originally had no intention of writing it. Moreover, I didn't even know such a protocol existed.

The first person to file a bug report on sane-airscan was Till Kamppeter — the person responsible for the printing system in Linux in general and in Ubuntu in particular.

Till is an absolutely amazing engineer; working with him is a pleasure. During remote debugging, he understands what I need from him instantly and does exactly what's needed — as if I were sitting on the other end myself, only better. It's very, very rare to have the pleasure of working with such a person.

The printer he had was also absolutely remarkable. Some inexpensive HP that implemented all the interesting protocols. I think this wonderful device collected if not all, then at least 90% of firmware bugs. If something worked on Till's printer, well, I was almost certain it would work on other devices too.

At some point, Till asked why sane-escl worked on his printer via USB, but my driver didn't.

In essence, ipp-usb was the answer to that question. Before it, Linux had ippusbxd — a daemon that simply accepted TCP connections and relayed them to USB. I wasn't planning to get into that, but I had to.

At the time, I didn't have an IPP over USB device (well, I didn't have the cable that plugs into the printer — they're not like regular ones; they have a square connector). So debugging had to be done with Till by correspondence.

It looked like sometimes the response to an HTTP request came back truncated. At first, I thought ippusbxd was buggy. After all, it's multithreaded — one thread per TCP connection — and written very simply. It could have gotten confused among its threads. I read through the entire thing carefully. But there were no obvious bugs, no matter how hard I looked.

Then it hit me that perhaps a piece of the previous response was remaining in the USB buffer itself. And that perhaps this was happening because sane-airscan in some cases didn't bother reading the response to the end, just dropping the connection when it already understood everything.

I quickly — literally in a few hours — wrote a prototype HTTP-to-USB proxy in Go and debugged it with Till's help. It always completed the HTTP transaction to the end once started, regardless of the client's interest in continuing.

And everything just worked: sane-airscan, printing, the web console — which had never worked through ippusbxd and nobody knew why.

But it took a month for this working prototype to turn into something resembling a product. During that time I had to:

  • Implement device announcement via DNS-SD. For this I had to learn to extract device parameters via IPP for the printer part and via eSCL for the scanner part. And for that I had to write an IPP client library, because to my surprise, there was no working one for Go. The one that existed panicked on errors, and to fix it I would have had to understand the protocol, and understanding the protocol was easiest by writing my own implementation that doesn't panic. And once I wrote it (which only took two days), I no longer wanted to fix the existing one.
  • Abandon http.Client, because in Go it's mainly an automatic HTTP connection manager, and in this case I absolutely didn't want any automation — I wanted full control over what was happening.
  • Abandon the ready-made Go wrapper for libusb and interface with libusb directly through cgo, because some other buggy devices appeared, and I wanted to be closer to the hardware to deal with them more easily.
  • Implement log rotation.
  • Create a configuration file. Then Till asked me not to use a third-party library because Go downloads them automatically, and the Linux distribution build system works in network isolation, and every extra external dependency is a nightmare. So I had to rewrite the parser by hand (or rather, port my old C one to Go).
  • Implement daemon launching from systemd and without it.
  • Write a man page.

And when all of this was done, the working prototype turned into a product, and that took a month (remember, the prototype was written in a few hours). But for this already working and polished product to get into the major distributions, several more months passed — about a year — and that's despite Till significantly helping me on the organizational side.

This lesson — about the difference between writing a program, turning it into a product, and bringing it to market — I will remember forever. It was a very instructive experience (usually an organization does this, and most of this activity passes by programmers and goes unnoticed by them).

On Equal Footing with Google

Usually we programmers interact with Google at job interviews or by getting hired there, or something along those lines.

As the author of open source projects that Google was interested in, I interacted with Google as two equal entities.

They sent me their code for review, and I was picky and asked them to redo things (well, I'm not actually that demanding, but they were solving their problems, and I'd be the one owning all of it afterward). They revised and came back again.

They discussed with me how best to do certain things and listened to what I had to say. Sometimes they agreed, sometimes they objected, but it was a conversation between two equal partners.

They boasted that they'd rewrite ipp-usb in C in a couple of weeks (they couldn't take Go into ChromeOS due to executable size), but they couldn't do it, and came back to complain. They hinted that maybe I could take it on, but they didn't offer money, and I didn't take it on. Then they rewrote it in Rust, and that worked out for them. Because Rust has a proper HTTP library, while C doesn't. And they didn't have the stamina to write one by hand, and I didn't have the motivation.

It doesn't matter that they have billions of dollars and own the entire world's Internet, while I'm just an individual person. We spoke as equals, with equal respect for each other. It's a very unusual experience.

How Not to Break Software Used by Thousands

Open source has a bad reputation for code that might break with each update (although looking at recent news about Windows updates, you start to wonder whether this reputation is really deserved).

The problem is that if I were a corporation, I'd have a test stand with a hundred devices, and every release would be tested on all of them. But I'm not a corporation; I don't have a test stand. And I can't ask my already-satisfied users to retest every update. Their problem is already solved; they don't want to think about it anymore; they want everything to just work like before.

You have to be very careful. You have to think about making sure each next update, as much as possible, doesn't change the program's behavior on already-verified devices. Sometimes behavior has to change — then you need to carefully think through the consequences.

Overall, I've managed this so far. Despite the fact that working with a wide range of hardware is like walking on rakes — you never know where the next one will hit — I've broken surprisingly little.

This is probably due to a longstanding habit of not working on the principle of "we tweaked something, it somehow worked, we don't know why, but it works — good enough." Instead, I always try to get to the bottom of things, not leaving mountains of technical debt behind me. After all, that's exactly where the explosions come from — those layers of code you don't fully understand why they work. You touch something in them, and they stop working. If you have little (or no) such code, it doesn't have the habit of unexpectedly blowing up. And no, this doesn't consume enormous amounts of time. In fact, if you fully control your code, development proceeds easily and quickly. You just shouldn't cut corners or ship code that works by miracle rather than by design.

Open Source Under Sanctions

Surprisingly, nothing changed. The attitude of foreign colleagues and users remained the same. Someone asked me once if I was having access problems and whether I needed help with anything, and that was it. Nobody else touched the topic of politics. It's good that there are still areas in the world where troubling political issues haven't seeped in yet. And it's good that software remains the heritage of humanity, not of any one particular nation. At least the software with open source code.

Was It Worth the Time Spent?

In my opinion — yes.

A person with two successful projects, even fairly niche ones, is looked at differently in job interviews.

The experience of independently running projects — where everything is on you, rather than closing tickets in TFS on a manager's command — is absolutely incomparable and extremely valuable. It teaches self-reliance, planning, reasonable goal-setting, and so on. It teaches you not to scatter your efforts — I don't have unlimited resources, and this work itself doesn't bring me money.

Technically, working on your own project is very comfortable. You're your own architect, your own programmer, your own quality assurance department. There are no inter-department approvals; decisions are made thoughtfully and holistically. If, of course, you enjoy and know how to do all of that.

The experience of bringing something to market is something programmers rarely encounter. Usually, their employer does it for them. Your own experience strips away many illusions. For example, you begin to understand what the company does for you, not just what you do for the company. You begin to understand (not in theory but from experience) that writing working, debugged, tested code is significantly less than half the work of creating a product. The bulk of the time isn't spent on bugfixing, as people sometimes like to say, but on creating those very little things that collectively distinguish a prototype from a product — each of which individually seems not very essential and not very substantial, but together they add up. A huge amount of time is spent on promotion itself. Not even effort, but specifically time — a product nobody knows about is a product nobody needs, and recognition grows quite slowly — even if the software itself is good.