3D Acceleration using VirtIO

, Nathan Gauër

Several months ago started the GSoC 2017. Among all the projects available one got my attention:

Add OpenGL support on a Windows guest using VirGL

In a VM, to access real hardware, we have two methods: passthrough, and virtualization extensions (Intel VT-x, AMD-V..). When it comes to GPUs possibilities drop down to one: passtrough. Intel has a virtualization extension (GVT), but we want to support every devices. Thus, we need to fall-back to a software based method.

Since a couple years, VirtIO devices became a good standard on QEMU. Then, Dave Airlie started to work on VirGL and a VirtIO-gpu. Both help provide a descent virtual-GPU which rely on the host graphic stack.

This article will present VirtIO devices, and what kind of operations a guest can do using VirGL.

I also invite you to read a previous article I wrote about Linux's graphic stack

VirtIO devices

Since we will use a VirtIO based device, let's see how it works. First, these devices behave as regular PCI devices. We have a config space, some dedicated memory, and interruptions. Second very important point, VirtIO devices communicate with ring-buffers used as FIFO queues. This device is entirely emulated in QEMU, and can realize DMA transfers by sharing common pages between the guest and the host.

Communication queues

On our v-gpu, we have 2 queues. One dedicated to the hardware cursor, and another for everything else. To send a command in the queue, it goes like this:

virtio device communication

VirGL

VirGL can be summed up as a simple state-machine, keeping track of resources, and translating command buffers to a sequence of OpenGL calls. It exposes two kinds of commands: let's say 2D and 3D.

2D commands are mainly focused on resources management. We can allocate memory on the host by creating a 2D resource. Then initialize a DMA transfer by linking this resource's memory areas to guest's physical pages. To ease resource management between applications on the guest, VirGL also adds a simple context feature. Resource creation is global, but to use them, you must attach them to the context.

Then, 3D commands. These are close to what we can find in a API like Vulkan. We can setup a viewport, scissor state, create a VBO, and draw it. Shaders are also supported, but we first need to translate them to TGSI; an assembly-like representation. Once on the host, they will be re-translated to GLSL and sent to OpenGL.

You can find a part of the spec on this repository

OpenGL on Windows

Windows graphic stack can be decomposed as follows:

windows graphic stack

Interresting parts are:

OpenGL ICD (Installable client driver):

This is our OpenGL implementation -> the state machine, which can speak to our kernel driver.

GDI.dll:

A simple syscall wrapper for us.

D3D Subsystem:

First part of the kernel graphic stack. It exposes a 3D and 2D API. Since we are not a licensed developer, let's try to avoid this. From the documentation, we have a some functions to bypass it: DxgkDdiEscape is one. This functions takes a buffer, a size, and lets it pass trough this subsystem, directly to the underlying driver.

DOD (Display Only Driver)

Our kernel driver. This part will have to communicate to both kernel/ICD and VirtIO-gpu.

OpenGL State-Tracker

OpenGL rely on a state machine we have to implement. Let's start by drawing on the frame-buffer.

We start a new application, want to split it from the rest. So we start by creating a VirGL context. Then create a 2D resource (800x600 RGBA seams great), and attach it to our VGL-context.

We might want to draw something now. We have two options, either use the 3D command INLINE_WRITE, or DMA. Using INLINE_WRITE means sending all our pixels through a VirtIO queue. So let's use DMA !

Now, let's draw some pixels on this frame-buffer. We will need :

A 3D Command is a set of UINT32. The first one is used as a header, followed by N arguments. A command buffer can contains several commands stacked together in one big UINT32 array.

Earlier, we created resources in VGL-Contexts. Now we will need 3D objects. These are created sending 3D commands, and are not shared between VGL contexts. Once created, we have to bind them to the current opengl-context.

Now, if everything goes well, we should be able to display something like that:

opengl on windows with qemu

Once more, explaining all the commands would be uninteresting, but there is a spec for that !

If you are still interested, here are couple of links:

Comments

GSoC | Log#2: API Forwarding

, Nathan Gauër

Writing an ICD is a problem in itself. Add to this Windows kernel interfaces, virtIO queues management, resources transfer between host and guest, and BOOM, you are lost. This brings us to our first step: something not efficient but simpler, API Forwarding.

Tasks

Realization

ICD

The ICD part (Userland) is pretty straightforward. Make your own opengl32.dll, serialize the calls. Now find a sweet function in gdi32.dll to throw your mumbo-jumbo on the kernel side. Fortunately, we have this:

NTSTATUS APIENTRY DxgkDdiEscape(
  _In_ const HANDLE         hAdapter,
  _In_ const DXGKARG_ESCAPE *pEscape
)
{ ... }

A beautiful function available on both DOD and full display driver. It takes a pointer on an userland buffer, and send it to our display driver. Wait... userland buffer, no check, kernel part ? Mmmmm.... What could go wrong ?

Kernel part

To initialize a display driver, you must call a function: DxgkInitialize This function will take a big structure, containing function pointers to your driver. For a display only driver, you will have a reduced set of function to implement. And for a full featured driver, well...

Anyway, now the game is to run the driver, and see where we crash. Sadly we cannot just hope to add some functions, and run only using the working DOD code base. Windows wants something more, and the game is to find what, Yay ! Since we have a working DOD driver, let's find how we could trick.

ICD <=> Kernel communication

We can register two type of driver: a DOD driver using DxgkInitializeDisplayOnlyDriver and DxgkInitialize. Windows will then know which kind of features each driver can support (fine tune will be done using query callbacks). Both drivers can implement DxgkDdiEscape. Great, we will fool Windows and use this DOD as a fully featured 3D driver ! WRONG !

Setup of the ICD part, sending everything through our escape functions ? check. But return values seams off. After investigation, and any function taking a userland buffer, I came to a conclusion: OpenGL ICD part cannot communicate with a DOD driver. Windows knows we are display only, and fall-back our ICD calls on it's own driver.

So now, what's the plan ? Let's put this problem aside, and try to focus on the real part: create proper commands for the host.

Comments

GSoC | Log#1: Project presentation

, Nathan Gauër

On my arrival at the lab, I started a little project: working on a display only driver for Windows. A good way to start learning what was hidden under the hood of an OpenGL application. Google Summer of Code 2017 arrived, and subject were published. Among these, QEMU's 'Windows Virgl driver'. Great ! Let's apply !

Applications closed early April. I took a look at the already existing DOD driver (non official repo) and also decided to learn a bit more about Vulkan. Results came, and I was selected, excellent !

Mission

The idea is to bring 3d acceleration on Windows guests running with QEMU. Using VirtIO devices and Virgl3d.

Context

windows_stack

On this stack, we can work on three parts: opengl32.dll, ICD and Miniport driver.

Problems

Comments

First steps with Vulkan

, Nathan Gauër
Vulkan is great, vulkan is love, vulkan is $(COMPLIMENT)

I heard a lot about this API, but never took some time to try it... So why not now ? The goal was to do a simple OBJ file viewer. Then, try to improve performances, and of course, since it's Vulkan, go multithread ! The API is pretty simple to use. We fill out some structs, and call a vkSomething function.

Some samples are available in the SDK, and of course, there is the famous Vulkan-tutorial website. So commits after commits, the little viewer is built, until the first 'Hello, world !' And now, realization: to display a white triangle, I needed more than 1000 lines of code. Anyway, it's OK, let's continue, and finally, a simple obj viewer ! (project repository link)

Program running under i3

But we are the May the 5th, and GSoC results are out. Time to switch lanes and go back to the big project: OpenGL driver for a Windows on QEMU VMs.

Comments

Linux graphic stack: an overview

, Nathan Gauër

In January 2017, results arrived. I was accepted at the LSE, a system laboratory in my school. We were 4, and had to find a new project to work on. One wanted to work on the linux kernel security, another on Valgrind, and then, there is me. I didn't knew how to start, but I wanted to work on something related to GPUs.

My teacher arrived, and explained the current problem with Windows and QEMU: we don't have any hardware acceleration. Might be useful to do something about it ! I was not ready...

The first step was to understand Linux graphic stack, and then find out how Windows could have done it. Finally, how we can bring this together using Virgl3D and VirtIO queues.

This article will try present you a rapid overview of the graphic stack on Linux. There already is some pretty good articles about the userland part, so I won't focus on that, and put some links.


OpenGL 101

Let's begin with a simple OpenGL application:

int main(int argc, char** argv)
{
    glutInit(&argc, argv);
    glutInitWindowSize(300, 300);
    glutCreateWindow("Hello world :D");

    glClear(GL_COLOR_BUFFER_BIT);
    glBegin(GL_TRIANGLE);
        glVertex3f(0.0, 0.0, 0.0);
        glVertex3f(0.5, 0.0, 0.0);
        glVertex3f(0.0, 0.5, 0.0);
    glEnd();
    glFlush();
    return 0;
}

This is a non working dummy sample, for the idea

As we can see, there is three main steps:

But how can we do that ?

Linux graphic stack

linux_graphic_stack

Level 1: Userland, X and libGL

The first part of our code looked like this:

glutInit(&argc, argv);
glutInitDisplayMode(GLUT_SINGLE);
glutInitWindowSize(300, 300);
glutInitWindowPosition(100, 100);

But in fact, actions can be resumed to something like this:

CTX = glCreateContext()
CONNECTION = xcb_connect()
xcb_create_window(CONNECTION, PARAMS, SURFACE, WINDOW)
What ? a connection, a context ?

To manage our display, Linux can use several programs. A well known is the X server. Since it's a real server, we have to connect to it first before being able to request anything. To ease our work, we will use the lib XCB. Once a window is created, any desktop manager compatible with X will be able to display it. For more informations about an OpenGL context -> Khronos wiki

Meet Mesa

Mesa is an implementation of OpenGL on Linux. Our entry point is libGL, just a dynamic library letting us interface with the openGL runtime. The idea is the following:

For more informations about Mesa and Gallium -> Wikipedia Another good article on the userland part -> Igalia blog

Welcome to KernelLand !

linux_graphic_stack

Meet the DRM

DRM: Direct Rendering Manager. This is more or less an IOCTL API composed of several modules. Each driver can add some specific entry points, but there is a common API designed to provide a minimal support. Two modules will be described: KMS and the infamous couple TTM & GEM.

Meet KMS

Remember the first step of our OpenGL application ? Ask for a window, getting a place to put some fancy pixels ? That's the job of the KMS: Kernel Mode-setting.

A long time ago, we used UMS: user mode setting. The idea was to manage our hardware directly from userland. Problems: every application needed to support all the devices. It means a lot of code was written, again and again. And what if two applications wanted to access to the same resources ? So, KMS. But why ?

linux_graphic_stack
Framebuffer: a buffer in memory, designed to store pixels

The story begins with a plane. Picture it like a group of resources used to create a image. A plane can contains several framebuffers. A big one, to store the full picture, and maybe a small one, something like 64x64 for an hardware cursor ? These framebuffers can be mixed together on the hardware to generate a final framebuffer.

Now, we have a buffer storing our picture. We assigned it to a CRTC (Cathode Ray Tube Controller). A CRTC is directly linked to an ouput. It means if your card has two CRTCs, you can have two different output. Final step, printing something on the screen. A screen is connected using a standard port, HDMI, DVI, VGA... this means encoding our stream to a well defined protocol. That's it, we have some pixels on our screen !

TTM & GEM

We can print some pixels, great ! But how can we do some fancy 3D stuff ? We have our GL calls going through some mumbo-jumbo stuff, and then what ? How can I actually write something on my GPU's memory ?

There globally two kind of memory architecture: UMA and dedicated

TTM and GEM are two different APIs designed to manage this. TTM is the old one, designed to covering every possible cases. The result is a big and complex interface no sane developer would use. Around 2008, GEM was introduced. A new and lighter API, designed to manage UMA architectures. Nowadays, GEM is often used as a frontend, but when dedicated memory management is needed, TTM is used as backend.

GEM for dummies

linux_graphic_stack

The main idea is to link a resource to a GEM handle. Now you only need to tell when a GEM is needed, and memory will be moved on and out our vram. But there is a small problem. To share resources, GEM uses global identifiers. A GEM is linked to a unique, global identifier. This means any program could ask for a specific GEM and get access to the resource... any.

Gladly, we have DMA-BUF. The idea is to link a buffer to a file descriptor. We add some functions to convert a local GEM identifier to a fd, and can safely share our resources.

I'll stop here for now, but I invite you to check some articles on DMA (Direct memory access) and read this article about TTM & GEM

Comments