Device File Palacios Extension Akhil Guliani and William Gross (Advised by Peter Dinda) Northwestern University What Is This? This is a proof-of-concept implementation of device file virtualization in the spirit of the Paradice system. Unlike Paradice, it is implemented using only a preload library in the guest (no guest modifications) and a kernel module in the host (no kernel modifications). Also, there are no modifications to the Palacios VMM either, as the core mechanisms used are the host hypercall interface and the user-level guest memory access capabilities implemented within Palacios. Note that it is a proof of concept. It does not do many of the things that Paradice can do. For example, a hypercall currently involves a hard stop of the guest core. Address space integration in the shadow process is page granularity only. Pointer detection for system call forwarding is done at a coarse granularity, meaning that ioctls would need ot be hand coded. Select across guest and host fds is completely ignored. Theory of Operation The basic idea is that we introduce a preload library into a guest process. This library hijacks system calls. When you open a /dev device that is in the list of devices we are proxying, the libraryio handles the open by converting it to a hypercall, and binding the result of the "open" hypercall and the fd returned. On subsequent system calls involving the fd, the system calls are also converted to hypercalls. The preload library merges fds/syscalls handled by the guest and those handled by the host. The preload library also assures that any data accessed is touched (page table entry exists), and could pin it (currently does not). It also limits any pointer argument to point to an block that fits within one page (e.g., read(1K offset, 4K length) turns into read(1K offset, 3K length). The hypercall is directed to the second component, a kernel module. The kernel module swizzles pointers involved in the system call from their GVAs to their GPAs. It then queues them for interaction with host user space process called the shadow process. The kernel module and the shadow share a page used to transfer the system call arguments and the return value and errno. The kernel module signals the shadow that a new swizzled system call is available by letting a poll/select complete. The shadow signals the kernel module that it is finished with the system call via an ioctl. The shadow process maps the guest's physical memory into its address space using the guest memory access mechanisms. It then goes into a select waiting for the kernel module. When it receives a system call from the kernel module, it swizzles any pointer arguments in the system call from their GPAs (provided by the kernel module) to their corresponding HVAs (where the guest is mapped). It then issues the system call, and writes back on the shared page both the return code and the current errno value. It then signals completion via the ioctl. The kernel module then returns from the hypercall to palacios, which returns to the guest preload library, which copies out the relevant results so that it appears that a system call has completed (on the guest). Note that this model can be potentailly also be used as a general system call forwarding mechanism. What's Here and There In this directory (palacios/gears/services/devfile) you will find the prelaod library and the kernel module. There is also a simple test program for the guest.. In the palacios user space directory (palacios/linux_usr), you will find the shadow process code. The latter is separate as it has dependencies on the guest memory access library and build config. In palacios/gears/services/devfile/scripts, you will find scripts which may help to evaluate the system. General Setup [root@v-test-r415-3 linux_usr]# ./v3_devfile_shadow v3_devfile_shadow Shadow process to support device file-level virtualization in the spirit of Paradice. This operates with the devfile_host.ko kernel module in the host and devfile_preload.so preload library in the guest. These can be found, along with example scripts in palacios/gears/services/devfile. The general steps are: 1. insmod kernel module into host 2. copy preload library into guest 3. instantiate guest 4. use v3_hypercall to bind the devfile hypercall (99993) to the kernel module for your guest 5. run v3_devfile_shadow 6. run process in guest that uses preload library ****Scripts and More Detail Below**** Setup: Copy the patched_start_guest, patched_mem_script, patched_insert_hypercall, and patched_view_console to the top of the palacios directory then cd to the to top of the directory so that you are at path/to/palacios/ and the scripts are at path/to/palacios/patched_* Guest Requirements: For this system to work, the guest needs a couple things. It needs to be configured with a CGA console (to get v3_console to work). It needs to have sufficient memory (1024kb seems to work). It needs to accept an LD_PRELOAD library. It needs to be an x86_64 architecture OS. It needs to be running a Linux kernel. It needs a second drive set up as a CD-ROM. Once that second drive is set up, replace the backing file with a handmade loopback file system (perhaps using dd). For our scripts to work exactly, this handmade file system should be called littlefs.dat and an empty directory (for mounting) should be present in the guest folder called tmp. The scripts are short and few, so the relevant paths in them can be modified as necessary to fit your setup. Start Guest with Device File Forwarding: in that terminal source patched_mem_script in other terminal go to the same position and source insert_hypercall if the terminal doesn't open into the guest, resize your terminal appropriately and then source patched_view_console and then from the guest do the following mkdir mnt mount /dev/hdb mnt cd /mnt/dev_file source load_lib_in_guest ./test_preload EXAMPLE: ./test_preload r 10 /dev/urandom This will attempt to read 10 bytes from /dev/urandom, where /dev/urandom is a host device. If that last argument (/dev/urandom)is present in the devfile_preload.c library, then the system will access the host version of that device, if not it will just perform the regular system calls on the guest's version of that device if it is present. Close Running Guest with Device File Forwarding: To shut down smoothly kill the shadow process run from the first terminal (patched_mem_script) make sure the second terminal has left the guest by pressing \ then in either terminal source patched_close_guest Adding Host Devices in /dev to the list of Supported Devices: go to path/to/palacios/gears/services/devfile edit guest_devices.txt then run from the command line 'python guest_device_setup.py' this will update the dev_file_ld_lib.c file, which can be made from that directory the object file is copied over to the guests /dev/hdb drive within the patched_mem_script NOTE: Adding devices requires the preload library (devfile_preload.c) to be made and copied to the guest so, they cannot be added while a guest is running with this feature. Close the guest first, add the new device changes, run the python script, make the /gears/services/dev_file directory, start the guest. Extending the System Call Interface: devfile_preload.c is the LD_PRELOAD library for the guest that hijacks some basic system calls to see if they should be forwarded to the host. To support a broader range of devices, this basic set can be added to. Any addition should follow a similar structure to the read or write system call in there. The things to make sure are that all pointer arguments get pinned into memory and that the file descriptor argument is checked against the set of active fds in the dtrack struct (also in the library file).