Lade Inhalt...

Virtualization for Reliable Embedded Systems

Masterarbeit 2013 260 Seiten




List of Figures

List of Tables

1 Preamble
1.1 Problem domain
1.1.1 What is virtualization?
1.1.2 What is an embedded system?
1.1.3 What is reliability?
1.2 Motivation
1.2.1 Operating system multiplicity
1.2.2 License separation
1.2.3 Hardware longevity
1.2.4 Hardware obliteration
1.2.5 Multi-core leverage
1.2.6 Energy efficiency and partial networking
1.2.7 Memory de-duplication
1.2.8 Self-healing systems
1.2.9 Summary

2 Virtualization
2.1 Basic principles
2.1.1 Machine model
2.1.2 Traps
2.1.3 Instruction behavior
2.1.4 Virtualizability
2.1.5 Virtual machine map
2.1.6 Hypervisor
2.1.7 Types of hypervisors
2.1.8 Hybrid virtual machines
2.1.9 Implementation details
2.1.10 Process and resource maps
2.2 I/O virtualization
2.2.1 Device pass-through
2.2.2 Device driver in the hypervisor
2.2.3 Special I/O partition
2.2.4 Device paravirtualization
2.2.5 Summary
2.3 History
2.4 Security aspects
2.4.1 Hypervisor
2.4.2 I/O MMUs
2.4.3 Covert and side channel attacks
2.5 Hardware virtualization support
2.5.1 ARM
2.5.2 MIPS
2.5.3 PowerPC
2.5.4 x
2.6 Embedded virtualization
2.6.1 Requirements
2.6.2 Microvisor
2.6.3 Summary

3 Hypervisors
3.1 Server
3.1.1 VMware ESX
3.1.2 Xen
3.1.3 Bhyve
3.1.4 KVM
3.2 Embedded systems
3.2.1 Hellfire and SPUMONE
3.2.2 INTEGRITY® and PikeOS
3.2.3 Xen

4 Reliability through virtualization
4.1 Related work
4.1.1 Lockstep
4.1.2 Mixed-mode multi-core
4.1.4 Virtualization-assisted application checkpointing
4.2 Remus
4.2.1 Distributed Replicated Block Device
4.2.2 Summary

5 Demonstrator
5.1 Overview
5.2 Components and configuration
5.2.1 Hardware D3003-S1 industrial Mini-ITX mainboard Back-to-Back Ethernet media converter Vulcan-based Ethernet switch
5.2.2 Software ECU-west and ECU-east remus migration node
5.3 Evaluation
5.3.1 Network performance
5.3.2 Disk performance
5.3.3 Page access performance
5.3.4 Migration Basic xm-usage Behavior of migration variants
5.3.5 Remus replication
5.4 Demonstrable use cases
5.4.1 Streaming media from a standby DMR
5.4.2 Migrating a partition to another host
5.4.3 Redundant network backbone
5.5 Caveats
5.5.1 Software optimizations
5.5.2 Fault detection and fencing
5.5.3 Network interface bonding
5.5.4 Network traffic priority

6 Conclusion


A Design Drawings
A.1 “Dali” Back-to-Back stands
A.2 Control panel

B Network traffic traces
B.1 Standby host taking over media streaming with a timeout of 1 s
B.2 Standby host taking over media streaming with a timeout of 100 ms

C Demonstrator configuration files, patches and scripts
C.1 Parts common to ECU-east, ECU-west and node
C.1.1 Console configuration file config
C.1.2 File systems configuration file fstab
C.1.3 Bonding configuration script patch ifenslave.diff
C.1.4 Package repositories configuration file sources.list
C.2 Parts common to ECU-east and ECU-west
C.2.1 Global kernel modules blacklist configuration file blacklist. conf
C.2.2 Framebuffer kernel modules blacklist configuration file fbdev- blacklist.conf
C.2.3 DRBD® global configuration file global_common.conf
C.2.4 GRUB configuration file grub
C.2.5 System initialization configuration file inittab
C.2.6 Xen distribution patch install-remus1.patch
C.2.7 Xen configuration file migration.cfg
C.2.8 DRBD® resource configuration file migration.res
C.2.9 Global kernel modules configuration file modules
C.2.10 Remus startup script remus
C.2.11 Xen configuration file remus.cfg
C.2.12 DRBD® resource configuration file remus.res
C.2.13 Plug network scheduler patch sch_plug.c.diff
C.2.14 System controls configuration file sysctl.conf
C.2.15 Xen domains configuration file xendomains
C.3 Parts specific to ECU-east
C.3.1 Host database configuration file hosts
C.3.2 Network interfaces configuration file interfaces
C.3.3 Remus domains configuration file remus
C.3.4 Xen daemon configuration file xend-config.sxp
C.4 Parts specific to ECU-west
C.4.1 Host database configuration file hosts
C.4.2 Network interfaces configuration file interfaces
C.4.3 Remus domains configuration file remus
C.4.4 Xen daemon configuration file xend-config.sxp
C.5 Parts specific to node
C.5.1 ALSA kernel modules configuration file alsa-base.conf
C.5.2 Shell configuration file .bash_profile
C.5.3 Global kernel modules blacklist configuration file blacklist. conf
C.5.4 GRUB configuration file grub
C.5.5 Host database configuration file hosts
C.5.6 System initialization configuration file inittab
C.5.7 Network interfaces configuration file interfaces
C.5.8 System control configuration file sysctl.conf
C.5.9 Xorg configuration file xorg.conf
C.6 Parts specific to migration
C.6.1 Kernel configuration file XEN
C.6.2 File systems configuration file fstab
C.6.3 Host database configuration file hosts
C.6.4 PyGRUB configuration file menu.lst
C.6.5 System configuration file rc.conf
C.6.6 SSH daemon configuration file sshd_config
C.6.7 Terminal initialization configuration file ttys
C.7 Parts specific to remus
C.7.1 Kernel configuration file .config
C.7.2 Nginx configuration file default
C.7.3 Kernel patch evtchn.c.diff
C.7.4 File systems configuration file fstab
C.7.5 Host database configuration file hosts
C.7.6 PyGRUB configuration file menu.lst
C.7.7 Kernel patch setup-xen.c.diff
C.7.8 Package repositories configuration file sources.list
C.7.9 Kernel patch xenbus_probe.c.diff


List of abbreviations


Hereby I would like to thank the A S&T CDS TCD advance development department of Continental Automotive GmbH in Regensburg for providing the hardware comprising the virtualization demonstrator built as part of this master’s thesis and its members for their support. Further thanks go to Hans-Jörg Sirtl and the University of Regensburg Computer Centre for providing the HP E2915-8G-PoE switch used as part of evaluating said demonstrator.

I would also like to express my gratitude to Christiane Abspacher and Anja Ruckdä- schel for the thankless task of proofreading this thesis.

The research leading to the results presented in this thesis is supported by “Regionale

Wettbewerbsfähigkeit und Beschäftigung”, Bayern, 2007 - 2013 (EFRE) as part of the SECBIT project (


Virtualization has come a long way since its beginnings in the 1960s. Nowadays, Virtual Machine Monitor (VMM)- or hypervisor-based virtualization of servers is the de facto standard in data centers. In recent years, virtualization has also been adopted to embedded devices such as avionics systems and mobile phones. The first mass deployment of embedded virtualization can probably be seen in video game consoles, though. However, the functionalities and possibilities provided by embedded virtualization today for the most part still are where they were when virtualization was in its infancy in the mainframe era. Moreover, it is still not employed by automotive electronics at all thus far. This thesis presents advancements achieved in hardware virtualization since then as well as their possible merits for embedded virtualization. The emphasis hereby lies on increases in reliability of the resulting embedded systems. Additionally, the focus is on automotive Electronic Control Units (ECUs) and especially the upcoming automotive domain controller architecture.

This work is divided into five parts: The first one constitutes an introduction to the problem domain and highlights benefits of virtualizing embedded devices - in particu- lar Domain Controller Units (DCUs), i. e. “server” variants of ECUs - beyond the mere partitioning into time and space. In the second part, an overview of virtualization technology centered around its basic principles is given. Against this background, complications of virtualizing Input/Output (I/O) operations with current hardware architectures and the adaption of their processors to virtualization are elaborated on. Moreover, the most prominent implications for security as well as requirements and specialties encountered in the context of embedded virtualization are discussed. The third chapter then summarizes embedded virtualization solutions found on today’s market. Besides, it details hypervisor techniques resembling the state of the art for servers. In the fourth part, concepts for enhancing reliability - in some aspects also availability - through virtualization and eventually the approach proposed for reliable embedded systems are depicted. Finally, the fifth chapter presents the demonstrator - foremost a technology preview of the latter - built as part of working on this thesis.

List of Figures

1.1 Standby DMR reliability comparison

1.2 Current automotive E/E architecture

1.3 Fully meshed redundant automotive backbone

1.4 Automotive domain controller architecture

1.5 Long-term automotive E/E architecture

2.1 The virtual machine map

2.2 Type I hypervisor system architecture

2.3 Type II hypervisor system architecture

2.4 CPU protection ring usage on i

2.5 CPU protection ring usage on a two-ring processor

2.6 Context switches of an I/O operation virtualized via a special I/O partition

2.7 Type I hypervisor evolution

2.8 Translations performed by an IOMMU versus an MMU

2.9 Microkernel-based type I hypervisor system architecture

3.1 Xen architecture

3.2 Xen device I/O for paravirtualized guest and basic network interface configuration

3.3 Xen device I/O for paravirtualized guest and bonded network interface configuration

3.4 Bhyve implementation

4.1 Mixed-mode multi-core over-commitment approach

4.2 SPUMONE distributed hypervisor architecture

4.3 Statue of the Capitoline Wolf suckling Romulus and Remus

4.4 Remus high-level architecture

4.5 Remus speculative execution and asynchronous replication timeline

4.6 Remus network buffering timeline

4.7 Disk write buffering in Remus

4.8 DRBD® overview

5.1 RAINER topology


5.3 RAINER demonstrator - top view

5.4 RAINER demonstrator - oblique view

5.5 RAINER demonstrator - isometric view

5.6 Fujitsu Technology Solutions GmbH D3003-S1 mainboard

5.7 Back-to-Back POE-based BR-100 to IEEE 802.3-2012 100BASE-TX media converter

5.8 POE-based 6-port BR-100 and IEEE 802.3-2012 10/100/1000BASE-T(X) Vulcan-based Ethernet switch

5.9 Broadcom Corporation BCM53115M block diagram

5.10 Freescale Semiconductor, Inc. Qorivva MPC5668G block diagram

5.11 Flow chart of remus domain startup script

5.12 ECU-east xentop display on tty

5.13 Disk partition table of ECU-west and ECU-east

5.14 Output of drbd-overview

5.15 Attributes of the logical LVM volume /dev/vgxen/migration

5.16 Desktop of node

5.17 Progress of Ethernet traffic associated with VM replication while sorting over time

5.18 Progress of Ethernet traffic associated with VM migration over time

5.19 Progress of Ethernet traffic associated with streaming “Tears of Steel” from an initially replicated VM over time

5.20 RAINER demonstrator control panel

List of Tables

3.1 Overview of embedded hypervisors

5.1 Host names and associated IP addresses used in the RAINER demon- strator

5.2 SW1 and SW2 port wiring and usage

5.3 Xen domain states

5.4 Movies provided by the RAINER demonstrator, their resolutions and associated streaming resource usages

5.5 Network performance as a function of kernel environment and network medium

5.6 Disk sequential block access performance and resulting network throughput as a function of kernel environment, network medium and synchronization rate limitation

5.7 Page access performance and resulting network throughput as a func- tion of kernel environment and network medium

1 Preamble

Virtual machines have finally arrived. Dismissed for a number of years as merely academic curiosities, they are now seen as cost-effective techniques for organizing

computer systems resources to provide extraordinary system flexibility and support for certain unique applications.” — Robert P. Goldberg, 1974 [74]

As illustrated by the above quotation, hardware virtualization - or virtualization for short - is rather old technology. The original intent which lead to its development in the mid-1960s [134, p. 5] was Operating System (OS) development [72, p. XII]. This was due to the fact that mainframes were expensive[178]; thus, it could not be afforded to disrupt their production use by reboots for testing experimental kernels or running a potentially unstable OS in general. Virtualization solved this problem nicely[74]y providing an efficient, isolated duplicate of the real machine[181] Soon afterwards, however, additional uses of the versatility and flexibility provided by virtualization were also found. For instance, these consisted in [74]:

- the graceful migration of users and their applications to new versions of an OS by running its predecessor in parallel to the current one on the same hardware,
- the establishment of a virtual network on a single piece of hardware by letting virtual machines communicate,
- the deployment of older OS versions on newer hardware with compatible Central Processing Units (CPUs) but unsupported peripherals by simulating the devices of previous generations, and
- the increased system availability and reliability in case of OS failures; such errors then only affect applications executing on a particular partition due to the high degree of isolation provided by virtualization.

Although R. P. Goldberg’s 1973 Ph.D. thesis “Architectural Principles for Virtual Computer Systems”[72]is also positioned in the mainframe era, its fundamental concepts and

ground-laying work still hold true today as-is. The reason why virtualization of the predominant[133], [205, p. 17] x861 architecture then took up until the 2000s was that this processor family had not been designed with self-virtualization2 in mind [20, p. 35]. Moreover, it was even declared unvirtualizable[201], at least regarding the requirements Popek and Goldberg had defined in [188]and, thus, solely in an insecure way. This situation changed when in 2002 details of the first [52]of two distinct approaches to circumvent this architectural limitation were revealed, with a product built on top of the former - VMware Workstation - available since 1999[288]. Finally, in 2005 both Advanced Micro Devices (AMD), Inc.[3] and the Intel Corporation [106] extended the Instruction Set Architectures (ISAs) of their respective x86 CPUs to support virtualization. Those events have triggered a virtualization explosion in data centers and associated server products on the market [205, p. 5].

Some time afterwards, virtualization also gained traction in the area of embedded systems, mainly for consumer electronics like mobile phones and set-top boxes for Internet Protocol (IP)-based Television (IPTV)[15]. For instance, Open Kernel Labs, Inc. claims[172]that the Toshiba W47T available since 2006 was the first mobile phone - devices which are typically powered by Advanced RISC3 Machine (ARM) processors[15]but since 2012 also by the x86 Intel® Atom™ CPUs[24]- on the market applying virtualization technology. However, the first mass deployment of embedded virtualization probably was seen in form of the PowerPC (Performance optimization with enhanced RISC - Performance Computing) processor based Microsoft Xbox 360™[33]video game console sold since 2005.

1.1 Problem domain

This master’s thesis is concerned with the virtualization of embedded systems with an emphasis on possible gains in reliability of the resulting systems. In this section, the problem domain in which the following work is performed in is outlined, including brief descriptions of the underlying terms.

1.1.1 What is virtualization?

The term “virtualization” describes the creation of a Virtual Computer System (VCS) or Virtual Machine (VM), which Goldberg defined as follows [72, p. 15]:

A Virtual Computer System (VCS) is a hardware-software duplicate of a real existing computer system in which a statistically dominant subset of the virtual processor ’ s instructions execute directly on the host processor in native mode.

This definition differentiates virtualization from the older technique of a Complete Software Interpreter Machine (CSIM) [72, p. 21]. With a CSIM, all of the instructions of a guest machine are emulated, which results in significant slowdown and makes that approach inappropriate for production use [74].

Goldberg further defined [72, p. 16]:

The program executing on the host machine that creates the VCS environment is called the Virtual Machine Monitor (VMM).

In [181], Popek and Goldberg appoint the following essential characteristics of a VMM:

I) “... the VMM provides an environment for programs which is essentially identical with the original machine”. Consequently, it is transparent for OSes atop.
II) “... programs run in this environment show at worst only minor decreases in speed”. III) “... the VMM is in complete control of system resources”.

Later on, VMMs were also referred to as “hypervisors” by analogy to the already established term “supervisor”. The latter describes the function of an OS kernel to supervise the execution of applications and tasks as well as their access to hardware resources. As such, hypervisors can also be described as multiplexers for the resources of host machines to guest OSes[15].

Regarding the second characteristic of hypervisors, in [72, p. 15] Goldberg more specifically states that given the VM may run at a different speed from the real machine, timing dependent processor and Input/Output (I/O) code may not perform exactly as intended. Due to the fact that embedded systems often but not necessarily always have real-time requirements[213], this already indicates that virtualization of the former class does not come easy. This problem should become apparent when thinking of virtualizing an existing embedded device. Previously, the hardware could dedicate its CPU cycles in entirety to fulfilling the deadlines of a single OS and its applications. Now, when running N instances of OSes on top of it, the hardware can only use the Nth part of its capacity for favoring the real-time requirements of a single guest at worst.

Also when switching to more powerful hardware at the same time, there still will remain differences in run-time behavior between a virtualized and a non-virtualized OS on the same host. This results from the VMM also being a piece of software and, thus, requiring CPU cycles to run, which then add up to e. g. interrupt latency.

In fact, this topic is not totally neglected herein. But given that the Ph.D. dissertation “Virtualisierung von Mehrprozessorsystemen mit Echtzeitanwendungen”[130]is devoted to virtualizing embedded systems with real-time requirements, this area clearly is out of scope for a master thesis intended to give an overview of embedded virtualization.

Nowadays, the term “virtual machine” is also applied to programming language environments which do not reflect a real machine. A typical example for this would be the Java™ Virtual Machine (JVM)[155]. For differentiating this extended meaning from the original definition introduced above, the term “process VM” was coined, whereas a VM corresponding to actual hardware is called a “system VM”[208]. In that regard, this thesis only covers the topic of system VMs. Concepts seemingly related to system VMs are OS-level virtualization [205, p. 38], e. g. containers and jails involving a single kernel and applications separated from each other, as well as user mode Linux[53]. These latter are not dealt with here either.

1.1.2 What is an embedded system?

Intuitively, one might describe an embedded system as any device equipped with a microcontroller or microprocessor, but which is not a desktop or laptop Personal

Computer (PC) nor a server. However, finding a formal definition for the term “em- bedded system” turned out to be difficult. In[245], it is stated that the first embedded systems were the banking and transaction processing systems running on mainframes and arrays of disks. While this definition would come in handy as far as this thesis is concerned - it would basically be done here -, it at least illustrates that one cannot necessarily think of embedded systems as being small - by some definition of small.

Actually, according to [79, p. 2] there are many definitions for embedded systems, but the best way to categorize them is in terms of what they are not and with examples of how they are used. The main characteristic specified there is the fact that a user cannot program an embedded device or install arbitrary software on it in the same way these operations are possible with a PC.

The most formal definition encountered can be found in [83]:

An embedded system is an engineering artifact involving computation that is subject to physical constraints. The physical constraints arise through two kinds of interactions of computational processes with the physical world: (1) reaction to a physical environment, and (2) execution on a physical platform”.

Remark: “Accordingly, the two types of physical constraints are reaction constraints and execution constraints. Common reaction constraints specify deadlines, throughput, and jitter; they originate from the behavioral requirements of the system. Common execution constraints put bounds on available processor speeds, power, and hardware failure rates; they originate from the implementation requirements of the system. Reaction constraints are studied in control theory; execution constraints, in computer engineering.

Given that this master’s thesis does not target a specific physical platform or environ- ment - apart from focusing on but not limiting to automotive Electronic Control Units (ECUs)1 -, therefore, it was concluded that it cannot be determined up front whether a given hardware device is an embedded system or suitable as one or not.

However, for isolating VMs from each other, so one guest cannot alter the memory assigned to another one, some form of memory protection support by the processor is inevitable [128] Modern CPUs2 have a Memory Management Unit (MMU) which performs this task[231]. It is also possible to virtualize MMU-less designs, e. g. by implementing virtual memory management entirely in software [7], albeit in an obviously insecure way, and no indication could be found that this actually has been conducted. Thus, it makes sense to restrict research on embedded virtualization to hardware equipped with an MMU. The typical contenders in the area of embedded systems today are ARM, MIPS (Microprocessor without Interlocked Pipeline Stages), PowerPC and x86 processors[7].

Regarding x86 CPUs, even the mobile, desktop and server variants are used in embed- ded systems [79, p. 28 ff.]. However, there exist x86 ISA implementations specifically designed for embedded use as well. Members of the family of Intel® Atom™[7] CPUs are probably known best in this regard. Among others, the G-Series[5] of the AMD Fusion family of x86 processors is also targeted for use in embedded systems, though.

Members of this series consist of a CPU with an integrated Graphics Processing Unit (GPU). AMD termed this combination as Accelerated Processing Unit (APU).

1.1.3 What is reliability?

In ISO/IEC/IEEE 24765:2012[ 25], a standard covering the vocabulary for systems and software engineering, reliability is defined as [125, p. 297]:

1. the ability of a system or component to perform its required functions under stated conditions for a specified period of time. 2. capability of the software product to maintain a specified level of performance when used under specified conditions.

An - as far as systems are concerned - extended and more formal version of this definition is provided in [142, p. 10 f.]:

The Reliability R (t) of a system is the probability that a system will provide the specified service until time t, given that the system was operational at the beginning, i. e., t = t 0 . The probability that a system will fail in a given interval of time is expressed by the failure rate, measured in FITs (Failure In Time). A failure rate of 1 FIT means that the mean time to a failure (MTTF) of a device is 10[9]h, i. e., one failure occurs in about 115,000 years. If a system has a constant failure rate of λ failures/h, then the reliability at time t is given by

illustration not visible in this excerpt

where t − t 0 is given in hours.

Remark: “Safety is reliability regarding critical failure modes.

Therefore, one way of ensuring the reliability of a given system is to use compo- nents having the necessary failure rate. Another approach is to build a system upon Fault-Tolerant Units (FTUs), where the paralleled Fault-Containment Units (FCUs) resembling an FTU may have a failure rate higher than the one required for the system - as long as the resulting reliability of the FTU is sufficient [142, p. 155]. If an FCU is a fail-silent component, i. e. may only omit a value but not provide an incorrect result [142, p. 139], then an FTU consisting of two FCUs is adequate [142, p. 155]. Such a configuration is called Dual Modular Redundancy (DMR)[240]. If an FCU may also exhibit false results, failure masking is possible with a Triple Modular Redundancy

(TMR) setup of three FCUs per FTU [142, p. 156] forming a quorum [163].

illustration not visible in this excerpt

R1 Figure 1.1: Standby DMR reliability comparison

In this thesis, DMR using a standby FCU and virtualization as a vehicle to the actual implementation is considered for increasing the reliability of embedded systems. Generally, the reliability Rs of a standby redundancy system composed of two subsystems having a reliability R 1 each is expressed as [132, p. 149]:

illustration not visible in this excerpt

where the fault-detection coverage C is the count of covered faults divided by the number of all faults. Consequently and as shown in figure 1.1, the resulting reliability of the targeted standby DMR is always higher than that of one of its single subsystems, except for in case the latter always or never fails and insofar C exceeds 50 %.

In general, together with availability, maintainability, safety and security, reliability is one of the metrics of system dependability [142, p. 10 ff.].

1.2 Motivation

In order to give this master’s thesis a direction nevertheless, the possibilities of in- creasing the reliability of automotive ECUs using technology based on virtualization was investigated as part of this work. For proving the feasibility and viability of this approach, the demonstrator presented in chapter 5 has been built and tested.

illustration not visible in this excerpt

Figure 1.2: Current automotive E/E architecture [229, p. 4]

Today, the reliability of automotive software is insufficient and the MTTF of cars is unknown, whereas in avionics reliabilities of 10 [9]hours and greater in terms of mean time between failures are state of the art[31]. One of the reasons for the superiority of the avionics over the automotive industry in this concern are wildly adopted and accepted specifications such as the series of ARINC 653 [10, 11, 9, 12] standards[249]. ARINC 653 defines an Avionics Application Standard Software Interface (ARINC) for space and time partitioning of safety-critical Real-Time Operating Systems (RTOS) including their isolation from each other[249]. Interestingly, while these standards do not specify the use of a hypervisor for achieving the partitioning, the latter lends itself to being implemented via a VMM; in fact, this is common practice[46] Thus, it obviously is still possible to meet the avionic reliability requirements specified in the RTCA DO-178B [203]and DO-178C [204] standards respectively, while taking advantage of virtualization. As will be elaborated in the following, a reason for the cost of the Electrical/Electronic architecture (E/E architecture) of today’s cars is its inefficiency. Therefore, getting to where the avionics industry already is in terms of technology and reliability may in effect be cost neutral or even result in a net win when switching to an improved E/E architecture at the same time.

As depicted in figure 1.2, current automotive E/E architectures typically consist of five or more bus systems and a multitude of ECUs communicating via one or more Central Gateways (CGWs)[219]. Usually, the ISO 11898-1:2003[109] Controller Area Network (CAN) or its Flexible Data-Rate (FD)[21] variant, ISO 17458-1:2013[121] FlexRay™, Local Interconnect Network (LIN[154] and Media Oriented Systems Trans- port (MOST®)[166] but also IEEE 802.3-2012[103] Ethernet are applied for the net- working part. The actual number of ECUs varies by class, generation and model of the vehicle. For example, Mercedes-Benz S-Class cars employ at least 70 networked

illustration not visible in this excerpt

Figure 1.3: Fully meshed redundant auto- motive backbone [229, p. 5]

illustration not visible in this excerpt

Figure 1.4: Automotive domain controller architecture [159, p. 8]

Regarding Ethernet, the introduction of BroadR-Reach® - a single unshielded pair of twisted wires replacement for the physical layer of IEEE 802.3-2012 - is imminent. The first product embedding it is predicted to hit the market in 2013 [159, p. 12]. This adaption in the automotive industry is driven by the One Pair Ethernet (OPEN) Alliance[27] BroadR-Reach® provides a data rate of 100 Megabit per Second (Mbps) full-duplex. In analogy to IEEE 802.3-2012 100BASE-TX Ethernet, thus, its media type is called BR-100[30].

However, even the automotive industry has identified the above mentioned deficien- cies of current E/E architectures and is proposing a consolidated one - above all the Bayerische Motoren Werke (BMW) AG[197]. As shown in figure 1.4, the vision is to primarily use four powerful “server” ECUs - one for each of the four domains drive/power train, chassis, comfort as well as information and communication/info- tainment in a car - instead of a conglomeration of ECUs. Consequently, such a capable ECU is called Domain Control Unit (DCU) or domain controller.

On the networking side, the intention is to use RFC 791[183] IP Version 4 (IPv4)- or RFC 2460[51] IP Version 6 (IPv6)-based communication over Ethernet as a backbone between the DCUs. As for the topology, as illustrated in figure 1.3, even a fully meshed redundant variant and - as can be seen in figure 1.5 - optionally also the use of separate switches is considered. In any case, the new architecture shall at most only use CAN FD in addition to Ethernet or BroadR-Reach®.

illustration not visible in this excerpt

Figure 1.5: Long-term automotive E/E architecture [229, p. 4]

For the same reasons mandated by ARINC 653 and by BMW for E/E architectures but again without specifying the actual implementation, partitioning DCUs in time and space with isolated guests corresponding to the former ECUs - at least as a transition aid to the new architecture - by means of virtualization is beneficial.

Moreover and as will be detailed on later, for the following two reasons the migration to a combination of Ethernet, i. e. BroadR-Reach®, and powerful DCUs harmonizes particularly well with virtualization and allows to introduce additional features in E/E architectures:

1. With 100 Mbps, BroadR-Reach® provides a significantly higher data rate than FlexRay™ (10 Mbps) and MOST® over copper wires (50 Mbps). Thus, it currently is only beaten by optical MOST® (150 Mbps). As shown by H. Zinner et al. in[252], BroadR-Reach® can fulfill the automotive real-time requirements as well. Besides, in form of the Reduced Twisted Pair Gigabit Ethernet (RTPGE) Study Group[96], there are also efforts underway to bring a data rate of 1 Gigabit per Second (Gbps) into cars, which then will be unparalleled.
2. When moving to more capable and different CPUs than those currently used in vehicles anyway, it makes sense to chose processors potent enough to also deal with the overhead of and that are designed for virtualization.

In particular, using a topology involving separate Ethernet switches rather than the DCUs resembling communication hubs - as it is typically the case with ECUs and the legacy bus systems in cars - will foster bringing in dynamic allocation and redundancy across DCUs. This is due to the fact that such an architecture effectively decouples the computing power from the rest of the network, allowing software to be run on basically any of its nodes as far as their locality is concerned.

Actually, using Powerline Communication (PLC) instead of Ethernet would be even more advantageous in this regard, given that the ubiquitous vehicular battery pow- erline serves as the physical network medium in that case. Consequently, network components like switches are superfluous - at the cost of using a shared medium of course. As was shown in[219], using Ethernet-PLC-bridges further detailed in[220], it is even viable to apply PLC as the physical layer for Ethernet and also use a seemless combination of media devoted to Ethernet, i. e. those specified in IEEE 802.3-2012 as well as BroadR-Reach®, and PLC.

Obviously, a switch to IP-based communication in cars potentially will also have an impact on the rest of the E/E architecture and not just the ECUs and DCUs, though. Specifically, it may be necessary to replace “dumb” actuators and sensors with so called “smart” ones[55] incorporating a networked transducer.

Finally, motivations for introducing virtualization alongside a domain controller ar- chitecture - and in embedded systems in general - besides temporal and spatial separation as well as providing a framework for inter-system redundancy are given. Partly, these are derived from use cases in the mainframe and server domains - such as those presented at the beginning of this chapter - mapped into the automotive realm.

1.2.1 Operating system multiplicity

Depending on whether an embedded system has real-time requirements or not, either an RTOS or a General Purpose Operating System (GPOS) is used. For instance, an Auto- motive Open System Architecture (AUTOSAR)[16]conformant ISO 17356-3:2005[110] OSEK/VDX1 OS is used as RTOS in the vehicular domain [196, p. 43]. A typical application of OSEK/VDX, thus, is in the real-time dependent drive train [196, p. 14]. On the other hand, even in a car infotainment components like navigation systems do not necessarily employ an RTOS [196, p. 342].

The feature-richness[153] of GPOS make these also attractive for embedded systems.

Additionally, several GPOSes and application for them are available royalty-free as open source. In particular, the Linux[156]kernel is frequently deployed in embedded systems [80, p. 12], together with GNU’s Not Unix (GNU)[63] userland2. Using a real-time capable hypervisor, the best of both these worlds - GPOS and RTOS - can be combined on a single CPU.

1.2.2 License separation

However, a disadvance of open source software may be its license. In particular, Linux is provided under the GNU General Public License (GPL) version [264]. While the GPL does not hinder commercial use in principle, it requires distributions in binary form to be accompanied with the corresponding source code or at least a written offer to do so. Moreover, the GPL is viral, i. e. in general it requires any derived works and source code eventually linked with GPLed one into binaries to also be placed under that license. Essentially, this may require companies to reveal their intellectual property in form of source code, which they might not be willing to do.

If one’s own code can be separated from the GPL-covered source by placing the binaries in adjacent partitions, virtualization allows for license separation and, thus, circumventing the downside of said license. Generally, this approach is orthogonal to subsection 1.2.1, as one might want to cherry-pick GPLed source code from Linux - for instance a device driver, file system or protocol stack - and integrate it into a custom RTOS, requiring it to be GPLed. In that case, the separation of more confidential parts within a particular device is still possible by placing them into another partition and dual-licensing the RTOS for instance.

1.2.3 Hardware longevity

As mentioned in the beginning of this chapter, one additional application of virtual- ization that emerged quickly was the use of older OS versions on newer hardware by simulating the former peripherals. Due to economic, certification etc. reasons, it might be desirable to run the OS and applications of a previous generation embedded system unchanged on current hardware alike. As long as the succeeding processor is able to execute the ISA of the predecessor natively and only complementary devices need to be emulated - otherwise the whole approach no longer would count as virtualization -, the use of VMs permits for the hardware longevity of embedded systems, too. This practice has the addition advantage of allowing software fully aware of the current platform to run side by side with an aged software stack on the same device.

One area in which this property can be especially convenient is the use of 32-bit software images on the transition path to 64-bit CPUs. This results from the fact that - across all ISAs - these latter typically can execute programs written and built for their 32-bit precursors but require support by the OS to do so. Here, virtualization alleviates the coexistence of 32-bit and 64-bit software in an embedded system by avoiding the need to implement the compatibility shims otherwise necessary at the OS level.

1.2.4 Hardware obliteration

Another use of virtualization already described above was the creation of virtual networks among VMs running on the same hypervisor of a mainframe. Again, this use case can be mapped to embedded systems - in particular in the context of a convergence to an automotive domain architecture - by letting a set of former ECUs now implemented as partitions communicate “directly” without the need for a phys- ical layer. That way, virtualization can facilitate the obliteration of Ethernet, CAN, FlexRay™, LIN, MOST® etc. Media Access Controllers (MACs), given that these are no longer required per guest or VMs at least may share single controllers.

1.2.5 Multi-core leverage

It is often stated, for example in[197] and[153], that - from a chip design perspective - adding cores to a package scales better than increasing the performance of a single core processor. Thus, the move to multi-core CPUs is also supposed to be the remedy for the increasing demand of embedded systems for processing power. However, what is often not stated in this context is that software does not scale to multiple cores alike. As a first approximation, the performance of software increases linearly with the speed of the processor it is running on. Consequently, raising the clock rate in order to supply more horsepower was the approach taken in the past.

By contrast and as detailed in[170], throwing in more cores does not add up the same way and it becomes worse with every ancillary core. In short, this limitation in speedup is due to communication overhead and precautions necessary in an OS for handling multiple cores as well as poorly written applications - which might not even be suitable for parallelization due to the nature of their tasks - and mediocre code generated by compilers out of the source code for them.

A better way to leverage the potential performance of multi-core embedded systems - where not all computing power needs to be devoted to a single application or only to a small set of these -, thus, is partitioning via virtualization. With this kind of setup, it is possible to tie VMs to one core each - also enabling the efficient use of OSes unaware of multiprocessors on otherwise multi-core machinery - or to a subset of the cores available on a platform in order to reduce or even totally avoid the overhead imposed by a multiprocessing (MP) environment.

While not every OS instance on a multi-processor system in turn necessarily requires a dedicated core, the latter configuration commonly is referred to as Asymmetric Multiprocessing (AMP) [243, p. 3]. This contrasts Symmetric Multiprocessing (SMP), which involves a single OS spanning multiple CPUs of multi-core hardware [227, p. 536 ff.]. Partitioning AMP systems via a VMM is known as “supervised AMP” (sAMP) [243, p. 5].

1.2.6 Energy efficiency and partial networking

A technology that evolved in 2005 with the VMware ESX (Elastic Sky X)[169] and - even in a “live” form minimizing the down time of the guest during transition - with the Xen [42] hypervisors is the migration of VMs between physical hosts. More precisely, this feature allows for moving guests running atop a hypervisor on one server - in these cases using IP- and Ethernet-based communication - to be hosted by the VMM of another machine along with keeping all states of the VM intact.

While this capability already is beneficial for keeping the guests online and services provided by them available during scheduled maintenance of the underlying hard- ware [37, p. 7], it also can be used as a foundation for advanced facilities. As outlined in [237, p. 3], it is possible to employ VM migration for building an automated sys- tem that constantly balances the guests across a pool of hosts according to the actual resource usage of the former. This permits optimizations in hardware usage up to the point of completely shutting down servers during periods of reduced load such as night hours and on weekends. Thus, guest migration facilitates the reduction of energy consumption without the need to turn off the VMs as well.

The automotive industry is facing a related problem with its current myriad of ECUs[59]. This is due to the fact that not all functions provided by these are re- quired in all operational states of a car: cruise control, for example, does not need to be available while a vehicle is parked and it is actually undesirable that the boot lid is functional when driving. Yet, the present E/E architecture may even require to keep all ECUs on a given bus system online at all times.

The solution pursued for increasing the energy efficiency of cars is to introduced partial networking[59]. This technology renders it possible to selectively power down unused ECUs and even whole functional clusters. However, in a pure domain controller architecture, this approach can be insufficient to achieve the desired goal. The problem arising in such an environment is that it might be accomplishable to disable most but not all of the functionality of a specific domain and, thus, not be viable to shut off a DCU. In an Electro Vehicle (EV), for instance, the drive train might be entirely unused when parking, except for the battery charger subsystem. By migrating the partition housing the latter during normal operations to another DCU which needs to be active for other reasons anyway, it again could be feasible to completely take down the DCU associated with hosting that task natively. It is even conceivable that migrating VMs can be of use in an intra-chip fashion for increasing energy efficiency. Basically, the SPUMONE (Software Processing Unit, Multiplexing One into two or more) hypervisor presented in[153] is a distributed virtualization concept with one VMM per core of a multi-core processor for enhanced reliability. Moreover, the 2011 ARM Cortex™-A15 MPCore™[13] CPU design arranges for capable but also power demanding A15 cores to be accompanied by a less perfor- mant but power-optimized Cortex™-A7 or similar core within the same package. This is termed a “big.LITTLE” configuration. Again, virtualization and the migration of VMs - in this case from an A7 to an A15 core and vice versa - with a SPUMONE-like VMM allows for the adoption to the current workload including turning off freed cores and thus for implementing energy efficiency in this scenario.

1.2.7 Memory de-duplication

In[174] Y. S. Pan et al. describe a potential merit of virtualization particularly in- teresting for resource constrained embedded systems. With several VMs executing concurrently on the same physical machine, it is likely that these require pages of the exact same content in Random-Access Memory (RAM). This is especially true in case the guests are running the same OS and/or applications, not necessarily in terms of data but at least in code segments.

Using a hypervisor or a virtualization supplement capable of identifying these com- mon memory pages and a de-duplication engine, these can be reduced to only one copy that is shared across all VMs. Consequently, such techniques would permit to either decrease the total amount of RAM required by a virtualized embedded system or to provide more of the freed up precious resources to guests. Similar methods may be applied for optimizing the Read-Only Memory (ROM) utilization across partitions as well.

1.2.8 Self-healing systems

H. Momeni et al. picture another application of virtualization specific benefit for highly integrated yet safety-critical embedded systems in[165], which is self-healing - one of the four essential features of an autonomic system. The basic idea that they are following essentially builds up on the well-known usage of increasing availability and reliability of the whole system by keeping fatal errors local to partitions outlined in the beginning of this chapter.

Another way to look at this idea is to reimplement the concept of a microkernel [227, p. 62 ff.] approach using a hypervisor. A microkernel architecture limits the amount of code run as kernel to the bare minimum absolutely necessary [227, p. 63]. All other parts of a traditional monolithic kernel including device drivers, protocol stacks etc. instead are run as services in user mode upon the microkernel and can be restarted on failures.

The analogy that can be drawn here is that a hypervisor is similar to a microkernel with everything else - in this case whole OSes including other kernels - run as a service on top of it. Herein, self-healing can be implemented via an automatism that restarts VMs which have crashed due to a fatal software exception.

Once more, this mechanism can be of notable advantage in an automotive domain controller architecture with exceedingly converged DCUs. Using a single OS instance on these, a critical failure in a core software component could take down and lead to the outage of all functions provided by the aforementioned DCU. Partitioning embedded devices by means of virtualization and providing a self-healing mechanism alleviates confining the impact of such malign failures to a subset of the functionality and to recover from them - insofar, the VMM itself is not affected, of course.

1.2.9 Summary

The possible advantages of embedded virtualization described in the above subsec- tions certainly are not exhaustive. Nevertheless, it is unlikely that a single of such motivations will drive the decision to virtualize an embedded system, but a combina- tion of the resulting merits for the targeted application probably will.

One argument pro virtualization which might be particularly missed in the above list but is kept being mentioned in that context - for instance in[153],[150],[7][87] and [80,p. 11 f.] - is security. While it is true that - as described at the beginning of this chapter - the isolation of VMs from each other was and is one of the goals of virtualization, it is inappropriate to consider the latter as a technique for bringing security into a system.

This results from the simple fact that using VMs can never be as secure as running a single OS instance on dedicated physical hardware. Above all, because within a virtualized environment, security additionally depends on the correctness of the hypervisor[135]. Moreover, the catalog of VMM security vulnerabilities accumulated in[173] illustrates that this is not a hypothetical problem either. Thus, in [230], A. van Cleeff et al. rightfully conclude that “the overall security impact of virtualization is not well understood”. Lessons should also be learned from the failed attempts to secure the Microsoft Xbox 360™ as well as the Sony PlayStation 3 video game console by the means of a hypervisor[33].

Actually, as will be elaborated on in section 2.4, extreme precautions need to be taken in order to not introduce additional security risks and attack vectors with virtualization.

2 Virtualization

The intention of this chapter is to present an overview of the basic principles of virtualization, the history and evolution of hypervisors as well as various implications of virtualization such as hardware support, security considerations and its use in embedded systems.

2.1 Basic principles

This section is derived from Goldberg’s Ph.D. thesis[72] and - due to better read- ability of the copies preserved - also from his later papers “Architecture of Virtual Machines”[73]of essentially the same sub-content as well as “Formal Requirements for Virtualizable Third Generation Architectures”[181]. The latter paper was co-authored by G. J. Popek and describes what is known as “trap-and-emulate”[81] virtualization which still is the state of the art today - at least when it comes to processors that are virtualizable according to their definition (see subsection 2.1.4).

“Third generation” (III generation) hereby refers to the 1965 to 1980 era of computers based on Integrated Circuits (ICs), starting with the IBM System/360 series launched by the International Business Machines (IBM) Corporation, and multiprogramming, which also gave birth to the UNIX® OS [227, p. 10 ff.]. The 1980 to present PCs are referred to as the “fourth generation” (IV generation), heralded by the Large Scale Integration (LSI) circuits technology [227, p. 13 ff.]. However, the machine model used by Goldberg is abstract enough to apply to the processors of the fourth generation, too, in particular also to that of today’s servers, PCs and embedded systems.

Goldberg additionally gave advice for the development of forth generation machines, which shall include a Hardware Virtualizer (HV) [72, p. 83 ff.]. HVs are supposed to provide a hardware-firmware mechanism for elegantly virtualizing what later on became known as MMUs, but otherwise still require trap-and-emulate virtualiza-

tion. With the addition of hardware support for virtualization to processors further described in section 2.5, HVs eventually came into reality.

2.1.1 Machine model

For their definition of virtualizability, Popek and Goldberg use a simplified model of a real machine with a single processor and linear, uniformly addressable memory presented in the following. The processor in this model has two modes of operation: supervisor and user mode - a characteristic of third generation computers. In super- visor mode, the complete instruction set of the CPU is available, while in the latter mode it is not. Apart from that, the ISA contains the typical instructions of arithmetic, testing, branching and memory movement operations. Memory is virtually addressed via the contents of a Relocation-Bounds (R-B) register.

This machine can exist in any of a finite number of states composited by:

- executable memory E,
- processor mode M (either s for supervisor or u for user mode),
- program counter P and
- R-B register R.

The current state of a machine thus is defined by the quadruple S:

S = 〈 E,M,P,R 〉

Executable memory is conventional, of size q and word- or byte-addressable. Elements in memory are referred to as E [ i ], where 0 < i < q. The R-B register is always active - independently of the current state of the machine - and defined by the tuple R = 〈 l, b 〉. Herein, l refers to the absolute or physical address, corresponding to the virtual address 0, and b - the bounds part - refers to the absolute size. Thus, R = (0 , q − 1) addresses the entire memory. Moreover, out of bounds addresses a produced by an instruction cause memory traps further discussed in subsection 2.1.2, i. e.:

if [Abbildung in dieser Leseprobe nicht enthalten]

else if [Abbildung in dieser Leseprobe nicht enthalten]

The program counter P is an address relative to the contents of R, acting as an index into E and indicating the next instruction to be executed.

A Program Status Word (PSW) is the content of the triplet 〈 M, P, R 〉. For the implementation of traps, E [0] is used to store the old PSW and E [1]to fetch a new PSW. Each component of S can take only a finite number of values and the set of finite states of S is C. An instruction i then is a function i: C → C from C to C. Thus, the execution of an instruction can be represented in one of the following two ways:

illustration not visible in this excerpt

In summary, this model so far specifies a machine with a primitive protection system based on a supervisor/user mode concept and a simple memory addressing built around an R-B system which still holds true for present processors when removing all superficial complexities. Given that this model only serves to define ISA virtualization, it specifically excludes I/O and interrupts, which are seen as an optional extension.

2.1.2 Traps

The above machine model is further refined by the addition of traps. According to Popek and Goldberg, an instruction [Abbildung in dieser Leseprobe nicht enthalten] is said to trap, where:

illustration not visible in this excerpt

Hence, memory is left intact when an instruction traps - except for E [0], which is used to store the PSW that was in effect before the trap occurred - and a new PSW is fetched from E [1]. For real machines, M 2 = s and R 2 = (0 , q − 1) typically hold, i. e. on a trap the processor automatically switches to supervisor mode and the entire memory space is accessible.

This model can be extended to provide different PSWs based on the type of a trap by storing these PSWs in E [1] through E [ x ] for x trap types. One such type is the memory trap already introduced in subsection 2.1.1

2.1.3 Instruction behavior

Popek and Goldberg classify instructions based on their behavior as a function of the state S into one of of the following groups:

- privileged
- sensitive
- innocuous

Whether a physical machine is virtualizable is determined by the group which an instruction falls into.

Privileged instructions are only executable in supervisor mode, otherwise they trigger a trap, which therefore typically is called a privileged instruction trap. An instruction i, thus, is privileged if for any pair of states S 1 = 〈 e, s, p, r 〉 and S 2 = 〈 e, u, p, r 〉, i. e. only differing in the processor mode, i (S 2) traps while i (S 1) does not. Therefore, this notion of “privileged” is close to the conventional one with the possible exception that in the context of virtualization they are required to trap. As will be shown later, the latter behavior is crucial for the virtualizability of machines and it is insufficient that privileged instruction are merely ignored or behave differently when in user mode.

A new group of instructions are the so called sensitive ones, that are further subdivided into control and behavior sensitive instructions. Control sensitive instructions are those for which a state S 1 = 〈 e 1 ,m 1 , p 1 , r 1 exists and i (S 1) = S 2 = 〈 e 2 , m 2 , p 2 , r 2 such that i (S 1) does not trap and either [Abbildung in dieser Leseprobe nicht enthalten] or both hold. In words, these are instructions that either change the amount of memory, other resources available or the processor mode without trapping. Behavior sensitive instructions are those whose execution depends on the value of the R-B register, i. e. upon their location in real memory, or on the processor mode. In other words, behavior sensitive instructions can be used to determine the current state of the process, e. g. its mode. Obviously, the whole set of sensitive instructions must not be executed directly within a VM but virtualized. Otherwise, the requirement of an equivalent environment stated in subsection 1.1.1 as well as that of the hypervisor being in full control of the system would be violated.

All instructions of a processor which are neither privileged nor sensitive are innocuous.

2.1.4 Virtualizability

Based on the above machine model, Popek and Goldberg state their basic theorem (proven in[181] ):

Theorem 1. “ For any conventional third generation computer, a virtual machine monitor may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions. ”

Herein, “conventional third generation computer” refers to machines that behave according to the model specified in subsections 2.1.1 and 2.1.2.

So what does theorem 1 import? As elaborated on in subsection 2.1.2, all sensitive instructions must be caught and virtualized by the hypervisor in order to be able to execute a VM - including the kernel running there - in user instead of supervisor mode. In order to be caught by the VMM, these instructions therefore are required to trap when executed in user mode, which only holds for privileged instructions. Now, if all sensitive instructions are privileged ones, this requirement is fulfilled.

Moreover, they state that as long as a physical machine is not running out of resources, i. e. memory, the original machine is recursively virtualizable (proven in[181] ):

Theorem 2. “ A conventional third generation computer is recursively virtualizable if it is:

(a) virtualizable, and (b) a VMM without any timing dependencies can be constructed for it. ”

2.1.5 Virtual machine map

Based on theorem 1, a homomorphism on C can be defined. For this, Popek and Goldberg subdivide C into Cv, which is the set of all states trapping into a hypervisor, and Cr consisting of the remainder. Moreover, instruction sequences en of finite length l are defined as en (S 1) = ij · · · k (S 1) = S 2. I is the set of all these sequences.

Instructions i can be thought of as unary operators on the set of processor states: i (Sj) = Sk. Likewise, instruction sequences en can also be thought of as unary operators on C. Thus, I contains the operators the homomorphism will be concerned with. The - with respect to all the operators ei in the instruction sequence set I - one- one homomorphism is called the virtual machine map (VM map) f: Cr → Cv. As illustrated in figure 2.1, that is, for any state Si ∈ Cr and any instruction sequence ei, an instruction sequence e ′ i suchthat f (e i (S i)= e i (f (S i))holds,exists.

illustration not visible in this excerpt

Figure 2.1: The virtual machine map [20, p. 29], adapted from[181]

Using the VM map, it is now possible to precisely define what is meant by equivalency of the environment: If both a real and a virtual machine are started in state S 1 and S ′ 1 = f (S 1) respectively, they are equivalent if S ′ 2 = f (S ′ 2)holdswhenbothhalt.

2.1.6 Hypervisor

A hypervisor or VMM is a piece of software called the Control Program by Popek and Goldberg. As already indicated in subsection 2.1.4, a consequence of theorem 1 is that the CP is run in supervisor mode - so it is in full control of the resources of the system resources -, and the OS within a VM is entirely executed in user mode.

Thus, all sensitive instructions issued by the latter are privileged and trigger privileged instruction traps which are caught and handled by the CP. Consequently, innocuous instructions of the VM are executed directly - and therefore a statistically dominant subset -, while sensitive ones are emulated. Hence, this type of virtualization technique got to be known as trap-and-emulate.

The CP is composed of modules falling into the following three groups:

- dispatcher D,
- allocator A and
- interpreter.

By placing the first instruction of the dispatcher into E [1] or - when as prospected in subsection 2.1.2 a trap table of size x is used - in the respective E [ i ] where 1 < i ≤ x it is acting as the trap handler for the sensitive instructions of the VMs. In order to pass full control of the system and the entire memory to the hypervisor, M = s and R = (0 ,q − 1) holds. Similarly, on exit from the CP, the old PSW saved in E [0] and thus user mode, the program location before entry to the hypervisor and the R-B register are restored. The dispatcher can be thought of the top level control module of the CP analyzing the conditions under which the privileged instruction trap occurred and then calls other modules as necessary.

One of these is the allocator A, which manages the system resources and allocates - hence its name - to the VMs. It is also the task of the allocator to take care of keeping the resources, i. e. memory, of the hypervisor and the VMs separate and to only pass disjoint sets to the guests concurrently. The allocator is typically called by the dispatcher when a VM alters its resources, mainly by modifying the R-B register.

Lastly, the third set of CP modules are the interpreters, which handle the remainder of the trapping instructions, one interpreter per privileged. The purpose of these interpreters is to simulate the effects of the instruction that trapped. Therefore, for each privileged instruction i an instruction sequence v exists, where:

illustration not visible in this excerpt

Thus, {vi}, i = 1 to m, where m is the count of privileged instructions, resembles set of all interpreters. Hence, a CP is specified by its three parts:

illustration not visible in this excerpt

2.1.7 Types of hypervisors

Goldberg distinguishes two types of hypervisors - type I and type II - whose architec- ture is depicted in figure 2.2 and figure 2.3 respectively.

illustration not visible in this excerpt

Figure 2.2: Type I hypervisor sys- Figure 2.3: Type II hypervisor sys- tem architecture tem architecture

A type I hypervisor runs directly on the hardware of the real machine1, which in this context is also called a bare machine, metal and later on also bare metal. In this case, a VMM can be thought of as a minimal OS, which is running on the physical system instead of an OS. As should be clear by now, the hypervisor then is used to host one or more VMs. These guests in turn execute a complete OS with their own set of applications on top of it or - in case of recursive virtualization - again another VMM. At least in the x86 world, type I hypervisors typically are booted instead of an OS [37, p. 17] and optionally provide a minimal text-based console and even also management capabilities over a network. However, for computer architectures, where virtualization is an inherent part of the platform, the hypervisor may also entirely be part of the respective machines’ firmware. An example for the latter are implementations of the Sun Microsystems, Inc. sun4v architecture[222]. By contrast, type II hypervisors are applications running atop a conventional OS directly hosted by hardware and thus on the third level. That latter combination of OS and machine generally is referred to as an extended host [72, p. 24]. Again, recursive virtualization is possible in this case. However, depending on the host OS, virtualization technique used, implementation details of the VMM etc., a hypervisor may require special interfaces to the underlying OS in order to be able to fulfill its tasks such as resource allocation (see subsection 3.1.3). In practice, type II VMMs nowadays typically are feature-rich in terms of providing a Graphical User Interface (GUI) and additional comfort functions. A by now classical example fur such a hypervisor is VMware Workstation available on the market since 1999[228]or its “light” variant VMware Player. Yet, still today - based on the exact purpose - type II VMMs do not necessarily have a GUI. The bhyve hypervisor[167] is an example for the latter class of VMMs only fitted with a Command-Line Interface (CLI).

The choice of the type of hypervisor is mainly a question of the intended application.

For the occasional user with a PC or workstation, a type II VMM generally is more appropriate, while for the permanent use on a mainframe or server, bare metal vir- tualization is adequate. The latter is due to the fact that the hosting OS may pose unnecessary overhead and is potentially more error-prone than a hypervisor being lightweight in nature.

Independently of the virtualization technique (see subsections 3.1.1 and 3.1.2 for two alternatives to the trap-and-emulate approach), this categorization of hypervisors into type I and II is applied.

2.1.8 Hybrid virtual machines

As it turned out, only few third generation machines met the virtualization requirement of theorem 1[181]. Therefore, Popek and Goldberg specified a more general but less efficient form of virtualization labeled a Hybrid Virtual Machine (HVM) system. An HVM operates similar to a VMM with the difference, that all of virtual supervisor mode is interpreted unconditionally of whether its instructions are sensitive or not. Thus, an HVM monitor interprets more instructions rather than executing them directly compared to a VMM. Still, this approach performs considerably better than a CSIM.

For the formal definition of an HVM, the class of sensitive instructions is subdivided into the not necessarily disjoint sets of user and supervisor sensitive instruction. An instruction i is user sensitive if any state S = 〈 E,u, P, R 〉 exists, for which i is either control or behavior sensitive. Obviously, this includes instruction used to switch from user to supervisor mode. Likewise, an instruction i is supervisor sensitive if any state S = 〈 E,s,P,R 〉 exists, for which i is either control or behavior sensitive.

This addition allowed Popek and Goldberg to state their third theorem:

Theorem 3. “ A hybrid virtual machine monitor may be constructed for any conventional third generation machine in which the set of user sensitive instructions is a subset of the set of

privileged instructions. ”

This is a relaxed version of theorem 1 because only user sensitive instructions are con- sidered. Specifically, no requirements regarding the behavior of supervisor sensitive instructions are sought as these are always interpreted. Therefore, the HVM monitor is in absolute control of virtual supervisor mode and instructions sensitive in user mode are trapped and emulated as before. Consequently, equivalence between an HVM monitor and a hypervisor can be shown. As will be explained in some more detail in subsection 3.1.1, the original x86 ISA still fails to fulfill even the relaxed virtualization requirements of the HVM approach, though.

2.1.9 Implementation details

illustration not visible in this excerpt

Figure 2.4: CPU protection ring usage on i386 without (left) and with (right) virtual- ization, adapted from [37, p. 17]

This subsection is intended as a sidenote regarding the implementation details of privilege separation with real processors. Several ISAs have the notion of protection rings, a concept that has its origins in the MULTICS (Multiplexed Information and

illustration not visible in this excerpt

Figure 2.5: CPU protection ring usage on a two-ring processor without (left) and with (right) virtualization, adapted from [37, p. 18]

Computing Service) OS [205, p. 17]. Protection rings are a mechanism of hierarchical protection domains, with ring 0 being the inner and most privileged one and the level of privilege degrading outwards with every additional ring. Thus, code executing in ring 0 can access the entire system resources and issue privileged instructions, which typically neither is permitted at higher levels. The i386 ISA has four such rings and - as illustrated on the left side of figure 2.4 - an extended machine is run with the kernel in ring 0 and userland in ring 3 [20, p. 17]. As explained in the above subsections, with virtualization the hypervisor now has the most privileged role. Consequently, as can be seen on the right half of figure 2.4, on an i386 machine type I hypervisors are executed in ring 0, guest kernels in ring 1 and virtual userland still in ring 3 [20, p. 37].

However, not every state of the art ISA necessarily has more than two protection rings - actually, this is even true for AMD64 [37, p. 16] - or has other means of privilege separation like a two-stage execution context etc. in the first place. In that case, the privilege separation without virtualization in the loop depicted on the left of figure 2.5 is obvious. As can be seen on the right of that figure, virtualizing such an ISA requires all of a VM to share the unprivileged ring. Therefore, separation of the virtualized kernel and userland has to be achieved - for AMD64 instead of segmentation [20, p. 38] - by paging or other means. The latter technique is known as ring compression[133]. An additional downside of the latter is that it renders self-virtualization impossible[136].

2.1.10 Process and resource maps

So far, only the virtualization of the ISA has been considered but neither that of mem- ory nor resources in general. In the context of - back then - future fourth generation architectures, Goldberg gave advice on how to virtualize resources, primarily memory [72, p. 87 ff.]. Since - for instance - I/O devices may be memory mapped, this model in principle is also applicable to resources of other types, though. He introduced two maps termed φ - and f-map for this approach.

One φ - or process map [72, p. 88 ff.] exists for each VM, is visible to the guest and can be modified by it. This table is used to translate process names to resource names.

Herein, “process” and “resource” are abstract notations. Actually, for a processor with an MMU - whose main task is to translates virtual memory addresses to physical ones - the φ -map refers to the page table. Although Goldberg never mentioned MMUs explicitly in his publications regarding virtualization, it becomes pretty much clear what he had in mind when inventing this mechanism.

The second map is the f- or resource map [72, p. 92 ff.] translating resource names into real, i. e. physical, resources on the host. Unlike φ -maps, f-maps are only accessible by the hypervisor. In the context of virtualizing MMU-based architectures, f-maps later on became known as shadow page tables[133].

Putting φ - and f-maps together, via its φ -map, a VM translates its virtual resources, i. e. addresses, into resources that are pseudo-physical to the hypervisor, which in turn uses an f-map to translate these into physical resources on the real machine. In other words, resources of a VM are virtualized using the new map f ◦ φ and recursive virtualization uses f f ◦ · · · ◦ f ◦ φ [72, p. 95 ff.].

While an f-map can be implemented in software, Goldberg described this approach as “unnatural and unnecessary” [72, p. 114]. Thus, he envisioned the HV mentioned in the beginning of this section, which provides a hardware implementation of the f-map and dynamic composition of the f- and φ -map at execution time [72, p. 118 ff.]. As also already noted, HVs became reality1 with the addition of hardware virtualization support to CPUs, which will be detailed some more in section 2.5.

2.2 I/O virtualization

Considerable parts of this section are based on P. A. Karger’s and D. R. Safford’s 2008 article “I/O for Virtual Machine Monitors: Security and Performance Issues”[136].

Goldberg et al. only ever really specified processor virtualization and - as already noted in subsection 2.1.1 regarding their machine model used - neglected I/O virtualization.

This is unfortunate, given that in general, virtualizing I/O is more difficult than CPU virtualization.

On the other hand, at least with mainframes, I/O virtualization is rather straight- forward. This is due to the fact that mainframes have I/O “helpers” called channel programs. A “channel” essentially is a special-purpose stored-program computer optimized for high-performance I/O. Mainframes typically have many of these, in- cluding for both disk and network I/O. From the OS’s point of view, there is a single interface to all channels via the SIO (Start I/O) privileged instruction. Independently of whether a computer architecture uses channels or not, I/O operation overhead for userland typically consist of two context switches1: one from user to supervisor mode and - once the I/O has finished - the same way back.

Virtualizing mainframe I/O, thus, is as simple as virtualizing the special SIO instruc- tion. However, although the hypervisor basically only has to do some bounds and sanity checking on these access, this approach already comes with a significant per- formance cost of doubling the number of context switches necessary - now from user to supervisor, then from supervisor to hypervisor and back to user via supervisor mode again. While by performing several I/O operation per SIO trap can mitigate this effect to some extent, this essentially is why virtualization generally performs best on compute-bound - processor virtualization is simple and associated with minimal overhead - and worse on I/O-bound workloads.

With any computer architecture that came after mainframes, this situation became more complicated and in terms of performance - depending on the approach taken - with a degradation of up to one magnitude. The former is due to the fact that helpers in form of channel programs no longer are used2 and the OS, i. e.


1 The term “x86” might be ambiguous. In this thesis, the established convention of letting it refer to both the families of 32-bit and 64-bit processors from AMD and Intel® based on the Intel® 8086 ISA as well as compatible ones is used. The former class is typically alluded to as i386, among others. For disambiguation with the CPU of a similar name, the designation Intel® Architecture 32 (IA-32) was also coined for it. The 64-bit enhancement was initially developed by AMD as AMD64 [3] and later renamed as x86-64. Intel® first called their identical incarnation Extended Memory 64 Technology (EM64T) and then Intel® 64. The term “x86” also accounts for the fact that there are also other manufacturers of compatible processors besides AMD and Intel®, for instance Centaur Technology, Inc. By contrast, the Intel® Architecture 64 (IA-64) - also known as Itanium® by the name of the first CPU implementing it - does not use an x86 ISA.

2 Actually, x86 processors could be used to naturally virtualize a 8086 predecessor - in contrast to self-virtualization, a concept generally known as family-virtualization [72, p. 17] - right from the beginning but not themselves [37, p. 9],[1].

3 Reduced Instruction Set Computer.

1 As will be detailed below, depending on the actual use within a car - for example drive train or infotainment -, the physical environments and platforms of ECUs still vary considerably. [2] When reading Goldberg’s Ph.D. thesis[72], it becomes evident that at that time processors certainly already had hardware support for protecting memory, but MMUs were only on the horizon.

1 Offene Systeme und deren Schnittstellen für die Elektronik im Kraftfahrzeug/Vehicle Distributed Executive.

2 Userland resembles all parts of an OS that are not the kernel and, thus, run in user mode [227, p. 1 f.].

1 In case of recursive virtualization, a VM becomes the “real” machine for the next level of virtualization and so forth.

1 Actually, it is likely that mainframes have grown HVs way before the other now more wildly deployed processor architectures did. However, no explicit references to this effect could be found in due time and given that mainframes are not the topic of this thesis, that question was not further pursued.

1 A context switch involves saving one set of CPU registers and restoring another one, cache flushing and any other operation required for changing the CPU processing from one process to another.

2 In modern computers including embedded systems there often are offload engines of some kind [127,p. 11], which will be dealt with later on to an extent. However, regarding I/O virtualization, these are of no advantage. Actually, quite the contrary, often offload hardware complicates virtualization even more and/or has a negative impact on performance[34].


ISBN (eBook)
ISBN (Buch)
7.8 MB
Institution / Hochschule
Fachhochschule Regensburg – Fakultät Elektro- und Informationstechnik
Virtualization Hypervisor VMM Embedded Systems Reliability Automotive Electronics




Titel: Virtualization for Reliable Embedded Systems