Thesis abstract
Virtual machine monitors (VMMs) or hypervisors play a crucial role in cloud computing platforms’ software stack. Their design and implementation significantly impact the performance, security, and robustness of cloud tenants applications. Hypervisors classified as Type-I are the most efficient, since they offer stronger isolation and better performance than Type-II pendant. In most of today’s Type-I virtualized systems (e.g., Xen or Hyper-V), the hypervisor relies on a privileged virtual machine (pVM). The pVM accomplishes work both for the hypervisor (e.g., VM life cycle management) and client VMs (I/O management). On uniform and non-uniform memory access (UMA & NUMA) architectures, this pVM-based architecture raises two challenging problems :
• (1) pVM’s resource sizing (CPU + memory) and placement — Indeed, an inappropriate pVM sizing and resource placement impact guests’ application performance. It is a tricky issue since there is a tight correlation between pVM’s needs and guest activities. Existing solutions either propose static approaches which lead to over/under-provisioning or do not consider resource placement in NUMA architectures. • (2) pVM’s fault tolerance — Being a central component, the pVM represents a critical component with a large blast radius in case of a failure. Existing approaches to improve the pVM’s fault tolerance provide limited resilience guarantees or prohibitive overheads.
This dissertation presents several design changes brought to the pVM from architectural and logical perspectives to tackle these problems. Concretely, this thesis introduces :
1. Closer, a principle for designing a suitable OS for the pVM. Closer consists of respectively scheduling and allocating pVM’s tasks and memory as close to the target guest as possible. Closer being a dynamic approach, alleviates the need to size the pVM and handles its resource placement in NUMA architectures with its locality strategy.
2. Two new mechanisms that reduce the overhead of page flipping (an efficient scheme used in network I/O virtualization) when used on NUMA architectures. By carefully selecting pVM pages for page flipping depending on their location, the latter mechanisms achieve better performance than the current network virtualization protocol.
3. A set of three design principles (disaggregation, specialization, and pro-activity) and optimized implementation techniques for building a resilient pVM without sacrificing guest application performance.
We build prototypes of pVM-based hypervisors (relying on the Xen hypervisor) that implements all the principles above. We validate the effectiveness of our prototypes by conducting several evaluations with a series of benchmarks. The results obtained shows better performance than state-of-the-art approaches and low overhead.
This dissertation highlights the critical role of the pVM in a virtualized environment and shows that it requires more attention from the research community.
Keywords : Virtualization, NUMA, hypervisor, pVM, sizing, resilience.
The jury commitee is composed of:
Mr. Pascal Felber, Professor, University of Neuchâtel, Reviewer
Mr. Willy Zwanepoel, Proefessor, University of Sydney, Reviewer
Mr. Marc Shapiro, Professor, Inria, Examiner
Mr. Renaud Lachaize, Assistant Professor, University of Grenoble Alpes, Examiner
Mr. Alain Tchana, Professor, ENS Lyon, Co-advisor
Mr. Noël De Palma, Professor, University of Grenoble Alpes, Advisor