From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FF23C04AA7 for ; Mon, 13 May 2019 21:42:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1CAE22085A for ; Mon, 13 May 2019 21:42:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1CAE22085A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A69336B0269; Mon, 13 May 2019 17:42:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A193A6B026A; Mon, 13 May 2019 17:42:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 908016B026B; Mon, 13 May 2019 17:42:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 565C36B0269 for ; Mon, 13 May 2019 17:42:55 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id e20so10417272pfn.8 for ; Mon, 13 May 2019 14:42:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:thread-topic:thread-index:date:message-id:references :in-reply-to:accept-language:content-language:mime-version; bh=4WsEukJt4QL54JGtqy+DlOk3x8JqGQbOVxe7TpPlfcg=; b=Tmb3izpoMuENvJzh5/7QWQzrxJV6EgXHwSA7k1P+dOJOlYPP88+IFn5JgKxBRYaYe6 gwByGHpcyZgKFo0ErDhnNtLxsqU+OX/f2wiWTGr+5przsGVrlPDLH/k6s1W9WpyMkpE6 Mz7B8cm98AidyOdRCIPASaIgc94UxudXbhp8gRUz9y6vsvaXYD50lq06R2Kx17JVcmmG 0HPPGePd+J8pnhAv93YnIa8eW/tZ+XKKOKc2YoD1bMJtpRSdu5H1436fTC65Q6+/U2sw 1TLRT30B5OGEWVYE30aqfFDrocIdZJM8szZ2lBklF2VHK8Q91Ex/wvLQEFXoW0NtUIZZ Wkkw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of jun.nakajima@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=jun.nakajima@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: APjAAAWxsktIzrahCCvthWX8620GjBlVjYXJ0z380iBRu7/+e3fXc41j /CNCrXIaLoS7EvwFOGtB2RqcTD7zphZlTXqu6wDJzBJ4Fr45AvAlUl4+4eT8zU8xLSc3wWeHwzT 74GSJj13wtn5+ALESKULClNO8wDC2aF5tzNs6PKcJl/bW+H5c/SkDewebNjeaEU/sqQ== X-Received: by 2002:a63:c50c:: with SMTP id f12mr33954811pgd.71.1557783774853; Mon, 13 May 2019 14:42:54 -0700 (PDT) X-Google-Smtp-Source: APXvYqwoTbDe+wGvvGWhqOxPc3Jno4ZgXobm52xZxYUjjTrrWJW0lyw2W32tkbyymOuyB9L0h9MU X-Received: by 2002:a63:c50c:: with SMTP id f12mr33954775pgd.71.1557783774084; Mon, 13 May 2019 14:42:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557783774; cv=none; d=google.com; s=arc-20160816; b=ASxA3oK6NZuLKkUr4uv024nRg+ScqgMop+r0BR0JU0gQXDFjh9rbVtUouo0Vit3l7i 4GtfFvwD7C5c7Jh5NNLIfej3/nfUrPdvxlOZHZgyoYpGM6AFO35OSfXHC6Po4ybxlIYO Sxd4bn+7xXSoA9eXsunLO307EolVOLmOaZXsPMJHlCkv78GOKSsexT4uWv59qjgyGj8R kte/oJg/cuJ1AxASOxxyLY0hw1LpmaI8vREY+vvXPJc3jA87Ik2HwlGNDGM7ytVZC27/ CnjkCcUfC6D2YmsRjpvzm1rkMiIP2CyLUf+gObKlKmuUf6nytzZpcrztLNdD1SbeZmdR 0nHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:content-language:accept-language:in-reply-to :references:message-id:date:thread-index:thread-topic:subject:cc:to :from; bh=4WsEukJt4QL54JGtqy+DlOk3x8JqGQbOVxe7TpPlfcg=; b=r5m0eXYm8rVwWvQ0u3j/qMRmleS7gJt45JorYfo8rPRqdT/pQ8GNNxiL/40XsVf+Te hGt+qJMz2fXiIoc6AAMph60+azkyRRMkugAGh+3CsIuKTzNr2JLupuGVMKrj8KOdHkiN aXLJ1dZZGGer6qZAyfb1Xd3V2umhMEwqwnZSCEJ8b9IbGJDmZne9o+PMdCCHn2sGcKvy OVe6Yaq9sq2hwgXFEp7AcRKTjTx4FDzRjEMvmuhIIZl318hJruqGXGKqauv7HLqJ2SS7 WeESVKSSn7kLLc9M1DxRIUbQMPNKJeqBDgJ9Xhz1B+mhRh/f7F3gJplwWVR143+4lBDc kJOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of jun.nakajima@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=jun.nakajima@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTPS id 21si7889664pfc.98.2019.05.13.14.42.53 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 May 2019 14:42:54 -0700 (PDT) Received-SPF: pass (google.com: domain of jun.nakajima@intel.com designates 134.134.136.24 as permitted sender) client-ip=134.134.136.24; Authentication-Results: mx.google.com; spf=pass (google.com: domain of jun.nakajima@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=jun.nakajima@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 May 2019 14:42:53 -0700 X-ExtLoop1: 1 Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by fmsmga005.fm.intel.com with ESMTP; 13 May 2019 14:42:53 -0700 Received: from fmsmsx101.amr.corp.intel.com ([169.254.1.164]) by FMSMSX103.amr.corp.intel.com ([169.254.2.58]) with mapi id 14.03.0415.000; Mon, 13 May 2019 14:42:53 -0700 From: "Nakajima, Jun" To: Liran Alon CC: Alexandre Chartre , "pbonzini@redhat.com" , "rkrcmar@redhat.com" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "hpa@zytor.com" , "dave.hansen@linux.intel.com" , "luto@kernel.org" , "peterz@infradead.org" , "kvm@vger.kernel.org" , "x86@kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "konrad.wilk@oracle.com" , "jan.setjeeilers@oracle.com" , "jwadams@google.com" Subject: Re: [RFC KVM 00/27] KVM Address Space Isolation Thread-Topic: [RFC KVM 00/27] KVM Address Space Isolation Thread-Index: AQHVCZo0+3UN0P2FyECwVvtVyp68J6ZpcaEAgACSxACAAAdfgA== Date: Mon, 13 May 2019 21:42:52 +0000 Message-ID: References: <1557758315-12667-1-git-send-email-alexandre.chartre@oracle.com> <11F6D766-EC47-4283-8797-68A1405511B0@intel.com> <46FF68B2-3284-4705-A904-328992449D43@oracle.com> In-Reply-To: <46FF68B2-3284-4705-A904-328992449D43@oracle.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.254.35.195] Content-Type: multipart/alternative; boundary="_000_D07C8F51F2DF4C8BAB3B0DFABD5F4C33intelcom_" MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --_000_D07C8F51F2DF4C8BAB3B0DFABD5F4C33intelcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On May 13, 2019, at 2:16 PM, Liran Alon > wrote: On 13 May 2019, at 22:31, Nakajima, Jun > wrote: On 5/13/19, 7:43 AM, "kvm-owner@vger.kernel.org on behalf of Alexandre Chartre" wrote: Proposal =3D=3D=3D=3D=3D=3D=3D=3D To handle both these points, this series introduce the mechanism of KVM address space isolation. Note that this mechanism completes (a)+(b) and don't contradict. In case this mechanism is also applied, (a)+(b) should still be applied to the full virtual address space as a defence-in-depth)= . The idea is that most of KVM #VMExit handlers code will run in a special KVM isolated address space which maps only KVM required code and per-VM information. Only once KVM needs to architectually access other (sensitiv= e) data, it will switch from KVM isolated address space to full standard host address space. At this point, KVM will also need to kick all sibling hyperthreads to get out of guest (note that kicking all sibling hyperthre= ads is not implemented in this serie). Basically, we will have the following flow: - qemu issues KVM_RUN ioctl - KVM handles the ioctl and calls vcpu_run(): . KVM switches from the kernel address to the KVM address space . KVM transfers control to VM (VMLAUNCH/VMRESUME) . VM returns to KVM . KVM handles VM-Exit: . if handling need full kernel then switch to kernel address space . else continues with KVM address space . KVM loops in vcpu_run() or return - KVM_RUN ioctl returns So, the KVM_RUN core function will mainly be executed using the KVM addre= ss space. The handling of a VM-Exit can require access to the kernel space and, in that case, we will switch back to the kernel address space. Once all sibling hyperthreads are in the host (either using the full kernel= address space or user address space), what happens to the other sibling hy= perthreads if one of them tries to do VM entry? That VCPU will switch to th= e KVM address space prior to VM entry, but others continue to run? Do you t= hink (a) + (b) would be sufficient for that case? The description here is missing and important part: When a hyperthread need= s to switch from KVM isolated address space to kernel full address space, i= t should first kick all sibling hyperthreads outside of guest and only then= safety switch to full kernel address space. Only once all sibling hyperthr= eads are running with KVM isolated address space, it is safe to enter guest= . Okay, it makes sense. So, it will require some synchronization among the si= blings there. The main point of this address space is to avoid kicking all sibling hypert= hreads on *every* VMExit from guest but instead only kick them when switchi= ng address space. The assumption is that the vast majority of exits can be = handled in KVM isolated address space and therefore do not require to kick = the sibling hyperthreads outside of guest. --- Jun Intel Open Source Technology Center --_000_D07C8F51F2DF4C8BAB3B0DFABD5F4C33intelcom_ Content-Type: text/html; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable


On May 13, 2019, at 2:16 PM, Liran Alon <liran.alon@oracle.com> wrote:

On 13 May 2019, at 22:31, Nakajima, Ju= n <jun.nakajima@int= el.com> wrote:

On 5/13/19, 7:43 AM, "kvm-owner@vger.kernel.org on behalf of Alexandre Chartre" = wrote:

  Proposal
  =3D=3D=3D=3D=3D=3D=3D=3D

  To handle both these points, this series introduce the mechanis= m of KVM
  address space isolation. Note that this mechanism completes (a)= +(b) and
  don't contradict. In case this mechanism is also applied, (a)&#= 43;(b) should
  still be applied to the full virtual address space as a defence= -in-depth).

  The idea is that most of KVM #VMExit handlers code will run in = a special
  KVM isolated address space which maps only KVM required code an= d per-VM
  information. Only once KVM needs to architectually access other= (sensitive)
  data, it will switch from KVM isolated address space to full st= andard
  host address space. At this point, KVM will also need to kick a= ll sibling
  hyperthreads to get out of guest (note that kicking all sibling= hyperthreads
  is not implemented in this serie).

  Basically, we will have the following flow:

    - qemu issues KVM_RUN ioctl
    - KVM handles the ioctl and calls vcpu_run():
      . KVM switches from the kernel address = to the KVM address space
      . KVM transfers control to VM (VMLAUNCH= /VMRESUME)
      . VM returns to KVM
      . KVM handles VM-Exit:
        . if handling need full ker= nel then switch to kernel address space
        . else continues with KVM a= ddress space
      . KVM loops in vcpu_run() or return
    - KVM_RUN ioctl returns

  So, the KVM_RUN core function will mainly be executed using the= KVM address
  space. The handling of a VM-Exit can require access to the kern= el space
  and, in that case, we will switch back to the kernel address sp= ace.

Once all sibling hyperthreads are in the host (either using the full kernel= address space or user address space), what happens to the other sibling hy= perthreads if one of them tries to do VM entry? That VCPU will switch to th= e KVM address space prior to VM entry, but others continue to run? Do you think (a) + (b) would be suf= ficient for that case?

The description here is missing and important part: When a hyperthread need= s to switch from KVM isolated address space to kernel full address space, i= t should first kick all sibling hyperthreads outside of guest and only then= safety switch to full kernel address space. Only once all sibling hyperthreads are running with KVM isolated ad= dress space, it is safe to enter guest.


Okay, it makes sense. So, it will require some synchronization among t= he siblings there.

The main point of this address space is to avoid kicking al= l sibling hyperthreads on *every* VMExit from guest but instead only kick t= hem when switching address space. The assumption is that the vast majority = of exits can be handled in KVM isolated address space and therefore do not require to kick the sibling hyperthread= s outside of guest.


---
= Jun
= Intel Open Source Technology Center
--_000_D07C8F51F2DF4C8BAB3B0DFABD5F4C33intelcom_--