From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8D78C433F5 for ; Mon, 23 May 2022 15:22:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1613F6B0005; Mon, 23 May 2022 11:22:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EAA96B0006; Mon, 23 May 2022 11:22:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECC9A6B0007; Mon, 23 May 2022 11:22:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D728D6B0005 for ; Mon, 23 May 2022 11:22:38 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A80776121C for ; Mon, 23 May 2022 15:22:38 +0000 (UTC) X-FDA: 79497374796.01.DB71703 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf07.hostedemail.com (Postfix) with ESMTP id 70026400CC for ; Mon, 23 May 2022 15:22:28 +0000 (UTC) Received: by mail-pl1-f172.google.com with SMTP id i1so13406765plg.7 for ; Mon, 23 May 2022 08:22:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=alhazuhH7tTd8Xd528vjQZhEVIVEKAWK4HCLXZDJ228=; b=FY+eaeQn0QSUHoJZ1pIFqwwhEngjriob9gD9u/B6V7NGsK628vXfe9OaR7V+ivQcG7 oUq06I8eW7HBIntiAx8hVuSzVcI4o08UOO8rnlQzh+q3k/m/WbEVxbdOUyjyOCwOE937 A487RWEuR49BgO5YHvoZXJYj8YibiJbikPaMzwoJU6XjJBkkBT4zo6LA0wUkZ+xe+wVv u/5J4TUl1BPNy5wFyQRJj+jpLzLFkAgws5kcfohn0eOOtgmxkfKFRb1vx4U7mxSjIxMa n0m0TkhMSmH3C563bTiPkClwsxB2JnwTw2UxQ027A0jTP2RpKPm9+cBoqaLh2XhDQs3i KbEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=alhazuhH7tTd8Xd528vjQZhEVIVEKAWK4HCLXZDJ228=; b=3DT4bGnXYHXRr1ECRwQjeMsXNJXZ1ruTYcsaqEWICU6GXG/4XuExC5tty6W6r89AUH BU65DFl9u9Vpm11vdoSU0M3jW7mzpnhcBt3zzdM6jRhHj0CGlyq8znU87Q87zwP/373K hCxwJkfpK2ZUSfTFYXMHHldKbDzqMJq02PJzS9KTkk9A+7f9Q9Jz8nQ17Ox8jt4Ruf2E S0clenFT0fyN1PCFEGPlv5vZ4RDGwr5GXMQlrvtCC3YyWL4Euo4ZsYq+nka4omvrsoo/ GbOiLcQ7mrvlpZTALWOJ8A8aiPWLG5k0SBEAR4t6I/FUszmo3/ylT5rqeuGJSfKb6nHx hdug== X-Gm-Message-State: AOAM530s6FAWIMCBNvfcCTyVp7urB9iOQ1tDwt6StfZOoUH3cmPmH4F0 yKDc7f1PXOKVGQBXw7HmyMqg8Q== X-Google-Smtp-Source: ABdhPJxMnuGcLeXNmgvjCrimkV9tdjgzm2jljb8i5xr1qPYZt2WYeIGCUhIkYo7TSWGZfmskKdgJKg== X-Received: by 2002:a17:90b:3884:b0:1df:db8a:1fcf with SMTP id mu4-20020a17090b388400b001dfdb8a1fcfmr24013896pjb.217.1653319356845; Mon, 23 May 2022 08:22:36 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id q22-20020a170902789600b0016230703ca3sm1655647pll.231.2022.05.23.08.22.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 08:22:36 -0700 (PDT) Date: Mon, 23 May 2022 15:22:32 +0000 From: Sean Christopherson To: Chao Peng Cc: Andy Lutomirski , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com Subject: Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory Message-ID: References: <20220519153713.819591-1-chao.p.peng@linux.intel.com> <20220519153713.819591-5-chao.p.peng@linux.intel.com> <8840b360-cdb2-244c-bfb6-9a0e7306c188@kernel.org> <20220523132154.GA947536@chaop.bj.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220523132154.GA947536@chaop.bj.intel.com> X-Rspamd-Queue-Id: 70026400CC X-Stat-Signature: xjk745ttftynpkwg1mfhyet1mfmyrutb Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FY+eaeQn; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of seanjc@google.com designates 209.85.214.172 as permitted sender) smtp.mailfrom=seanjc@google.com X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1653319348-822079 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, May 23, 2022, Chao Peng wrote: > On Fri, May 20, 2022 at 06:31:02PM +0000, Sean Christopherson wrote: > > On Fri, May 20, 2022, Andy Lutomirski wrote: > > > The alternative would be to have some kind of separate table or bitmap (part > > > of the memslot?) that tells KVM whether a GPA should map to the fd. > > > > > > What do you all think? > > > > My original proposal was to have expolicit shared vs. private memslots, and punch > > holes in KVM's memslots on conversion, but due to the way KVM (and userspace) > > handle memslot updates, conversions would be painfully slow. That's how we ended > > up with the current propsoal. > > > > But a dedicated KVM ioctl() to add/remove shared ranges would be easy to implement > > and wouldn't necessarily even need to interact with the memslots. It could be a > > consumer of memslots, e.g. if we wanted to disallow registering regions without an > > associated memslot, but I think we'd want to avoid even that because things will > > get messy during memslot updates, e.g. if dirty logging is toggled or a shared > > memory region is temporarily removed then we wouldn't want to destroy the tracking. > > Even we don't tight that to memslots, that info can only be effective > for private memslot, right? Setting this ioctl to memory ranges defined > in a traditional non-private memslots just makes no sense, I guess we can > comment that in the API document. Hrm, applying it universally would be funky, e.g. emulated MMIO would need to be declared "shared". But, applying it selectively would arguably be worse, e.g. letting userspace map memory into the guest as shared for a region that's registered as private... On option to that mess would be to make memory shared by default, and so userspace must declare regions that are private. Then there's no weirdness with emulated MMIO or "legacy" memslots. On page fault, KVM does a lookup to see if the GPA is shared or private. If the GPA is private, but there is no memslot or the memslot doesn't have a private fd, KVM exits to userspace. If there's a memslot with a private fd, the shared/private flag is used to resolve the And to handle the ioctl(), KVM can use kvm_zap_gfn_range(), which will bump the notifier sequence, i.e. force the page fault to retry if the GPA may have been (un)registered between checking the type and acquiring mmu_lock. > > I don't think we'd want to use a bitmap, e.g. for a well-behaved guest, XArray > > should be far more efficient. > > What about the mis-behaved guest? I don't want to design for the worst > case, but people may raise concern on the attack from such guest. That's why cgroups exist. E.g. a malicious/broken L1 can similarly abuse nested EPT/NPT to generate a large number of shadow page tables. > > One benefit to explicitly tracking this in KVM is that it might be useful for > > software-only protected VMs, e.g. KVM could mark a region in the XArray as "pending" > > based on guest hypercalls to share/unshare memory, and then complete the transaction > > when userspace invokes the ioctl() to complete the share/unshare. > > OK, then this can be another field of states/flags/attributes. Let me > dig up certain level of details: > > First, introduce below KVM ioctl > > KVM_SET_MEMORY_ATTR Actually, if the semantics are that userspace declares memory as private, then we can reuse KVM_MEMORY_ENCRYPT_REG_REGION and KVM_MEMORY_ENCRYPT_UNREG_REGION. It'd be a little gross because we'd need to slightly redefine the semantics for TDX, SNP, and software-protected VM types, e.g. the ioctls() currently require a pre-exisitng memslot. But I think it'd work... I'll think more on this...