From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67BBAC43334 for ; Mon, 6 Jun 2022 20:10:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C0D6F6B0072; Mon, 6 Jun 2022 16:10:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BBCF56B0073; Mon, 6 Jun 2022 16:10:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5D476B0074; Mon, 6 Jun 2022 16:10:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8D8C56B0072 for ; Mon, 6 Jun 2022 16:10:07 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 554C434577 for ; Mon, 6 Jun 2022 20:10:07 +0000 (UTC) X-FDA: 79548902454.24.8EB1225 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf28.hostedemail.com (Postfix) with ESMTP id 20F44C007B for ; Mon, 6 Jun 2022 20:09:18 +0000 (UTC) Received: by mail-pg1-f173.google.com with SMTP id q123so13819513pgq.6 for ; Mon, 06 Jun 2022 13:10:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oQFpDXi6bD7+wKzTyAPy27GXXUSscdFak0V6JCay3LM=; b=Lt8bn1CKv27NBAN2LZ6a3kjXi7zTNT9n5YC6qdlDRRz27wSGX169ro34tJXroQ4ylw 9Wh5AtBkZYrRJCB6VOaJC0MlhiSdzn1C4PqwPfP5C2UQ58ClSnY1eORDsS+jz9o2IDri ni9cn1KDaMb5LTVmi0K48gTlOfkMsvw04FEVxngHWMm311dg/8tSsLNw8NZngXEjzLdm MRj2v0erbTuphncL4y5iv8ZG1Wuu7I1voO3ifQtd6JmL8xCnR4egkVJv67k8mCcsucuy B7Vorowmpt94+GNwzICo6EQEcNJyTDkVWQJuYGl9kQjzC8CdwmuVUJOWPEVPDexxdgNI LvRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oQFpDXi6bD7+wKzTyAPy27GXXUSscdFak0V6JCay3LM=; b=F9ij+v3nLtFDvz0za+O2IskKkAQJ8a3Po+gDoau3slVe1gRPyWL2acZNkd5DBtPEGL ppHWjZfgDBxRCBq38tJdRuX99opCs2i03VgvZQx/BeKcOwMcYo3HmHMrTsK0VXrttp76 eM3qgNEfw3QoQSsqWxwohztH+6jeKI4msIEJdQOAzjmEY2Fc6Ng1QhLXEoKC5zrzUp+N 54MOLHnjp+G8+rish7gcZinuXDJxZDst+xWKRx0O4b7/Yqn1g5+NemQXdpGlx3wa+LgW b97+f6EuEwDjAOdTa14+8N+vHDZPUPtqNZCL94GExQjXS/d4WMbrkeKY1QHNwUMuR1Rl HG7A== X-Gm-Message-State: AOAM5308RlQ/ju619gceDZQ/MFlXwMzj7K2TaZG2F3XzLcIKvjqFmcoP WE6N2DGn0pJwsGLoVj9k/C10BjhI2YLboBycVP/qgg== X-Google-Smtp-Source: ABdhPJz71kp7wjsXPjRB2ML7eIa2fwB82eNGfeI62OXeQw8INa+MLyq06Q2ncMec3mjRtswZPQ72CnuvPkBz4ir4YOc= X-Received: by 2002:a63:69c2:0:b0:3fa:78b5:d991 with SMTP id e185-20020a6369c2000000b003fa78b5d991mr23043411pgc.40.1654546201400; Mon, 06 Jun 2022 13:10:01 -0700 (PDT) MIME-Version: 1.0 References: <20220519153713.819591-1-chao.p.peng@linux.intel.com> In-Reply-To: <20220519153713.819591-1-chao.p.peng@linux.intel.com> From: Vishal Annapurve Date: Mon, 6 Jun 2022 13:09:50 -0700 Message-ID: Subject: Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory To: Chao Peng Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Yu Zhang , "Kirill A . Shutemov" , Andy Lutomirski , Jun Nakajima , dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 20F44C007B X-Stat-Signature: y6c7uxnpe5saaqram4hhhny834pm1mgr X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Lt8bn1CK; spf=pass (imf28.hostedemail.com: domain of vannapurve@google.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=vannapurve@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1654546158-253750 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > Private memory map/unmap and conversion > --------------------------------------- > Userspace's map/unmap operations are done by fallocate() ioctl on the > backing store fd. > - map: default fallocate() with mode=0. > - unmap: fallocate() with FALLOC_FL_PUNCH_HOLE. > The map/unmap will trigger above memfile_notifier_ops to let KVM map/unmap > secondary MMU page tables. > .... > QEMU: https://github.com/chao-p/qemu/tree/privmem-v6 > > An example QEMU command line for TDX test: > -object tdx-guest,id=tdx \ > -object memory-backend-memfd-private,id=ram1,size=2G \ > -machine q35,kvm-type=tdx,pic=no,kernel_irqchip=split,memory-encryption=tdx,memory-backend=ram1 > There should be more discussion around double allocation scenarios when using the private fd approach. A malicious guest or buggy userspace VMM can cause physical memory getting allocated for both shared (memory accessible from host) and private fds backing the guest memory. Userspace VMM will need to unback the shared guest memory while handling the conversion from shared to private in order to prevent double allocation even with malicious guests or bugs in userspace VMM. Options to unback shared guest memory seem to be: 1) madvise(.., MADV_DONTNEED/MADV_REMOVE) - This option won't stop kernel from backing the shared memory on subsequent write accesses 2) fallocate(..., FALLOC_FL_PUNCH_HOLE...) - For file backed shared guest memory, this option still is similar to madvice since this would still allow shared memory to get backed on write accesses 3) munmap - This would give away the contiguous virtual memory region reservation with holes in the guest backing memory, which might make guest memory management difficult. 4) mprotect(... PROT_NONE) - This would keep the virtual memory address range backing the guest memory preserved ram_block_discard_range_fd from reference implementation: https://github.com/chao-p/qemu/tree/privmem-v6 seems to be relying on fallocate/madvise. Any thoughts/suggestions around better ways to unback the shared memory in order to avoid double allocation scenarios? Regards, Vishal