From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5661FC38145 for ; Fri, 2 Sep 2022 10:32:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 889D4800CF; Fri, 2 Sep 2022 06:32:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 812788D001B; Fri, 2 Sep 2022 06:32:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68C47800CF; Fri, 2 Sep 2022 06:32:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 52C028D001B for ; Fri, 2 Sep 2022 06:32:53 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 928ADC0E70 for ; Fri, 2 Sep 2022 10:32:52 +0000 (UTC) X-FDA: 79866782184.16.50F0595 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf02.hostedemail.com (Postfix) with ESMTP id 175338005E for ; Fri, 2 Sep 2022 10:32:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662114771; x=1693650771; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=Ws38xTyai1UArcptdFTCO63DRGm+K11UNHn/R44NQ2A=; b=N//FG4Jq8XKhPXGTuFRJcoI0phXjy/f2yKX9XD+889tXa/bRdqRRYG46 WY3Dy9+TepGSZ/ExcojSCZLQUXxeHSVZ4FheGZZ9lO/0kPTCU0vRyEn/G UafMZwl8jzBVCqHF+zsOKqWt9GQPpXrxWKD4ekNgGwCH74TBhlB4j8TT5 QPge1IQ87HdK760pdua/jI2VLmYf7G8lHRrUuJt2uz9bDm6VJE1JK3iZr j2E0KCv9oXORYR+YApazpygGAlazj7aieQlovFq/io6CddN8M8jiVLZa2 tyvcGM6ePBICoazYDVh2JIJveFHvbMJOoBXCP7Lfzdjw/Y1wPEuuUHdng Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10457"; a="295959439" X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="295959439" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Sep 2022 03:32:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="608945519" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by orsmga007.jf.intel.com with ESMTP; 02 Sep 2022 03:32:39 -0700 Date: Fri, 2 Sep 2022 18:27:57 +0800 From: Chao Peng To: "Kirill A . Shutemov" Cc: Hugh Dickins , "Kirill A. Shutemov" , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-kselftest@vger.kernel.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song , "Gupta, Pankaj" , Elena Reshetova Subject: Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory Message-ID: <20220902102757.GB1712673@chaop.bj.intel.com> Reply-To: Chao Peng References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> <20220818132421.6xmjqduempmxnnu2@box> <20220820002700.6yflrxklmpsavdzi@box.shutemov.name> <20220831142439.65q2gi4g2d2z4ofh@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220831142439.65q2gi4g2d2z4ofh@box.shutemov.name> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662114771; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JRTqsXUC4XlLTanZI5JqazSVLGhTubz9i8V3/efqq9I=; b=V5hgnZmLkaItBS7t3vnZtALCAIwAPZdh4lueZtYOkCucUnY1bC4HM3hWEx3hc9BGE9E+Ev Ep0pHgmVWcBOWAB8VOyFIp34q79473bvvggsG/JTEfiz9P+ocOeA69iw9TfD6Tk4g7AkFA Ql41mAAO+kMepsLfLaZ9tKlbfq7jBMk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="N//FG4Jq"; spf=none (imf02.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662114771; a=rsa-sha256; cv=none; b=0V/tKj90bB7MAQCX5I47RxTAr/PsMvFiAQtEkLNeKerMFGlf+Av5xR8HkEw6SCe084ssXE Ojw/Yu43Rtn3eCSh+pYaLxnIPq/yP8ksk2LlEjhxqLe7F95hjl6Sa5betFYH9Y4fxJchpm oUbY4w+j02fWFMXbfdOvipL9im3ZSdc= Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b="N//FG4Jq"; spf=none (imf02.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-Rspamd-Server: rspam11 X-Stat-Signature: j1asj7naptwpa5hhy8y5t7fdnnt91uj1 X-Rspamd-Queue-Id: 175338005E X-Rspam-User: X-HE-Tag: 1662114770-84498 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 31, 2022 at 05:24:39PM +0300, Kirill A . Shutemov wrote: > On Sat, Aug 20, 2022 at 10:15:32PM -0700, Hugh Dickins wrote: > > > I will try next week to rework it as shim to top of shmem. Does it work > > > for you? > > > > Yes, please do, thanks. It's a compromise between us: the initial TDX > > case has no justification to use shmem at all, but doing it that way > > will help you with some of the infrastructure, and will probably be > > easiest for KVM to extend to other more relaxed fd cases later. > > Okay, below is my take on the shim approach. > > I don't hate how it turned out. It is easier to understand without > callback exchange thing. > > The only caveat is I had to introduce external lock to protect against > race between lookup and truncate. Otherwise, looks pretty reasonable to me. > > I did very limited testing. And it lacks integration with KVM, but API > changed not substantially, any it should be easy to adopt. I have integrated this patch with other KVM patches and verified the functionality works well in TDX environment with a minor fix below. > > Any comments? > ... > diff --git a/mm/memfd.c b/mm/memfd.c > index 08f5f8304746..1853a90f49ff 100644 > --- a/mm/memfd.c > +++ b/mm/memfd.c > @@ -261,7 +261,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg) > #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) > #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) > > -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB) > +#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \ > + MFD_INACCESSIBLE) > > SYSCALL_DEFINE2(memfd_create, > const char __user *, uname, > @@ -283,6 +284,14 @@ SYSCALL_DEFINE2(memfd_create, > return -EINVAL; > } > > + /* Disallow sealing when MFD_INACCESSIBLE is set. */ > + if ((flags & MFD_INACCESSIBLE) && (flags & MFD_ALLOW_SEALING)) > + return -EINVAL; > + > + /* TODO: add hugetlb support */ > + if ((flags & MFD_INACCESSIBLE) && (flags & MFD_HUGETLB)) > + return -EINVAL; > + > /* length includes terminating zero */ > len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1); > if (len <= 0) > @@ -331,10 +340,24 @@ SYSCALL_DEFINE2(memfd_create, > *file_seals &= ~F_SEAL_SEAL; > } > > + if (flags & MFD_INACCESSIBLE) { > + struct file *inaccessible_file; > + > + inaccessible_file = memfd_mkinaccessible(file); > + if (IS_ERR(inaccessible_file)) { > + error = PTR_ERR(inaccessible_file); > + goto err_file; > + } The new file should alse be marked as O_LARGEFILE otherwise setting the initial size greater than 2^31 on the fd will be refused by ftruncate(). + inaccessible_file->f_flags |= O_LARGEFILE; + > + > + file = inaccessible_file; > + } > + > fd_install(fd, file); > kfree(name); > return fd; > > +err_file: > + fput(file); > err_fd: > put_unused_fd(fd); > err_name: