From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E7C0CCA48C for ; Mon, 25 Jul 2022 13:47:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08FDB8E0002; Mon, 25 Jul 2022 09:47:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 03ECA8E0001; Mon, 25 Jul 2022 09:47:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E49428E0002; Mon, 25 Jul 2022 09:47:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D376D8E0001 for ; Mon, 25 Jul 2022 09:47:15 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9FB1B1203A8 for ; Mon, 25 Jul 2022 13:47:15 +0000 (UTC) X-FDA: 79725748830.30.B33AADB Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf25.hostedemail.com (Postfix) with ESMTP id DE77BA00B6 for ; Mon, 25 Jul 2022 13:47:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658756833; x=1690292833; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=Bdlz/iQMAncTEc31gSNH4MyviKpdF4xPNxVdJTjrNtE=; b=hKjOLenvIfbsIHvdwuVX9rhvWnQMi6F9k0m+ecSyw4tP6hv4CwupOFgB D0qxfAk4WEm2bkqIByQOqzDkhfdqrsMu6nLL/rw3fUjjq9bNhLl6m+Siq felyE7/nocjMDqr2U9rcS00MJ0YRrN8aWxqsRmYXM/OsKGeENGfAIH8xk js79Py7iw2Yq0ahw3STUVND0KhWo1AU6QkI4gNevB+LVEgMDgSkhxJC6p een9aKK28ixXIA+PKUcI29f6F70DrD+mAN7L4HxNudiIryP8s51KzWlTV b1vcdQgibHhGrb0Ti9wCHdT75jaMtd1Zj3ilEOPfu+tRKJ6/n6IilBzpB Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10418"; a="274565643" X-IronPort-AV: E=Sophos;i="5.93,193,1654585200"; d="scan'208";a="274565643" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2022 06:47:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,193,1654585200"; d="scan'208";a="627457285" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by orsmga008.jf.intel.com with ESMTP; 25 Jul 2022 06:47:01 -0700 Date: Mon, 25 Jul 2022 21:42:12 +0800 From: Chao Peng To: David Hildenbrand Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, linux-kselftest@vger.kernel.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Vishal Annapurve , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , Michael Roth , mhocko@suse.com, Muchun Song Subject: Re: [PATCH v7 01/14] mm: Add F_SEAL_AUTO_ALLOCATE seal to memfd Message-ID: <20220725134212.GB304216@chaop.bj.intel.com> Reply-To: Chao Peng References: <20220706082016.2603916-1-chao.p.peng@linux.intel.com> <20220706082016.2603916-2-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658756834; a=rsa-sha256; cv=none; b=mju9vBllj+vDSY7IGZeV7B2X91YFgwA2DXiE5tnKIdohpucfd/RiLRs1PoyWNBC2O2Rapa Vyq6w3lQltSP58+WzCnU8MD3Ab0hvhPOk8EQrQUrfbkZHf5KiQuCtpcfHn6C1IDIo5XfkE x/miNfO0xBXpvTuWVF/dP++7Jre81nE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=hKjOLenv; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none); spf=none (imf25.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.20) smtp.mailfrom=chao.p.peng@linux.intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658756834; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O4mkGWaagj57W3u2vCf4UErUVrgU7CgbpgqYWO7l6yg=; b=pqdL6OAscPLV1xQe+G6fvpYUCEjBb+8Q/Gmfjv7OVo5qFx+5IDKFB6jqKuqp7PxaaJj4Yr ppE7V4MlzndBPCTobyJL0wEAdesobDM9QgqIiz5Tgdnld7awG1Y5LEBt1E0ZwRQ6Mzgi0N ISWtylJY6dskXjTyxnFv1IgS7VrkHvw= X-Stat-Signature: qrpe14fdwwq984nrbetc89im5i5mzur8 X-Rspamd-Queue-Id: DE77BA00B6 Authentication-Results: imf25.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=hKjOLenv; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none); spf=none (imf25.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.20) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1658756832-737860 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 21, 2022 at 11:44:11AM +0200, David Hildenbrand wrote: > On 06.07.22 10:20, Chao Peng wrote: > > Normally, a write to unallocated space of a file or the hole of a sparse > > file automatically causes space allocation, for memfd, this equals to > > memory allocation. This new seal prevents such automatically allocating, > > either this is from a direct write() or a write on the previously > > mmap-ed area. The seal does not prevent fallocate() so an explicit > > fallocate() can still cause allocating and can be used to reserve > > memory. > > > > This is used to prevent unintentional allocation from userspace on a > > stray or careless write and any intentional allocation should use an > > explicit fallocate(). One of the main usecases is to avoid memory double > > allocation for confidential computing usage where we use two memfds to > > back guest memory and at a single point only one memfd is alive and we > > want to prevent memory allocation for the other memfd which may have > > been mmap-ed previously. More discussion can be found at: > > > > https://lkml.org/lkml/2022/6/14/1255 > > > > Suggested-by: Sean Christopherson > > Signed-off-by: Chao Peng > > --- > > include/uapi/linux/fcntl.h | 1 + > > mm/memfd.c | 3 ++- > > mm/shmem.c | 16 ++++++++++++++-- > > 3 files changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > index 2f86b2ad6d7e..98bdabc8e309 100644 > > --- a/include/uapi/linux/fcntl.h > > +++ b/include/uapi/linux/fcntl.h > > @@ -43,6 +43,7 @@ > > #define F_SEAL_GROW 0x0004 /* prevent file from growing */ > > #define F_SEAL_WRITE 0x0008 /* prevent writes */ > > #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ > > +#define F_SEAL_AUTO_ALLOCATE 0x0020 /* prevent allocation for writes */ > > Why only "on writes" and not "on reads". IIRC, shmem doesn't support the > shared zeropage, so you'll simply allocate a new page via read() or on > read faults. Right, it also prevents read faults. > > > Also, I *think* you can place pages via userfaultfd into shmem. Not sure > if that would count "auto alloc", but it would certainly bypass fallocate(). Userfaultfd sounds interesting, will further investigate it. But a rough look sounds it only faults to usrspace for write/read fault, not write()? Also sounds it operates on vma and userfaultfd_register() takes mmap_lock which is what we want to avoid for frequent register/unregister during private/shared memory conversion. Chao > > -- > Thanks, > > David / dhildenb