From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6662EC4708D for ; Fri, 2 Dec 2022 06:53:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB10C6B0071; Fri, 2 Dec 2022 01:53:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3B0B6B0073; Fri, 2 Dec 2022 01:53:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98D3A6B0074; Fri, 2 Dec 2022 01:53:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 869826B0071 for ; Fri, 2 Dec 2022 01:53:45 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4ED474077B for ; Fri, 2 Dec 2022 06:53:45 +0000 (UTC) X-FDA: 80196450810.19.E6FB5A3 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf09.hostedemail.com (Postfix) with ESMTP id 83D8714000B for ; Fri, 2 Dec 2022 06:53:43 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gyAKmjJh; spf=none (imf09.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669964024; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JM2rmoE5RvvU7lOMISBUQkmMWM9rsZae6ZcbQ/wR0E8=; b=R7F/T3nzc51YC//l62kLf6spxnN+5eIT6bTIsrkeyIu4GfJJlN0/mLlzw7geQjBjdxbyAy NwGRJNWkYmxTJait588LWhiv6J/RQUFaFlWGhmqXPQm+rLgPxI/RbukgJLzrsleY70SbYi UWT2zdGWfQDjCXeQpqAwfLmPvRFN29E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669964024; a=rsa-sha256; cv=none; b=XOw2tjEOwO7A9z6aN5sv8uKJik1CywpF4uxctFXa/AFNGrRZ90Z4/vNowRJGdMkzbc2nMk SsVeJMw39WEBrMRskgds/BauAQ/47+zBZElGNQEhAwxswauqjHOjPvwiouGpK5TezIwvlL dDEsQdV+conrKzlPmM9wvsOoJ78JYdE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gyAKmjJh; spf=none (imf09.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1669964023; x=1701500023; h=date:from:to:cc:subject:message-id:reply-to:references: mime-version:in-reply-to; bh=hsL57ikcNVoLHdsSPb/2quBPJAcrsHQWRoxcjDJhwfk=; b=gyAKmjJhJAMYmsB9vddkBososbuSZkpLI78pBqiqyOvldfSe/sllQleZ rs7GqKYWYGVtq/mMMYpXc19UOfQAW69TVT0EHyjjSaUqkzbhvSHsjthKY cppnjIkHhVSW6KeSwAHiz2iL5GiQVNS/+7jHSazxvozg7REtu2VYZV5KS TCJkKiRCs3qLf4x6fx3WvN2pMrKWGKNDWRJh87uhoyyvD0lwckBwh1Ymh bExDKOvzA/wU3sHxKNWaqEbtukN9pNT4A08AtjUesyNespFkMmZS+EN3x IZ3SRs3JtC41zHbDoYWxUDHX0kQ9KxFuOUEx/eO5Y0YZ3sFpWatnkHBq8 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10548"; a="299253371" X-IronPort-AV: E=Sophos;i="5.96,210,1665471600"; d="scan'208";a="299253371" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Dec 2022 22:53:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10548"; a="708374082" X-IronPort-AV: E=Sophos;i="5.96,210,1665471600"; d="scan'208";a="708374082" Received: from chaop.bj.intel.com (HELO localhost) ([10.240.193.75]) by fmsmga008.fm.intel.com with ESMTP; 01 Dec 2022 22:53:30 -0800 Date: Fri, 2 Dec 2022 14:49:09 +0800 From: Chao Peng To: Vishal Annapurve Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org, Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Shuah Khan , Mike Rapoport , Steven Price , "Maciej S . Szmigiero" , Vlastimil Babka , Yu Zhang , "Kirill A . Shutemov" , luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret , tabba@google.com, Michael Roth , mhocko@suse.com, Muchun Song , wei.w.wang@intel.com Subject: Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory Message-ID: <20221202064909.GA1070297@chaop.bj.intel.com> Reply-To: Chao Peng References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> <20221025151344.3784230-2-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Spamd-Result: default: False [-1.85 / 9.00]; BAYES_HAM(-2.15)[95.90%]; SUBJECT_HAS_UNDERSCORES(1.00)[]; DMARC_POLICY_ALLOW(-0.50)[intel.com,none]; R_DKIM_ALLOW(-0.20)[intel.com:s=Intel]; MIME_GOOD(-0.10)[text/plain]; RCVD_NO_TLS_LAST(0.10)[]; RCPT_COUNT_TWELVE(0.00)[46]; R_SPF_NA(0.00)[no SPF record]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[intel.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; REPLYTO_EQ_FROM(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_SOME(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; ARC_NA(0.00)[]; HAS_REPLYTO(0.00)[chao.p.peng@linux.intel.com] X-Stat-Signature: rxostcubigob89giik4qditde6aax6bc X-Rspamd-Queue-Id: 83D8714000B X-Rspamd-Server: rspam05 X-HE-Tag: 1669964023-541773 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 01, 2022 at 06:16:46PM -0800, Vishal Annapurve wrote: > On Tue, Oct 25, 2022 at 8:18 AM Chao Peng wrote: > > ... > > +} > > + > > +SYSCALL_DEFINE1(memfd_restricted, unsigned int, flags) > > +{ > > Looking at the underlying shmem implementation, there seems to be no > way to enable transparent huge pages specifically for restricted memfd > files. > > Michael discussed earlier about tweaking > /sys/kernel/mm/transparent_hugepage/shmem_enabled setting to allow > hugepages to be used while backing restricted memfd. Such a change > will affect the rest of the shmem usecases as well. Even setting the > shmem_enabled policy to "advise" wouldn't help unless file based > advise for hugepage allocation is implemented. Had a look at fadvise() and looks it does not support HUGEPAGE for any filesystem yet. > > Does it make sense to provide a flag here to allow creating restricted > memfds backed possibly by huge pages to give a more granular control? We do have a unused 'flags' can be extended for such usage, but I would let Kirill have further look, perhaps need more discussions. Chao > > > + struct file *file, *restricted_file; > > + int fd, err; > > + > > + if (flags) > > + return -EINVAL; > > + > > + fd = get_unused_fd_flags(0); > > + if (fd < 0) > > + return fd; > > + > > + file = shmem_file_setup("memfd:restrictedmem", 0, VM_NORESERVE); > > + if (IS_ERR(file)) { > > + err = PTR_ERR(file); > > + goto err_fd; > > + } > > + file->f_mode |= FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE; > > + file->f_flags |= O_LARGEFILE; > > + > > + restricted_file = restrictedmem_file_create(file); > > + if (IS_ERR(restricted_file)) { > > + err = PTR_ERR(restricted_file); > > + fput(file); > > + goto err_fd; > > + } > > + > > + fd_install(fd, restricted_file); > > + return fd; > > +err_fd: > > + put_unused_fd(fd); > > + return err; > > +} > > + > > +void restrictedmem_register_notifier(struct file *file, > > + struct restrictedmem_notifier *notifier) > > +{ > > + struct restrictedmem_data *data = file->f_mapping->private_data; > > + > > + mutex_lock(&data->lock); > > + list_add(¬ifier->list, &data->notifiers); > > + mutex_unlock(&data->lock); > > +} > > +EXPORT_SYMBOL_GPL(restrictedmem_register_notifier); > > + > > +void restrictedmem_unregister_notifier(struct file *file, > > + struct restrictedmem_notifier *notifier) > > +{ > > + struct restrictedmem_data *data = file->f_mapping->private_data; > > + > > + mutex_lock(&data->lock); > > + list_del(¬ifier->list); > > + mutex_unlock(&data->lock); > > +} > > +EXPORT_SYMBOL_GPL(restrictedmem_unregister_notifier); > > + > > +int restrictedmem_get_page(struct file *file, pgoff_t offset, > > + struct page **pagep, int *order) > > +{ > > + struct restrictedmem_data *data = file->f_mapping->private_data; > > + struct file *memfd = data->memfd; > > + struct page *page; > > + int ret; > > + > > + ret = shmem_getpage(file_inode(memfd), offset, &page, SGP_WRITE); > > + if (ret) > > + return ret; > > + > > + *pagep = page; > > + if (order) > > + *order = thp_order(compound_head(page)); > > + > > + SetPageUptodate(page); > > + unlock_page(page); > > + > > + return 0; > > +} > > +EXPORT_SYMBOL_GPL(restrictedmem_get_page); > > -- > > 2.25.1 > >