From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7395DC433EF for ; Thu, 7 Apr 2022 17:10:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AE916B0072; Thu, 7 Apr 2022 13:10:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95DBE6B0073; Thu, 7 Apr 2022 13:10:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7FDEF6B0074; Thu, 7 Apr 2022 13:10:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 700256B0072 for ; Thu, 7 Apr 2022 13:10:32 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 1323363E2F for ; Thu, 7 Apr 2022 17:10:22 +0000 (UTC) X-FDA: 79330721484.16.8CDB852 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf06.hostedemail.com (Postfix) with ESMTP id 6AB68180004 for ; Thu, 7 Apr 2022 17:10:21 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A5ECF61383; Thu, 7 Apr 2022 17:10:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 667DCC385A6; Thu, 7 Apr 2022 17:10:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649351419; bh=4Bi+StaZdOPFXAdU/bNHFV7mZV6B2itRPcHJI8amDrg=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=u9TLtj76C8m9CC/tH7uQ7MlfCEWIi+QNH/YhOnoZ9xa/e9MdwAaItd74EmSTmNJKj HOr5CIZ7jsLIxtluoCWxg8Ao3Zx+DUVpo4P5PQ1vGpMcDrKfRyLTBGLnd6BLBkkYaZ BYBhwWLsbrQLvaln61cfErbSWkxUEMoyB6idvNPq8Y0+s2XiS+LCdpMxfs2N/5cLoH WN6cIKF8yDkDa37YZGAvsL10YCBchgRdWDMJw7Ez/xSwxKFZSOD1tSm0nuKrPyGsOn w9B1VnQyj/no82a/jOGdxOucrTSdfFvGcU1ABMXjbwjTxx2Np1xrFjgzvAIpM5JOkG Og7zspGQiS5sA== Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 43D7427C0054; Thu, 7 Apr 2022 13:10:17 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute2.internal (MEProxy); Thu, 07 Apr 2022 13:10:17 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudejkedguddutdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpedvleehjeejvefhuddtgeegffdtjedtffegveethedvgfejieev ieeufeevuedvteenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id AC96021E006E; Thu, 7 Apr 2022 13:10:15 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-386-g4174665229-fm-20220406.001-g41746652 Mime-Version: 1.0 Message-Id: <02e18c90-196e-409e-b2ac-822aceea8891@www.fastmail.com> In-Reply-To: References: <20220310140911.50924-1-chao.p.peng@linux.intel.com> <20220310140911.50924-5-chao.p.peng@linux.intel.com> Date: Thu, 07 Apr 2022 10:09:55 -0700 From: "Andy Lutomirski" To: "Sean Christopherson" , "Chao Peng" Cc: "kvm list" , "Linux Kernel Mailing List" , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Linux API" , qemu-devel@nongnu.org, "Paolo Bonzini" , "Jonathan Corbet" , "Vitaly Kuznetsov" , "Wanpeng Li" , "Jim Mattson" , "Joerg Roedel" , "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "the arch/x86 maintainers" , "H. Peter Anvin" , "Hugh Dickins" , "Jeff Layton" , "J . Bruce Fields" , "Andrew Morton" , "Mike Rapoport" , "Steven Price" , "Maciej S . Szmigiero" , "Vlastimil Babka" , "Vishal Annapurve" , "Yu Zhang" , "Kirill A. Shutemov" , "Nakajima, Jun" , "Dave Hansen" , "Andi Kleen" , "David Hildenbrand" Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6AB68180004 X-Rspam-User: Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=u9TLtj76; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org X-Stat-Signature: xsh7qnp7emux543dqsra46yhs3xndm9c X-HE-Tag: 1649351421-378146 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 7, 2022, at 9:05 AM, Sean Christopherson wrote: > On Thu, Mar 10, 2022, Chao Peng wrote: >> Since page migration / swapping is not supported yet, MFD_INACCESSIBLE >> memory behave like longterm pinned pages and thus should be accounted= to >> mm->pinned_vm and be restricted by RLIMIT_MEMLOCK. >>=20 >> Signed-off-by: Chao Peng >> --- >> mm/shmem.c | 25 ++++++++++++++++++++++++- >> 1 file changed, 24 insertions(+), 1 deletion(-) >>=20 >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 7b43e274c9a2..ae46fb96494b 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -915,14 +915,17 @@ static void notify_fallocate(struct inode *inod= e, pgoff_t start, pgoff_t end) >> static void notify_invalidate_page(struct inode *inode, struct folio= *folio, >> pgoff_t start, pgoff_t end) >> { >> -#ifdef CONFIG_MEMFILE_NOTIFIER >> struct shmem_inode_info *info =3D SHMEM_I(inode); >> =20 >> +#ifdef CONFIG_MEMFILE_NOTIFIER >> start =3D max(start, folio->index); >> end =3D min(end, folio->index + folio_nr_pages(folio)); >> =20 >> memfile_notifier_invalidate(&info->memfile_notifiers, start, end); >> #endif >> + >> + if (info->xflags & SHM_F_INACCESSIBLE) >> + atomic64_sub(end - start, ¤t->mm->pinned_vm); > > As Vishal's to-be-posted selftest discovered, this is broken as=20 > current->mm may > be NULL. Or it may be a completely different mm, e.g. AFAICT there's=20 > nothing that > prevents a different process from punching hole in the shmem backing. > How about just not charging the mm in the first place? There=E2=80=99s = precedent: ramfs and hugetlbfs (at least sometimes =E2=80=94 I=E2=80=99v= e lost track of the current status). In any case, for an administrator to try to assemble the various rlimits= into a coherent policy is, and always has been, quite messy. ISTM cgrou= p limits, which can actually add across processes usefully, are much bet= ter. So, aside from the fact that these fds aren=E2=80=99t in a filesystem an= d are thus available by default, I=E2=80=99m not convinced that this acc= ounting is useful or necessary. Maybe we could just have some switch require to enable creation of priva= te memory in the first place, and anyone who flips that switch without c= onfiguring cgroups is subject to DoS.