From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CE21C433E0 for ; Tue, 9 Feb 2021 09:53:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C315D64E77 for ; Tue, 9 Feb 2021 09:53:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C315D64E77 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 103A36B0005; Tue, 9 Feb 2021 04:53:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 08D6C6B006C; Tue, 9 Feb 2021 04:53:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E971F6B006E; Tue, 9 Feb 2021 04:53:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id CD84B6B0005 for ; Tue, 9 Feb 2021 04:53:48 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8B0E6180ACC51 for ; Tue, 9 Feb 2021 09:53:48 +0000 (UTC) X-FDA: 77798267736.16.owl47_230d26027606 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 6545B100E54BC for ; Tue, 9 Feb 2021 09:53:48 +0000 (UTC) X-HE-Tag: owl47_230d26027606 X-Filterd-Recvd-Size: 8115 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 9 Feb 2021 09:53:47 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1612864426; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ys4lZa12PW2rFBR9uYb+XrRjecZKizZLu20iG5KOoeU=; b=qMkZ7XbqI1d8hRSD4sQuj8GloCeGKTkheB3rXPURIdHzln4FFQIx0n7YjCba0Bw7Yo+suG YVH6OFVka1kiTncyA9TPZgvqak3KFQTe+UD6eD7U0V5Q5FCqYqLZFu60NAxFSbvHXVFlM8 PKcchFgHt4dmUy7k0ra5KqJyGsnIQ0A= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4727BAC43; Tue, 9 Feb 2021 09:53:46 +0000 (UTC) Date: Tue, 9 Feb 2021 10:53:29 +0100 From: Michal Hocko To: David Hildenbrand Cc: Mike Rapoport , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org Subject: Re: [PATCH v17 00/10] mm: introduce memfd_secret system call to create "secret" memory areas Message-ID: References: <20210208211326.GV242749@kernel.org> <1F6A73CF-158A-4261-AA6C-1F5C77F4F326@redhat.com> <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <662b5871-b461-0896-697f-5e903c23d7b9@redhat.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 09-02-21 10:15:17, David Hildenbrand wrote: > On 09.02.21 09:59, Michal Hocko wrote: > > On Mon 08-02-21 22:38:03, David Hildenbrand wrote: > > >=20 > > > > Am 08.02.2021 um 22:13 schrieb Mike Rapoport : > > > >=20 > > > > =EF=BB=BFOn Mon, Feb 08, 2021 at 10:27:18AM +0100, David Hildenbr= and wrote: > > > > > On 08.02.21 09:49, Mike Rapoport wrote: > > > > >=20 > > > > > Some questions (and request to document the answers) as we now = allow to have > > > > > unmovable allocations all over the place and I don't see a sing= le comment > > > > > regarding that in the cover letter: > > > > >=20 > > > > > 1. How will the issue of plenty of unmovable allocations for us= er space be > > > > > tackled in the future? > > > > >=20 > > > > > 2. How has this issue been documented? E.g., interaction with Z= ONE_MOVABLE > > > > > and CMA, alloc_conig_range()/alloc_contig_pages?. > > > >=20 > > > > Secretmem sets the mappings gfp mask to GFP_HIGHUSER, so it does = not > > > > allocate movable pages at the first place. > > >=20 > > > That is not the point. Secretmem cannot go on CMA / ZONE_MOVABLE > > > memory and behaves like long-term pinnings in that sense. This is a > > > real issue when using a lot of sectremem. > >=20 > > A lot of unevictable memory is a concern regardless of CMA/ZONE_MOVAB= LE. > > As I've said it is quite easy to land at the similar situation even w= ith > > tmpfs/MAP_ANON|MAP_SHARED on swapless system. Neither of the two is > > really uncommon. It would be even worse that those would be allowed t= o > > consume both CMA/ZONE_MOVABLE. >=20 > IIRC, tmpfs/MAP_ANON|MAP_SHARED memory > a) Is movable, can land in ZONE_MOVABLE/CMA > b) Can be limited by sizing tmpfs appropriately >=20 > AFAIK, what you describe is a problem with memory overcommit, not with = zone > imbalances (below). Or what am I missing? It can be problem for both. If you have just too much of shm (do not forget about MAP_SHARED|MAP_ANON which is much harder to size from an admin POV) then migrateability doesn't really help because you need a free memory to migrate. Without reclaimability this can easily become a problem. That is why I am saying this is not really a new problem. Swapless systems are not all that uncommon. =20 > > One has to be very careful when relying on CMA or movable zones. This= is > > definitely worth a comment in the kernel command line parameter > > documentation. But this is not a new problem. >=20 > I see the following thing worth documenting: >=20 > Assume you have a system with 2GB of ZONE_NORMAL/ZONE_DMA and 4GB of > ZONE_MOVABLE/CMA. >=20 > Assume you make use of 1.5GB of secretmem. Your system might run into O= OM > any time although you still have plenty of memory on ZONE_MOVAVLE (and = even > swap!), simply because you are making excessive use of unmovable alloca= tions > (for user space!) in an environment where you should not make excessive= use > of unmovable allocations (e.g., where should page tables go?). yes, you are right of course and I am not really disputing this. But I would argue that 2:1 Movable/Normal is something to expect problems already. "Lowmem" allocations can easily trigger OOM even without secret mem in the picture. It all just takes to allocate a lot of GFP_KERNEL or even GFP_{HIGH}USER. Really, it is CMA/MOVABLE that are elephant in the room and one has to be really careful when relying on them. =20 > The existing controls (mlock limit) don't really match the current sema= ntics > of that memory. I repeat it once again: secretmem *currently* resembles > long-term pinned memory, not mlocked memory. Well, if we had a proper user space pinning accounting then I would agree that there is a better model to use. But we don't. And previous attempts to achieve that have failed. So I am afraid that we do not have much choice left than using mlock as a model. > Things will change when > implementing migration support for secretmem pages. Until then, the > semantics are different and this should be spelled out. >=20 > For long-term pinnings this is kind of obvious, still we're now documen= ting > it because it's dangerous to not be aware of. Secretmem behaves exactly= the > same and I think this is worth spelling out: secretmem has the potentia= l of > being used much more often than fairly special vfio/rdma/ ... yeah I do agree that pinning is a problem for movable/CMA but most people simply do not care about those. Movable is the thing for hoptlug and a really weird fragmentation avoidance IIRC and CMA is mostly to handle crap HW. If those are to be used along with secret mem or longterm GUP then they will constantly bump into corner cases. Do not take me wrong, we should be looking at those problems, we should even document them but I do not see this as anything new. We should probably have a central place in Documentation explaining all those problems. I would be even happy to see an explicit note in the tunables - e.g. configuring movable/normal in 2:1 will get you back to 32b times wrt. low mem problems. --=20 Michal Hocko SUSE Labs