From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40DB4C41513 for ; Mon, 8 Jul 2024 01:46:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24F706B008C; Sun, 7 Jul 2024 21:46:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FE956B0092; Sun, 7 Jul 2024 21:46:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EE356B0093; Sun, 7 Jul 2024 21:46:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E42CE6B008C for ; Sun, 7 Jul 2024 21:46:22 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5C9B840F91 for ; Mon, 8 Jul 2024 01:46:22 +0000 (UTC) X-FDA: 82314895404.30.16D30D1 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf30.hostedemail.com (Postfix) with ESMTP id C99D180010 for ; Mon, 8 Jul 2024 01:46:19 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=zx2c4.com header.s=20210105 header.b=ATv78rgt; dmarc=pass (policy=quarantine) header.from=zx2c4.com; spf=pass (imf30.hostedemail.com: domain of "SRS0=B+7m=OI=zx2c4.com=Jason@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=B+7m=OI=zx2c4.com=Jason@kernel.org" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720403159; a=rsa-sha256; cv=none; b=ODNYw2hqqPXB3Nt7Ky242X8m+dU6tEy6X/2rrFX/EzI0XxuL1RUiZWjPS6LS4UCRyM7LwX w7GT7C0xvHYVV3Bd5Bae/EjbjMl41hE1M5fy7ZEM61bm3RT7nDwWv/AGE+N2F9EugkoRBp P8kzPpc8tmhNvhao3RiAUHkjzla89NY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=zx2c4.com header.s=20210105 header.b=ATv78rgt; dmarc=pass (policy=quarantine) header.from=zx2c4.com; spf=pass (imf30.hostedemail.com: domain of "SRS0=B+7m=OI=zx2c4.com=Jason@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=B+7m=OI=zx2c4.com=Jason@kernel.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720403159; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wNm2R1n+MIEThUcyIX2REGbOX7o31ZK6mTAc1tqY4Ec=; b=ho/0b0EIBOWw3w9PVPvvAog9MGh/J3dWKSzroUoxKsjqDWSc3Tx2sIzGRjunSEq2aP2wGI J2maU1tzWUF+/eYFupMmXUJRH9/ZkdNhVRrbx9dapgQwHZ1SFM51dcG1fN5bwxUm9dF4e7 dOieqfeXd6HRUBQ1/oULDQU6paq2uMA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 6862FCE08CC; Mon, 8 Jul 2024 01:46:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21C9CC3277B; Mon, 8 Jul 2024 01:46:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1720403171; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wNm2R1n+MIEThUcyIX2REGbOX7o31ZK6mTAc1tqY4Ec=; b=ATv78rgtmz5EROWymcyF0jsKj4PQq8A5vhpa/bgzqkPjwOB73zr8dHZFY5p4T+o6ZQlnwR WuQ3N6Av5WPkQcXiUpeUqBmncDWFxQRetmsNY8og2J7Pxsfq48KswCB9NeoCx3NPN0r4fb nEUmg3vf01hoKcKxOwFSom7FscFR/x8= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 2900d469 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 8 Jul 2024 01:46:11 +0000 (UTC) Date: Mon, 8 Jul 2024 03:46:07 +0200 From: "Jason A. Donenfeld" To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de, linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Linus Torvalds , Greg Kroah-Hartman , Adhemerval Zanella Netto , Carlos O'Donell , Florian Weimer , Arnd Bergmann , Jann Horn , Christian Brauner , David Hildenbrand , linux-mm@kvack.org Subject: Re: [PATCH v21 1/4] mm: add VM_DROPPABLE for designating always lazily freeable mappings Message-ID: References: <20240707002658.1917440-1-Jason@zx2c4.com> <20240707002658.1917440-2-Jason@zx2c4.com> <1583c837-a4d5-4a8a-9c1d-2c64548cd199@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1583c837-a4d5-4a8a-9c1d-2c64548cd199@redhat.com> X-Rspamd-Queue-Id: C99D180010 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: gad48sq9gcjei8t9q4qb333zikn1bb1t X-HE-Tag: 1720403179-801328 X-HE-Meta: U2FsdGVkX18fsc6LSaSQhp2J3vsqJOwqh3U5S/LiSS3fEVlEG+/l2FqgSvs9nnB3lQHn8t0ZfGef7koEoAq0LJ2S5MB/UYvgcHxDi9Oz2p1sGyScx5uyvaQM+h7t5p1zXHPhW+/CyPua7hVyJiwe6SZLDVGGskkqM/qKdDXUA9kf01js7Ubgh2WVLWMDOSRhu90IAs7hUALg5EttgTt2itWpTK7p1jCwf7Z2XdY/+6Py3f4Iu+klbV/F54noKGYNCew2w2Dn0SdG/EEMmNgd/GNG1p4VyLj+kgWxPVJnvTx30Yd+7V0s/JXOpryFtWWMPPQ3di0wUbABZVdo9JvUn1JzMxafgjr1o5PsrakVF+jga+NIg0QD1OThzECA05N/zr+VbyIf2cgW0IM54QS2JQydL/vPgTudVjPYRglXqzKbIatCmCO1h6SQhBSlTqcvxmMOaFLx8PBsn8BNBlD0TKnIyIcU7YR+X6s6cWQm2Sfqs23bcVS5daa5XwSfivRz2Xb/5/884+kvKo1iqcLWbVs/e4rLSW+zSlMyuovFhBhr1espwnttjp6hHMxiYuNEacwT2R7KO5zEh7WSVRQ86P0qTykwFKpPZXkZSsL16mXnLrAwoXXGSek+QpZXYGVjK0ZeYcoo7wfQ7E6oyyhS+fPdwKZKrTXw4MeK9TNd0W098qX39u/nJaSZDBrMTQncemK3eFyrmVGY8sCfEGuBDRrXmlY0k+3x1Y6xi12bZ6KWX52a5XkKouQnUC9pYnZRIYNdwF8FTs9oM4wYH52LZ3lgUR+n8narak4YBQLupGx3wj5NnZO8Xz0P1x0KRmal3B4XZtQLCegr4rbKtNLc86t9qNlPp+OzqAc4tFuxFlRVt23WFrImMs0sFjDozSZipGb4A9W8vLqiOe+GxeqZxbTmbfUdwKXU4Tjjt1oblFRyA8/gXq2c/PEsCS79ml1N8a/u9NPVpUol5BVuO1o gULFm4iX CNus5sY2hK4cp6ZHkrVQKwAHGdwg1NRYcjpLsjCM4KP98Vo6+fqUvDMerOW3PSA76Xov4+vA/EF96d/1l3TW5IJNJzQKgtr3XgpPAsETcr4ZlgGY2Z2e6vybu5kKFr6POCljKx1ADsP1SNMkDqLOwdL6J6v88kY4rikQYAmvishHXTwyvAgHnKdnYhBQrAY1PB8VXaarIqh3t1Da/jpakGp7wp0lFXZRuOPqDjmkFvLwRUXdlGa6EMYnUB3ksBhUfQDFP4/0ouuwtfdDcAdJ1FqVbOSsVhaTk58Y8PzmcAzywS7LuYuzZl07SSnunxqKbe7QE7WxRmgZJA/vJsUs31meWPuSwInMVTg6hqGqmSmqBq8fKV+0zDqm8ANiHTS5s+yZMP+OeaLWmBxFj1XAKbYmeeI9vrCN4qHLEaH7vOHVyJ2X0F0OOT+LZYoZgoD7CTKYTDyDYoJWvicY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi David, Thanks a lot for the review of the code. Am very glad to have somebody who knows this code take a careful look at it. On Sun, Jul 07, 2024 at 09:42:38AM +0200, David Hildenbrand wrote: > Patch subject would be better to talk about MAP_DROPPABLE now. Will do. Or, well, in light of the conversation downthread, MAP_DROPPABLE. > But I don't immediately see why MAP_WIPEONFORK and MAP_DONTDUMP have to > be mmap() flags. Using mmap(MAP_NORESERVE|MAP_DROPPABLE) with madvise() > to configure these (for users that require that) should be good enough, > just like they are for existing users. I looked into that too, and coming up with some clunky mechanism for automating several calls to madvise() for each thing. I could make it work need be, but it's really not nice. And it sort of then leads in the direction, "this interface isn't great; why don't you just make a dedicated syscall that does everything you need in one fell swoop," which is explicitly what Linus doesn't want. Making it accessible to mmap() instead makes it more of a direct thing that isn't a whole new syscall. Anyway, it indeed looks like there are more PROT_ bits available, and also that PROT_ has been used this way before. In addition to PROT_SEM, there are a few arch-specific PROT_ bits that seem similar enough. The distinction is pretty blurry between MAP_ and PROT_. So I'll just move this to PROT_ for v+1. > > diff --git a/mm/mprotect.c b/mm/mprotect.c > > index 8c6cd8825273..57b8dad9adcc 100644 > > --- a/mm/mprotect.c > > +++ b/mm/mprotect.c > > @@ -623,7 +623,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, > > may_expand_vm(mm, oldflags, nrpages)) > > return -ENOMEM; > > if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_HUGETLB| > > - VM_SHARED|VM_NORESERVE))) { > > + VM_SHARED|VM_NORESERVE|VM_DROPPABLE))) { > > charged = nrpages; > > if (security_vm_enough_memory_mm(mm, charged)) > > return -ENOMEM; > > I don't quite understand this change here. If MAP_DROPPABLE does not > affect memory accounting during mmap(), it should not affect the same > during mprotect(). VM_NORESERVE / MAP_NORESERVE is responsible for that. > > Did I missing something where MAP_DROPPABLE changes the memory > accounting during mmap()? Actually, I think I errored by not adding it to mmap() (via the check in accountable_mapping(), I believe), and I should add it there. That also might be another reason why this is better as a MAP_ (or, rather PROT_) bit, rather than an madvise call. Tell me if you disagree, as I might be way off here. But I was thinking that because the system can just "drop" this memory, it's not sensible to account for it, because it can be taken right back. > > diff --git a/mm/rmap.c b/mm/rmap.c > We use > > /* > * Comment start > * Comment end > */ > > styled comments in MM. Fixed. > > > + (vma->vm_flags & VM_DROPPABLE))) { > > dec_mm_counter(mm, MM_ANONPAGES); > > goto discard; > > } > > @@ -1851,7 +1858,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > > * discarded. Remap the page to page table. > > */ > > set_pte_at(mm, address, pvmw.pte, pteval); > > - folio_set_swapbacked(folio); > > + /* Unlike MADV_FREE mappings, VM_DROPPABLE ones > > + * never get swap backed on failure to drop. */ > > + if (!(vma->vm_flags & VM_DROPPABLE)) > > + folio_set_swapbacked(folio); > > ret = false; > > page_vma_mapped_walk_done(&pvmw); > > break; > > A note that in mm/mm-stable, "madvise_free_huge_pmd" exists to optimize > MADV_FREE on PMDs. I suspect we'd want to extend that one as well for > dropping support, but likely it would also only be a performance > improvmeent and not affect functonality if not handled. That's for doing the freeing of PTEs after the fact, right? If the mapping was created, got filled with some data, and then sometime later it got MADV_FREE'd, which is the pattern people follow typically with MADV_FREE. If we do this as PROT_/MAP_, then that's not a case we need to worry about, if I understand this code correctly. Jason