From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7794EC3DA49 for ; Thu, 11 Jul 2024 17:09:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E55C6B00A4; Thu, 11 Jul 2024 13:09:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 096016B00A6; Thu, 11 Jul 2024 13:09:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9F266B00A7; Thu, 11 Jul 2024 13:09:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CD68C6B00A4 for ; Thu, 11 Jul 2024 13:09:50 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8210A1205F7 for ; Thu, 11 Jul 2024 17:09:50 +0000 (UTC) X-FDA: 82328108940.03.F2F1297 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf06.hostedemail.com (Postfix) with ESMTP id E58FC180014 for ; Thu, 11 Jul 2024 17:09:47 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=zx2c4.com header.s=20210105 header.b="X/X7+0TS"; dmarc=pass (policy=quarantine) header.from=zx2c4.com; spf=pass (imf06.hostedemail.com: domain of "SRS0=Le2v=OL=zx2c4.com=Jason@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=Le2v=OL=zx2c4.com=Jason@kernel.org" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720717761; a=rsa-sha256; cv=none; b=OkhDF29nDZDXhBgbdbqseNMsIBa5BR72q/6Y56Yv3JKyaTjoFY5YIZNfqm6bV7I74NVuV7 p4vX3By/NxHzboR64ksrpv5sXGvsQgKbTx1M4sirqr5uz1spnPeKjj4orBO935mikdFQ7a hgiWWSGNLJHbeZh6I2vISajrpQAQO5Y= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=zx2c4.com header.s=20210105 header.b="X/X7+0TS"; dmarc=pass (policy=quarantine) header.from=zx2c4.com; spf=pass (imf06.hostedemail.com: domain of "SRS0=Le2v=OL=zx2c4.com=Jason@kernel.org" designates 145.40.73.55 as permitted sender) smtp.mailfrom="SRS0=Le2v=OL=zx2c4.com=Jason@kernel.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720717761; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b5Zk8ILabTM/NFQk7obBb2+60sqv1qoyRRStBSjez0M=; b=U7OCAJqX/0fwsSH76DrY595Pa3wR7OwtLmJeevFhH9zYbVYkPREoZAh84CtabbSydZ/uZk WiOQgHQLxN+Bz1e2uTiWsbhmUT66vlzp/XEQUkFvhGboith3UDilMk5KOShq9adhuLa7JJ YDprrldpFYBqDB3v6SOwgi540pRmwqs= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 2C1FACE1914; Thu, 11 Jul 2024 17:09:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DB2E2C116B1; Thu, 11 Jul 2024 17:09:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zx2c4.com; s=20210105; t=1720717780; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b5Zk8ILabTM/NFQk7obBb2+60sqv1qoyRRStBSjez0M=; b=X/X7+0TSXxHpmN9PMnqrKgXgKx72kEhINEQYzolMUKcDfR1Vzh5V+MOSwdpOdnflOrEggY zwuFsiGiSZePB7MRot1uPuwaJvzoEreWrxF6ZujhbuOrACxJggMXynn87lCnleNNjeWH5z uh/2WYxRcos75n4JpPgbhTxfBYbgRw4= Received: by mail.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 229a906f (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 11 Jul 2024 17:09:39 +0000 (UTC) Date: Thu, 11 Jul 2024 19:09:36 +0200 From: "Jason A. Donenfeld" To: Linus Torvalds Cc: David Hildenbrand , linux-kernel@vger.kernel.org, patches@lists.linux.dev, tglx@linutronix.de, linux-crypto@vger.kernel.org, linux-api@vger.kernel.org, x86@kernel.org, Greg Kroah-Hartman , Adhemerval Zanella Netto , Carlos O'Donell , Florian Weimer , Arnd Bergmann , Jann Horn , Christian Brauner , David Hildenbrand , linux-mm@kvack.org Subject: Re: [PATCH v22 1/4] mm: add MAP_DROPPABLE for designating always lazily freeable mappings Message-ID: References: <20240709130513.98102-1-Jason@zx2c4.com> <20240709130513.98102-2-Jason@zx2c4.com> <378f23cb-362e-413a-b221-09a5352e79f2@redhat.com> <9b400450-46bc-41c7-9e89-825993851101@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E58FC180014 X-Stat-Signature: mbhcq3fzwcicp494ibi55ubpfypcykx1 X-Rspam-User: X-HE-Tag: 1720717787-585226 X-HE-Meta: U2FsdGVkX1+5scwY6pdYRslnBMWwyQOQCQAjU+Zhnco9WZxFcxZd3sffkZEVyfRDZx1DqOA/rH2GLtotyp2MmiSDPhbxJti6uyQZ5qtmcgQ7XAmnkQ1NaeZOIfd6zQVEUoWbqf8RXb57lSlMEr4xl8w68ddRd6da5VOeqnijPzZ+CPQoRj3SbeiZflszUTM+YHs9HwJi4gQRpiUZr3JBlkz+teeAJG0qR/MO0DsHxDKmMikuuNzXThA+geIk+xS/xAG2Dt6GLqHr8u5hwIe/VfkQ8fhvfkqPabXbKYhxMOvjK8jmZ1fCjA87tmtiJKxSn3jfr9LVLGQfg1BAlXVO4231Syqaq3komX4d6RM14RKkGCnTr7zYu/4wi75RqY0p2YhJqsLzBhEfN0Uc0MODKcyXKqqGX+F5q2KDwO8lkWuMJ2DOl/tj/WANT5M8xY9+RxujBFh9wwb+XGCt3uxaLUtrGioYUJaPfjDZBHSGQZOWbOwiPd7aSB96WmaQWqnG/P9VexChywLdPfYUDhQr22nhiuco6YkvWjq94wbUTlEUaiUeVBDEyHbEh58Op8+9b4eWkuNoaF+mY3m5mFUWivR2BDEV+m+Fmge7OQtKV9+gcbIGhVHUT0jxyoo7OeXW/apyCw7Zb3W1pITGppslPLOHILmP58I6elhsuVv+tUn9dOKG1vSBzlpry6ZM0A3qRuiOF5/pXl8bmg9MX8IKzZooGpGFfuiTcDwxP5jku3gqFInZ8qrzbnmq7Qqi4x7yKhvDIIcEFSB0rzXkJ5413uCwnpaRKo0dTdmQkxhsAXlmSKCkOSmU2oBdeQEujmmvUQpSRIWjFWNCym5xnqWRUnso6mwzzu4cc1LPiV6MCUTTbZRyo4VWWGQKbQyGV8Z5kD4gv0oB6ny32nKL0AbueQN/Y+8nzzG5NH+212iub+qPTGguZF8kqmzs28pONjH1tGCMlJm+XJA6B6lrTMp lzGKx5VL 5HAPDRxoken4IMmuo/GOow0keMJQW0Dep/j51YyQGoVSnftthcK5VTwIW0DUcvYAsZX/jDkRKU7jE0xEzL5Fr/7WzUpUK1tOfz0SohjZ5v4t0NevZfO9n4aTZFf70ImEnW6oOehJJExIoWcxYe1IWzS7+vBrgWWmCQm2+U9Zh4buucK/p8I1JGj03ZB5wU3GuPIzGQpLci4GwziswH3uYmTKlp8PbpmwPssnyeCgI11XVgIrGWjeiL0OsKqYRaoYXg0UovAfZeMti7DrK4ScU1gzppXbbn912OjmRiIFspogGn3EOxgJKphcQDoyEDTNlZ+6yWvsAo3ArT7PtfaSePRgazi+ckT7A2GAtWTUO14AeC2tCTGfQ+F6f49hw3suniIBM3X8+Os3lPuhgUp9VBUWmT7/VnyIXisdC3DA3dmVIiW05pqMZRfKn+pn1FVNHLMUrcn/LvqeqFPg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Linus, David, On Wed, Jul 10, 2024 at 10:07:03PM -0700, Linus Torvalds wrote: > The other approach might be to just let all the dirty handling happen > - make droppable pages have a "page->mapping" (and not be anonymous), > and have the mapping->a_ops->writepage() just always return success > immediately. When I was working on this patchset this year with the syscall, this is similar somewhat to the initial approach I was taking with setting up a special mapping. It turned into kind of a mess and I couldn't get it working. There's a lot of functionality built around anonymous pages that would need to be duplicated (I think?). I'll revisit it if need be, but let's see if I can make avoiding the dirty bit propagation work. > It's mainly the pte_dirty games in mm/vmscan.c that does it > (walk_pte_range), but also the tear-down in mm/memory.c > (zap_present_folio_ptes). Possibly others that I didn't think of. > > Both do have access to the vma, although in the case of > walk_pte_range() we don't actually pass it down because we haven't > needed it). Actually, it's there hanging out in args->vma, and the function makes use of that member already. So not so bad. > > There's also page_vma_mkclean_one(), try_to_unmap_one() and > try_to_migrate_one(). And possibly many others I haven't even thought > about. > > So quite a few places that do that "transfer dirty bit from pte to folio". Alright, an hour later of fiddling, and it doesn't actually work (yet?) -- the selftest fails. A diff follows below. So, hmm... The swapbacked thing really seemed so simple... I wonder if there's a way of recovering that. Jason diff --git a/mm/gup.c b/mm/gup.c index ca0f5cedce9b..38745cc4fa06 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -990,7 +990,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if (flags & FOLL_TOUCH) { if ((flags & FOLL_WRITE) && - !pte_dirty(pte) && !PageDirty(page)) + !pte_dirty(pte) && !PageDirty(page) && + !(vma->vm_flags & VM_DROPPABLE)) set_page_dirty(page); /* * pte_mkyoung() would be more correct here, but atomic care diff --git a/mm/ksm.c b/mm/ksm.c index 34c4820e0d3d..2401fc4203ba 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1339,7 +1339,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct folio *folio, goto out_unlock; } - if (pte_dirty(entry)) + if (pte_dirty(entry) && !(vma->vm_flags & VM_DROPPABLE)) folio_mark_dirty(folio); entry = pte_mkclean(entry); @@ -1518,7 +1518,7 @@ static int try_to_merge_one_page(struct vm_area_struct *vma, * Page reclaim just frees a clean page with no dirty * ptes: make sure that the ksm page would be swapped. */ - if (!PageDirty(page)) + if (!PageDirty(page) && !(vma->vm_flags & VM_DROPPABLE)) SetPageDirty(page); err = 0; } else if (pages_identical(page, kpage)) diff --git a/mm/memory.c b/mm/memory.c index d10e616d7389..6a02d16309be 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1479,7 +1479,7 @@ static __always_inline void zap_present_folio_ptes(struct mmu_gather *tlb, if (!folio_test_anon(folio)) { ptent = get_and_clear_full_ptes(mm, addr, pte, nr, tlb->fullmm); - if (pte_dirty(ptent)) { + if (pte_dirty(ptent) && !(vma->vm_flags & VM_DROPPABLE)) { folio_mark_dirty(folio); if (tlb_delay_rmap(tlb)) { delay_rmap = true; @@ -6140,7 +6140,8 @@ static int __access_remote_vm(struct mm_struct *mm, unsigned long addr, if (write) { copy_to_user_page(vma, page, addr, maddr + offset, buf, bytes); - set_page_dirty_lock(page); + if (!(vma->vm_flags & VM_DROPPABLE)) + set_page_dirty_lock(page); } else { copy_from_user_page(vma, page, addr, buf, maddr + offset, bytes); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index aecc71972a87..72d3f8eaae6e 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -216,7 +216,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, migrate->cpages++; /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pte)) + if (pte_dirty(pte) && !(vma->vm_flags & VM_DROPPABLE)) folio_mark_dirty(folio); /* Setup special migration page table entry */ diff --git a/mm/rmap.c b/mm/rmap.c index 1f9b5a9cb121..1688d06bb617 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1397,12 +1397,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma); - /* - * VM_DROPPABLE mappings don't swap; instead they're just dropped when - * under memory pressure. - */ - if (!(vma->vm_flags & VM_DROPPABLE)) - __folio_set_swapbacked(folio); + __folio_set_swapbacked(folio); __folio_set_anon(folio, vma, address, true); if (likely(!folio_test_large(folio))) { @@ -1777,7 +1772,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) + if (pte_dirty(pteval) && !(vma->vm_flags & VM_DROPPABLE)) folio_mark_dirty(folio); /* Update high watermark before we lower rss */ @@ -1822,7 +1817,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, } /* MADV_FREE page check */ - if (!folio_test_swapbacked(folio)) { + if (!folio_test_swapbacked(folio) || (vma->vm_flags & VM_DROPPABLE)) { int ref_count, map_count; /* @@ -1846,13 +1841,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * plus the rmap(s) (dropped by discard:). */ if (ref_count == 1 + map_count && - (!folio_test_dirty(folio) || - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE - * ones can be dropped even if they've - * been dirtied. - */ - (vma->vm_flags & VM_DROPPABLE))) { + !folio_test_dirty(folio)) { dec_mm_counter(mm, MM_ANONPAGES); goto discard; } @@ -1862,12 +1851,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * discarded. Remap the page to page table. */ set_pte_at(mm, address, pvmw.pte, pteval); - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE ones - * never get swap backed on failure to drop. - */ - if (!(vma->vm_flags & VM_DROPPABLE)) - folio_set_swapbacked(folio); + folio_set_swapbacked(folio); ret = false; page_vma_mapped_walk_done(&pvmw); break; @@ -2151,7 +2135,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) + if (pte_dirty(pteval) && !(vma->vm_flags & VM_DROPPABLE)) folio_mark_dirty(folio); /* Update high watermark before we lower rss */ @@ -2397,7 +2381,7 @@ static bool page_make_device_exclusive_one(struct folio *folio, pteval = ptep_clear_flush(vma, address, pvmw.pte); /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) + if (pte_dirty(pteval) && !(vma->vm_flags & VM_DROPPABLE)) folio_mark_dirty(folio); /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 2e34de9cd0d4..cf5b26bd067a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3396,6 +3396,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_YOUNG]++; if (pte_dirty(ptent) && !folio_test_dirty(folio) && + !(args->vma->vm_flags & VM_DROPPABLE) && !(folio_test_anon(folio) && folio_test_swapbacked(folio) && !folio_test_swapcache(folio))) folio_mark_dirty(folio); @@ -3476,6 +3477,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area walk->mm_stats[MM_LEAF_YOUNG]++; if (pmd_dirty(pmd[i]) && !folio_test_dirty(folio) && + !(vma->vm_flags && VM_DROPPABLE) && !(folio_test_anon(folio) && folio_test_swapbacked(folio) && !folio_test_swapcache(folio))) folio_mark_dirty(folio); @@ -4076,6 +4078,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) young++; if (pte_dirty(ptent) && !folio_test_dirty(folio) && + !(vma->vm_flags & VM_DROPPABLE) && !(folio_test_anon(folio) && folio_test_swapbacked(folio) && !folio_test_swapcache(folio))) folio_mark_dirty(folio);