From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0270CCFA10 for ; Thu, 26 Sep 2024 01:36:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1D876B00CA; Wed, 25 Sep 2024 21:35:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CCC8D6B00CC; Wed, 25 Sep 2024 21:35:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A83EB6B00CD; Wed, 25 Sep 2024 21:35:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 826F56B00CA for ; Wed, 25 Sep 2024 21:35:40 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 36B5216073C for ; Thu, 26 Sep 2024 01:35:40 +0000 (UTC) X-FDA: 82605172440.16.008BE9D Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) by imf10.hostedemail.com (Postfix) with ESMTP id 6A0FBC000D for ; Thu, 26 Sep 2024 01:35:37 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HWS0dRqN; spf=pass (imf10.hostedemail.com: domain of 3aLr0ZgoKCO0YiWdjVWidcVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3aLr0ZgoKCO0YiWdjVWidcVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727314501; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MWj+EtmYN6BoWav4C6XYFCP/YJSRpIEJzj9cj+E6cQI=; b=4ZSmsGCB3yhXO8lN6QThvzj7wmeid6A0tPzhRBsUH44Qx5uq9uK1CT5qoZEyxRG9gh6wZM gvmga/KdX5xBDAVlcH8eZ1YsOMfxy8KA6c0frhPuqTjmtHNfk2V0Ft6Aj0M5wjVzO2vsmS wlhOD5RixYD4M6GPO0g5kOeD4DXWiIo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HWS0dRqN; spf=pass (imf10.hostedemail.com: domain of 3aLr0ZgoKCO0YiWdjVWidcVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3aLr0ZgoKCO0YiWdjVWidcVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727314501; a=rsa-sha256; cv=none; b=Eerj0Y7yxbs2hJy/ANAzuDKc1alX0OhwJbajt9Bd+YZR6koewU0zcJfKewbohpmIZbXprM mpg5165xNso4X9qqnufsixOQkTwWbOwn5p2kj2IIttHg66aFzA7yGwq2rRKa2CAwDflsCE WixS5yumkhxAS5oHH3rHdpp/QF7ShM0= Received: by mail-vs1-f73.google.com with SMTP id ada2fe7eead31-49bc7d7b31fso191943137.1 for ; Wed, 25 Sep 2024 18:35:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314536; x=1727919336; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MWj+EtmYN6BoWav4C6XYFCP/YJSRpIEJzj9cj+E6cQI=; b=HWS0dRqNUbZ3nOmld805JSKeJLfhKmkUV2v7f8f41IJIbOWWdTF7hXKu5TQKAPnwZP kyXjmheR495dYmYwbQkcrJUlCmKr4lvcIj6hXZPeGtSOJl6mNHhTojjW9izG4mfrimoG F05vDGWO1XI76dZXOHk7UTK3481QNdyhRJAvKYiCPISw4h5Yj/KKck66ItlfzpgKiBTg /qOGpDVeY9fd290v06hu0uahovgUkLx6gEE282MncebqSR0dui23MD0+wKOMl10Uyawc FPmD5GHymClXoe1FEfOCbbEhx/YM0xtOntbq2PFT+eOSClV/rEG8uRrIk/tKI6RlIRRg 4UDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314536; x=1727919336; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MWj+EtmYN6BoWav4C6XYFCP/YJSRpIEJzj9cj+E6cQI=; b=uyrXr5fO549FYrsjUsjgiCiqr4AMw711NdjkrbqtujVw3r+CEl5p3pikJ0QjWK2pyG pxD6v+E584eEQ+PwDeSAmV9wylX02Owe0jo9rZl+flXKC1Zfc1xDeXBwkb4Z1cS4T/fT fS0Jyj+pgKt+Npw28BHjwQ1JspplseurrAii4uNRSgw/staCUcBl5kBS8/iKM4kDt2U1 DxE0JloIdtSz4qkZIvRA87Y1w8BfbwJtaHhhxiBWCtN2fxoU1CrOCBjlrOFGJdsVDRT6 /aCu+4V2ui1UJU4GKM8EQf9qiITZqO3zXdqjUw0hL+cF/39vxoAlH723ZTC7/NNZUUMB YCyg== X-Forwarded-Encrypted: i=1; AJvYcCVjXttT/kWglO2jo0iKota1Vocp8xPTzZ7cTbuRSG0HaqZeK4yBFAOQiDs1VpF69j38ADgHVEJInQ==@kvack.org X-Gm-Message-State: AOJu0Yx2xs8Tp9svmYtRgoAQbqW97qKA/EZh/7tCa0RUm6P4mWMBdnAJ 3xmZrkRh0mPSxqf9WgGeICEXkAmGNVMk+TMY8PDkQFAktTHkEJtiOL18XR1qBhVikifZPwQuNMs kH/CUsCIStLAaUEZSMQ== X-Google-Smtp-Source: AGHT+IEWbFq5+mKO2q49QysNNPvZDCXSWbklfJ2UiM9KvwQART8aUu+/JLlyrv3E0SSfh27Nwb4MbN5v9Rl58JjT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:6102:5e97:b0:48d:9c9b:f391 with SMTP id ada2fe7eead31-4a15dd4beb2mr104946137.5.1727314536465; Wed, 25 Sep 2024 18:35:36 -0700 (PDT) Date: Thu, 26 Sep 2024 01:35:05 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-18-jthoughton@google.com> Subject: [PATCH v7 17/18] mm: multi-gen LRU: Have secondary MMUs participate in aging From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Stat-Signature: j4yiy39bbgxwdbgtsibf88epwba9gsyr X-Rspamd-Queue-Id: 6A0FBC000D X-Rspamd-Server: rspam11 X-HE-Tag: 1727314537-590744 X-HE-Meta: U2FsdGVkX1/pGIidV7LVOGRrEyBXHZ6CDE/D24OTEINcbWYX3ScNy6hi4L+vPMJLfIdcDvum9hugCpDyOezmNo/7r4DXbHucr1T0RylMCYT0hrSeW+AIz9O2wSXl2YbKKkTKx+jOvmgZlPpa7jkSrF4DBncyiOeRlRSN98xnW23Wokl2ACpfUSO2zXPAsAhuI4vQm9awCJCZt5dZ5Pvv7DWuxUNV/CPmKhH6cIR+MkUDbapBudeL/0Ri6noxg+X+QiJWq263V+Ddg3hjfjWA1wZN3aOPQ8XuyR+HHHIcmRt8wMKM6trGV4097hzUDVLBPbzklmU6LJOvSVSQOdvv6P7mP3AlFrx6JdgeHPHBsaLsmHrO6kHFF3qNoqq6BPCfqD/L9o2f15zhkqmYDmnCeXCJADd6uaBbvNBgUzgkh0qLUZfTibXoNmHAsE7B15c7ddouVEFjQ1xkMjgQJawhn8meqqhsBB/m6OauLHB5czHKFOxpa99TyjsIb9+km1E3awkKFTejnOW0yna5TwbEH2YanNDiRVT/D9ogOzhs/N5eg0AJt66fNVG7kuDiV4v6DwwRer792uMPfFD6k825Nc6OUSHwvXRIaMGohm8ZH/1ullF0W71eHCyih1DBTBikVbHeFHSXpO/xMOXidPsYw3dntaBsw+V44vUJkfQLLF/EwUQUKuZIJ9B4c86XYPkQKUspvEtDqH77UyTM669a023XZyKdBJ7Wuz2Fkf36It4/d8I232nvWS6Xo/WMEPr28rcahMF847JEluIE8s6+Xo1XR9vk3Cv/7d8Fa1qhkklHwOa0RmSg7uk2vvHLwaF0EQkNE38DbE5QRa2BeKBwYrT+A2+uq3EYeB/WIB7oXqWoo4ayyglIWAuxgCjsU/vUUoKviduYoBX7gRZNPA5cjNbapOew8YZav6oYpeUz5WZOYAmmI/p9ud+XNp8oVy6pxOAbxGv2TbNDj2BaICS 30R8HG6P VOV04J+nP61VDEyE4/AZ54jkTHAP2zG7Z8aEK7D8HywJ27Au5ETI9y05qG1Kya/mCpLVRnDCRm4I4kOFbBSvgXqpiZHxlXUsLmrb2C4KAJxvEpkIVbWZY9GpeW87v2u4ZWSlzEOHOAiPMiTzJiS5GgObCgyOKDYJ+14KEHC96zTjuS9eP/3m58dKy+C2MmIg3qDDjB4m/Aj0xuKqVx07V+lOp7EgZ4oQPmfRHAfHpcXFjvw6gBDeFxZD/7rP20+nQCkbHg1IXYdY0znqinJVHCntoVbSaFgreKdRxWq2ntLyEiPRrNVjR18sNh09/mjmkF+/hljpDXwvuQuqUwNQFI8OIVWuMHx/HKTfZbfotchSkg9WfN7675iZpB1N8KoJAkSpqBgCfG8rvCyZnBgqoglkRUKJ08rrUKmqEtiEtE1tPgBF2GmwcV19+DFs9ZwOFEYQkWhHWg3q53868k24+tgu1ckN8TS9URUJIPY2158ZwaJCkPpDRC3Tyb9p4M5wKTPC3qCL5e72ncbNBAAblu09teZtn4vKAKfb1p/AeWYSFKqYm/rDyW30XRZX8fB3zsdNwr8EKJOh7XNM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Secondary MMUs are currently consulted for access/age information at eviction time, but before then, we don't get accurate age information. That is, pages that are mostly accessed through a secondary MMU (like guest memory, used by KVM) will always just proceed down to the oldest generation, and then at eviction time, if KVM reports the page to be young, the page will be activated/promoted back to the youngest generation. The added feature bit (0x8), if disabled, will make MGLRU behave as if there are no secondary MMUs subscribed to MMU notifiers except at eviction time. Implement aging with the new mmu_notifier_clear_young_fast_only() notifier. For architectures that do not support this notifier, this becomes a no-op. For architectures that do implement it, it should be fast enough to make aging worth it (usually the case if the notifier is implemented locklessly). Suggested-by: Yu Zhao Signed-off-by: James Houghton --- Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 9 +- mm/vmscan.c | 148 ++++++++++++++---- 4 files changed, 127 insertions(+), 42 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst index 33e068830497..e1862407652c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Clear the accessed bit in secondary MMU page tables when aging + instead of waiting until eviction time. This results in accurate + page age information for pages that are mainly used by a + secondary MMU. [yYnN] Apply to all the components above. ====== =============================================================== @@ -56,7 +60,7 @@ E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1dc6248feb83..dbfb868c3708 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -400,6 +400,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_SECONDARY_MMU_WALK, NR_LRU_GEN_CAPS }; @@ -557,7 +558,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -576,8 +577,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index 2490e727e2dc..51bbda3bae60 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -870,13 +870,10 @@ static bool folio_referenced_one(struct folio *folio, continue; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index cfa839284b92..6ab87dd1c6d9 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -2594,6 +2595,11 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } +static bool should_walk_secondary_mmu(void) +{ + return get_cap(LRU_GEN_SECONDARY_MMU_WALK); +} + /****************************************************************************** * shorthand helpers ******************************************************************************/ @@ -3291,7 +3297,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,10 +3313,15 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,6 +3336,10 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3332,10 +3348,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3358,6 +3370,26 @@ static bool suitable_to_scan(int total, int young) return young * n >= total; } +static bool lru_gen_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return should_walk_secondary_mmu() && + mmu_notifier_clear_young_fast_only(mm, start, end); +} + +static bool lru_gen_pmdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + bool young = pmdp_test_and_clear_young(vma, addr, pmd); + + if (lru_gen_notifier_clear_young(vma->vm_mm, addr, addr + PMD_SIZE)) + young = true; + + return young; +} + static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *args) { @@ -3372,8 +3404,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = args->mm; - pte = pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + pte = pte_offset_map_nolock(mm, pmd, start & PMD_MASK, &ptl); if (!pte) return false; if (!spin_trylock(ptl)) { @@ -3391,11 +3424,11 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { + if (!pte_young(ptent) && !mm_has_notifiers(mm)) { walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3404,8 +3437,14 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE) && + !pte_young(ptent)) { + walk->mm_stats[MM_LEAF_OLD]++; + continue; + } + + if (pte_young(ptent)) + ptep_test_and_clear_young(args->vma, addr, pte + i); young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3471,22 +3510,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (should_clear_pmd_young() && + !should_walk_secondary_mmu()) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!lru_gen_pmdp_test_and_clear_young(vma, addr, pmd + i)) { + walk->mm_stats[MM_LEAF_OLD]++; goto next; + } walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3543,19 +3585,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; + if (pfn == -1) continue; - } - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + if (!pmd_young(val) && !mm_has_notifiers(args->mm)) { + walk->mm_stats[MM_LEAF_OLD]++; continue; + } walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; @@ -3563,7 +3604,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (should_clear_pmd_young() && !should_walk_secondary_mmu()) { if (!pmd_young(val)) continue; @@ -4030,6 +4071,31 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * rmap/PT walk feedback ******************************************************************************/ +static bool should_look_around(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int *young) +{ + int secondary_young = mmu_notifier_clear_young( + vma->vm_mm, addr, addr + PAGE_SIZE); + + /* + * Look around if (1) the PTE is young or (2) the secondary PTE was + * young and one of the "fast" MMUs of one of the secondary MMUs + * reported that the page was young. + */ + if (pte_young(ptep_get(pte))) { + ptep_test_and_clear_young(vma, addr, pte); + *young = true; + return true; + } + + if (secondary_young) { + *young = true; + return mm_has_fast_young_notifiers(vma->vm_mm); + } + + return false; +} + /* * This function exploits spatial locality when shrink_folio_list() walks the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If @@ -4037,7 +4103,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; @@ -4055,16 +4121,20 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) struct lru_gen_mm_state *mm_state = get_mm_state(lruvec); DEFINE_MAX_SEQ(lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = pvmw->vma->vm_mm; lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!should_look_around(vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return young; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4072,6 +4142,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4085,7 +4158,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; arch_enter_lazy_mmu_mode(); @@ -4095,19 +4168,23 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) + if (!pte_young(ptent) && !mm_has_notifiers(mm)) continue; folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE) && + !pte_young(ptent)) + continue; + + if (pte_young(ptent)) + ptep_test_and_clear_young(vma, addr, pte + i); young++; @@ -4137,6 +4214,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return young; } /****************************************************************************** @@ -5140,6 +5219,9 @@ static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |= BIT(LRU_GEN_NONLEAF_YOUNG); + if (should_walk_secondary_mmu()) + caps |= BIT(LRU_GEN_SECONDARY_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); } -- 2.46.0.792.g87dc391469-goog