From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7619C43334 for ; Sat, 4 Jun 2022 00:40:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF4B78D000A; Fri, 3 Jun 2022 20:40:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACAB88D0001; Fri, 3 Jun 2022 20:40:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 969808D000A; Fri, 3 Jun 2022 20:40:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 88E568D0001 for ; Fri, 3 Jun 2022 20:40:25 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5A12E11AC for ; Sat, 4 Jun 2022 00:40:25 +0000 (UTC) X-FDA: 79538697210.10.326A5CD Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf03.hostedemail.com (Postfix) with ESMTP id 53A7A20013 for ; Sat, 4 Jun 2022 00:40:09 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id g129-20020a636b87000000b003fd1deac6ebso1617700pgc.23 for ; Fri, 03 Jun 2022 17:40:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=JGhvT9aDQlYaTanHqW6Ps9Sdwtg80bXP3IuUZrmB/34=; b=Xu496L45u5w3CTyipJ2ABzP2/wgASEaWpBJg2ugCcdh5hUbbigsVMdm43EWTtBG296 SzShQKFhTG+K6/luBWGrfHNx2bVpbV7CP16YM+OtkPZoK6vwwrRk1vmvZ/futSyowEM5 FffJpEUxoNdOdxHrMih81VSr1+3wlT9/4Y2YCH0TIJ3TOXyLI8HKSM1ajDXlx2tfsaKB tNU/UP/7fnLKcqIfDci0Se/Sf4qhxDgEhkOR+15kTfcGx+Z3EE12zUvh9nGt//u9Hk5A rP3y+Xd8tsRZFvH8s2ZQ0B0I3zcZbGMJRobpVs1yyvWIjUr22VGmnQvDRWioQOCRHMZD tOmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=JGhvT9aDQlYaTanHqW6Ps9Sdwtg80bXP3IuUZrmB/34=; b=6wvkeRFVn9jz39AJAfhV+Dwr2NS8Yg32dMA6sP//9PuIPfpcagAovf4REpmshs8LPz DhrMvNiOxo/CyXOwOJUw6IRO1iIE1sbYGgJ+QQmBDHKWHNKSs94/zU37GM9W/XMdEKit GsNkUqf7So4b8csxntGVlrRuQ9HVHWE2ODHb4gO+P0ZkiNdxNtQCqM9f8XbSp1lEoqNk kYwUseRv0HrR3sHqisLC6m2bELSV09oElSaA0sZ6QB438cWPKX6R1n/358W+bDSzoD9c Jv7Eh6ySkMUUCsCPHMDKjU4CqhtKFrjGdy2rRPZCNGgbBxoXebECjO4d6y4fOV/aO4mA di/w== X-Gm-Message-State: AOAM533A9nOAmlqrXWtVhT7SBs3QP+fpZs8tehLItfuCXTJ8rlnhPIRU C6jd94LNw7I742S459iROTy4hYUMLjrn X-Google-Smtp-Source: ABdhPJwzuvMh3zQCfbfNMJsaq6OjADQioHiLnyYl6H2EDA9pvvXHHKAvSv1H2wAlbRTJdLmeHTxeSo6ISHIM X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:2402:b0:4e1:46ca:68bd with SMTP id z2-20020a056a00240200b004e146ca68bdmr12581647pfh.70.1654303224005; Fri, 03 Jun 2022 17:40:24 -0700 (PDT) Date: Fri, 3 Jun 2022 17:39:56 -0700 In-Reply-To: <20220604004004.954674-1-zokeefe@google.com> Message-Id: <20220604004004.954674-8-zokeefe@google.com> Mime-Version: 1.0 References: <20220604004004.954674-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.1.255.ge46751e96f-goog Subject: [PATCH v6 07/15] mm/khugepaged: add flag to ignore khugepaged heuristics From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Xu496L45; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3-KmaYgcKCNEMB711213BB381.zB985AHK-997Ixz7.BE3@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3-KmaYgcKCNEMB711213BB381.zB985AHK-997Ixz7.BE3@flex--zokeefe.bounces.google.com X-Stat-Signature: 3qzyhq75d5pwgeipzm6pg31fdduuwkby X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 53A7A20013 X-HE-Tag: 1654303209-586131 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_page_heuristics flag to struct collapse_control that allows context to ignore heuristics originally designed to guide khugepaged: 1) sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared] 2) requirement that some pages in region being collapsed be young or referenced This flag is set in khugepaged collapse context to preserve existing khugepaged behavior. This flag will be used (unset) when introducing madvise collapse context since here, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't tell the user they are wrong. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 55 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 37 insertions(+), 18 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 03e0da0008f1..c3589b3e238d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -87,6 +87,13 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + /* + * Heuristics: + * - khugepaged_max_ptes_[none|swap|shared] + * - require memory to be young / referenced + */ + bool enforce_page_heuristics; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -604,6 +611,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -617,7 +625,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_page_heuristics)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -637,8 +646,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_page_heuristics && page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -705,9 +714,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, list_add_tail(&page->lru, compound_pagelist); next: /* There should be enough young pte to collapse the page */ - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_page_heuristics && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; if (pte_write(pteval)) @@ -716,7 +726,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!writable)) { result = SCAN_PAGE_RO; - } else if (unlikely(!referenced)) { + } else if (unlikely(cc->enforce_page_heuristics && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1096,7 +1106,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - result = __collapse_huge_page_isolate(vma, address, pte, + result = __collapse_huge_page_isolate(vma, address, pte, cc, &compound_pagelist); spin_unlock(pte_ptl); @@ -1185,7 +1195,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->enforce_page_heuristics) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1204,7 +1215,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_page_heuristics)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -1234,8 +1246,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unmap; } - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_page_heuristics && + page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -1289,14 +1302,17 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, result = SCAN_PAGE_COUNT; goto out_unmap; } - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_page_heuristics && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; } if (!writable) { result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { + } else if (cc->enforce_page_heuristics && + (!referenced || + (unmapped && referenced < HPAGE_PMD_NR / 2))) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1966,7 +1982,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->enforce_page_heuristics && + ++swap > khugepaged_max_ptes_swap) { result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -2017,7 +2034,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, rcu_read_unlock(); if (result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->enforce_page_heuristics) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { @@ -2258,6 +2276,7 @@ static int khugepaged(void *none) { struct mm_slot *mm_slot; struct collapse_control cc = { + .enforce_page_heuristics = true, .last_target_node = NUMA_NO_NODE, /* .gfp set later */ }; -- 2.36.1.255.ge46751e96f-goog