From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C900C433EF for ; Tue, 12 Jul 2022 17:06:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1326A9400B5; Tue, 12 Jul 2022 13:06:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E486940063; Tue, 12 Jul 2022 13:06:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EECC49400B5; Tue, 12 Jul 2022 13:06:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id DE489940063 for ; Tue, 12 Jul 2022 13:06:51 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A3844208EF for ; Tue, 12 Jul 2022 17:06:51 +0000 (UTC) X-FDA: 79679077422.29.5830DEE Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf14.hostedemail.com (Postfix) with ESMTP id 54422100062 for ; Tue, 12 Jul 2022 17:06:51 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id p9so8329348pjd.3 for ; Tue, 12 Jul 2022 10:06:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ebwTNLZZddGE04E9RmMGJmv0nKXRZtYF685JBx7kUsw=; b=h3WNW/VFQrpVzhJQcCK+1I3YwjHSfsKrM8Zg7BGOrOUxm0/aNr0RSMxr7oMMTdatvc 8tqA1caxy+zAMCSaKcRrwbH9nLmoqNjaYzH2J87GQYV1cLxLEbTN0xSM63xBGL7qD4+5 hGIWEUhirVpxfNnPo50OtbNnQQZdAZKgwjNUaCeQFYDmNIIxM/U6Aggqh1LKdCAQWm6q +rF6CStV/NvA+0+bAhz7jGdPce6Y/Fu8DJ+33KMzw4DOCPsYL2e3TH5sMc7RrC6OTrxJ cu8wNLgSvrxEb6rZiRlLTEqvPK3ZbRVAzBnWE5P6wzkJVeSovccXvUt4yXcWoGJFx+sF evKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ebwTNLZZddGE04E9RmMGJmv0nKXRZtYF685JBx7kUsw=; b=IYf8TXBxXGWpwgUgebkh/crfhTxYM82Igqn1tacPDG+VOYfBYYuyHbPZuOkROa/taY o6Aqy2LOTW5vzlIfF8uwcZFbr6QKz+E3mTjbEXqFfon2/VAtZ3+otSp7g3pYDAODLv4v WIsATsN/OSF8Cjb/2R9D12A/5eAteNqDN4AVth9+DcBkQSKCyPE4hCQna/1EpXeHYlqT yQpmv24Iy+aFJ/sFuUGInHFQs2JGQDaW98xM362Nvlu9AL6OK3lbh6kwpERhni1ClK+a Rt/x6CKbftFC2qmNT0oy/+0BFkf659/vOztL2/QsljlUyAaKhykAifVzuM8wIpOUoxV3 FbOw== X-Gm-Message-State: AJIora+PQHNEVthGdINIWpCL3nFe9uvsEhH9W0YHuBathFY1eGMpOOUi Tev6Lq5qG6Z5vW2gRF2uy75LMA== X-Google-Smtp-Source: AGRyM1vJUoRqVYP4h1K8AXIJMTKR/gzHrOZ6uFDti2rd4FezCEJy/SriRC1OQs+K2q3uBuU4qKcJ0Q== X-Received: by 2002:a17:90b:1b48:b0:1ef:a90d:eae1 with SMTP id nv8-20020a17090b1b4800b001efa90deae1mr5354698pjb.68.1657645610089; Tue, 12 Jul 2022 10:06:50 -0700 (PDT) Received: from google.com (55.212.185.35.bc.googleusercontent.com. [35.185.212.55]) by smtp.gmail.com with ESMTPSA id w132-20020a627b8a000000b00528c04c7c54sm7004419pfc.131.2022.07.12.10.06.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 10:06:49 -0700 (PDT) Date: Tue, 12 Jul 2022 10:06:46 -0700 From: Zach O'Keefe To: Yang Shi Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Subject: Re: [mm-unstable v7 06/18] mm/khugepaged: add flag to predicate khugepaged-only behavior Message-ID: References: <20220706235936.2197195-1-zokeefe@google.com> <20220706235936.2197195-7-zokeefe@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657645611; a=rsa-sha256; cv=none; b=tiOOx/IXF/GMdbxqgRv/xrnML8RwMTnEpRg9gPLv4/6NkOfMg+Qat4UQLiaqz78CFAOdgX j8Rqd951//Y4ZTTFci5Ryn2sKIG5LQGi3BEMV2y3zk2GVXSKz89CMoPG6gmDnqZ1vVwShJ BYopvimoanRO/+nWQ0J/0GMVlAPmcgY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="h3WNW/VF"; spf=pass (imf14.hostedemail.com: domain of zokeefe@google.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657645611; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ebwTNLZZddGE04E9RmMGJmv0nKXRZtYF685JBx7kUsw=; b=bpfY9fgPhy4Ruwwxz7lwXU43tQZpj8Rk9BXaXRqueFCWvTCgUTkqxbCJceSbshIz1F3mgD WH38vTwnqZjnAcLe3N0SqQUTfgu4vmxcFS+XBH3sKoYYonFh6OgBzp8PhmFKk39F+XCLpj IFkx5yEZzDDpKzCO5BbszQ5H3h40KSA= X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 54422100062 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="h3WNW/VF"; spf=pass (imf14.hostedemail.com: domain of zokeefe@google.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: qnqionisgpwbbrx6o7bpe5f39dqacnnj X-HE-Tag: 1657645611-455456 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jul 11 13:43, Yang Shi wrote: > On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe wrote: > > > > Add .is_khugepaged flag to struct collapse_control so > > khugepaged-specific behavior can be elided by MADV_COLLAPSE context. > > > > Start by protecting khugepaged-specific heuristics by this flag. In > > MADV_COLLAPSE, the user presumably has reason to believe the collapse > > will be beneficial and khugepaged heuristics shouldn't prevent the user > > from doing so: > > > > 1) sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared] > > > > 2) requirement that some pages in region being collapsed be young or > > referenced > > > > Signed-off-by: Zach O'Keefe > > --- > > > > v6 -> v7: There is no functional change here from v6, just a renaming of > > flags to explicitly be predicated on khugepaged. > > Reviewed-by: Yang Shi > > Just a nit, some conditions check is_khugepaged first, some don't. Why > not make them more consistent to check is_khugepaged first? > Again, thank you for taking the time to review. Agreed the inconsistency is ugly, and have updated to check is_khugepaged consistently first. Thanks for the suggestion. Zach > > --- > > mm/khugepaged.c | 62 ++++++++++++++++++++++++++++++++++--------------- > > 1 file changed, 43 insertions(+), 19 deletions(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 147f5828f052..d89056d8cbad 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -73,6 +73,8 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); > > * default collapse hugepages if there is at least one pte mapped like > > * it would have happened if the vma was large enough during page > > * fault. > > + * > > + * Note that these are only respected if collapse was initiated by khugepaged. > > */ > > static unsigned int khugepaged_max_ptes_none __read_mostly; > > static unsigned int khugepaged_max_ptes_swap __read_mostly; > > @@ -86,6 +88,8 @@ static struct kmem_cache *mm_slot_cache __read_mostly; > > #define MAX_PTE_MAPPED_THP 8 > > > > struct collapse_control { > > + bool is_khugepaged; > > + > > /* Num pages scanned per node */ > > int node_load[MAX_NUMNODES]; > > > > @@ -554,6 +558,7 @@ static bool is_refcount_suitable(struct page *page) > > static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > > unsigned long address, > > pte_t *pte, > > + struct collapse_control *cc, > > struct list_head *compound_pagelist) > > { > > struct page *page = NULL; > > @@ -567,7 +572,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > > if (pte_none(pteval) || (pte_present(pteval) && > > is_zero_pfn(pte_pfn(pteval)))) { > > if (!userfaultfd_armed(vma) && > > - ++none_or_zero <= khugepaged_max_ptes_none) { > > + (++none_or_zero <= khugepaged_max_ptes_none || > > + !cc->is_khugepaged)) { > > continue; > > } else { > > result = SCAN_EXCEED_NONE_PTE; > > @@ -587,8 +593,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > > > > VM_BUG_ON_PAGE(!PageAnon(page), page); > > > > - if (page_mapcount(page) > 1 && > > - ++shared > khugepaged_max_ptes_shared) { > > + if (cc->is_khugepaged && page_mapcount(page) > 1 && > > + ++shared > khugepaged_max_ptes_shared) { > > result = SCAN_EXCEED_SHARED_PTE; > > count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); > > goto out; > > @@ -654,10 +660,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > > if (PageCompound(page)) > > list_add_tail(&page->lru, compound_pagelist); > > next: > > - /* There should be enough young pte to collapse the page */ > > - if (pte_young(pteval) || > > - page_is_young(page) || PageReferenced(page) || > > - mmu_notifier_test_young(vma->vm_mm, address)) > > + /* > > + * If collapse was initiated by khugepaged, check that there is > > + * enough young pte to justify collapsing the page > > + */ > > + if (cc->is_khugepaged && > > + (pte_young(pteval) || page_is_young(page) || > > + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, > > + address))) > > referenced++; > > > > if (pte_write(pteval)) > > @@ -666,7 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > > > > if (unlikely(!writable)) { > > result = SCAN_PAGE_RO; > > - } else if (unlikely(!referenced)) { > > + } else if (unlikely(cc->is_khugepaged && !referenced)) { > > result = SCAN_LACK_REFERENCED_PAGE; > > } else { > > result = SCAN_SUCCEED; > > @@ -745,6 +755,7 @@ static void khugepaged_alloc_sleep(void) > > > > > > struct collapse_control khugepaged_collapse_control = { > > + .is_khugepaged = true, > > .last_target_node = NUMA_NO_NODE, > > }; > > > > @@ -1023,7 +1034,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > > mmu_notifier_invalidate_range_end(&range); > > > > spin_lock(pte_ptl); > > - result = __collapse_huge_page_isolate(vma, address, pte, > > + result = __collapse_huge_page_isolate(vma, address, pte, cc, > > &compound_pagelist); > > spin_unlock(pte_ptl); > > > > @@ -1114,7 +1125,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, > > _pte++, _address += PAGE_SIZE) { > > pte_t pteval = *_pte; > > if (is_swap_pte(pteval)) { > > - if (++unmapped <= khugepaged_max_ptes_swap) { > > + if (++unmapped <= khugepaged_max_ptes_swap || > > + !cc->is_khugepaged) { > > /* > > * Always be strict with uffd-wp > > * enabled swap entries. Please see > > @@ -1133,7 +1145,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, > > } > > if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { > > if (!userfaultfd_armed(vma) && > > - ++none_or_zero <= khugepaged_max_ptes_none) { > > + (++none_or_zero <= khugepaged_max_ptes_none || > > + !cc->is_khugepaged)) { > > continue; > > } else { > > result = SCAN_EXCEED_NONE_PTE; > > @@ -1163,8 +1176,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, > > goto out_unmap; > > } > > > > - if (page_mapcount(page) > 1 && > > - ++shared > khugepaged_max_ptes_shared) { > > + if (cc->is_khugepaged && > > + page_mapcount(page) > 1 && > > + ++shared > khugepaged_max_ptes_shared) { > > result = SCAN_EXCEED_SHARED_PTE; > > count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); > > goto out_unmap; > > @@ -1218,14 +1232,22 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, > > result = SCAN_PAGE_COUNT; > > goto out_unmap; > > } > > - if (pte_young(pteval) || > > - page_is_young(page) || PageReferenced(page) || > > - mmu_notifier_test_young(vma->vm_mm, address)) > > + > > + /* > > + * If collapse was initiated by khugepaged, check that there is > > + * enough young pte to justify collapsing the page > > + */ > > + if (cc->is_khugepaged && > > + (pte_young(pteval) || page_is_young(page) || > > + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, > > + address))) > > referenced++; > > } > > if (!writable) { > > result = SCAN_PAGE_RO; > > - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { > > + } else if (cc->is_khugepaged && > > + (!referenced || > > + (unmapped && referenced < HPAGE_PMD_NR / 2))) { > > result = SCAN_LACK_REFERENCED_PAGE; > > } else { > > result = SCAN_SUCCEED; > > @@ -1894,7 +1916,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, > > continue; > > > > if (xa_is_value(page)) { > > - if (++swap > khugepaged_max_ptes_swap) { > > + if (cc->is_khugepaged && > > + ++swap > khugepaged_max_ptes_swap) { > > result = SCAN_EXCEED_SWAP_PTE; > > count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); > > break; > > @@ -1945,7 +1968,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, > > rcu_read_unlock(); > > > > if (result == SCAN_SUCCEED) { > > - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { > > + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && > > + cc->is_khugepaged) { > > result = SCAN_EXCEED_NONE_PTE; > > count_vm_event(THP_SCAN_EXCEED_NONE_PTE); > > } else { > > -- > > 2.37.0.rc0.161.g10f37bed90-goog > >