From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Nico Pache <npache@redhat.com>
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
lorenzo.stoakes@oracle.com, ryan.roberts@arm.com,
dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org,
mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
akpm@linux-foundation.org, baohua@kernel.org,
willy@infradead.org, peterx@redhat.com,
wangkefeng.wang@huawei.com, usamaarif642@gmail.com,
sunnanyong@huawei.com, vishal.moola@gmail.com,
thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com,
kirill.shutemov@linux.intel.com, aarcange@redhat.com,
raquini@redhat.com, anshuman.khandual@arm.com,
catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org,
dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org,
jglisse@google.com, surenb@google.com, zokeefe@google.com,
hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com,
rdunlap@infradead.org
Subject: Re: [PATCH v7 02/12] introduce khugepaged_collapse_single_pmd to unify khugepaged and madvise_collapse
Date: Fri, 16 May 2025 13:12:01 -0400 [thread overview]
Message-ID: <db37bakzupqagevhjvngsu7vzcqugp6coy635bvhoy6cdrzk53@mrldbtuep3gk> (raw)
In-Reply-To: <20250515032226.128900-3-npache@redhat.com>
* Nico Pache <npache@redhat.com> [250514 23:23]:
> The khugepaged daemon and madvise_collapse have two different
> implementations that do almost the same thing.
>
> Create khugepaged_collapse_single_pmd to increase code
> reuse and create an entry point for future khugepaged changes.
>
> Refactor madvise_collapse and khugepaged_scan_mm_slot to use
> the new khugepaged_collapse_single_pmd function.
>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 96 +++++++++++++++++++++++++------------------------
> 1 file changed, 49 insertions(+), 47 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 806bcd8c5185..5457571d505a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2353,6 +2353,48 @@ static int khugepaged_scan_file(struct mm_struct *mm, unsigned long addr,
> return result;
> }
>
> +/*
> + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> + * the results.
> + */
> +static int khugepaged_collapse_single_pmd(unsigned long addr,
> + struct vm_area_struct *vma, bool *mmap_locked,
> + struct collapse_control *cc)
> +{
> + int result = SCAN_FAIL;
> + struct mm_struct *mm = vma->vm_mm;
> +
> + if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
why IS_ENABLED(CONFIG_SHMEM) here, it seems new?
> + struct file *file = get_file(vma->vm_file);
> + pgoff_t pgoff = linear_page_index(vma, addr);
> +
> + mmap_read_unlock(mm);
> + *mmap_locked = false;
> + result = khugepaged_scan_file(mm, addr, file, pgoff, cc);
> + fput(file);
> + if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
> + mmap_read_lock(mm);
> + *mmap_locked = true;
> + if (khugepaged_test_exit_or_disable(mm)) {
> + result = SCAN_ANY_PROCESS;
> + goto end;
> + }
> + result = collapse_pte_mapped_thp(mm, addr,
> + !cc->is_khugepaged);
> + if (result == SCAN_PMD_MAPPED)
> + result = SCAN_SUCCEED;
> + mmap_read_unlock(mm);
> + *mmap_locked = false;
> + }
> + } else {
> + result = khugepaged_scan_pmd(mm, vma, addr, mmap_locked, cc);
> + }
> + if (cc->is_khugepaged && result == SCAN_SUCCEED)
> + ++khugepaged_pages_collapsed;
> +end:
> + return result;
This function can return with mmap_read_locked or unlocked..
> +}
> +
> static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> struct collapse_control *cc)
> __releases(&khugepaged_mm_lock)
> @@ -2427,34 +2469,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> VM_BUG_ON(khugepaged_scan.address < hstart ||
> khugepaged_scan.address + HPAGE_PMD_SIZE >
> hend);
> - if (!vma_is_anonymous(vma)) {
> - struct file *file = get_file(vma->vm_file);
> - pgoff_t pgoff = linear_page_index(vma,
> - khugepaged_scan.address);
> -
> - mmap_read_unlock(mm);
> - mmap_locked = false;
> - *result = hpage_collapse_scan_file(mm,
> - khugepaged_scan.address, file, pgoff, cc);
> - fput(file);
> - if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> - mmap_read_lock(mm);
> - if (hpage_collapse_test_exit_or_disable(mm))
> - goto breakouterloop;
> - *result = collapse_pte_mapped_thp(mm,
> - khugepaged_scan.address, false);
> - if (*result == SCAN_PMD_MAPPED)
> - *result = SCAN_SUCCEED;
> - mmap_read_unlock(mm);
> - }
> - } else {
> - *result = hpage_collapse_scan_pmd(mm, vma,
> - khugepaged_scan.address, &mmap_locked, cc);
> - }
> -
> - if (*result == SCAN_SUCCEED)
> - ++khugepaged_pages_collapsed;
>
> + *ngle_pmd(khugepaged_scan.address,
> + vma, &mmap_locked, cc);
> + /* If we return SCAN_ANY_PROCESS we are holding the mmap_lock */
But this comment makes it obvious that you know that..
> + if (*result == SCAN_ANY_PROCESS)
> + goto breakouterloop;
But later..
breakouterloop:
mmap_read_unlock(mm); /* exit_mmap will destroy ptes after this */
breakouterloop_mmap_lock:
So if you return with SCAN_ANY_PROCESS, we are holding the lock and go
immediately and drop it. This seems unnecessarily complicated and
involves a lock.
That would leave just the khugepaged_scan_pmd() path with the
unfortunate locking mess - which is a static function and called in one
location..
Looking at what happens after the return seems to indicate we could
clean that up as well, sometime later.
> /* move to next address */
> khugepaged_scan.address += HPAGE_PMD_SIZE;
> progress += HPAGE_PMD_NR;
> @@ -2773,36 +2793,18 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
> mmap_assert_locked(mm);
> memset(cc->node_load, 0, sizeof(cc->node_load));
> nodes_clear(cc->alloc_nmask);
> - if (!vma_is_anonymous(vma)) {
> - struct file *file = get_file(vma->vm_file);
> - pgoff_t pgoff = linear_page_index(vma, addr);
>
> - mmap_read_unlock(mm);
> - mmap_locked = false;
> - result = hpage_collapse_scan_file(mm, addr, file, pgoff,
> - cc);
> - fput(file);
> - } else {
> - result = hpage_collapse_scan_pmd(mm, vma, addr,
> - &mmap_locked, cc);
> - }
> + result = khugepaged_collapse_single_pmd(addr, vma, &mmap_locked, cc);
> +
> if (!mmap_locked)
> *prev = NULL; /* Tell caller we dropped mmap_lock */
>
> -handle_result:
> switch (result) {
> case SCAN_SUCCEED:
> case SCAN_PMD_MAPPED:
> ++thps;
> break;
> case SCAN_PTE_MAPPED_HUGEPAGE:
> - BUG_ON(mmap_locked);
> - BUG_ON(*prev);
> - mmap_read_lock(mm);
> - result = collapse_pte_mapped_thp(mm, addr, true);
> - mmap_read_unlock(mm);
> - goto handle_result;
All of the above should probably be replaced with a BUG_ON(1) since it's
not expected now? Or at least WARN_ON_ONCE(), but it should be safe to
continue if that's the case.
It looks like the mmap_locked boolean is used to ensure that *prev is
safe, but we are now dropping the lock and re-acquiring it (and
potentially returning here) with it set to true, so perv will not be set
to NULL like it should.
I think you can handle this by ensuring that
khugepaged_collapse_single_pmd() returns with mmap_locked false in the
SCAN_ANY_PROCESS case.
> - /* Whitelisted set of results where continuing OK */
This seems worth keeping?
> case SCAN_PMD_NULL:
> case SCAN_PTE_NON_PRESENT:
> case SCAN_PTE_UFFD_WP:
I guess SCAN_ANY_PROCESS should be handled by the default case
statement? It should probably be added to the switch?
That is to say, before your change the result would come from either
hpage_collapse_scan_file(), then lead to collapse_pte_mapped_thp()
above.
Now, you can have khugepaged_test_exit_or_disable() happen to return
SCAN_ANY_PROCESS and it will fall through to the default in this switch
statement, which seems like new behaviour?
At the very least, this information should be added to the git log on
what this patch does - if it's expected?
Thanks,
Liam
next prev parent reply other threads:[~2025-05-16 17:13 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-15 3:22 [PATCH v7 00/12] khugepaged: mTHP support Nico Pache
2025-05-15 3:22 ` [PATCH v7 01/12] khugepaged: rename hpage_collapse_* to khugepaged_* Nico Pache
2025-05-16 17:30 ` Liam R. Howlett
2025-06-29 6:48 ` Nico Pache
2025-05-15 3:22 ` [PATCH v7 02/12] introduce khugepaged_collapse_single_pmd to unify khugepaged and madvise_collapse Nico Pache
2025-05-15 5:50 ` Baolin Wang
2025-05-16 11:59 ` Nico Pache
2025-05-16 17:12 ` Liam R. Howlett [this message]
2025-07-02 0:00 ` Nico Pache
2025-05-15 3:22 ` [PATCH v7 03/12] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-05-16 17:14 ` Liam R. Howlett
2025-06-29 6:52 ` Nico Pache
2025-05-23 6:55 ` Baolin Wang
2025-05-28 6:57 ` Dev Jain
2025-05-29 4:00 ` Nico Pache
2025-05-30 3:02 ` Baolin Wang
2025-05-15 3:22 ` [PATCH v7 04/12] khugepaged: generalize alloc_charge_folio() Nico Pache
2025-05-15 3:22 ` [PATCH v7 05/12] khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2025-05-15 3:22 ` [PATCH v7 06/12] khugepaged: introduce khugepaged_scan_bitmap " Nico Pache
2025-05-16 3:20 ` Baolin Wang
2025-05-17 6:47 ` Nico Pache
2025-05-18 3:04 ` Liam R. Howlett
2025-05-20 10:09 ` Baolin Wang
2025-05-20 10:26 ` David Hildenbrand
2025-05-21 1:03 ` Baolin Wang
2025-05-21 10:23 ` Nico Pache
2025-05-22 9:39 ` Baolin Wang
2025-05-28 9:26 ` David Hildenbrand
2025-05-28 14:04 ` Baolin Wang
2025-05-29 4:02 ` Nico Pache
2025-05-29 8:27 ` Baolin Wang
2025-05-15 3:22 ` [PATCH v7 07/12] khugepaged: add " Nico Pache
2025-06-07 6:23 ` Dev Jain
2025-06-07 12:55 ` Nico Pache
2025-06-07 13:03 ` Nico Pache
2025-06-07 14:31 ` Dev Jain
2025-06-07 14:42 ` Dev Jain
2025-05-15 3:22 ` [PATCH v7 08/12] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-05-15 3:22 ` [PATCH v7 09/12] khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2025-05-15 3:22 ` [PATCH v7 10/12] khugepaged: improve tracepoints for mTHP orders Nico Pache
2025-05-15 3:22 ` [PATCH v7 11/12] khugepaged: add per-order mTHP khugepaged stats Nico Pache
2025-05-15 3:22 ` [PATCH v7 12/12] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2025-05-15 4:40 ` Randy Dunlap
2025-06-07 6:44 ` Dev Jain
2025-06-07 12:57 ` Nico Pache
2025-06-07 14:34 ` Dev Jain
2025-06-08 19:50 ` Nico Pache
2025-06-09 3:06 ` Baolin Wang
2025-06-09 5:26 ` Dev Jain
2025-06-09 6:39 ` Baolin Wang
2025-06-09 5:56 ` Nico Pache
2025-05-28 12:31 ` [PATCH 1/2] mm: khugepaged: allow khugepaged to check all anonymous mTHP orders Baolin Wang
2025-05-28 12:31 ` [PATCH 2/2] mm: khugepaged: kick khugepaged for enabling none-PMD-sized mTHPs Baolin Wang
2025-05-28 12:39 ` [PATCH v7 00/12] khugepaged: mTHP support Baolin Wang
2025-05-29 3:52 ` Nico Pache
2025-06-16 3:51 ` Dev Jain
2025-06-16 15:51 ` Nico Pache
2025-06-16 16:35 ` Dev Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=db37bakzupqagevhjvngsu7vzcqugp6coy635bvhoy6cdrzk53@mrldbtuep3gk \
--to=liam.howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=jglisse@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=ryan.roberts@arm.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usamaarif642@gmail.com \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox