linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com, Chris Mason <clm@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Kiryl Shutsemau <kirill@shutemov.name>,
	Brendan Jackman <jackmanb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Suren Baghdasaryan <surenb@google.com>, Zi Yan <ziy@nvidia.com>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit
Date: Wed,  1 Oct 2025 08:55:23 -0700	[thread overview]
Message-ID: <20251001155523.2470826-1-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <689e36d4-d828-42c5-9e57-ba663adc9ea9@suse.cz>

Hi Vlastimil,

Thank you for your feedback!

On Wed, 1 Oct 2025 12:04:50 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 9/29/25 5:17 PM, Joshua Hahn wrote:
> > On Sun, 28 Sep 2025 13:17:37 +0800 kernel test robot <oliver.sang@intel.com> wrote:
> > 
> > Hello Kernel Test Robot,
> > 
> >> Hello,
> >>
> >> kernel test robot noticed "WARNING:bad_unlock_balance_detected" on:

[...snip...]

> >> [  414.880298][ T7549] WARNING: bad unlock balance detected!
> >> [  414.881071][ T7549] 6.17.0-rc6-00147-g7e86100bfb0d #1 Not tainted
> >> [  414.881924][ T7549] -------------------------------------
> >> [  414.882695][ T7549] date/7549 is trying to release lock (&pcp->lock) at:
> >> [ 414.883649][ T7549] free_frozen_page_commit+0x425/0x9d0 
> >> [  414.884764][ T7549] but there are no more locks to release!
> >> [  414.885539][ T7549]
> >> [  414.885539][ T7549] other info that might help us debug this:
> >> [  414.886704][ T7549] 2 locks held by date/7549:
> >> [ 414.887353][ T7549] #0: ffff888104f29940 (&mm->mmap_lock){++++}-{4:4}, at: exit_mmap (include/linux/seqlock.h:431 include/linux/mmap_lock.h:88 include/linux/mmap_lock.h:398 mm/mmap.c:1288) 
> >> [ 414.888591][ T7549] #1: ffff8883ae40e858 (&pcp->lock){+.+.}-{3:3}, at: free_frozen_page_commit+0x46a/0x9d0 
> > 
> > So based on this, it seems like I must have overlooked a pretty important
> > consideration here. When I unlock the pcp, it allows both the zone and pcp
> > lock to be picked up by another task (pcp lock less likely), but it also
> > means that this process can be migrated to a different CPU, where it will
> > be trying to unlock & acquire a completely different pcp.
> 
> Yes.
> 
> > For me the most simple solution looks to be migrate_disable() and
> > migrate_enable() in the function to ensure that this task is bound to the
> > CPU it originally started runing on.
> > 
> > I'm not sure how this will affect performance, but I think in terms of
> 
> It is somewhat expensive, I'd rather avoid if possible.
> 
> > desired behavior it does seem like this is the correct way to do it.
> 
> I'd rather detect this happened (new pcp doesn't match old pcp after a
> relock) and either give up (should be rare enough hopefully so won't
> cause much imbalance) or recalculate how much to free on the other cpu
> and continue there (probably subtract how much we already did so we
> don't end up unlucky flushing all kinds of cpus "forever").

I think this idea makes sense. Since I am dropping patch 2/4, the remaining
call sites here will be in decay_pcp_high and free_frozen_page_commit.
Here, I think it makes sense to just give up when it realizes it's on a
different CPU. If the new CPU should have pages flushed, then it will be
flushed by either the next call to free_frozen_page_commit (like in
free_unref_folios) or in the case of __free_frozen_pages, it doesn't really
make sense to flush a pcp that isn't related to the current caller.

One concern that I do have is that now it is possible to flush less pages
than the current behavior, since before it was guaranteed that the
specified number of pages will have been flushed. But like you suggested,
hopefully this is rare enough that we don't see it happen. FWIW, I have
not seen this happen before during my testing, the first time I saw it
was in the kernel test robot report, so hopefully it's not very likely.

Thank you for the idea Vlastimil. I'll make these changes in v3!
I hope you have a great day!
Joshua


      reply	other threads:[~2025-10-01 15:55 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-24 20:44 [PATCH v2 0/4] mm/page_alloc: Batch callers of free_pcppages_bulk Joshua Hahn
2025-09-24 20:44 ` [PATCH v2 1/4] mm/page_alloc/vmstat: Simplify refresh_cpu_vm_stats change detection Joshua Hahn
2025-09-24 22:51   ` Christoph Lameter (Ampere)
2025-09-25 18:26     ` Joshua Hahn
2025-09-26 15:34   ` Dan Carpenter
2025-09-26 16:40     ` Joshua Hahn
2025-09-26 17:50   ` SeongJae Park
2025-09-26 18:24     ` Joshua Hahn
2025-09-26 18:33       ` SeongJae Park
2025-09-24 20:44 ` [PATCH v2 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone Joshua Hahn
2025-09-24 23:09   ` Christoph Lameter (Ampere)
2025-09-25 18:44     ` Joshua Hahn
2025-09-26 16:21       ` Christoph Lameter (Ampere)
2025-09-26 17:25         ` Joshua Hahn
2025-10-01 11:23         ` Vlastimil Babka
2025-09-26 14:01   ` Brendan Jackman
2025-09-26 15:48     ` Joshua Hahn
2025-09-26 16:57       ` Brendan Jackman
2025-09-26 17:33         ` Joshua Hahn
2025-09-27  0:46   ` Hillf Danton
2025-09-30 14:42     ` Joshua Hahn
2025-09-30 22:14       ` Hillf Danton
2025-10-01 15:37         ` Joshua Hahn
2025-10-01 23:48           ` Hillf Danton
2025-10-03  8:35             ` Vlastimil Babka
2025-10-03 10:02               ` Hillf Danton
2025-10-04  9:03                 ` Mike Rapoport
2025-09-24 20:44 ` [PATCH v2 3/4] mm/page_alloc: Batch page freeing in decay_pcp_high Joshua Hahn
2025-09-24 20:44 ` [PATCH v2 4/4] mm/page_alloc: Batch page freeing in free_frozen_page_commit Joshua Hahn
2025-09-28  5:17   ` kernel test robot
2025-09-29 15:17     ` Joshua Hahn
2025-10-01 10:04       ` Vlastimil Babka
2025-10-01 15:55         ` Joshua Hahn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251001155523.2470826-1-joshua.hahnjy@gmail.com \
    --to=joshua.hahnjy@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=kernel-team@meta.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mhocko@suse.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox