From: Vlastimil Babka <vbabka@suse.cz>
To: Mel Gorman <mgorman@techsingularity.net>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>, Yu Zhao <yuzhao@google.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Michal Hocko <mhocko@kernel.org>,
Marek Szyprowski <m.szyprowski@samsung.com>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] mm/page_alloc: Leave IRQs enabled for per-cpu page allocations
Date: Fri, 18 Nov 2022 15:30:57 +0100 [thread overview]
Message-ID: <f9acd363-bf53-c582-78ec-347fd7ec5c37@suse.cz> (raw)
In-Reply-To: <20221118101714.19590-3-mgorman@techsingularity.net>
On 11/18/22 11:17, Mel Gorman wrote:
> The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task
> allocating from the PCP must not re-enter the allocator from IRQ context.
> In each instance where IRQ-reentrancy is possible, the lock is acquired
> using pcp_spin_trylock_irqsave() even though IRQs are disabled and
> re-entrancy is impossible.
>
> Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common
> case at the cost of some IRQ allocations taking a slower path. If the PCP
> lists need to be refilled, the zone lock still needs to disable IRQs but
> that will only happen on PCP refill and drain. If an IRQ is raised when
> a PCP allocation is in progress, the trylock will fail and fallback to
> using the buddy lists directly. Note that this may not be a universal win
> if an interrupt-intensive workload also allocates heavily from interrupt
> context and contends heavily on the zone->lock as a result.
>
> [yuzhao@google.com: Reported lockdep issue on IO completion from softirq]
> [hughd@google.com: Fix list corruption, lock improvements, micro-optimsations]
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Some nits below:
> @@ -3516,10 +3485,10 @@ void free_unref_page(struct page *page, unsigned int order)
> */
> void free_unref_page_list(struct list_head *list)
> {
> + unsigned long __maybe_unused UP_flags;
> struct page *page, *next;
> struct per_cpu_pages *pcp = NULL;
> struct zone *locked_zone = NULL;
> - unsigned long flags;
> int batch_count = 0;
> int migratetype;
>
> @@ -3550,11 +3519,26 @@ void free_unref_page_list(struct list_head *list)
>
> /* Different zone, different pcp lock. */
> if (zone != locked_zone) {
> - if (pcp)
> - pcp_spin_unlock_irqrestore(pcp, flags);
> + if (pcp) {
> + pcp_spin_unlock(pcp);
> + pcp_trylock_finish(UP_flags);
> + }
>
> + /*
> + * trylock is necessary as pages may be getting freed
> + * from IRQ or SoftIRQ context after an IO completion.
> + */
> + pcp_trylock_prepare(UP_flags);
> + pcp = pcp_spin_trylock(zone->per_cpu_pageset);
> + if (!pcp) {
Perhaps use unlikely() here?
> + pcp_trylock_finish(UP_flags);
> + free_one_page(zone, page, page_to_pfn(page),
> + 0, migratetype, FPI_NONE);
Not critical for correctness, but the migratepage here might be stale and we
should do get_pcppage_migratetype(page);
> + locked_zone = NULL;
> + continue;
> + }
> locked_zone = zone;
> - pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags);
> + batch_count = 0;
> }
>
> /*
> @@ -3569,18 +3553,23 @@ void free_unref_page_list(struct list_head *list)
> free_unref_page_commit(zone, pcp, page, migratetype, 0);
>
> /*
> - * Guard against excessive IRQ disabled times when we get
> - * a large list of pages to free.
> + * Guard against excessive lock hold times when freeing
> + * a large list of pages. Lock will be reacquired if
> + * necessary on the next iteration.
> */
> if (++batch_count == SWAP_CLUSTER_MAX) {
> - pcp_spin_unlock_irqrestore(pcp, flags);
> + pcp_spin_unlock(pcp);
> + pcp_trylock_finish(UP_flags);
> batch_count = 0;
> - pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags);
> + pcp = NULL;
> + locked_zone = NULL;
AFAICS if this block was just "locked_zone = NULL;" then the existing code
would do the right thing.
Or maybe to have simpler code, just do batch_count++ here and
make the relocking check do
if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX)
> }
> }
>
> - if (pcp)
> - pcp_spin_unlock_irqrestore(pcp, flags);
> + if (pcp) {
> + pcp_spin_unlock(pcp);
> + pcp_trylock_finish(UP_flags);
> + }
> }
>
> /*
next prev parent reply other threads:[~2022-11-18 14:31 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-18 10:17 [PATCH v3 0/2] " Mel Gorman
2022-11-18 10:17 ` [PATCH 1/2] mm/page_alloc: Always remove pages from temporary list Mel Gorman
2022-11-18 13:24 ` Vlastimil Babka
2022-11-18 10:17 ` [PATCH 2/2] mm/page_alloc: Leave IRQs enabled for per-cpu page allocations Mel Gorman
2022-11-18 14:30 ` Vlastimil Babka [this message]
2022-11-21 12:01 ` Mel Gorman
2022-11-21 16:03 ` Mel Gorman
2022-11-22 9:11 ` Vlastimil Babka
2022-11-22 9:09 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f9acd363-bf53-c582-78ec-347fd7ec5c37@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=m.szyprowski@samsung.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mtosatti@redhat.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox