From: Johannes Weiner <hannes@cmpxchg.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>, Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Miaohe Lin <linmiaohe@huawei.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene
Date: Wed, 20 Sep 2023 09:48:11 -0400 [thread overview]
Message-ID: <20230920134811.GB124289@cmpxchg.org> (raw)
In-Reply-To: <bc4bd049-1e29-0f23-cca6-493abb5e774f@suse.cz>
On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote:
> On 9/20/23 03:38, Zi Yan wrote:
> > On 19 Sep 2023, at 20:32, Mike Kravetz wrote:
> >
> >> On 09/19/23 16:57, Zi Yan wrote:
> >>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote:
> >>>
> >>>> --- a/mm/page_alloc.c
> >>>> +++ b/mm/page_alloc.c
> >>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
> >>>> end = pageblock_end_pfn(pfn) - 1;
> >>>>
> >>>> /* Do not cross zone boundaries */
> >>>> +#if 0
> >>>> if (!zone_spans_pfn(zone, start))
> >>>> start = zone->zone_start_pfn;
> >>>> +#else
> >>>> + if (!zone_spans_pfn(zone, start))
> >>>> + start = pfn;
> >>>> +#endif
> >>>> if (!zone_spans_pfn(zone, end))
> >>>> return false;
> >>>> I can still trigger warnings.
> >>>
> >>> OK. One thing to note is that the page type in the warning changed from
> >>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change.
> >>>
> >>
> >> Just to be really clear,
> >> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path.
> >> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call
> >> path WITHOUT your change.
> >>
> >> I am guessing the difference here has more to do with the allocation path?
> >>
> >> I went back and reran focusing on the specific migrate type.
> >> Without your patch, and coming from the alloc_contig_range call path,
> >> I got two warnings of 'page type is 0, passed migratetype is 1' as above.
> >> With your patch I got one 'page type is 0, passed migratetype is 1'
> >> warning and one 'page type is 1, passed migratetype is 0' warning.
> >>
> >> I could be wrong, but I do not think your patch changes things.
> >
> > Got it. Thanks for the clarification.
> >>
> >>>>
> >>>> One idea about recreating the issue is that it may have to do with size
> >>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried
> >>>> to really stress the allocations by increasing the number of hugetlb
> >>>> pages requested and that did not help. I also noticed that I only seem
> >>>> to get two warnings and then they stop, even if I continue to run the
> >>>> script.
> >>>>
> >>>> Zi asked about my config, so it is attached.
> >>>
> >>> With your config, I still have no luck reproducing the issue. I will keep
> >>> trying. Thanks.
> >>>
> >>
> >> Perhaps try running both scripts in parallel?
> >
> > Yes. It seems to do the trick.
> >
> >> Adjust the number of hugetlb pages allocated to equal 25% of memory?
> >
> > I am able to reproduce it with the script below:
> >
> > while true; do
> > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages&
> > echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages&
> > wait
> > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> > done
> >
> > I will look into the issue.
Nice!
I managed to reproduce it ONCE, triggering it not even a second after
starting the script. But I can't seem to do it twice, even after
several reboots and letting it run for minutes.
> With migratetypes 0 and 1 and somewhat harder to reproduce scenario (= less
> deterministic, more racy) it's possible we now see what I suspected can
> happen here:
> https://lore.kernel.org/all/37dbd4d0-c125-6694-dec4-6322ae5b6dee@suse.cz/
> In that there are places reading the migratetype outside of zone lock.
Good point!
I had already written up a fix for this issue. Still trying to get the
reproducer to work, but attaching the fix below in case somebody with
a working environment beats me to it.
---
From 94f67bfa29a602a66014d079431b224cacbf79e9 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Fri, 15 Sep 2023 16:23:38 -0400
Subject: [PATCH] mm: page_alloc: close migratetype race between freeing and
stealing
There are three freeing paths that read the page's migratetype
optimistically before grabbing the zone lock. When this races with
block stealing, those pages go on the wrong freelist.
The paths in question are:
- when freeing >costly orders that aren't THP
- when freeing pages to the buddy upon pcp lock contention
- when freeing pages that are isolated
- when freeing pages initially during boot
- when freeing the remainder in alloc_pages_exact()
- when "accepting" unaccepted VM host memory before first use
- when freeing pages during unpoisoning
None of these are so hot that they would need this optimization at the
cost of hampering defrag efforts. Especially when contrasted with the
fact that the most common buddy freeing path - free_pcppages_bulk - is
checking the migratetype under the zone->lock just fine.
In addition, isolated pages need to look up the migratetype under the
lock anyway, which adds branches to the locked section, and results in
a double lookup when the pages are in fact isolated.
Move the lookups into the lock.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/page_alloc.c | 47 +++++++++++++++++------------------------------
1 file changed, 17 insertions(+), 30 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0ca999d24a00..d902a8aaa3fd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1222,18 +1222,15 @@ static void free_pcppages_bulk(struct zone *zone, int count,
spin_unlock_irqrestore(&zone->lock, flags);
}
-static void free_one_page(struct zone *zone,
- struct page *page, unsigned long pfn,
- unsigned int order,
- int migratetype, fpi_t fpi_flags)
+static void free_one_page(struct zone *zone, struct page *page,
+ unsigned long pfn, unsigned int order,
+ fpi_t fpi_flags)
{
unsigned long flags;
+ int migratetype;
spin_lock_irqsave(&zone->lock, flags);
- if (unlikely(has_isolate_pageblock(zone) ||
- is_migrate_isolate(migratetype))) {
- migratetype = get_pfnblock_migratetype(page, pfn);
- }
+ migratetype = get_pfnblock_migratetype(page, pfn);
__free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
spin_unlock_irqrestore(&zone->lock, flags);
}
@@ -1249,18 +1246,8 @@ static void __free_pages_ok(struct page *page, unsigned int order,
if (!free_pages_prepare(page, order, fpi_flags))
return;
- /*
- * Calling get_pfnblock_migratetype() without spin_lock_irqsave() here
- * is used to avoid calling get_pfnblock_migratetype() under the lock.
- * This will reduce the lock holding time.
- */
- migratetype = get_pfnblock_migratetype(page, pfn);
-
spin_lock_irqsave(&zone->lock, flags);
- if (unlikely(has_isolate_pageblock(zone) ||
- is_migrate_isolate(migratetype))) {
- migratetype = get_pfnblock_migratetype(page, pfn);
- }
+ migratetype = get_pfnblock_migratetype(page, pfn);
__free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
spin_unlock_irqrestore(&zone->lock, flags);
@@ -2404,7 +2391,7 @@ void free_unref_page(struct page *page, unsigned int order)
struct per_cpu_pages *pcp;
struct zone *zone;
unsigned long pfn = page_to_pfn(page);
- int migratetype, pcpmigratetype;
+ int migratetype;
if (!free_pages_prepare(page, order, FPI_NONE))
return;
@@ -2416,23 +2403,23 @@ void free_unref_page(struct page *page, unsigned int order)
* get those areas back if necessary. Otherwise, we may have to free
* excessively into the page allocator
*/
- migratetype = pcpmigratetype = get_pfnblock_migratetype(page, pfn);
+ migratetype = get_pfnblock_migratetype(page, pfn);
if (unlikely(migratetype >= MIGRATE_PCPTYPES)) {
if (unlikely(is_migrate_isolate(migratetype))) {
- free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
+ free_one_page(page_zone(page), page, pfn, order, FPI_NONE);
return;
}
- pcpmigratetype = MIGRATE_MOVABLE;
+ migratetype = MIGRATE_MOVABLE;
}
zone = page_zone(page);
pcp_trylock_prepare(UP_flags);
pcp = pcp_spin_trylock(zone->per_cpu_pageset);
if (pcp) {
- free_unref_page_commit(zone, pcp, page, pcpmigratetype, order);
+ free_unref_page_commit(zone, pcp, page, migratetype, order);
pcp_spin_unlock(pcp);
} else {
- free_one_page(zone, page, pfn, order, migratetype, FPI_NONE);
+ free_one_page(zone, page, pfn, order, FPI_NONE);
}
pcp_trylock_finish(UP_flags);
}
@@ -2465,7 +2452,7 @@ void free_unref_page_list(struct list_head *list)
migratetype = get_pfnblock_migratetype(page, pfn);
if (unlikely(is_migrate_isolate(migratetype))) {
list_del(&page->lru);
- free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
+ free_one_page(page_zone(page), page, pfn, 0, FPI_NONE);
continue;
}
}
@@ -2498,8 +2485,7 @@ void free_unref_page_list(struct list_head *list)
pcp = pcp_spin_trylock(zone->per_cpu_pageset);
if (unlikely(!pcp)) {
pcp_trylock_finish(UP_flags);
- free_one_page(zone, page, pfn,
- 0, migratetype, FPI_NONE);
+ free_one_page(zone, page, pfn, 0, FPI_NONE);
locked_zone = NULL;
continue;
}
@@ -6537,13 +6523,14 @@ bool take_page_off_buddy(struct page *page)
bool put_page_back_buddy(struct page *page)
{
struct zone *zone = page_zone(page);
- unsigned long pfn = page_to_pfn(page);
unsigned long flags;
- int migratetype = get_pfnblock_migratetype(page, pfn);
bool ret = false;
spin_lock_irqsave(&zone->lock, flags);
if (put_page_testzero(page)) {
+ unsigned long pfn = page_to_pfn(page);
+ int migratetype = get_pfnblock_migratetype(page, pfn);
+
ClearPageHWPoisonTakenOff(page);
__free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
if (TestClearPageHWPoison(page)) {
--
2.42.0
next prev parent reply other threads:[~2023-09-20 13:48 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 19:41 Johannes Weiner
2023-09-11 19:41 ` [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2023-09-11 19:59 ` Zi Yan
2023-09-11 21:09 ` Andrew Morton
2023-09-12 13:47 ` Vlastimil Babka
2023-09-12 14:50 ` Johannes Weiner
2023-09-13 9:33 ` Vlastimil Babka
2023-09-13 13:24 ` Johannes Weiner
2023-09-13 13:34 ` Vlastimil Babka
2023-09-12 15:03 ` Johannes Weiner
2023-09-14 7:29 ` Vlastimil Babka
2023-09-14 9:56 ` Mel Gorman
2023-09-27 5:42 ` Huang, Ying
2023-09-27 14:51 ` Johannes Weiner
2023-09-30 4:26 ` Huang, Ying
2023-10-02 14:58 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 2/6] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2023-09-11 20:01 ` Zi Yan
2023-09-13 9:52 ` Vlastimil Babka
2023-09-14 10:00 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 3/6] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2023-09-11 20:17 ` Zi Yan
2023-09-11 20:47 ` Johannes Weiner
2023-09-11 20:50 ` Zi Yan
2023-09-13 14:31 ` Vlastimil Babka
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 4/6] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2023-09-11 20:23 ` Zi Yan
2023-09-13 14:40 ` Vlastimil Babka
2023-09-14 13:37 ` Johannes Weiner
2023-09-14 10:03 ` Mel Gorman
2023-09-11 19:41 ` [PATCH 5/6] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2023-09-13 19:52 ` Vlastimil Babka
2023-09-14 14:47 ` Johannes Weiner
2023-09-11 19:41 ` [PATCH 6/6] mm: page_alloc: consolidate free page accounting Johannes Weiner
2023-09-13 20:18 ` Vlastimil Babka
2023-09-14 4:11 ` Johannes Weiner
2023-09-14 23:52 ` [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Mike Kravetz
2023-09-15 14:16 ` Johannes Weiner
2023-09-15 15:05 ` Mike Kravetz
2023-09-16 19:57 ` Mike Kravetz
2023-09-16 20:13 ` Andrew Morton
2023-09-18 7:16 ` Vlastimil Babka
2023-09-18 14:52 ` Johannes Weiner
2023-09-18 17:40 ` Mike Kravetz
2023-09-19 6:49 ` Johannes Weiner
2023-09-19 12:37 ` Zi Yan
2023-09-19 15:22 ` Zi Yan
2023-09-19 18:47 ` Mike Kravetz
2023-09-19 20:57 ` Zi Yan
2023-09-20 0:32 ` Mike Kravetz
2023-09-20 1:38 ` Zi Yan
2023-09-20 6:07 ` Vlastimil Babka
2023-09-20 13:48 ` Johannes Weiner [this message]
2023-09-20 16:04 ` Johannes Weiner
2023-09-20 17:23 ` Zi Yan
2023-09-21 2:31 ` Zi Yan
2023-09-21 10:19 ` David Hildenbrand
2023-09-21 14:47 ` Zi Yan
2023-09-25 21:12 ` Zi Yan
2023-09-26 17:39 ` Johannes Weiner
2023-09-28 2:51 ` Zi Yan
2023-10-03 2:26 ` Zi Yan
2023-10-10 21:12 ` Johannes Weiner
2023-10-11 15:25 ` Johannes Weiner
2023-10-11 15:45 ` Johannes Weiner
2023-10-11 15:57 ` Zi Yan
2023-10-13 0:06 ` Zi Yan
2023-10-13 14:51 ` Zi Yan
2023-10-16 13:35 ` Zi Yan
2023-10-16 14:37 ` Johannes Weiner
2023-10-16 15:00 ` Zi Yan
2023-10-16 18:51 ` Johannes Weiner
2023-10-16 19:49 ` Zi Yan
2023-10-16 20:26 ` Johannes Weiner
2023-10-16 20:39 ` Johannes Weiner
2023-10-16 20:48 ` Zi Yan
2023-09-26 18:19 ` David Hildenbrand
2023-09-28 3:22 ` Zi Yan
2023-10-02 11:43 ` David Hildenbrand
2023-10-03 2:35 ` Zi Yan
2023-09-18 7:07 ` Vlastimil Babka
2023-09-18 14:09 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230920134811.GB124289@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mike.kravetz@oracle.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox