From: David Hildenbrand <david@redhat.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>,
Matthew Wilcox <willy@infradead.org>
Cc: Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840
Date: Tue, 28 May 2024 16:24:05 +0200 [thread overview]
Message-ID: <209ff705-fe6e-4d6d-9d08-201afba7d74b@redhat.com> (raw)
In-Reply-To: <162cb2a8-1b53-4e86-8d49-f4e09b3255a4@redhat.com>
Am 28.05.24 um 15:57 schrieb David Hildenbrand:
> Am 28.05.24 um 08:05 schrieb Mikhail Gavrilov:
>> On Thu, May 23, 2024 at 12:05 PM Mikhail Gavrilov
>> <mikhail.v.gavrilov@gmail.com> wrote:
>>>
>>> On Thu, May 9, 2024 at 10:50 PM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> The only known workload that causes this is updating a large
>>> container. Unfortunately, not every container update reproduces the
>>> problem.
>>
>> Is it possible to add more debugging information to make it clearer
>> what's going on?
>
> If we knew who originally allocated that problematic page, that might help.
> Maybe page_owner could give some hints?
>
>>
>> BUG: Bad page state in process kcompactd0 pfn:605811
>> page: refcount:0 mapcount:0 mapping:0000000082d91e3e index:0x1045efc4f
>> pfn:0x605811
>> aops:btree_aops ino:1
>> flags:
>> 0x17ffffc600020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x1fffff)
>> raw: 0017ffffc600020c dead000000000100 dead000000000122 ffff888159075220
>> raw: 00000001045efc4f 0000000000000000 00000000ffffffff 0000000000000000
>> page dumped because: non-NULL mapping
>
> Seems to be an order-0 page, otherwise we would have another "head: ..." report.
>
> It's not an anon/ksm/non-lru migration folio, because we clear the page->mapping
> field for them manually on the page freeing path. Likely it's a pagecache folio.
>
> So one option is that something seems to not properly set folio->mapping to
> NULL. But that problem would then also show up without page migration? Hmm.
>
>> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
>> BIOS 2611 04/07/2024
>> Call Trace:
>> <TASK>
>> dump_stack_lvl+0x84/0xd0
>> bad_page.cold+0xbe/0xe0
>> ? __pfx_bad_page+0x10/0x10
>> ? page_bad_reason+0x9d/0x1f0
>> free_unref_page+0x838/0x10e0
>> __folio_put+0x1ba/0x2b0
>> ? __pfx___folio_put+0x10/0x10
>> ? __pfx___might_resched+0x10/0x10
>
> I suspect we come via
> migrate_pages_batch()->migrate_folio_unmap()->migrate_folio_done().
>
> Maybe this is the "Folio was freed from under us. So we are done." path
> when "folio_ref_count(src) == 1".
>
> Alternatively, we might come via
> migrate_pages_batch()->migrate_folio_move()->migrate_folio_done().
>
> For ordinary migration, move_to_new_folio() will clear src->mapping if
> the folio was migrated successfully. That's the very first thing that
> migrate_folio_move() does, so I doubt that is the problem.
>
> So I suspect we are in the migrate_folio_unmap() path. But for
> a !anon folio, who should be freeing the folio concurrently (and not clearing
> folio->mapping?)? After all, we have to hold the folio lock while migrating.
>
> In khugepaged:collapse_file() we manually set folio->mapping = NULL, before
> dropping the reference.
>
> Something to try might be (to see if the problem goes away).
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index dd04f578c19c..45e92e14c904 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1124,6 +1124,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> /* Folio was freed from under us. So we are done. */
> folio_clear_active(src);
> folio_clear_unevictable(src);
> + /*
> + * Anonymous and movable src->mapping will be cleared by
> + * free_pages_prepare so don't reset it here for keeping
> + * the type to work PageAnon, for example.
> + */
> + if (!folio_mapping_flags(src))
> + src->mapping = NULL;
> /* free_pages_prepare() will clear PG_isolated. */
> list_del(&src->lru);
> migrate_folio_done(src, reason);
>
> But it does feel weird: who freed the page concurrently and didn't clear
> folio->mapping ...
>
> We don't hold the folio lock of src, though, but have the only reference. So
> another possible thing might be folio refcount mis-counting: folio_ref_count()
> == 1 but there are other references (e.g., from the pagecache).
Hmm, your original report mentions kswapd, so I'm getting the feeling someone
does one folio_put() too much and we are freeing a pageache folio that is still
in the pageache and, therefore, has folio->mapping set ... bisecting would
really help.
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2024-05-28 14:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-18 9:55 Mikhail Gavrilov
2024-05-08 10:16 ` Mikhail Gavrilov
2024-05-08 17:45 ` David Hildenbrand
2024-05-09 11:59 ` Mikhail Gavrilov
2024-05-09 17:50 ` David Hildenbrand
2024-05-23 7:05 ` Mikhail Gavrilov
2024-05-28 6:05 ` Mikhail Gavrilov
2024-05-28 13:57 ` David Hildenbrand
2024-05-28 14:24 ` David Hildenbrand [this message]
2024-05-29 6:57 ` David Hildenbrand
2024-05-29 19:00 ` David Sterba
2024-05-29 22:37 ` Qu Wenruo
2024-05-30 5:26 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=209ff705-fe6e-4d6d-9d08-201afba7d74b@redhat.com \
--to=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mikhail.v.gavrilov@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox