linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840
Date: Tue, 28 May 2024 15:57:58 +0200	[thread overview]
Message-ID: <162cb2a8-1b53-4e86-8d49-f4e09b3255a4@redhat.com> (raw)
In-Reply-To: <CABXGCsP3Yf2g6e7pSi71pbKpm+r1LdGyF5V7KaXbQjNyR9C_Rw@mail.gmail.com>

Am 28.05.24 um 08:05 schrieb Mikhail Gavrilov:
> On Thu, May 23, 2024 at 12:05 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
>>
>> On Thu, May 9, 2024 at 10:50 PM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> Do you have the other stracktrace as well?
>>>
>>> Maybe triggering memory reclaim (e.g., using "stress" or "memhog") could
>>> trigger it, that might be reasonable to trey. Once we have a reproducer
>>> we could at least bisect.
>>>
>>
>> The only known workload that causes this is updating a large
>> container. Unfortunately, not every container update reproduces the
>> problem.
> 
> Is it possible to add more debugging information to make it clearer
> what's going on?

If we knew who originally allocated that problematic page, that might help. 
Maybe page_owner could give some hints?

> 
> BUG: Bad page state in process kcompactd0  pfn:605811
> page: refcount:0 mapcount:0 mapping:0000000082d91e3e index:0x1045efc4f
> pfn:0x605811
> aops:btree_aops ino:1
> flags: 0x17ffffc600020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x1fffff)
> raw: 0017ffffc600020c dead000000000100 dead000000000122 ffff888159075220
> raw: 00000001045efc4f 0000000000000000 00000000ffffffff 0000000000000000
> page dumped because: non-NULL mapping

Seems to be an order-0 page, otherwise we would have another "head: ..." report.

It's not an anon/ksm/non-lru migration folio, because we clear the page->mapping 
field for them manually on the page freeing path. Likely it's a pagecache folio.

So one option is that something seems to not properly set folio->mapping to 
NULL. But that problem would then also show up without page migration? Hmm.

> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
> BIOS 2611 04/07/2024
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x84/0xd0
>   bad_page.cold+0xbe/0xe0
>   ? __pfx_bad_page+0x10/0x10
>   ? page_bad_reason+0x9d/0x1f0
>   free_unref_page+0x838/0x10e0
>   __folio_put+0x1ba/0x2b0
>   ? __pfx___folio_put+0x10/0x10
>   ? __pfx___might_resched+0x10/0x10

I suspect we come via 		
	migrate_pages_batch()->migrate_folio_unmap()->migrate_folio_done().

Maybe this is the "Folio was freed from under us. So we are done." path
when "folio_ref_count(src) == 1".

Alternatively, we might come via
	migrate_pages_batch()->migrate_folio_move()->migrate_folio_done().

For ordinary migration, move_to_new_folio() will clear src->mapping if
the folio was migrated successfully. That's the very first thing that 
migrate_folio_move() does, so I doubt that is the problem.

So I suspect we are in the migrate_folio_unmap() path. But for
a !anon folio, who should be freeing the folio concurrently (and not clearing 
folio->mapping?)? After all, we have to hold the folio lock while migrating.

In khugepaged:collapse_file() we manually set folio->mapping = NULL, before 
dropping the reference.

Something to try might be (to see if the problem goes away).

diff --git a/mm/migrate.c b/mm/migrate.c
index dd04f578c19c..45e92e14c904 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1124,6 +1124,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
                 /* Folio was freed from under us. So we are done. */
                 folio_clear_active(src);
                 folio_clear_unevictable(src);
+               /*
+                * Anonymous and movable src->mapping will be cleared by
+                * free_pages_prepare so don't reset it here for keeping
+                * the type to work PageAnon, for example.
+                */
+               if (!folio_mapping_flags(src))
+                       src->mapping = NULL;
                 /* free_pages_prepare() will clear PG_isolated. */
                 list_del(&src->lru);
                 migrate_folio_done(src, reason);

But it does feel weird: who freed the page concurrently and didn't clear 
folio->mapping ...

We don't hold the folio lock of src, though, but have the only reference. So
another possible thing might be folio refcount mis-counting: folio_ref_count() 
== 1 but there are other references (e.g., from the pagecache).


>   ? migrate_folio_done+0x1de/0x2b0
>   migrate_pages_batch+0xe73/0x2880
>   ? __pfx_compaction_alloc+0x10/0x10
>   ? __pfx_compaction_free+0x10/0x10
>   ? __pfx_migrate_pages_batch+0x10/0x10
>   ? trace_irq_enable.constprop.0+0xce/0x110
>   ? __pfx_remove_migration_pte+0x10/0x10
>   ? rcu_is_watching+0x12/0xc0
>   migrate_pages+0x194f/0x22f0
>   ? __pfx_compaction_alloc+0x10/0x10
>   ? __pfx_compaction_free+0x10/0x10
>   ? __pfx_migrate_pages+0x10/0x10
>   ? trace_irq_enable.constprop.0+0xce/0x110
>   ? rcu_is_watching+0x12/0xc0
>   ? isolate_migratepages_block+0x2b02/0x4560
>   ? __pfx_isolate_migratepages_block+0x10/0x10
>   ? __pfx___might_resched+0x10/0x10
>   compact_zone+0x1a7c/0x3860
>   ? rcu_is_watching+0x12/0xc0
>   ? __pfx___free_object+0x10/0x10
>   ? __pfx_compact_zone+0x10/0x10
>   ? rcu_is_watching+0x12/0xc0
>   ? lock_acquire+0x457/0x540
>   ? kcompactd+0x2fa/0xc70
>   ? rcu_is_watching+0x12/0xc0
>   compact_node+0x144/0x240
>   ? __pfx_compact_node+0x10/0x10
>   ? rcu_is_watching+0x12/0xc0
>   kcompactd+0x686/0xc70
>   ? __pfx_kcompactd+0x10/0x10
>   ? __pfx_autoremove_wake_function+0x10/0x10
>   ? __kthread_parkme+0xb1/0x1d0
>   ? __pfx_kcompactd+0x10/0x10
>   ? __pfx_kcompactd+0x10/0x10
>   kthread+0x2d2/0x3a0
>   ? _raw_spin_unlock_irq+0x28/0x60
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x31/0x70
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
> 

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2024-05-28 13:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18  9:55 Mikhail Gavrilov
2024-05-08 10:16 ` Mikhail Gavrilov
2024-05-08 17:45   ` David Hildenbrand
2024-05-09 11:59     ` Mikhail Gavrilov
2024-05-09 17:50       ` David Hildenbrand
2024-05-23  7:05         ` Mikhail Gavrilov
2024-05-28  6:05           ` Mikhail Gavrilov
2024-05-28 13:57             ` David Hildenbrand [this message]
2024-05-28 14:24               ` David Hildenbrand
2024-05-29  6:57                 ` David Hildenbrand
2024-05-29 19:00                   ` David Sterba
2024-05-29 22:37                   ` Qu Wenruo
2024-05-30  5:26                     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=162cb2a8-1b53-4e86-8d49-f4e09b3255a4@redhat.com \
    --to=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox