From: Hugh Dickins <hughd@google.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Suleiman Souhlal <suleiman@google.com>,
Matthew Wilcox <willy@infradead.org>,
Andrea Arcangeli <aarcange@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: madvise(MADV_REMOVE) deadlocks on shmem THP
Date: Wed, 13 Jan 2021 20:31:23 -0800 (PST) [thread overview]
Message-ID: <alpine.LSU.2.11.2101132000500.4777@eggly.anvils> (raw)
In-Reply-To: <X/+7dkbhNtAVV+wd@google.com>
On Thu, 14 Jan 2021, Sergey Senozhatsky wrote:
> Hi,
>
> We are running into lockups during the memory pressure tests on our
> boards, which essentially NMI panic them. In short the test case is
>
> - THP shmem
> echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled
>
> - And a user-space process doing madvise(MADV_HUGEPAGE) on new mappings,
> and madvise(MADV_REMOVE) when it wants to remove the page range
>
> The problem boils down to the reverse locking chain:
> kswapd does
>
> lock_page(page) -> down_read(page->mapping->i_mmap_rwsem)
>
> madvise() process does
>
> down_write(page->mapping->i_mmap_rwsem) -> lock_page(page)
>
>
>
> CPU0 CPU1
>
> kswapd vfs_fallocate()
> shrink_node() shmem_fallocate()
> shrink_active_list() unmap_mapping_range()
> page_referenced() << lock page:PG_locked >> unmap_mapping_pages() << down_write(mapping->i_mmap_rwsem) >>
> rmap_walk_file() zap_page_range_single()
> down_read(mapping->i_mmap_rwsem) << W-locked on CPU1>> unmap_page_range()
> rwsem_down_read_failed() __split_huge_pmd()
> __rwsem_down_read_failed_common() __lock_page() << PG_locked on CPU0 >>
> schedule() wait_on_page_bit_common()
> io_schedule()
Very interesting, Sergey: many thanks for this report.
There is no doubt that kswapd is right in its lock ordering:
__split_huge_pmd() is in the wrong to be attempting lock_page().
Which used not to be done, but was added in 5.8's c444eb564fb1 ("mm:
thp: make the THP mapcount atomic against __split_huge_pmd_locked()").
Which explains why this deadlock was not seen years ago: that
surprised me at first, since the case you show to reproduce it is good,
but I'd expect more common ways in which that deadlock could show up.
And your report is remarkably timely too: I have two other reasons
for looking at that change at the moment (I'm currently catching up
with recent discussion of page_count versus mapcount when deciding
COW page reuse).
I won't say more tonight, but should have more to add tomorrow.
Hugh
next prev parent reply other threads:[~2021-01-14 4:31 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-14 3:33 Sergey Senozhatsky
2021-01-14 4:31 ` Hugh Dickins [this message]
2021-01-14 5:38 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.2101132000500.4777@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=suleiman@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox