From: Jiaqi Yan <jiaqiyan@google.com>
To: jane.chu@oracle.com
Cc: David Hildenbrand <david@redhat.com>,
nao.horiguchi@gmail.com, linmiaohe@huawei.com,
sidhartha.kumar@oracle.com, muchun.song@linux.dev,
akpm@linux-foundation.org, osalvador@suse.de,
rientjes@google.com, jthoughton@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/2] How HugeTLB handle HWPoison page at truncation
Date: Mon, 20 Jan 2025 21:08:06 -0800 [thread overview]
Message-ID: <CACw3F52E4DvXtmpxU7WH_2vt+OeO7qbXZCWpJmgbH-mnCckVjA@mail.gmail.com> (raw)
In-Reply-To: <673b0353-ad8c-471b-8670-25d9f06d232b@oracle.com>
On Mon, Jan 20, 2025 at 9:01 PM <jane.chu@oracle.com> wrote:
>
>
> On 1/20/2025 5:21 PM, Jiaqi Yan wrote:
> > On Mon, Jan 20, 2025 at 2:59 AM David Hildenbrand <david@redhat.com> wrote:
> >> On 19.01.25 19:06, Jiaqi Yan wrote:
> >>> While I was working on userspace MFR via memfd [1], I spend some time to
> >>> understand what current kernel does when a HugeTLB-backing memfd is
> >>> truncated. My expectation is, if there is a HWPoison HugeTLB folio
> >>> mapped via the memfd to userspace, it will be unmapped right away but
> >>> still be kept in page cache [2]; however when the memfd is truncated to
> >>> zero or after the memfd is closed, kernel should dissolve the HWPoison
> >>> folio in the page cache, and free only the clean raw pages to buddy
> >>> allocator, excluding the poisoned raw page.
> >>>
> >>> So I wrote a hugetlb-mfr-base.c selftest and expect
> >>> 0. say nr_hugepages initially is 64 as system configuration.
> >>> 1. after MADV_HWPOISON, nr_hugepages should still be 64 as we kept even
> >>> HWPoison huge folio in page cache. free_hugepages should be
> >>> nr_hugepages minus whatever the amount in use.
> >>> 2. after truncated memfd to zero, nr_hugepages should reduced to 63 as
> >>> kernel dissolved and freed the HWPoison huge folio. free_hugepages
> >>> should also be 63.
> >>>
> >>> However, when testing at the head of mm-stable commit 2877a83e4a0a
> >>> ("mm/hugetlb: use folio->lru int demote_free_hugetlb_folios()"), I found
> >>> although free_hugepages is reduced to 63, nr_hugepages is not reduced
> >>> and stay at 64.
> >>>
> >>> Is my expectation outdated? Or is this some kind of bug?
> >>>
> >>> I assume this is a bug and then digged a little bit more. It seems there
> >>> are two issues, or two things I don't really understand.
> >>>
> >>> 1. During try_memory_failure_hugetlb, we should increased the target
> >>> in-use folio's refcount via get_hwpoison_hugetlb_folio. However,
> >>> until the end of try_memory_failure_hugetlb, this refcout is not put.
> >>> I can make sense of this given we keep in-use huge folio in page
> >>> cache.
> >> Isn't the general rule that hwpoisoned folios have a raised refcount
> >> such that they won't get freed + reused? At least that's how the buddy
> >> deals with them, and I suspect also hugetlb?
> > Thanks, David.
> >
> > I see, so it is expected that the _entire_ huge folio will always have
> > at least a refcount of 1, even when the folio can become "free".
> >
> > For *free* huge folio, try_memory_failure_hugetlb dissolves it and
> > frees the clean pages (a lot) to the buddy allocator. This made me
> > think the same thing will happen for *in-use* huge folio _eventually_
> > (i.e. somehow the refcount due to HWPoison can be put). I feel this is
> > a little bit unfortunate for the clean pages, but if it is what it is,
> > that's fair as it is not a bug.
>
> Agreed with David. For *in use* hugetlb pages, including unused shmget
> pages, hugetlb shouldn't dissvolve the page, not until an explicit freeing action is taken like
> RMID and echo 0 > nr_hugepages.
To clarify myself, I am not asking memory-failure.c to dissolve the
hugepage at the time it is in-use, but rather when it becomes free
(truncated or process exited).
>
> -jane
>
> >
> >>> [ 1069.320976] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2780000
> >>> [ 1069.320978] head: order:18 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> >>> [ 1069.320980] flags: 0x400000000100044(referenced|head|hwpoison|node=0|zone=1)
> >>> [ 1069.320982] page_type: f4(hugetlb)
> >>> [ 1069.320984] raw: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000
> >>> [ 1069.320985] raw: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000
> >>> [ 1069.320987] head: 0400000000100044 ffffffff8760bbc8 ffffffff8760bbc8 0000000000000000
> >>> [ 1069.320988] head: 0000000000000000 0000000000000000 00000001f4000000 0000000000000000
> >>> [ 1069.320990] head: 0400000000000012 ffffdd53de000001 ffffffffffffffff 0000000000000000
> >>> [ 1069.320991] head: 0000000000040000 0000000000000000 00000000ffffffff 0000000000000000
> >>> [ 1069.320992] page dumped because: track hwpoison folio's ref
> >>>
> >>> 2. Even if folio's refcount do drop to zero and we get into
> >>> free_huge_folio, it is not clear to me which part of free_huge_folio
> >>> is handling the case that folio is HWPoison. In my test what I
> >>> observed is that evantually the folio is enqueue_hugetlb_folio()-ed.
> >> How would we get a refcount of 0 if we assume the raised refcount on a
> >> hwpoisoned hugetlb folio?
> >>
> >> I'm probably missing something: are you saying that you can trigger a
> >> hwpoisoned hugetlb folio to get reallocated again, in upstream code?
> > No, I think it is just my misunderstanding. From what you said, the
> > expectation of HWPoison hugetlb folio is just it won't get reallocated
> > again, which is true.
> >
> > My (wrong) expectation is, in addition to the "won't reallocated
> > again" part, some (large) portion of the huge folio will be freed to
> > the buddy allocator. On the other hand, is it something worth having /
> > improving? (1G - some_single_digit * 4KB) seems to be valuable to the
> > system, though they are all 4K. #1 and #2 above are then what needs to
> > be done if the improvement is worth chasing.
> >
> >>
> >> --
> >> Cheers,
> >>
> >> David / dhildenb
> >>
next prev parent reply other threads:[~2025-01-21 5:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-19 18:06 Jiaqi Yan
2025-01-19 18:06 ` [RFC PATCH v1 1/2] selftest/mm: test HWPoison hugetlb truncation behavior Jiaqi Yan
2025-01-19 20:18 ` Pedro Falcato
2025-01-19 18:06 ` [RFC PATCH v1 2/2] mm/hugetlb: immature fix to handle HWPoisoned folio Jiaqi Yan
2025-01-20 10:59 ` [RFC PATCH v1 0/2] How HugeTLB handle HWPoison page at truncation David Hildenbrand
2025-01-21 1:21 ` Jiaqi Yan
2025-01-21 5:00 ` jane.chu
2025-01-21 5:08 ` Jiaqi Yan [this message]
2025-01-21 5:22 ` jane.chu
2025-01-21 8:02 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACw3F52E4DvXtmpxU7WH_2vt+OeO7qbXZCWpJmgbH-mnCckVjA@mail.gmail.com \
--to=jiaqiyan@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jane.chu@oracle.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
--cc=sidhartha.kumar@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox