Re: [RFC PATCH v1 0/3] Userspace MFR Policy via memfd

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Harry Yoo <harry.yoo@oracle.com>
To: Jiaqi Yan <jiaqiyan@google.com>
Cc: "Miaohe Lin" <linmiaohe@huawei.com>,
	"“William Roche" <william.roche@oracle.com>,
	"Ackerley Tng" <ackerleytng@google.com>,
	jgg@nvidia.com, akpm@linux-foundation.org, ankita@nvidia.com,
	dave.hansen@linux.intel.com, david@redhat.com,
	duenwen@google.com, jane.chu@oracle.com, jthoughton@google.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, muchun.song@linux.dev,
	nao.horiguchi@gmail.com, osalvador@suse.de, peterx@redhat.com,
	rientjes@google.com, sidhartha.kumar@oracle.com,
	tony.luck@intel.com, wangkefeng.wang@huawei.com,
	willy@infradead.org, vbabka@suse.cz, surenb@google.com,
	mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org,
	ziy@nvidia.com
Subject: Re: [RFC PATCH v1 0/3] Userspace MFR Policy via memfd
Date: Thu, 6 Nov 2025 16:53:30 +0900	[thread overview]
Message-ID: <aQxSSjyPsI0MT8mp@harry> (raw)
In-Reply-To: <CACw3F503FG01yQyA53hHAo7q0yE3qQtMuT9kOjNHpp8Q9qHKPQ@mail.gmail.com>

On Mon, Nov 03, 2025 at 08:57:08AM -0800, Jiaqi Yan wrote:
> On Mon, Nov 3, 2025 at 12:53 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > On Mon, Nov 03, 2025 at 05:16:33PM +0900, Harry Yoo wrote:
> > > On Thu, Oct 30, 2025 at 10:28:48AM -0700, Jiaqi Yan wrote:
> > > > On Thu, Oct 30, 2025 at 4:51 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> > > > > On 2025/10/28 15:00, Harry Yoo wrote:
> > > > > > On Mon, Oct 27, 2025 at 09:17:31PM -0700, Jiaqi Yan wrote:
> > > > > >> On Wed, Oct 22, 2025 at 6:09 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > > > > >>> On Mon, Oct 13, 2025 at 03:14:32PM -0700, Jiaqi Yan wrote:
> > > > > >>>> On Fri, Sep 19, 2025 at 8:58 AM “William Roche <william.roche@oracle.com> wrote:
> > > > > >>> But even after fixing that we need to fix the race condition.
> > > > > >>
> > > > > >> What exactly is the race condition you are referring to?
> > > > > >
> > > > > > When you free a high-order page, the buddy allocator doesn't not check
> > > > > > PageHWPoison() on the page and its subpages. It checks PageHWPoison()
> > > > > > only when you free a base (order-0) page, see free_pages_prepare().
> > > > >
> > > > > I think we might could check PageHWPoison() for subpages as what free_page_is_bad()
> > > > > does. If any subpage has HWPoisoned flag set, simply drop the folio. Even we could
> > > >
> > > > Agree, I think as a starter I could try to, for example, let
> > > > free_pages_prepare scan HWPoison-ed subpages if the base page is high
> > > > order. In the optimal case, HugeTLB does move PageHWPoison flag from
> > > > head page to the raw error pages.
> > >
> > > [+Cc page allocator folks]
> > >
> > > AFAICT enabling page sanity check in page alloc/free path would be against
> > > past efforts to reduce sanity check overhead.
> > >
> > > [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net/ 
> > > [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net/ 
> > > [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz 
> > >
> > > I'd recommend to check hwpoison flag before freeing it to the buddy
> > > when we know a memory error has occurred (I guess that's also what Miaohe
> > > suggested).
> > >
> > > > > do it better -- Split the folio and let healthy subpages join the buddy while reject
> > > > > the hwpoisoned one.
> > > > >
> > > > > >
> > > > > > AFAICT there is nothing that prevents the poisoned page to be
> > > > > > allocated back to users because the buddy doesn't check PageHWPoison()
> > > > > > on allocation as well (by default).
> > > > > >
> > > > > > So rather than freeing the high-order page as-is in
> > > > > > dissolve_free_hugetlb_folio(), I think we have to split it to base pages
> > > > > > and then free them one by one.
> > > > >
> > > > > It might not be worth to do that as this would significantly increase the overhead
> > > > > of the function while memory failure event is really rare.
> > > >
> > > > IIUC, Harry's idea is to do the split in dissolve_free_hugetlb_folio
> > > > only if folio is HWPoison-ed, similar to what Miaohe suggested
> > > > earlier.
> > >
> > > Yes, and if we do the check before moving HWPoison flag to raw pages,
> > > it'll be just a single folio_test_hwpoison() call.
> > >
> > > > BTW, I believe this race condition already exists today when
> > > > memory_failure handles HWPoison-ed free hugetlb page; it is not
> > > > something introduced via this patchset. I will fix or improve this in
> > > > a separate patchset.
> > >
> > > That makes sense.
> >
> > Wait, without this patchset, do we even free the hugetlb folio when
> > its subpage is hwpoisoned? I don't think we do, but I'm not expert at MFR...
> 
> Based on my reading of try_memory_failure_hugetlb, me_huge_page, and
> __page_handle_poison, I think mainline kernel frees dissolved hugetlb
> folio to buddy allocator in two cases:
> 1. it was a free hugetlb page at the moment of try_memory_failure_hugetlb

Right.

> 2. it was an anonomous hugetlb page

Right.

Thanks. I think you're right that poisoned hugetlb folios can be freed
to the buddy even without this series (and poisoned pages allocated back to
users instead of being isolated due to missing PageHWPoison() checks on
alloc/free).

So the plan is to post RFC v2 of this series and the race condition fix
as a separate series, right? (that sounds good to me!)

I still think it'd be best to split the hugetlb folio to order-0 pages and
free them when we know the hugetlb folio is poisoned because:

- We don't have to implement a special version of __free_pages() that
  knows how to handle freeing of a high-order page where its one or more
  sub-pages are poisoned.

- We can avoid re-enabling page sanity checks (and introducing overhead)
  all the time.

-- 
Cheers,
Harry / Hyeonggon

next prev parent reply	other threads:[~2025-11-06  7:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-18 23:15 Jiaqi Yan
2025-01-18 23:15 ` [RFC PATCH v1 1/3] mm: memfd/hugetlb: introduce userspace memory failure recovery policy Jiaqi Yan
2025-01-18 23:15 ` [RFC PATCH v1 2/3] selftests/mm: test userspace MFR for HugeTLB 1G hugepage Jiaqi Yan
2025-01-18 23:15 ` [RFC PATCH v1 3/3] Documentation: add userspace MF recovery policy via memfd Jiaqi Yan
2025-01-20 17:26 ` [RFC PATCH v1 0/3] Userspace MFR Policy " Jason Gunthorpe
2025-01-21 21:45   ` Jiaqi Yan
2025-01-22 16:41 ` Zi Yan
2025-09-19 15:58 ` “William Roche
2025-10-13 22:14   ` Jiaqi Yan
2025-10-14 20:57     ` William Roche
2025-10-28  4:17       ` Jiaqi Yan
2025-10-22 13:09     ` Harry Yoo
2025-10-28  4:17       ` Jiaqi Yan
2025-10-28  7:00         ` Harry Yoo
2025-10-30 11:51           ` Miaohe Lin
2025-10-30 17:28             ` Jiaqi Yan
2025-10-30 21:28               ` Jiaqi Yan
2025-11-03  8:16               ` Harry Yoo
2025-11-03  8:53                 ` Harry Yoo
2025-11-03 16:57                   ` Jiaqi Yan
2025-11-04  3:44                     ` Miaohe Lin
2025-11-06  7:53                     ` Harry Yoo [this message]
2025-11-12  1:28                       ` Jiaqi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQxSSjyPsI0MT8mp@harry \
    --to=harry.yoo@oracle.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankita@nvidia.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duenwen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=jane.chu@oracle.com \
    --cc=jgg@nvidia.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=surenb@google.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=william.roche@oracle.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox