From: Mike Kravetz <mike.kravetz@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
akpm@linux-foundation.org, songmuchun@bytedance.com,
almasrymina@google.com, linmiaohe@huawei.com,
minhquangbui99@gmail.com, aneesh.kumar@linux.ibm.com
Subject: Re: [PATCH v2 5/9] mm/hugetlb: convert isolate_or_dissolve_huge_page to folios
Date: Mon, 12 Jun 2023 16:34:51 -0700 [thread overview]
Message-ID: <20230612233451.GF3704@monkey> (raw)
In-Reply-To: <ZIdYzZGSUzYumrCT@casper.infradead.org>
On 06/12/23 18:41, Matthew Wilcox wrote:
> On Tue, Nov 01, 2022 at 03:30:55PM -0700, Sidhartha Kumar wrote:
> > +++ b/mm/hugetlb.c
> > @@ -2815,7 +2815,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
> > int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
> > {
> > struct hstate *h;
> > - struct page *head;
> > + struct folio *folio = page_folio(page);
>
> Is this safe? I was reviewing a different patch today, and I spotted
> this. With THP, we can relatively easily hit this case:
>
> struct page points to a page with pfn 0x40305, in a folio of order 2.
> We call page_folio() on it and the resulting pointer is for the folio
> with pfn 0x40304.
> If we don't have our own refcount (or some other protection ...) against
> freeing, the folio can now be freed and reallocated. Say it's now part
> of an order-3 folio.
> Our 'folio' pointer is now actually a pointer to a tail page, and we
> have various assertions that a folio pointer doesn't point to a tail
> page, so they trigger.
>
> It seems to me that this ...
>
> /*
> * The page might have been dissolved from under our feet, so make sure
> * to carefully check the state under the lock.
> * Return success when racing as if we dissolved the page ourselves.
> */
> spin_lock_irq(&hugetlb_lock);
> if (folio_test_hugetlb(folio)) {
> h = folio_hstate(folio);
> } else {
> spin_unlock_irq(&hugetlb_lock);
> return 0;
> }
>
> implies that we don't have our own reference on the folio, so we might
> find a situation where the folio pointer we have is no longer a folio
> pointer.
Your analysis is correct.
This is not safe because we hold no locks or references. The folio
pointer obtained via page_folio(page) may not be valid when calling
folio_test_hugetlb(folio) and later.
My bad for the Reviewed-by: :(
>
> Maybe the page_folio() call should be moved inside the hugetlb_lock
> protection? Is that enough? I don't know enough about how hugetlb
> pages are split, freed & allocated to know what's going on.
>
> But then we _drop_ the lock, and keep referring to ...
>
> > @@ -2841,10 +2840,10 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
> > if (hstate_is_gigantic(h))
> > return -ENOMEM;
> >
> > - if (page_count(head) && !isolate_hugetlb(head, list))
> > + if (folio_ref_count(folio) && !isolate_hugetlb(&folio->page, list))
> > ret = 0;
> > - else if (!page_count(head))
> > - ret = alloc_and_dissolve_huge_page(h, head, list);
> > + else if (!folio_ref_count(folio))
> > + ret = alloc_and_dissolve_huge_page(h, &folio->page, list);
The above was OK when using struct page instead of folio. The 'racy'
part was getting the ref count on the head page. It was OK because this
was only a check to see if we should TRY to isolate or dissolve. The
code to actually isolate or dissolve would take the appropriate locks.
I'm afraid the code is now making even more use of a potentially invalid
folio. Here is how the above now looks in v6.3:
spin_unlock_irq(&hugetlb_lock);
/*
* Fence off gigantic pages as there is a cyclic dependency between
* alloc_contig_range and them. Return -ENOMEM as this has the effect
* of bailing out right away without further retrying.
*/
if (hstate_is_gigantic(h))
return -ENOMEM;
if (folio_ref_count(folio) && isolate_hugetlb(folio, list))
ret = 0;
else if (!folio_ref_count(folio))
ret = alloc_and_dissolve_hugetlb_folio(h, folio, list);
Looks like that potentially invalid folio is being passed to other
routines. Previous code would take lock and revalidate that struct page
was still a hugetlb page. We can not do the same with a folio.
--
Mike Kravetz
next prev parent reply other threads:[~2023-06-12 23:35 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-01 22:30 [PATCH v2 0/9] convert hugetlb_cgroup helper functions " Sidhartha Kumar
2022-11-01 22:30 ` [PATCH v2 1/9] mm/hugetlb_cgroup: convert __set_hugetlb_cgroup() " Sidhartha Kumar
2022-11-02 3:05 ` Muchun Song
2022-11-01 22:30 ` [PATCH v2 2/9] mm/hugetlb_cgroup: convert hugetlb_cgroup_from_page() " Sidhartha Kumar
2022-11-02 6:25 ` Muchun Song
2022-11-01 22:30 ` [PATCH v2 3/9] mm/hugetlb_cgroup: convert set_hugetlb_cgroup*() " Sidhartha Kumar
2022-11-02 6:45 ` Muchun Song
2022-11-10 0:20 ` Sidhartha Kumar
2022-11-10 7:34 ` Muchun Song
2022-11-10 19:08 ` Mike Kravetz
2022-11-01 22:30 ` [PATCH v2 4/9] mm/hugetlb_cgroup: convert hugetlb_cgroup_migrate " Sidhartha Kumar
2022-11-02 6:47 ` Muchun Song
2022-11-01 22:30 ` [PATCH v2 5/9] mm/hugetlb: convert isolate_or_dissolve_huge_page " Sidhartha Kumar
2022-11-02 6:48 ` Muchun Song
2023-06-12 17:41 ` Matthew Wilcox
2023-06-12 18:45 ` Sidhartha Kumar
2023-06-12 23:34 ` Mike Kravetz [this message]
2023-06-13 23:29 ` Mike Kravetz
2022-11-01 22:30 ` [PATCH v2 6/9] mm/hugetlb: convert free_huge_page " Sidhartha Kumar
2022-11-02 6:53 ` Muchun Song
2022-11-01 22:30 ` [PATCH v2 7/9] mm/hugetlb_cgroup: convert hugetlb_cgroup_uncharge_page() " Sidhartha Kumar
2022-11-02 6:56 ` Muchun Song
2022-11-01 22:30 ` [PATCH v2 8/9] mm/hugeltb_cgroup: convert hugetlb_cgroup_commit_charge*() " Sidhartha Kumar
2022-11-02 6:57 ` Muchun Song
2022-11-01 22:30 ` [PATCH v2 9/9] mm/hugetlb: convert move_hugetlb_state() " Sidhartha Kumar
2022-11-02 7:01 ` Muchun Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230612233451.GF3704@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minhquangbui99@gmail.com \
--cc=sidhartha.kumar@oracle.com \
--cc=songmuchun@bytedance.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox