From: Mike Kravetz <mike.kravetz@oracle.com>
To: Tarun Sahu <tsahu@linux.ibm.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
muchun.song@linux.dev, aneesh.kumar@linux.ibm.com,
willy@infradead.org, sidhartha.kumar@oracle.com,
gerald.schaefer@linux.ibm.com, linux-kernel@vger.kernel.org,
jaypatel@linux.ibm.com
Subject: Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
Date: Thu, 8 Jun 2023 16:52:34 -0700 [thread overview]
Message-ID: <20230608235234.GC88798@monkey> (raw)
In-Reply-To: <873532jmot.fsf@linux.ibm.com>
On 06/08/23 15:33, Tarun Sahu wrote:
> Hi Mike,
>
> Please find my comments inline.
>
> Mike Kravetz <mike.kravetz@oracle.com> writes:
>
> > On 06/06/23 10:32, Tarun Sahu wrote:
> >>
> >> Hi Mike,
> >>
> >> Thanks for your inputs.
> >> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
> >
> > Hi Tarun,
> >
> > Just a few more comments/questions.
> >
> > On 05/15/23 22:38, Tarun Sahu wrote:
> >> folio_set_order(folio, 0) is used in kernel at two places
> >> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> >> Currently, It is called to clear out the folio->_folio_nr_pages and
> >> folio->_folio_order.
> >>
> >> For __destroy_compound_gigantic_folio:
> >> In past, folio_set_order(folio, 0) was needed because page->mapping used
> >> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> >> left uncleared during freeing gigantic hugepages, they were causing
> >> "BUG: bad page state" due to non-zero page->mapping. Now, After
> >> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> >> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> >> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
> >
> > I believe the same logic/reasoning as above also applies to
> > __prep_compound_gigantic_folio.
> > Why?
> > In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> > in the case of error. If __prep_compound_gigantic_folio fails, the caller
> > will then call free_gigantic_folio() on the "gigantic page". However, it is
> > not really a gigantic at this point in time, and we are simply calling
> > cma_release() or free_contig_range().
> > The end result is that I do not believe the existing call to
> > folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> > required. ???
> No, there is a difference. IIUC, __destroy_compound_gigantic_folio
> explicitly reset page->mapping for each page of compound page which
> makes sure, even if in future some fields of struct page/folio overlaps
> with page->mapping, it won't cause `BUG: bad page state` error. But If we
> just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
> without moving folio_set_order(folio, order), this will cause extra
> maintenance overhead to track if page->_folio_order overlaps with
> page->mapping everytime struct page fields are changed. As in case of
> overlapping page->mapping will be non-zero. IMHO, To avoid it,
> moving the folio_set_order(folio, order) after all error checks are
> done on tail pages. So, _folio_order is either set on success and not
> set in case of error. (which is the original proposal). But for
> folio_set_head, I agree the way you suggested below.
>
> WDYT?
Right. It is more 'future proof' to only set folio order on success as
done in your original patch.
> >
> > If my reasoning above is correct, then we could just have one patch to
> > remove the folio_set_order(folio, 0) calls and remove special casing for
> > order 0 in folio_set_order.
> >
> > However, I still believe your restructuring of __prep_compound_gigantic_folio,
> > is of value. I do not believe there is an issue as questioned by Matthew. My
> > reasoning has been stated previously. We could make changes like the following
> > to retain the same order of operations in __prep_compound_gigantic_folio and
> > totally avoid Matthew's question. Totally untested.
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ea24718db4af..a54fee663cb1 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > int nr_pages = 1 << order;
> > struct page *p;
> >
> > - __folio_clear_reserved(folio);
> > - __folio_set_head(folio);
> > /* we rely on prep_new_hugetlb_folio to set the destructor */
> > - folio_set_order(folio, order);
> > +
> > for (i = 0; i < nr_pages; i++) {
> > p = folio_page(folio, i);
> >
> > @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > * on the head page when they need know if put_page() is needed
> > * after get_user_pages().
> > */
> > - if (i != 0) /* head page cleared above */
> > + if (i != 0) /* head page cleared below */
> > __ClearPageReserved(p);
> > /*
> > * Subtle and very unlikely
> > @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > } else {
> > VM_BUG_ON_PAGE(page_count(p), p);
> > }
> > - if (i != 0)
> > +
> > + if (i == 0) {
> > + __folio_clear_reserved(folio);
> > + __folio_set_head(folio);
> > + folio_set_order(folio, order);
> With folio_set_head, I agree to this, But does not feel good with
> folio_set_order as per my above reasoning. WDYT?
Agree with your reasoning. We should just move __folio_set_head and
folio_set_order after the loop as you originally suggested.
>
> > + } else {
> > set_compound_head(p, &folio->page);
> > + }
> > }
> > atomic_set(&folio->_entire_mapcount, -1);
> > atomic_set(&folio->_nr_pages_mapped, 0);
> > @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > p = folio_page(folio, j);
> > __ClearPageReserved(p);
> > }
> > - folio_set_order(folio, 0);
> > __folio_clear_head(folio);
> > return false;
> > }
> >
> >
> >>
> >> struct page {
> >> ...
> >> struct address_space * mapping; /* 24 8 */
> >> ...
> >> }
> >>
> >> struct folio {
> >> ...
> >> union {
> >> struct {
> >> long unsigned int _flags_1; /* 64 8 */
> >> long unsigned int _head_1; /* 72 8 */
> >> unsigned char _folio_dtor; /* 80 1 */
> >> unsigned char _folio_order; /* 81 1 */
> >>
> >> /* XXX 2 bytes hole, try to pack */
> >>
> >> atomic_t _entire_mapcount; /* 84 4 */
> >> atomic_t _nr_pages_mapped; /* 88 4 */
> >> atomic_t _pincount; /* 92 4 */
> >> unsigned int _folio_nr_pages; /* 96 4 */
> >> }; /* 64 40 */
> >> struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */
> >> }
> >> ...
> >> }
> >
> > I do not think the copy of page/folio definitions adds much value to the
> > commit message.
> Yeah, Will remove it.
> >
I think we are finally on the same page. I am good with this v2 patch.
Only change is to update commit message to remove the definitions.
--
Mike Kravetz
prev parent reply other threads:[~2023-06-08 23:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-15 17:08 Tarun Sahu
2023-05-15 17:15 ` Tarun Sahu
2023-05-15 17:16 ` Matthew Wilcox
2023-05-15 17:45 ` Mike Kravetz
2023-05-16 13:09 ` Tarun Sahu
2023-05-22 5:49 ` Tarun Sahu
[not found] ` <20230606155853.GA4150@monkey>
2023-06-08 10:03 ` Tarun Sahu
2023-06-08 23:52 ` Mike Kravetz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230608235234.GC88798@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=jaypatel@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=sidhartha.kumar@oracle.com \
--cc=tsahu@linux.ibm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox