linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 "Kirill A. Shutemov" <kirill@shutemov.name>,
	 Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	 Vlastimil Babka <vbabka@suse.cz>, Peter Xu <peterx@redhat.com>,
	 Yang Shi <shy828301@gmail.com>,
	John Hubbard <jhubbard@nvidia.com>,
	 Mike Kravetz <mike.kravetz@oracle.com>,
	 Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	 Muchun Song <songmuchun@bytedance.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	 Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	 Mina Almasry <almasrymina@google.com>,
	 James Houghton <jthoughton@google.com>,
	Zach O'Keefe <zokeefe@google.com>,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 4/3] mm,thp,rmap: handle the normal !PageCompound case first
Date: Wed, 9 Nov 2022 20:21:47 -0800 (PST)	[thread overview]
Message-ID: <22eebc5b-6fcb-5561-634-f984f82f533@google.com> (raw)
In-Reply-To: <CAHk-=wiJfLx9dVJcQOhQsAseoPmLhpVvHgf4GYu6frfhmBAuMg@mail.gmail.com>

On Wed, 9 Nov 2022, Linus Torvalds wrote:
> On Wed, Nov 9, 2022 at 6:18 PM Hugh Dickins <hughd@google.com> wrote:
> >
> > Commit ("mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts")
> > propagated the "if (compound) {lock} else if (PageCompound) {lock} else
> > {atomic}" pattern throughout; but Linus hated the way that gives primacy
> > to the uncommon case: switch to "if (!PageCompound) {atomic} else if
> > (compound) {lock} else {lock}" throughout.
> 
> Side note, that 'compound' naming is also on my list of "I'm _really_
> not a fan".
> 
> We actually have a completely different meaning for PageCompound()
> than the meaning of 'compound' in the rmap functions, and those
> functions literally mix those meanings if  not on the same line, then
> at least right next to each other.
> 
> What 'rmap' actually means with 'compound' in the add/remove functions
> is basically 'not PAGE_SIZE' as far as I can tell.
> 
> So if I get the energy to do the rmap counts,

See other mail, I got some zest to try your idea on the counts.

> I will *also* be renaming that horrible thing completely.

But I don't suppose I'll spend time on that part, I don't really
see the problem.  "compound" might be better named, say, "large_rmap"
(I'd have said "pmd_mapped" or "pmd_rmap", but you raise the spectre
of hugetlb below, and powerpc as usual does hugetlb very differently),
but compound seems okay to me, and consistent with usage elsewhere.

> 
> In fact, I'd be inclined to just pass in the actual page size
> (possibly a page shift order), which some of the functions want
> anyway, and which would be a lot clearer than the horrid "compound"
> name.

But yes, I think that would be an improvement; yet you might find a
reason why so often we don't do that - there's often an awkward
BUILD_BUG when you build without CONFIG_TRANSPARENT_HUGEPAGE=y.
And much as I've often wanted to remove it, it does give some
assurance that we're not bloating THP-disabled configs.  Maybe the
steady growth of compound_nr() usage gets around that better now
(or will you be renaming that too ?-)

> 
> One reason I find the "compound" name so horrifying is that it is used
> very much for HUGETLB pages, which I don't think end up ever being
> marked as PageCompund(), and which are - for various historical

hugetlb pages are always PageCompound.  Shoot me if they're not.

> reasons - doubly confusing because they use a "pte_t" to describe
> themselves, even when they are actually using a "pmd_t" or a "pud_t"
> to actually map the page.

Yes, I wish we would undo that hugetlb deception: it would probably
be much more (un)doable, were it not for powerpc (and ia64 iirc).

> 
> So a HUGETLB entry really is (for historical reasons) designed to look
> like a single last-level pte_t entry, but from an rmap perspective it
> is explicitly *not* made to look like that at all, completely
> violating the HUGETLB design.
> 
> So the rmap code has several really REALLY confusing cases:
> 
>  - the common one: just a page mapped at a *real* pte_t level.
> 
>    To make that more confusing, it can actually be a single-page
> _part_ of a compound page in the PageCompound() sense, but the rmap
> 'compound' argument will *not* be set, because from a *mmap*
> standpoint it's mapped as a single page.

Yes.  Most pages are unambiguous, but when a PageHead page arrives
at page_add/remove_rmap(), we have to do different things, according
to whether it's mapped with a large or a small entry.

But I'm going away at this point, you write much faster than I can
read and understand and respond.  I'm responding in part to "fix"
my stupid typo on Johannes's address.

Hugh

> 
>    This is generally recognized by the rmap code by 'compound' being zero.
> 
>  - a HUGETLB mapping, which uses '->pte' in the page walking (please
> kill me now) and is *not* using a PageCompund() page, but 'compound'
> is still set, because from a *mapping* standpoint it's not a final
> pte_t entry (buit from a MM design standpoint it _was_ supposed to be
> designed like a single page).
> 
>    This is randomly recognized by looking at the vma flags (using
> "is_vm_hugetlb_page(vma)") or just inherent in the code itself (ie the
> 'hugetlb()' functions are only called by code that has tested this
> situation one way or another)
> 
>    To make things more confusing, some places use PageHeadHuge()
> instead (but the folio version of said test is called
> "folio_test_hugetlb()", just so that nobody could possibly ever accuse
> the HUGETLB code to have consistency).
> 
>     You'd think that PageHeadHuge() is one of the functions that
> checks the page flag bits. You'd be wrong. It's very very special.
> 
>  - an *actual* PageCompound() page, mapped as such as a THP page (ie
> mapped by a pmd, not a pte).
> 
>    This may be the same page size as a HUGETLB mapping (and often is),
> but it's a completely different case in every single way.
> 
>    But like the HUGETLB case, the 'compound' argument will be set, and
> now it's actually a compound page (but hey, so could the single page
> mapping case be too).
> 
>    Unlike the HUGETLB case, the page walker does not use ->pte for
> this, and some of the walkers will very much use that, ie
> folio_referenced_one() will do
> 
>                 if (pvmw.pte) {
> 
>    to distinguish the "natively mapped PageCompound()" case (no pte)
> from the "map a single page" or from the HUGETLB case (yes pte).
> 
> There may be more cases than those three, and I may have confused
> myself and gotten some of the details wrong, but I did want to write
> the above diatribe out to
> 
>  (a) see if somebody corrects me for any of the cases I enumerated
> 
>  (b) see if somebody can point to yet another odd case
> 
>  (c) see if somebody has suggestions for better and more obvious names
> for that 'compound' argument in the rmap code
> 
> I do wish the HUGETLB case didn't use 'pte' for its notion of how
> HUGETLB entries are mapped, but that's literally how HUGETLB is
> designed: it started life as a larger last-level pte.
> 
> It just means that it ends up being very confusing when from a page
> table walk perspective, you're walking a pud or a pmd entry, and then
> you see a 'pte_t' instead.
> 
> An example of that confusion is visible in try_to_unmap_one(), which
> can be called with a HUGEPTE page (well, folio), and that does
> 
>         while (page_vma_mapped_walk(&pvmw)) {
> 
> to find the rmap entries, but it can't do that
> 
>                 if (pvmw.pte) {
> 
> test to see what mapping it's walking (since both regular pages and
> HUGETLB pages use that), so then it just keeps testing what kind of
> page that was passed in.
> 
> Which really smells very odd to me, but apparently it works,
> presumably because unlike THP there can be no splitting.  But it's a
> case where the whole "was it a regular page or a HUGETLB page" is
> really really confusing/
> 
> And mm/hugetlb.c (and places like mm/pagewalk.c too) has a fair number
> of random casts as a result of this "it's not really a pte_t, but it's
> designed to look like one" thing.
> 
> This all really is understandable from a historical context, and from
> HUGETLB really being just kind of designed to be a bigger page (not a
> collection of small pages that can be mapped as a bigger entity), but
> it really does mean that 'rmap' calling those HUGETLB pages 'compound'
> is conceptually very very wrong.
> 
> Oh well. That whole HUGETLB model isn't getting fixed, but I think the
> naming confusion about 'compound' *can* be fixed fairly easily, and we
> could try to at least avoid having 'compound' and 'PageCompound()'
> mean totally different things in the same function.
> 
> I'm not going to do any of this cleanup now, but I wanted to at least
> voice my concerns. Maybe I'll get around to actually trying to clarify
> the code later.
> 
> Because this was all stuff that was *very* confusing when I did the
> rmap simplification in that (now completely rewritten to explicitly
> _not_ touch rmap at all) original version of the delayed rmap patch
> series.
> 
>                  Linus


  reply	other threads:[~2022-11-10  4:22 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-03  1:44 [PATCH 0/3] mm,huge,rmap: unify and speed up compound mapcounts Hugh Dickins
2022-11-03  1:48 ` [PATCH 1/3] mm,hugetlb: use folio fields in second tail page Hugh Dickins
2022-11-03 21:18   ` Sidhartha Kumar
2022-11-04  4:29     ` Hugh Dickins
2022-11-10  0:11       ` Sidhartha Kumar
2022-11-10  2:10         ` Hugh Dickins
2022-11-10  2:13           ` [PATCH 1/3 fix] mm,hugetlb: use folio fields in second tail page: fix Hugh Dickins
2022-11-05 19:13   ` [PATCH 1/3] mm,hugetlb: use folio fields in second tail page Kirill A. Shutemov
2022-11-10  1:58     ` Hugh Dickins
2022-11-03  1:51 ` [PATCH 2/3] mm,thp,rmap: simplify compound page mapcount handling Hugh Dickins
2022-11-05 19:51   ` Kirill A. Shutemov
2022-11-10  2:49     ` Hugh Dickins
2022-11-03  1:53 ` [PATCH 3/3] mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts Hugh Dickins
2022-11-05 20:06   ` Kirill A. Shutemov
2022-11-10  3:31     ` Hugh Dickins
2022-11-10  2:18 ` [PATCH 4/3] mm,thp,rmap: handle the normal !PageCompound case first Hugh Dickins
2022-11-10  3:23   ` Linus Torvalds
2022-11-10  4:21     ` Hugh Dickins [this message]
2022-11-10 16:31     ` Matthew Wilcox
2022-11-10 16:58       ` Linus Torvalds
2022-11-18  9:08 ` [PATCH 0/3] mm,thp,rmap: rework the use of subpages_mapcount Hugh Dickins
2022-11-18  9:12   ` [PATCH 1/3] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages Hugh Dickins
2022-11-19  0:12     ` Yu Zhao
2022-11-19  0:37       ` Hugh Dickins
2022-11-19  1:35         ` [PATCH 1/3 fix] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages: fix Hugh Dickins
2022-11-21 12:38           ` Kirill A. Shutemov
2022-11-22  9:13             ` Hugh Dickins
2022-11-21 12:36     ` [PATCH 1/3] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages Kirill A. Shutemov
2022-11-22  9:03       ` Hugh Dickins
2022-11-18  9:14   ` [PATCH 2/3] mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped Hugh Dickins
2022-11-21 13:09     ` Kirill A. Shutemov
2022-11-22  9:33       ` Hugh Dickins
2022-11-18  9:16   ` [PATCH 3/3] mm,thp,rmap: clean up the end of __split_huge_pmd_locked() Hugh Dickins
2022-11-21 13:24     ` Kirill A. Shutemov
2022-11-18 20:18   ` [PATCH 0/3] mm,thp,rmap: rework the use of subpages_mapcount Linus Torvalds
2022-11-18 20:42     ` Johannes Weiner
2022-11-18 20:51     ` Hugh Dickins
2022-11-18 22:03       ` Andrew Morton
2022-11-18 22:07         ` Linus Torvalds
2022-11-18 22:10         ` Hugh Dickins
2022-11-18 22:23           ` Andrew Morton
2022-11-21 16:59   ` Shakeel Butt
2022-11-21 17:16     ` Linus Torvalds
2022-11-22 16:27       ` Shakeel Butt
2022-11-21 18:52     ` Johannes Weiner
2022-11-22  1:32       ` Hugh Dickins
2022-11-22  5:57       ` Matthew Wilcox
2022-11-22  6:55         ` Johannes Weiner
2022-11-22 16:30           ` Shakeel Butt
2022-11-22  9:38   ` [PATCH v2 " Hugh Dickins
2022-11-22  9:42     ` [PATCH v2 1/3] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages Hugh Dickins
2022-11-22  9:49     ` [PATCH v2 2/3] mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped Hugh Dickins
2022-11-22  9:51     ` [PATCH v2 3/3] mm,thp,rmap: clean up the end of __split_huge_pmd_locked() Hugh Dickins
2022-12-05  1:38       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22eebc5b-6fcb-5561-634-f984f82f533@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=shy828301@gmail.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=songmuchun@bytedance.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox