linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxcg.org>,
	 "Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	 David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>, Peter Xu <peterx@redhat.com>,
	 Yang Shi <shy828301@gmail.com>,
	John Hubbard <jhubbard@nvidia.com>,
	 Mike Kravetz <mike.kravetz@oracle.com>,
	Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	 Muchun Song <songmuchun@bytedance.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	 Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Mina Almasry <almasrymina@google.com>,
	 James Houghton <jthoughton@google.com>,
	"Zach O'Keefe" <zokeefe@google.com>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [PATCH 4/3] mm,thp,rmap: handle the normal !PageCompound case first
Date: Wed, 9 Nov 2022 19:23:08 -0800	[thread overview]
Message-ID: <CAHk-=wiJfLx9dVJcQOhQsAseoPmLhpVvHgf4GYu6frfhmBAuMg@mail.gmail.com> (raw)
In-Reply-To: <fca2f694-2098-b0ef-d4e-f1d8b94d318c@google.com>

On Wed, Nov 9, 2022 at 6:18 PM Hugh Dickins <hughd@google.com> wrote:
>
> Commit ("mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts")
> propagated the "if (compound) {lock} else if (PageCompound) {lock} else
> {atomic}" pattern throughout; but Linus hated the way that gives primacy
> to the uncommon case: switch to "if (!PageCompound) {atomic} else if
> (compound) {lock} else {lock}" throughout.

Side note, that 'compound' naming is also on my list of "I'm _really_
not a fan".

We actually have a completely different meaning for PageCompound()
than the meaning of 'compound' in the rmap functions, and those
functions literally mix those meanings if  not on the same line, then
at least right next to each other.

What 'rmap' actually means with 'compound' in the add/remove functions
is basically 'not PAGE_SIZE' as far as I can tell.

So if I get the energy to do the rmap counts, I will *also* be
renaming that horrible thing completely.

In fact, I'd be inclined to just pass in the actual page size
(possibly a page shift order), which some of the functions want
anyway, and which would be a lot clearer than the horrid "compound"
name.

One reason I find the "compound" name so horrifying is that it is used
very much for HUGETLB pages, which I don't think end up ever being
marked as PageCompund(), and which are - for various historical
reasons - doubly confusing because they use a "pte_t" to describe
themselves, even when they are actually using a "pmd_t" or a "pud_t"
to actually map the page.

So a HUGETLB entry really is (for historical reasons) designed to look
like a single last-level pte_t entry, but from an rmap perspective it
is explicitly *not* made to look like that at all, completely
violating the HUGETLB design.

So the rmap code has several really REALLY confusing cases:

 - the common one: just a page mapped at a *real* pte_t level.

   To make that more confusing, it can actually be a single-page
_part_ of a compound page in the PageCompound() sense, but the rmap
'compound' argument will *not* be set, because from a *mmap*
standpoint it's mapped as a single page.

   This is generally recognized by the rmap code by 'compound' being zero.

 - a HUGETLB mapping, which uses '->pte' in the page walking (please
kill me now) and is *not* using a PageCompund() page, but 'compound'
is still set, because from a *mapping* standpoint it's not a final
pte_t entry (buit from a MM design standpoint it _was_ supposed to be
designed like a single page).

   This is randomly recognized by looking at the vma flags (using
"is_vm_hugetlb_page(vma)") or just inherent in the code itself (ie the
'hugetlb()' functions are only called by code that has tested this
situation one way or another)

   To make things more confusing, some places use PageHeadHuge()
instead (but the folio version of said test is called
"folio_test_hugetlb()", just so that nobody could possibly ever accuse
the HUGETLB code to have consistency).

    You'd think that PageHeadHuge() is one of the functions that
checks the page flag bits. You'd be wrong. It's very very special.

 - an *actual* PageCompound() page, mapped as such as a THP page (ie
mapped by a pmd, not a pte).

   This may be the same page size as a HUGETLB mapping (and often is),
but it's a completely different case in every single way.

   But like the HUGETLB case, the 'compound' argument will be set, and
now it's actually a compound page (but hey, so could the single page
mapping case be too).

   Unlike the HUGETLB case, the page walker does not use ->pte for
this, and some of the walkers will very much use that, ie
folio_referenced_one() will do

                if (pvmw.pte) {

   to distinguish the "natively mapped PageCompound()" case (no pte)
from the "map a single page" or from the HUGETLB case (yes pte).

There may be more cases than those three, and I may have confused
myself and gotten some of the details wrong, but I did want to write
the above diatribe out to

 (a) see if somebody corrects me for any of the cases I enumerated

 (b) see if somebody can point to yet another odd case

 (c) see if somebody has suggestions for better and more obvious names
for that 'compound' argument in the rmap code

I do wish the HUGETLB case didn't use 'pte' for its notion of how
HUGETLB entries are mapped, but that's literally how HUGETLB is
designed: it started life as a larger last-level pte.

It just means that it ends up being very confusing when from a page
table walk perspective, you're walking a pud or a pmd entry, and then
you see a 'pte_t' instead.

An example of that confusion is visible in try_to_unmap_one(), which
can be called with a HUGEPTE page (well, folio), and that does

        while (page_vma_mapped_walk(&pvmw)) {

to find the rmap entries, but it can't do that

                if (pvmw.pte) {

test to see what mapping it's walking (since both regular pages and
HUGETLB pages use that), so then it just keeps testing what kind of
page that was passed in.

Which really smells very odd to me, but apparently it works,
presumably because unlike THP there can be no splitting.  But it's a
case where the whole "was it a regular page or a HUGETLB page" is
really really confusing/

And mm/hugetlb.c (and places like mm/pagewalk.c too) has a fair number
of random casts as a result of this "it's not really a pte_t, but it's
designed to look like one" thing.

This all really is understandable from a historical context, and from
HUGETLB really being just kind of designed to be a bigger page (not a
collection of small pages that can be mapped as a bigger entity), but
it really does mean that 'rmap' calling those HUGETLB pages 'compound'
is conceptually very very wrong.

Oh well. That whole HUGETLB model isn't getting fixed, but I think the
naming confusion about 'compound' *can* be fixed fairly easily, and we
could try to at least avoid having 'compound' and 'PageCompound()'
mean totally different things in the same function.

I'm not going to do any of this cleanup now, but I wanted to at least
voice my concerns. Maybe I'll get around to actually trying to clarify
the code later.

Because this was all stuff that was *very* confusing when I did the
rmap simplification in that (now completely rewritten to explicitly
_not_ touch rmap at all) original version of the delayed rmap patch
series.

                 Linus


  reply	other threads:[~2022-11-10  3:23 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-03  1:44 [PATCH 0/3] mm,huge,rmap: unify and speed up compound mapcounts Hugh Dickins
2022-11-03  1:48 ` [PATCH 1/3] mm,hugetlb: use folio fields in second tail page Hugh Dickins
2022-11-03 21:18   ` Sidhartha Kumar
2022-11-04  4:29     ` Hugh Dickins
2022-11-10  0:11       ` Sidhartha Kumar
2022-11-10  2:10         ` Hugh Dickins
2022-11-10  2:13           ` [PATCH 1/3 fix] mm,hugetlb: use folio fields in second tail page: fix Hugh Dickins
2022-11-05 19:13   ` [PATCH 1/3] mm,hugetlb: use folio fields in second tail page Kirill A. Shutemov
2022-11-10  1:58     ` Hugh Dickins
2022-11-03  1:51 ` [PATCH 2/3] mm,thp,rmap: simplify compound page mapcount handling Hugh Dickins
2022-11-05 19:51   ` Kirill A. Shutemov
2022-11-10  2:49     ` Hugh Dickins
2022-11-03  1:53 ` [PATCH 3/3] mm,thp,rmap: lock_compound_mapcounts() on THP mapcounts Hugh Dickins
2022-11-05 20:06   ` Kirill A. Shutemov
2022-11-10  3:31     ` Hugh Dickins
2022-11-10  2:18 ` [PATCH 4/3] mm,thp,rmap: handle the normal !PageCompound case first Hugh Dickins
2022-11-10  3:23   ` Linus Torvalds [this message]
2022-11-10  4:21     ` Hugh Dickins
2022-11-10 16:31     ` Matthew Wilcox
2022-11-10 16:58       ` Linus Torvalds
2022-11-18  9:08 ` [PATCH 0/3] mm,thp,rmap: rework the use of subpages_mapcount Hugh Dickins
2022-11-18  9:12   ` [PATCH 1/3] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages Hugh Dickins
2022-11-19  0:12     ` Yu Zhao
2022-11-19  0:37       ` Hugh Dickins
2022-11-19  1:35         ` [PATCH 1/3 fix] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages: fix Hugh Dickins
2022-11-21 12:38           ` Kirill A. Shutemov
2022-11-22  9:13             ` Hugh Dickins
2022-11-21 12:36     ` [PATCH 1/3] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages Kirill A. Shutemov
2022-11-22  9:03       ` Hugh Dickins
2022-11-18  9:14   ` [PATCH 2/3] mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped Hugh Dickins
2022-11-21 13:09     ` Kirill A. Shutemov
2022-11-22  9:33       ` Hugh Dickins
2022-11-18  9:16   ` [PATCH 3/3] mm,thp,rmap: clean up the end of __split_huge_pmd_locked() Hugh Dickins
2022-11-21 13:24     ` Kirill A. Shutemov
2022-11-18 20:18   ` [PATCH 0/3] mm,thp,rmap: rework the use of subpages_mapcount Linus Torvalds
2022-11-18 20:42     ` Johannes Weiner
2022-11-18 20:51     ` Hugh Dickins
2022-11-18 22:03       ` Andrew Morton
2022-11-18 22:07         ` Linus Torvalds
2022-11-18 22:10         ` Hugh Dickins
2022-11-18 22:23           ` Andrew Morton
2022-11-21 16:59   ` Shakeel Butt
2022-11-21 17:16     ` Linus Torvalds
2022-11-22 16:27       ` Shakeel Butt
2022-11-21 18:52     ` Johannes Weiner
2022-11-22  1:32       ` Hugh Dickins
2022-11-22  5:57       ` Matthew Wilcox
2022-11-22  6:55         ` Johannes Weiner
2022-11-22 16:30           ` Shakeel Butt
2022-11-22  9:38   ` [PATCH v2 " Hugh Dickins
2022-11-22  9:42     ` [PATCH v2 1/3] mm,thp,rmap: subpages_mapcount of PTE-mapped subpages Hugh Dickins
2022-11-22  9:49     ` [PATCH v2 2/3] mm,thp,rmap: subpages_mapcount COMPOUND_MAPPED if PMD-mapped Hugh Dickins
2022-11-22  9:51     ` [PATCH v2 3/3] mm,thp,rmap: clean up the end of __split_huge_pmd_locked() Hugh Dickins
2022-12-05  1:38       ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wiJfLx9dVJcQOhQsAseoPmLhpVvHgf4GYu6frfhmBAuMg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxcg.org \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=kirill@shutemov.name \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=shy828301@gmail.com \
    --cc=sidhartha.kumar@oracle.com \
    --cc=songmuchun@bytedance.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox