From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
xen-devel@lists.xenproject.org, linux-fsdevel@vger.kernel.org,
nvdimm@lists.linux.dev, Andrew Morton <akpm@linux-foundation.org>,
Juergen Gross <jgross@suse.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>,
Dan Williams <dan.j.williams@intel.com>,
Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Jann Horn <jannh@google.com>,
Pedro Falcato <pfalcato@suse.de>, Hugh Dickins <hughd@google.com>,
Oscar Salvador <osalvador@suse.de>,
Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH v2 6/9] mm/memory: convert print_bad_pte() to print_bad_page_map()
Date: Fri, 18 Jul 2025 13:55:46 +0100 [thread overview]
Message-ID: <432662e3-e043-41cc-b7a2-acf14387eb95@lucifer.local> (raw)
In-Reply-To: <c8b9c805-2760-4b90-951a-3666cad6a4a4@redhat.com>
On Fri, Jul 18, 2025 at 01:04:30PM +0200, David Hildenbrand wrote:
> >
> > Yeah sorry I was in 'what locks do we need' mode and hadn't shifted back here,
> > but I guess the intent is that the caller _must_ hold this lock.
> >
> > I know it's nitty and annoying (sorry!) but as asserting seems to not be a
> > possibility here, could we spell these out as a series of points like:
> >
> > /*
> > * The caller MUST hold the following locks:
> > *
> > * - Leaf page table lock
> > * - Appropriate VMA lock to keep VMA stable
> > */
> >
> > I don't _actually_ think you need the rmap lock then, as none of the page tables
> > you access would be impacted by any rmap action afaict, with these locks held.
>
> I don't enjoy wrong comments ;)
>
> This can be called from rmap code when doing a vm_normal_page() while
> holding the PTL.
OK so this is just underlining the confusion here (not your fault! I mean in
general with page table code, really).
Would this possibly work better then?:
/*
* The caller MUST prevent page table teardown (by holding mmap, vma or rmap lock)
* and MUST hold the leaf page table lock.
*/
>
> Really, I think we are over-thinking a helper that is triggered in specific
> context when the world is about to collide.
Indeed, but I think it's important to be clear as to what is required for this
to work (even though, as I understand it, we're already in trouble and if page
tables are corrupted or something like this we may even NULL ptr deref anyway so
it's best effort).
>
> This is not your general-purpose API.
>
> Maybe I should have never added a comment. Maybe I should just not have done
> this patch, because I really don't want to do more than the bare minimum to
> print_bad_page_map().
No no David no :P come back to the light sir...
This is a good patch I don't mean to dissuade you, I just want to make things
clear in a confusing bit of the kernel as we in mm as usual make life hard for
ourselves...
I think the locking comment _is_ important, as for far too long in mm we've had
_implicit_ understanding of where the locks should be at any given time, which
constantly blows up underneath us.
I'd rather keep things as clear as possible so even the intellectually
challenged such as myself are subject to less confusion.
Anyway TL;DR if you do something similar to suggestion above it's all good. Just
needs clarification.
>
> Because I deeply detest it, and no comments we will add will change that.
So do I! *clinks glass* but yes, it's horrid. But there's value in improving
quality of horrid code and refactoring to the least-worst version.
And we can attack it in a more serious way later.
[snip]
> > > > > +static void print_bad_page_map(struct vm_area_struct *vma,
> > > > > + unsigned long addr, unsigned long long entry, struct page *page)
> > > > > +{
> > > > > + struct address_space *mapping;
> > > > > + pgoff_t index;
> > > > > +
> > > > > + if (is_bad_page_map_ratelimited())
> > > > > + return;
> > > > >
> > > > > mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL;
> > > > > index = linear_page_index(vma, addr);
> > > > >
> > > > > - pr_alert("BUG: Bad page map in process %s pte:%08llx pmd:%08llx\n",
> > > > > - current->comm,
> > > > > - (long long)pte_val(pte), (long long)pmd_val(*pmd));
> > > > > + pr_alert("BUG: Bad page map in process %s entry:%08llx", current->comm, entry);
> > > >
> > > > Sort of wonder if this is even useful if you don't know what the 'entry'
> > > > is? But I guess the dump below will tell you.
> > >
> > > You probably missed in the patch description:
> > >
> > > "Whether it is a PTE or something else will usually become obvious from the
> > > page table dump or from the dumped stack. If ever required in the future, we
> > > could pass the entry level type similar to "enum rmap_level". For now, let's
> > > keep it simple."
> >
> > Yeah sorry I glossed over the commit msg, and now I pay for it ;) OK this
> > is fine then.
>
> Let me play with indicating the page table level, but it's the kind of stuff
> I wouldn't want to do in this series here.
Sure understood. I don't mean to hold this up with nits. Am happy to be
flexible, just thinking out loud generally.
>
> > >
> > > >
> > > > Then we have VM_IO, which strictly must not have an associated page right?
> > >
> > > VM_IO just means read/write side-effects, I think you could have ones with
> > > an memmap easily ... e.g., memory section (128MiB) spanning both memory and
> > > MMIO regions.
> >
> > Hmm, but why not have two separate VMAs? I guess I need to look into more
> > what this flag actually effects.
>
> Oh, I meant, that we might have a "struct page" for MMIO memory (pfn_valid()
> == true).
>
> In a MIXEDMAP that will get refcounted. Not sure if there are users that use
> VM_IO in a MIXEDMAP, I would assume so but didn't check.
>
> So VM_IO doesn't really interact with vm_normal_page(), really. It's all
> about PFNMAP and MIXEDMAP.
Thanks, yeah am thinking more broadly about VM_SPECIAL here. May go off and do
some work on that... VM_UNMERGEABLE might be better.
>
> --
> Cheers,
>
> David / dhildenb
>
Cheers, Lorenzo
next prev parent reply other threads:[~2025-07-18 12:56 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-17 11:52 [PATCH v2 0/9] mm: vm_normal_page*() improvements David Hildenbrand
2025-07-17 11:52 ` [PATCH v2 1/9] mm/huge_memory: move more common code into insert_pmd() David Hildenbrand
2025-07-17 15:34 ` Lorenzo Stoakes
2025-07-25 2:47 ` Wei Yang
2025-07-17 11:52 ` [PATCH v2 2/9] mm/huge_memory: move more common code into insert_pud() David Hildenbrand
2025-07-17 15:42 ` Lorenzo Stoakes
2025-07-25 2:56 ` Wei Yang
2025-07-17 11:52 ` [PATCH v2 3/9] mm/huge_memory: support huge zero folio in vmf_insert_folio_pmd() David Hildenbrand
2025-07-17 15:47 ` Lorenzo Stoakes
2025-07-25 8:07 ` Wei Yang
2025-07-17 11:52 ` [PATCH v2 4/9] fs/dax: use vmf_insert_folio_pmd() to insert the huge zero folio David Hildenbrand
2025-07-17 18:09 ` Lorenzo Stoakes
2025-07-17 11:52 ` [PATCH v2 5/9] mm/huge_memory: mark PMD mappings of the huge zero folio special David Hildenbrand
2025-07-17 18:29 ` Lorenzo Stoakes
2025-07-17 20:31 ` David Hildenbrand
2025-07-18 10:41 ` Lorenzo Stoakes
2025-07-18 10:54 ` David Hildenbrand
2025-07-18 13:06 ` Lorenzo Stoakes
2025-07-28 8:49 ` Wei Yang
2025-07-17 11:52 ` [PATCH v2 6/9] mm/memory: convert print_bad_pte() to print_bad_page_map() David Hildenbrand
2025-07-17 19:17 ` Lorenzo Stoakes
2025-07-17 20:03 ` David Hildenbrand
2025-07-18 10:15 ` Lorenzo Stoakes
2025-07-18 11:04 ` David Hildenbrand
2025-07-18 12:55 ` Lorenzo Stoakes [this message]
2025-07-17 22:06 ` Demi Marie Obenour
2025-07-18 7:44 ` David Hildenbrand
2025-07-18 7:59 ` Demi Marie Obenour
2025-07-18 8:26 ` David Hildenbrand
2025-07-17 11:52 ` [PATCH v2 7/9] mm/memory: factor out common code from vm_normal_page_*() David Hildenbrand
2025-07-17 19:51 ` Lorenzo Stoakes
2025-07-17 19:55 ` Lorenzo Stoakes
2025-07-17 20:03 ` David Hildenbrand
2025-07-18 12:43 ` Lorenzo Stoakes
2025-07-30 12:54 ` David Hildenbrand
2025-07-30 13:24 ` Lorenzo Stoakes
2025-07-17 20:12 ` David Hildenbrand
2025-07-18 12:35 ` Lorenzo Stoakes
2025-07-17 11:52 ` [PATCH v2 8/9] mm: introduce and use vm_normal_page_pud() David Hildenbrand
2025-07-17 20:03 ` Lorenzo Stoakes
2025-07-17 20:14 ` David Hildenbrand
2025-07-18 10:47 ` Lorenzo Stoakes
2025-07-18 11:06 ` David Hildenbrand
2025-07-18 12:44 ` Lorenzo Stoakes
2025-07-29 7:52 ` Wei Yang
2025-07-17 11:52 ` [PATCH v2 9/9] mm: rename vm_ops->find_special_page() to vm_ops->find_normal_page() David Hildenbrand
2025-07-17 20:07 ` Lorenzo Stoakes
2025-07-29 7:53 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=432662e3-e043-41cc-b7a2-acf14387eb95@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgross@suse.com \
--cc=lance.yang@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=nvdimm@lists.linux.dev \
--cc=oleksandr_tyshchenko@epam.com \
--cc=osalvador@suse.de \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=sstabellini@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=xen-devel@lists.xenproject.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox