linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Replacing walk_page_range
@ 2023-08-08 17:09 Matthew Wilcox
  2023-08-09  7:35 ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2023-08-08 17:09 UTC (permalink / raw)
  To: Sidhartha Kumar; +Cc: linux-mm

Sid was incautious enough to say he'd like to take on fixing
walk_page_range() so that hugetlb isn't treated specially.  This is
going to subject him to one of my rants, so I thought I'd share with
everyone before we meet to talk about it later today.

1. I dislike the callback approach.  Indirect function calls are not
cheap (thanks Spectre!) and it forces separation of code into two
functions, often necessitating some awkward passing of state between
them through the mm_walk->private void pointer.

2. If you want to support PUDs, and the page tables happen to contain PTEs,
you get passed a PUD, even though you need to do the ACTION_CONTINUE.

3. There's separate handling for hugetlb, even though there really
shoudn't be.

4. It's not used everywhere.  unmap_page_range() opencodes the page
table walk.  It's so hard to use that ptdump_walk_pgd() wraps it
and has its own callback!

5. There's no help for systems with cont_pte/cont_pmd.  You have to
manage that yourself.


I think a new interface looks like ...

	struct folio *folio;
	PAGEWALK(my_walk, mm, start, end);

	pagewalk_for_each_folio(folio, &my_walk) {
		... do things with folio ...
	}

but I haven't spent enough time looking at all the consumers to be
sure that it'll work.  I think there are other consumers that want
to do things that aren't per-folio iterations, such as

	pagewalk_for_each_pfn(pfn, &my_walk) {
		... do things with pfn ...
	}

I think the pre-vma, post-vma hooks can be replaced by remembering what
the prev vma was, and doing whatever is needed if current vma is
different from prev vma.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Replacing walk_page_range
  2023-08-08 17:09 Replacing walk_page_range Matthew Wilcox
@ 2023-08-09  7:35 ` David Hildenbrand
  2023-08-09 13:09   ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand @ 2023-08-09  7:35 UTC (permalink / raw)
  To: Matthew Wilcox, Sidhartha Kumar; +Cc: linux-mm

On 08.08.23 19:09, Matthew Wilcox wrote:
> Sid was incautious enough to say he'd like to take on fixing
> walk_page_range() so that hugetlb isn't treated specially.  This is
> going to subject him to one of my rants, so I thought I'd share with
> everyone before we meet to talk about it later today.

Are we only talking about walk_page_range() or also 
walk_page_range_vma() / walk_page_vma() ?

I tend to like the VMA variants ...

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Replacing walk_page_range
  2023-08-09  7:35 ` David Hildenbrand
@ 2023-08-09 13:09   ` Matthew Wilcox
  2023-08-09 13:44     ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2023-08-09 13:09 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Sidhartha Kumar, linux-mm

On Wed, Aug 09, 2023 at 09:35:01AM +0200, David Hildenbrand wrote:
> On 08.08.23 19:09, Matthew Wilcox wrote:
> > Sid was incautious enough to say he'd like to take on fixing
> > walk_page_range() so that hugetlb isn't treated specially.  This is
> > going to subject him to one of my rants, so I thought I'd share with
> > everyone before we meet to talk about it later today.
> 
> Are we only talking about walk_page_range() or also walk_page_range_vma() /
> walk_page_vma() ?
> 
> I tend to like the VMA variants ...

We're talking about getting rid of mm_walk_ops.  There aren't exactly a
lot of callers of either of those functions -- 4 of walk_page_vma()
and 1 of walk_page_range_vma().


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Replacing walk_page_range
  2023-08-09 13:09   ` Matthew Wilcox
@ 2023-08-09 13:44     ` David Hildenbrand
  0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2023-08-09 13:44 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Sidhartha Kumar, linux-mm

On 09.08.23 15:09, Matthew Wilcox wrote:
> On Wed, Aug 09, 2023 at 09:35:01AM +0200, David Hildenbrand wrote:
>> On 08.08.23 19:09, Matthew Wilcox wrote:
>>> Sid was incautious enough to say he'd like to take on fixing
>>> walk_page_range() so that hugetlb isn't treated specially.  This is
>>> going to subject him to one of my rants, so I thought I'd share with
>>> everyone before we meet to talk about it later today.
>>
>> Are we only talking about walk_page_range() or also walk_page_range_vma() /
>> walk_page_vma() ?
>>
>> I tend to like the VMA variants ...
> 
> We're talking about getting rid of mm_walk_ops.  There aren't exactly a
> lot of callers of either of those functions -- 4 of walk_page_vma()
> and 1 of walk_page_range_vma().

Okay, I see. For some use cases, it's probably sufficient to walk folios.

But we do have some advanced users like fs/proc/task_mmu.c.

[I do have two more follow_page() -> walk_page_range_vma() conversions 
lying around here; they want to also know if a page is mapped writable]

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-09 13:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-08 17:09 Replacing walk_page_range Matthew Wilcox
2023-08-09  7:35 ` David Hildenbrand
2023-08-09 13:09   ` Matthew Wilcox
2023-08-09 13:44     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox