linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* mbind MPOL_INTERLEAVE existing pages
@ 2023-05-01 18:58 Mike Kravetz
  2023-05-02  7:45 ` Vlastimil Babka
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Kravetz @ 2023-05-01 18:58 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: Michal Hocko, Vlastimil Babka, Lorenzo Stoakes

I received a question from a customer that was trying to move pages via
the mbind system call.  In this specific case, the system had two nodes
and all pages in the range were already present on node 0.  They then
called mbind with mode MPOL_INTERLEAVE and the MPOL_MF_MOVE_ALL flag.  Their
expectation was that half the pages in the range would be moved to node 1
in an interleaved pattern.

In the above situation, no pages actually get moved.  This is because mbind
creates a list of pages to be moved via:

	ret = queue_pages_range(mm, start, end, nmask,
                          flags | MPOL_MF_INVERT, &pagelist);

No page will be added to the list as queue_folio_required is called for each
page to determine if it resides within the set of nodes.  And, all page are
within the set.

I have reread the mbind man page several times and agree that one might
expect MPOL_INTERLEAVE with MPOL_MF_MOVE_ALL to move pages and create an
interleaved pattern.  My question is should we:
- Change mbind so that pages are moved to an interleaved pattern?
- Update the documentation to be more explicit?

I can do either, but just wanted to get opinions before starting.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mbind MPOL_INTERLEAVE existing pages
  2023-05-01 18:58 mbind MPOL_INTERLEAVE existing pages Mike Kravetz
@ 2023-05-02  7:45 ` Vlastimil Babka
  2023-05-02 13:12   ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Vlastimil Babka @ 2023-05-02  7:45 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel; +Cc: Michal Hocko, Lorenzo Stoakes

On 5/1/23 20:58, Mike Kravetz wrote:
> I received a question from a customer that was trying to move pages via
> the mbind system call.  In this specific case, the system had two nodes
> and all pages in the range were already present on node 0.  They then
> called mbind with mode MPOL_INTERLEAVE and the MPOL_MF_MOVE_ALL flag.  Their
> expectation was that half the pages in the range would be moved to node 1
> in an interleaved pattern.
> 
> In the above situation, no pages actually get moved.  This is because mbind
> creates a list of pages to be moved via:
> 
> 	ret = queue_pages_range(mm, start, end, nmask,
>                           flags | MPOL_MF_INVERT, &pagelist);
> 
> No page will be added to the list as queue_folio_required is called for each
> page to determine if it resides within the set of nodes.  And, all page are
> within the set.
> 
> I have reread the mbind man page several times and agree that one might
> expect MPOL_INTERLEAVE with MPOL_MF_MOVE_ALL to move pages and create an
> interleaved pattern.  My question is should we:
> - Change mbind so that pages are moved to an interleaved pattern?

I guess it could be worth trying, if there's a use case. And hope nobody
else is depending on the current behavior and will complain afterwards :)

> - Update the documentation to be more explicit?
> 
> I can do either, but just wanted to get opinions before starting.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mbind MPOL_INTERLEAVE existing pages
  2023-05-02  7:45 ` Vlastimil Babka
@ 2023-05-02 13:12   ` Michal Hocko
  2023-05-02 16:34     ` Mike Kravetz
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2023-05-02 13:12 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Mike Kravetz, linux-mm, linux-kernel, Lorenzo Stoakes

On Tue 02-05-23 09:45:40, Vlastimil Babka wrote:
> On 5/1/23 20:58, Mike Kravetz wrote:
> > I received a question from a customer that was trying to move pages via
> > the mbind system call.  In this specific case, the system had two nodes
> > and all pages in the range were already present on node 0.  They then
> > called mbind with mode MPOL_INTERLEAVE and the MPOL_MF_MOVE_ALL flag.  Their
> > expectation was that half the pages in the range would be moved to node 1
> > in an interleaved pattern.
> > 
> > In the above situation, no pages actually get moved.  This is because mbind
> > creates a list of pages to be moved via:
> > 
> > 	ret = queue_pages_range(mm, start, end, nmask,
> >                           flags | MPOL_MF_INVERT, &pagelist);
> > 
> > No page will be added to the list as queue_folio_required is called for each
> > page to determine if it resides within the set of nodes.  And, all page are
> > within the set.
> > 
> > I have reread the mbind man page several times and agree that one might
> > expect MPOL_INTERLEAVE with MPOL_MF_MOVE_ALL to move pages and create an
> > interleaved pattern.  My question is should we:
> > - Change mbind so that pages are moved to an interleaved pattern?
> 
> I guess it could be worth trying, if there's a use case. And hope nobody
> else is depending on the current behavior and will complain afterwards :)

I am not sure this is worth it wrt. complexity. Essentially it would
require to build up the distribution for the whole range first so 2
passes. Also it could become more tricky if the final node mask has
nodes of difference distances (it would be a reasonable expectation to
distribute withe minimum total distances right ;)).
 
> > - Update the documentation to be more explicit?

Yes, please. I do not think. While this sounds like a neat feature I
think the additional complexity is likely not worth it. A strong usecase
might make a difference though.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mbind MPOL_INTERLEAVE existing pages
  2023-05-02 13:12   ` Michal Hocko
@ 2023-05-02 16:34     ` Mike Kravetz
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Kravetz @ 2023-05-02 16:34 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Vlastimil Babka, linux-mm, linux-kernel, Lorenzo Stoakes

On 05/02/23 15:12, Michal Hocko wrote:
> On Tue 02-05-23 09:45:40, Vlastimil Babka wrote:
> > On 5/1/23 20:58, Mike Kravetz wrote:
> > > I received a question from a customer that was trying to move pages via
> > > the mbind system call.  In this specific case, the system had two nodes
> > > and all pages in the range were already present on node 0.  They then
> > > called mbind with mode MPOL_INTERLEAVE and the MPOL_MF_MOVE_ALL flag.  Their
> > > expectation was that half the pages in the range would be moved to node 1
> > > in an interleaved pattern.
> > > 
> > > In the above situation, no pages actually get moved.  This is because mbind
> > > creates a list of pages to be moved via:
> > > 
> > > 	ret = queue_pages_range(mm, start, end, nmask,
> > >                           flags | MPOL_MF_INVERT, &pagelist);
> > > 
> > > No page will be added to the list as queue_folio_required is called for each
> > > page to determine if it resides within the set of nodes.  And, all page are
> > > within the set.
> > > 
> > > I have reread the mbind man page several times and agree that one might
> > > expect MPOL_INTERLEAVE with MPOL_MF_MOVE_ALL to move pages and create an
> > > interleaved pattern.  My question is should we:
> > > - Change mbind so that pages are moved to an interleaved pattern?
> > 
> > I guess it could be worth trying, if there's a use case. And hope nobody
> > else is depending on the current behavior and will complain afterwards :)
> 
> I am not sure this is worth it wrt. complexity. Essentially it would
> require to build up the distribution for the whole range first so 2
> passes. Also it could become more tricky if the final node mask has
> nodes of difference distances (it would be a reasonable expectation to
> distribute withe minimum total distances right ;)).

Yes, I was worried about the complexity of such a change.  At a high
level, interleave sounds easy.  But, like most things the details
could add a bunch of complexity.

> > > - Update the documentation to be more explicit?
> 
> Yes, please. I do not think. While this sounds like a neat feature I
> think the additional complexity is likely not worth it. A strong usecase
> might make a difference though.

Well, this user has a 'work around'.  They simply make sure to set the
policy of this area (a shared memory segment) before populating.  And,
I don't think they would really be happy with the cost of potentially
migrating hundreds of GB of data.

I'll send out a documentation update.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-02 16:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-01 18:58 mbind MPOL_INTERLEAVE existing pages Mike Kravetz
2023-05-02  7:45 ` Vlastimil Babka
2023-05-02 13:12   ` Michal Hocko
2023-05-02 16:34     ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox