Re: manual page migration -- issue list

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Paul Jackson <pj@sgi.com>
To: Robin Holt <holt@sgi.com>
Cc: raybry@sgi.com, linux-mm@kvack.org, ak@muc.de,
	haveblue@us.ibm.com, marcello@cyclades.com,
	stevel@mwwireless.net, peterc@gelato.unsw.edu.au
Subject: Re: manual page migration -- issue list
Date: Wed, 16 Feb 2005 07:45:50 -0800	[thread overview]
Message-ID: <20050216074550.313b1300.pj@sgi.com> (raw)
In-Reply-To: <20050216113047.GA8388@lnx-holt.americas.sgi.com>

Robin wrote:
> Until then, there is no clear win over first
> touch for their type of application.

Huh?  So what was the point of this rant? <grin>

You seem to explain why first touch is used instead of the Linux 2.6
numa placement calls mbind/mempolicy, in some third party code that
runs on multiple operating systems.

But I thought this was the page migration thread, not the placement
policy thread.

Now I am as mystified with your latest comments as I was with Andi's
discussion of using these memory policy calls.

Regardless of what mechanisms we use to guide future allocations to
their proper nodes, how best can we provide a facility to migrate
already allocated physical memory pages to other nodes?  That's the
question, or so I thought, on this thread.

To repeat myself ...

> The next concern that rises to the top for me was best expressed by Andi:
> >
> > The main reasons for that is that I don't think external
> > processes should mess with virtual addresses of another process.
> > It just feels unclean and has many drawbacks (parsing /proc/*/maps
> > needs complicated user code, racy, locking difficult).  
> > 
> > In kernel space handling full VMs is much easier and safer due to better 
> > locking facilities.
> 
> I share Andi's concerns, but I don't see what to do about this. 

Perhaps a part of the answer is that we aren't messing with (as in
"changing") the virtual addresses of other processes.  The migration
call is only reading these addresses.  What it messes with is the
_physical_ addresses ;).

Though this proposed call still seems to have some of the same drawbacks.

One of my motivations for persuing the no-array version of this call
that you loved so much was that it (my latest variant, anyway) didn't
pass any virtual address ranges in, further simplifying what crossed the
user-kernel boundary and leaving details of parsing the virtual address
layout of tasks strictly to the kernel (no need to read /proc/*/maps).

But it seems that if we are going to achieve the fairly significant
optimizations you enumerated in your example a few hours ago, we at
least have to parse the /proc/*/maps files.

Hmmm ... wait just a minute ... isn't parsing the maps files in /proc
really scanning the virtual addresses of tasks.  In your example of a
few hours ago, which seemed to only require 3 system calls and one full
scan of any task address space, did you read all the /proc/*/maps files,
for all 256 of the tasks involved?  I would think you would have to have
done so, or else one of these tasks could be holding onto some private
memory of its own that we would need to migrate.  Are the stack pages
and any per-thread private data on pages visible to all the threads, or
are some of these pages private to each thread?  Does anything prevent a
thread from having additional private pages invisible to the other
threads?

Could you redo your example, including scans implied by reading maps
files, and including system calls needed to do those reads, and needed
to migrate any private pages they might have?  Perhaps your preferred
API doesn't have such an insane advantage after all.

I'm fixing soon to consider another variant of this call, that takes an
_array_ of pids, along with the old and new arrays of nodes, but takes
no virtual address range.  The kernel would scan each pid in the array,
migrating anything found on any old node to the corresponding new node,
all in one system call.  If my speculations above are right, this does
the minimum of scans, one per pid, and the minimum number of system
calls - one.  And does so without involving the user space code in racy
maps file reading to determine what to call (though the kernel code
would probably still have more than its share of races to fuss over).

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.650.933.1373, 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2005-02-16 15:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-15 23:52 Ray Bryant
2005-02-16  0:09 ` Paul Jackson
2005-02-16  0:28   ` Ray Bryant
2005-02-16  0:51 ` Paul Jackson
2005-02-16  1:17   ` Paul Jackson
2005-02-16  2:01     ` Robin Holt
2005-02-16  4:04       ` Ray Bryant
2005-02-16  4:28         ` Paul Jackson
2005-02-16  4:24       ` Paul Jackson
2005-02-16  3:55     ` Ray Bryant
2005-02-16  1:56   ` Robin Holt
2005-02-16  4:22     ` Paul Jackson
2005-02-16  9:20       ` Robin Holt
2005-02-16 10:20         ` Paul Jackson
2005-02-16 11:30           ` Robin Holt
2005-02-16 15:45             ` Paul Jackson [this message]
2005-02-16 16:08               ` Robin Holt
2005-02-16 19:23                 ` Paul Jackson
2005-02-16 19:56                   ` Robin Holt
2005-02-16 23:08           ` Ray Bryant
2005-02-16 23:05         ` Ray Bryant
2005-02-17  0:28           ` Paul Jackson
2005-02-16  1:41 ` Paul Jackson
2005-02-16  3:56   ` Ray Bryant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050216074550.313b1300.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=ak@muc.de \
    --cc=haveblue@us.ibm.com \
    --cc=holt@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=marcello@cyclades.com \
    --cc=peterc@gelato.unsw.edu.au \
    --cc=raybry@sgi.com \
    --cc=stevel@mwwireless.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox