Re: [RFC 4/5] page migration: Support moving of individual pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@osdl.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org, bls@sgi.com, jes@sgi.com,
	lee.schermerhorn@hp.com, kamezawa.hiroyu@jp.fujitsu.com,
	mtk-manpages@gmx.net
Subject: Re: [RFC 4/5] page migration: Support moving of individual pages
Date: Fri, 19 May 2006 16:45:39 -0700	[thread overview]
Message-ID: <20060519164539.401a8eec.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0605191603110.26870@schroedinger.engr.sgi.com>

Christoph Lameter <clameter@sgi.com> wrote:
>
> On Fri, 19 May 2006, Andrew Morton wrote:
> 
> > > Possible page states:
> > > 
> > > 0..MAX_NUMNODES		The page is now on the indicated node.
> > > 
> > > -ENOENT		Page is not present or target node is not present
> > 
> > So the caller has no way of distinguishing one case from the other? 
> > Perhaps it would be better to permit that.
> 
> But then we would not follow the meaning of the -Exx codes?

If we're returning this fine-grained info back to userspace (good) then we
should go all the way.  If that's hard to do with the current
map-it-onto-existing-errnos approach then we've hit the limits of that
approach.

> > But it still feels a bit kludgy to me.  Perhaps it would be nicer to define
> > a specific set of return codes for this application.
> 
> The -Exx cocdes are in use thoughout the migration code for error 
> conditions. We could do another pass through all of this and define 
> specific error codes for page migration alone?

They're syscall return codes, not page-migration-per-page-result codes.

I'd have thought that would produce a cleaner result, really.  I don't know
how much impact that would have from a back-compatibility POV though.

> > > Test program for this may be found with the patches
> > > on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm1
> > 
> > The syscall is ia64-only at present.  And that's OK, but if anyone has an
> > interest in page migration on other architectures (damn well hope so) then
> > let's hope they wire the syscall up and get onto it..
> 
> Well I expecteed a longer discussion on how to do this, why are we doing 
> it this way etc etc before the patch got in and before I would have to 
> polish it up for prime time. Hopefully this whole thing does not become 
> too volatile.

The patches looked fairly straightforward to me.  Maybe I missed something ;)

> You are keeping this separate from the other material that 
> is intended for 2.6.18 right?

yup.

> > > +			const int __user *nodes,
> > > +			int __user *status, int flags)
> > > +{
> > 
> > I expect this is going to be a bitch to write compat emulation for.  If we
> > want to support this syscall for 32-bit userspace.
> 
> Page migration on a 32 bit platform? Do we really need that?

sys_migrate_pages is presently wired up in the x86 syscall table.  And it's
available in x86_64's 32-bit mode.

> > If there's any possibility of that then perhaps we should revisit these
> > types, see if we can design this syscall so that it doesn't need a compat
> > wrapper.
> > 
> > The `status' array should be char*, surely?
> 
> Could be. But then its an integer status and not a character so I thought 
> that an int would be cleaner.

As it's just a status result it's hard to see that we'd ever need more
bits.  Might as well get the speed and space savings of using a char?

> > > +	/*
> > > +	 * Check if this process has the right to modify the specified
> > > +	 * process. The right exists if the process has administrative
> > > +	 * capabilities, superuser privileges or the same
> > > +	 * userid as the target process.
> > > +	 */
> > > +	if ((current->euid != task->suid) && (current->euid != task->uid) &&
> > > +	    (current->uid != task->suid) && (current->uid != task->uid) &&
> > > +	    !capable(CAP_SYS_NICE)) {
> > > +		err = -EPERM;
> > > +		goto out2;
> > > +	}
> > 
> > We have code which looks very much like this in maybe five or more places. 
> > Someone should fix it ;)
> 
> hmmm. yes this seems to be duplicated quite a bit.
> 
> > > +	task_nodes = cpuset_mems_allowed(task);
> > > +	pm = kmalloc(GFP_KERNEL, (nr_pages + 1) * sizeof(struct page_to_node));
> > 
> > A horrid bug.  If userspace passes in a sufficiently large nr_pages, the
> > multiplication will overflow and we'll allocate far too little memory and
> > we'll proceed to scrog kernel memory.
> 
> nr_pages is a 32 bit entity. On a 64 bit platform it will be difficult to 
> overflow the result. So we only have an issue if we support move_pages() 
> on 32 bit.

nr_pages is declared as unsigned long.

> > (OK, that's what would happen if you'd got the kmalloc args the correct way
> > around.  As it stands, heaven knows what it'll do ;))
> 
> It survived the test (ROTFL). But why did we add this gfp_t type if it 
> does not cause the compiler to spit out a warning? We only get a warning 
> with sparse checking?
> 
> > > +		err = -EFAULT;
> > > +		if (get_user(addr, pages + i))
> > > +			goto putback;
> > 
> > No, we cannot run get_user() inside down_read(mmap_sem).  Because that ends
> > up taking mmap_sem recursively and an intervening down_write() from another
> > process will deadlock the kernel.
> 
> Ok. Will fix the numerous bugs next week unless there are more concerns on 
> a basic conceptual level.

Who else is interested in these features apart from the high-end ia64
people?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-05-19 23:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-18 18:21 [RFC] page migration: patches for later than 2.6.18 Christoph Lameter
2006-05-18 18:21 ` [RFC 1/5] page migration: simplify migrate_pages() Christoph Lameter
2006-05-18 18:21 ` [RFC 2/5] page migration: handle freeing of pages in migrate_pages() Christoph Lameter
2006-05-18 18:21 ` [RFC 3/5] page migration: use allocator function for migrate_pages() Christoph Lameter
2006-05-18 18:21 ` [RFC 4/5] page migration: Support moving of individual pages Christoph Lameter
2006-05-19 19:27   ` Andrew Morton
2006-05-19 23:23     ` Christoph Lameter
2006-05-19 23:45       ` Andrew Morton [this message]
2006-05-20  0:46         ` Christoph Lameter
2006-05-22  8:02         ` Jes Sorensen
2006-05-18 18:21 ` [RFC 5/5] page migration: Detailed status for " Christoph Lameter
2006-05-18 18:21 ` [RFC 6/6] page migration: Support a vma migration function Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060519164539.401a8eec.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=bls@sgi.com \
    --cc=clameter@sgi.com \
    --cc=jes@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mtk-manpages@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox