From: Andrew Morton <akpm@osdl.org>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org, bls@sgi.com, jes@sgi.com,
lee.schermerhorn@hp.com, kamezawa.hiroyu@jp.fujitsu.com,
mtk-manpages@gmx.net
Subject: Re: [RFC 4/5] page migration: Support moving of individual pages
Date: Fri, 19 May 2006 16:45:39 -0700 [thread overview]
Message-ID: <20060519164539.401a8eec.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0605191603110.26870@schroedinger.engr.sgi.com>
Christoph Lameter <clameter@sgi.com> wrote:
>
> On Fri, 19 May 2006, Andrew Morton wrote:
>
> > > Possible page states:
> > >
> > > 0..MAX_NUMNODES The page is now on the indicated node.
> > >
> > > -ENOENT Page is not present or target node is not present
> >
> > So the caller has no way of distinguishing one case from the other?
> > Perhaps it would be better to permit that.
>
> But then we would not follow the meaning of the -Exx codes?
If we're returning this fine-grained info back to userspace (good) then we
should go all the way. If that's hard to do with the current
map-it-onto-existing-errnos approach then we've hit the limits of that
approach.
> > But it still feels a bit kludgy to me. Perhaps it would be nicer to define
> > a specific set of return codes for this application.
>
> The -Exx cocdes are in use thoughout the migration code for error
> conditions. We could do another pass through all of this and define
> specific error codes for page migration alone?
They're syscall return codes, not page-migration-per-page-result codes.
I'd have thought that would produce a cleaner result, really. I don't know
how much impact that would have from a back-compatibility POV though.
> > > Test program for this may be found with the patches
> > > on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm1
> >
> > The syscall is ia64-only at present. And that's OK, but if anyone has an
> > interest in page migration on other architectures (damn well hope so) then
> > let's hope they wire the syscall up and get onto it..
>
> Well I expecteed a longer discussion on how to do this, why are we doing
> it this way etc etc before the patch got in and before I would have to
> polish it up for prime time. Hopefully this whole thing does not become
> too volatile.
The patches looked fairly straightforward to me. Maybe I missed something ;)
> You are keeping this separate from the other material that
> is intended for 2.6.18 right?
yup.
> > > + const int __user *nodes,
> > > + int __user *status, int flags)
> > > +{
> >
> > I expect this is going to be a bitch to write compat emulation for. If we
> > want to support this syscall for 32-bit userspace.
>
> Page migration on a 32 bit platform? Do we really need that?
sys_migrate_pages is presently wired up in the x86 syscall table. And it's
available in x86_64's 32-bit mode.
> > If there's any possibility of that then perhaps we should revisit these
> > types, see if we can design this syscall so that it doesn't need a compat
> > wrapper.
> >
> > The `status' array should be char*, surely?
>
> Could be. But then its an integer status and not a character so I thought
> that an int would be cleaner.
As it's just a status result it's hard to see that we'd ever need more
bits. Might as well get the speed and space savings of using a char?
> > > + /*
> > > + * Check if this process has the right to modify the specified
> > > + * process. The right exists if the process has administrative
> > > + * capabilities, superuser privileges or the same
> > > + * userid as the target process.
> > > + */
> > > + if ((current->euid != task->suid) && (current->euid != task->uid) &&
> > > + (current->uid != task->suid) && (current->uid != task->uid) &&
> > > + !capable(CAP_SYS_NICE)) {
> > > + err = -EPERM;
> > > + goto out2;
> > > + }
> >
> > We have code which looks very much like this in maybe five or more places.
> > Someone should fix it ;)
>
> hmmm. yes this seems to be duplicated quite a bit.
>
> > > + task_nodes = cpuset_mems_allowed(task);
> > > + pm = kmalloc(GFP_KERNEL, (nr_pages + 1) * sizeof(struct page_to_node));
> >
> > A horrid bug. If userspace passes in a sufficiently large nr_pages, the
> > multiplication will overflow and we'll allocate far too little memory and
> > we'll proceed to scrog kernel memory.
>
> nr_pages is a 32 bit entity. On a 64 bit platform it will be difficult to
> overflow the result. So we only have an issue if we support move_pages()
> on 32 bit.
nr_pages is declared as unsigned long.
> > (OK, that's what would happen if you'd got the kmalloc args the correct way
> > around. As it stands, heaven knows what it'll do ;))
>
> It survived the test (ROTFL). But why did we add this gfp_t type if it
> does not cause the compiler to spit out a warning? We only get a warning
> with sparse checking?
>
> > > + err = -EFAULT;
> > > + if (get_user(addr, pages + i))
> > > + goto putback;
> >
> > No, we cannot run get_user() inside down_read(mmap_sem). Because that ends
> > up taking mmap_sem recursively and an intervening down_write() from another
> > process will deadlock the kernel.
>
> Ok. Will fix the numerous bugs next week unless there are more concerns on
> a basic conceptual level.
Who else is interested in these features apart from the high-end ia64
people?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-05-19 23:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-18 18:21 [RFC] page migration: patches for later than 2.6.18 Christoph Lameter
2006-05-18 18:21 ` [RFC 1/5] page migration: simplify migrate_pages() Christoph Lameter
2006-05-18 18:21 ` [RFC 2/5] page migration: handle freeing of pages in migrate_pages() Christoph Lameter
2006-05-18 18:21 ` [RFC 3/5] page migration: use allocator function for migrate_pages() Christoph Lameter
2006-05-18 18:21 ` [RFC 4/5] page migration: Support moving of individual pages Christoph Lameter
2006-05-19 19:27 ` Andrew Morton
2006-05-19 23:23 ` Christoph Lameter
2006-05-19 23:45 ` Andrew Morton [this message]
2006-05-20 0:46 ` Christoph Lameter
2006-05-22 8:02 ` Jes Sorensen
2006-05-18 18:21 ` [RFC 5/5] page migration: Detailed status for " Christoph Lameter
2006-05-18 18:21 ` [RFC 6/6] page migration: Support a vma migration function Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060519164539.401a8eec.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=bls@sgi.com \
--cc=clameter@sgi.com \
--cc=jes@sgi.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-mm@kvack.org \
--cc=mtk-manpages@gmx.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox