linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ray Bryant <raybry@engr.sgi.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Andi Kleen <ak@suse.de>, Ray Bryant <raybry@sgi.com>,
	Hirokazu Takahashi <taka@valinux.co.jp>,
	Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
	Dave Hansen <haveblue@us.ibm.com>, linux-mm <linux-mm@kvack.org>,
	Nathan Scott <nathans@sgi.com>, Ray Bryant <raybry@austin.rr.com>,
	lhms-devel@lists.sourceforge.net,
	Jes Sorensen <jes@wildopensource.com>,
	Steve Longerbeam <stevel@mwwireless.net>
Subject: Re: [Lhms-devel] Re: [PATCH 2.6.12-rc3 1/8] mm: manual page migration-rc2 -- xfs-extended-attributes-rc2.patch
Date: Fri, 20 May 2005 17:26:47 -0500	[thread overview]
Message-ID: <428E6427.7060401@engr.sgi.com> (raw)
In-Reply-To: <20050512104543.GA14799@infradead.org>

Christoph Hellwig wrote:
> On Wed, May 11, 2005 at 09:32:07PM +0200, Andi Kleen wrote:
> 
>>A minor change for that is probably ok, as long as the actual logic
>>who uses this is generic. 
>>
>>hch: if you still are against this please reread the original thread
>>with me and Ray and see why we decided that ld.so changes are not
>>a good idea.
> 
> 
> So reading through the thread I think using mempolicies to mark shared
> libraries is better than the mmap flag I proposed.  I still don't think
> xattrs interpreted by the kernel is a good way to store them.  Setting
> up libraries is the job of the dynamic linker, and reading pre-defined
> memory policies from an ELF header fits the approach we do for related
> things.
> 
> 
> 

Christoph and Andi,

OK, here are the alternatives I have figured out, I'd appreciate feedback
on which of these would be acceptable.  (In each case, the migration
attributes being set are either:  MIGRATE_NONE to indicate that nothing
in this mapped file should be migrated, or MIGRATE_NS to indicate that
the non-shared pages should be migrated, this is the normal setting for
shared library files.  And, since madvise() is mostly about I/O related
things, I'm assuming here that I extend mbind() to set the migration
attributes.):

(1)  Use mbind() to set "shallow" vm attributes.  (I use shallow
versus deep here to indicate whether or not other processes that map
the same object can see the attributes -- this basically also maps
to whether we put the attributes in the vma [shallow] or in the
memory_object [deep].)

In the shallow case, mbind() has to be called in
each address space in order to properly set the migration flags the
same way in each address space that maps a shared object.  So, we
basically have to call mbind() from ld.so.

As far as I am concerned this is a fundamental show stopper, since we
without broad glibc support, we will never get the changes
into ld.so for just Altix and page migration.  It also doesn't handle
the case of shared, mapped r/o data files.   We can leverage Steve
Longerbeam's work here, but he also doesn't have a time frame as to
when his ld.so changes might be accepted by the glibc developers.

It does allow one to mark anonymous memory with migration policy.
However, any use of that  I've been able to think of (e. g. marking
some anonymous pages as MIGRATE_NONE and then calling migrat_pages())
could equally well be handled by mbind(.., MPOL_MF_STRICT | MPOL_MF_MOVE) 
(MPOL_MF_MOVE is in Steve Longerbeams patch and says to move the pages
that don't match the memory policy -- we plan to hook this up to the migration
code at some point in the future.)

(2)  Use mbind() to set "deep" vm attributes.  There appear to be
two places where the deep attributes could be set:  in the
address space object vma->file->f_mapping or in the inode
vma->file->f_mapping->host.  Some upper order bits of address_space.
flags could be used, but there appear to be concurrency issues
there.  Bits in inode.i_flags also appear to be available.

The advantage of setting "deep" vm attributes is that this interface
could be used by ld.so, but in advance of getting the changes
accepted there, we could also set the deep attributes in a migration
library before calling migrate_pages().  (deep attrbutes are be
seen from any address space that maps the object.)  Then when ld.so
changes are in, we can reduce the work done by the migration library.

(3)  The problem with (2) is that to set a deep attribute, one has
to do 4 system calls: open, mmap, mbind, munmap.  If we add the
migration attributes to fcntl() [such as Paul Jackson has suggested],
then it we could set them directly in the inode with one system call.
Perhaps not a big deal, but something to think about.  It's also
simpler, easier to maintain code.

(4)  Then there is the original, extended attribute approach.  I'm
including this one last time just to observe that:
      (i)  This correctly handles regular data (non-elf) files.
     (ii)  If one wants to migrate just a portion of anonymous
           memory, one could still use mbind(...MPOL_MF_STRICT | MPOL_MF_MOVE)
    (iii)  How to set the migration policy is based on how a shared file
           is mapped in multiple address spaces.  It is not so much
           a characterstic of an individual address space's usage of
           the file.  So, it seems natural to associate these with
           the file and not the particular instance in one address space
           (that is alternative (1)).
       If using a system attribute is too much change to fs code,
then let's use a user attribute here.  It's not perfect, but it is
doable, and doesn't require any fs changes.  (We'll just not support
migration policy in file systems that don't have extended attributes.)

In short, as near as I can tell, alternative (1) really doesn't do
what we want, and is the hardest to implement and get into a production
kernel.  I still like (4) best, but I can live with (2) or (3).
Both (2) and (3) have interim approaches that can be made to work
until Steve Longerbeam's stuff makes it into ld.so, at which point
I can easily merge my required changes in with his.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  parent reply	other threads:[~2005-05-20 22:26 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-11  4:37 [PATCH 2.6.12-rc3 0/8] mm: manual page migration-rc2 -- overview Ray Bryant
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 1/8] mm: manual page migration-rc2 -- xfs-extended-attributes-rc2.patch Ray Bryant
2005-05-11  7:15   ` Christoph Hellwig
2005-05-11 12:10     ` [Lhms-devel] " Ray Bryant
2005-05-11 12:59       ` Andi Kleen
2005-05-11 18:43         ` Ray Bryant
2005-05-11 19:32           ` Andi Kleen
2005-05-11 20:00             ` Christoph Hellwig
2005-05-11 22:04               ` Ray Bryant
2005-05-12 10:45             ` Christoph Hellwig
2005-05-17  4:22               ` Ray Bryant
2005-05-18  6:20                 ` Paul Jackson
2005-05-18 14:49                   ` Ray Bryant
2005-05-20 22:26               ` Ray Bryant [this message]
2005-05-23 17:50                 ` Steve Longerbeam
2005-05-24  4:53                   ` Ray Bryant
2005-05-24 20:59                     ` Christoph Lameter
2005-05-24 21:04                       ` Martin J. Bligh
2005-05-25  6:42                       ` Ray Bryant
2005-05-28  8:40                         ` Christoph Hellwig
2005-05-28 16:12                           ` Ray Bryant
2005-05-11 19:50       ` Christoph Hellwig
2005-05-11 21:30         ` Ray Bryant
2005-05-12  9:55           ` Christoph Hellwig
2005-05-12 15:47             ` Ray Bryant
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 2/8] mm: manual page migration-rc2 -- xfs-migrate-page-rc2.patch Ray Bryant
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 3/8] mm: manual page migration-rc2 -- add-node_map-arg-to-try_to_migrate_pages-rc2.patch Ray Bryant
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 4/8] mm: manual page migration-rc2 -- add-sys_migrate_pages-rc2.patch Ray Bryant
2005-05-11  8:24   ` Christoph Hellwig
2005-05-18 19:07     ` Ray Bryant
2005-05-28  9:14       ` Christoph Hellwig
2005-05-28 15:53         ` Ray Bryant
2005-05-11 13:23   ` Hirokazu Takahashi
2005-05-11 13:26     ` Hirokazu Takahashi
2005-05-11 14:06     ` Ray Bryant
2005-05-12  6:41       ` Hirokazu Takahashi
2005-05-12 16:41         ` Ray Bryant
2005-05-12 23:50           ` Hirokazu Takahashi
2005-05-13  9:59             ` [Lhms-devel] " Ray Bryant
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 5/8] mm: manual page migration-rc2 -- sys_migrate_pages-xattr-support-rc2.patch Ray Bryant
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 6/8] mm: manual page migration-rc2 -- sys_migrate_pages-mempolicy-migration-rc2.patch Ray Bryant
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 7/8] mm: manual page migration-rc2 -- sys_migrate_pages-cpuset-support-rc2.patch Ray Bryant
2005-05-11 12:37   ` Paul Jackson
2005-05-11 14:20     ` Ray Bryant
2005-05-11 18:55       ` [Lhms-devel] " Paul Jackson
2005-05-11  4:38 ` [PATCH 2.6.12-rc3 8/8] mm: manual page migration-rc2 -- sys_migrate_pages-permissions-check-rc2.patch Ray Bryant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=428E6427.7060401@engr.sgi.com \
    --to=raybry@engr.sgi.com \
    --cc=ak@suse.de \
    --cc=haveblue@us.ibm.com \
    --cc=hch@infradead.org \
    --cc=jes@wildopensource.com \
    --cc=lhms-devel@lists.sourceforge.net \
    --cc=linux-mm@kvack.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=nathans@sgi.com \
    --cc=raybry@austin.rr.com \
    --cc=raybry@sgi.com \
    --cc=stevel@mwwireless.net \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox