Re: question on page-migration code

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: question on page-migration code
  2005-04-07 22:16 question on page-migration code Ray Bryant
@ 2005-04-07 18:08 ` Marcelo Tosatti
  2005-04-11 14:20   ` Ray Bryant
                     ` (3 more replies)
  2005-04-07 22:44 ` Ray Bryant
  1 sibling, 4 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2005-04-07 18:08 UTC (permalink / raw)
  To: Ray Bryant; +Cc: Hirokazu Takahashi, Dave Hansen, linux-mm

On Thu, Apr 07, 2005 at 05:16:30PM -0500, Ray Bryant wrote:
> Hirokazu (and Marcelo),
> 
> In testing my manual page migration code, I've run up against a situation
> where the migrations are occasionally very slow.  They work ok, but they
> can take minutes to migrate a few megabytes of memory.
> 
> Dropping into kdb shows that the migration code is waiting in msleep() in
> migrate_page_common() due to an -EAGAIN return from page_migratable().
> A little further digging shows that the specific return in page_migratable()
> is the very last one there at the bottom of the routine.
> 
> I'm puzzled as to why the page is still busy in this case.  Previous code
> in page_migratable() has unmapped the page, its not in PageWriteback()
> because we would have taken a different return statement in that case.
> 
> According to /proc/meminfo, there are no pages in either SwapCache or
> Dirty state, and the system has been sync'd before the migrate_pages()
> call was issued.

Who is using the page? 

A little debugging might help similar to what bad_page does can help: 

        printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
                (int)(2*sizeof(page_flags_t)), (unsigned long)page->flags,
                page->mapping, page_mapcount(page), page_count(page));
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* question on page-migration code
@ 2005-04-07 22:16 Ray Bryant
  2005-04-07 18:08 ` Marcelo Tosatti
  2005-04-07 22:44 ` Ray Bryant
  0 siblings, 2 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-07 22:16 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: Marcelo Tosatti, Dave Hansen, linux-mm

Hirokazu (and Marcelo),

In testing my manual page migration code, I've run up against a situation
where the migrations are occasionally very slow.  They work ok, but they
can take minutes to migrate a few megabytes of memory.

Dropping into kdb shows that the migration code is waiting in msleep() in
migrate_page_common() due to an -EAGAIN return from page_migratable().
A little further digging shows that the specific return in page_migratable()
is the very last one there at the bottom of the routine.

I'm puzzled as to why the page is still busy in this case.  Previous code
in page_migratable() has unmapped the page, its not in PageWriteback()
because we would have taken a different return statement in that case.

According to /proc/meminfo, there are no pages in either SwapCache or
Dirty state, and the system has been sync'd before the migrate_pages()
call was issued.
-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* question on page-migration code
  2005-04-07 22:16 question on page-migration code Ray Bryant
  2005-04-07 18:08 ` Marcelo Tosatti
@ 2005-04-07 22:44 ` Ray Bryant
  1 sibling, 0 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-07 22:44 UTC (permalink / raw)
  To: Ray Bryant; +Cc: Hirokazu Takahashi, Marcelo Tosatti, Dave Hansen, linux-mm

Hirokazu (and Marcelo),

A little more information on this.  The first time I migrate the test process,
the migration is quite rapid.  The next time I migrate the same process (to
a new set of nodes, or back to the old set of nodes where it came from),
it is quite slow, and the migration remains this way, albeit with widely
varying times (e. g. 40s to 220 s), from that point forward.

I'm wondering if the page state is not being set quite correctly in the
migrated pages, thus causing needless waiting in migrate_page_common()
when the pages are migrated for a second time.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-07 18:08 ` Marcelo Tosatti
@ 2005-04-11 14:20   ` Ray Bryant
  2005-04-11 18:31   ` Ray Bryant
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-11 14:20 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Hirokazu Takahashi, Dave Hansen, linux-mm

Marcelo Tosatti wrote:
> On Thu, Apr 07, 2005 at 05:16:30PM -0500, Ray Bryant wrote:
> 
>>Hirokazu (and Marcelo),
>>
>>In testing my manual page migration code, I've run up against a situation
>>where the migrations are occasionally very slow.  They work ok, but they
>>can take minutes to migrate a few megabytes of memory.
>>
>>Dropping into kdb shows that the migration code is waiting in msleep() in
>>migrate_page_common() due to an -EAGAIN return from page_migratable().
>>A little further digging shows that the specific return in page_migratable()
>>is the very last one there at the bottom of the routine.
>>
>>I'm puzzled as to why the page is still busy in this case.  Previous code
>>in page_migratable() has unmapped the page, its not in PageWriteback()
>>because we would have taken a different return statement in that case.
>>
>>According to /proc/meminfo, there are no pages in either SwapCache or
>>Dirty state, and the system has been sync'd before the migrate_pages()
>>call was issued.
> 
> 
> Who is using the page? 
> 
> A little debugging might help similar to what bad_page does can help: 
> 
>         printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
>                 (int)(2*sizeof(page_flags_t)), (unsigned long)page->flags,
>                 page->mapping, page_mapcount(page), page_count(page));
> --

The suspect pages all have flags field of 105d and mapcount of 0, pagecount
of 3.  If I'm decoding the bits correctly, we've got the following bits
set:

Locked
Referenced
Uptodate
Dirty
Active
PG_arch_1

Doesn't tell me much.  Anything spring to mind when you look at these
bits, Marcelo?

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-07 18:08 ` Marcelo Tosatti
  2005-04-11 14:20   ` Ray Bryant
@ 2005-04-11 18:31   ` Ray Bryant
  2005-04-11 23:41     ` Hirokazu Takahashi
  2005-04-11 19:00   ` Ray Bryant
  2005-04-11 19:59   ` Ray Bryant
  3 siblings, 1 reply; 22+ messages in thread
From: Ray Bryant @ 2005-04-11 18:31 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Hirokazu Takahashi, Dave Hansen, linux-mm

Marcelo Tosatti wrote:
> On Thu, Apr 07, 2005 at 05:16:30PM -0500, Ray Bryant wrote:
> 
>>Hirokazu (and Marcelo),
>>
>>In testing my manual page migration code, I've run up against a situation
>>where the migrations are occasionally very slow.  They work ok, but they
>>can take minutes to migrate a few megabytes of memory.
>>
>>Dropping into kdb shows that the migration code is waiting in msleep() in
>>migrate_page_common() due to an -EAGAIN return from page_migratable().
>>A little further digging shows that the specific return in page_migratable()
>>is the very last one there at the bottom of the routine.
>>
>>I'm puzzled as to why the page is still busy in this case.  Previous code
>>in page_migratable() has unmapped the page, its not in PageWriteback()
>>because we would have taken a different return statement in that case.
>>
>>According to /proc/meminfo, there are no pages in either SwapCache or
>>Dirty state, and the system has been sync'd before the migrate_pages()
>>call was issued.
> 
> 
> Who is using the page? 
> 
> A little debugging might help similar to what bad_page does can help: 
> 
>         printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
>                 (int)(2*sizeof(page_flags_t)), (unsigned long)page->flags,
>                 page->mapping, page_mapcount(page), page_count(page));
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
> 
A little further digging shows that when we go into -EAGAIN case in
migrate_page_common(), we have flag bits 0x104D set, and when we finally
exit the routine, we have flags bits 0x004D set.  The 1 bit there is
PG_private, as near as I can tell (not PG_arch_1, I guess I can't count).

PagePrivate() is cleared by truncation specific code in migrate_onepage(),
but it doesn't appear to be cleared (directly) by code on the
generic_migrate_page() patch.  I wonder if this has something to do with
the problem I am seeing.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-07 18:08 ` Marcelo Tosatti
  2005-04-11 14:20   ` Ray Bryant
  2005-04-11 18:31   ` Ray Bryant
@ 2005-04-11 19:00   ` Ray Bryant
  2005-04-11 19:59   ` Ray Bryant
  3 siblings, 0 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-11 19:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Hirokazu Takahashi, Dave Hansen, linux-mm

Marcelo Tosatti wrote:

> Who is using the page? 
> 
> A little debugging might help similar to what bad_page does can help: 
> 
>         printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
>                 (int)(2*sizeof(page_flags_t)), (unsigned long)page->flags,
>                 page->mapping, page_mapcount(page), page_count(page));
> --

Marcello,

I wrote:

"PagePrivate() is cleared by truncation specific code in migrate_onepage(),
but it doesn't appear to be cleared (directly) by code on the
generic_migrate_page() patch.  I wonder if this has something to do with
the problem I am seeing. "

Ooops.  I didn't look deep enough.  migrate_page_common() calls
writeback_and_free_buffers(), which in turn calls try_to_release_page()
which will eventually call down to __clear_page_buffers() which will
clear PagePrivate().

So it looks like the following is perhaps what is happening:

(1)  We come into migrate_one_page() with the pages dirty.  (The first
      time we enter the -EAGAIN section of migate_page_common() we have
      flags = 105d, the last time through before succeeding, flags are
      104d, and when we do return flags=004d.
(2)  We have to wait around until the pages get paged out before we can
      migrated them.  (flags=004d).

I'll have to check and see if I believe it might take 3 minutes to page
out all of the pages of my application.  If so, then this explains what
is happening.

Does that make sense?

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-07 18:08 ` Marcelo Tosatti
                     ` (2 preceding siblings ...)
  2005-04-11 19:00   ` Ray Bryant
@ 2005-04-11 19:59   ` Ray Bryant
  3 siblings, 0 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-11 19:59 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Hirokazu Takahashi, Dave Hansen, linux-mm

Marcello,

Checking /proc/vmstat/pgpgout appears to indicate that the pages I am
migrating are being swapped out when I see the migration slow down,
although something is fishy with pgpgout.  pgpgout is supposed to be
KB of page I/O, but I know that I am migrating 8685 pages, at 16KB/page,
or 138960 KB.  pgpgout gets incremented by roughly twice this.
So it looks like either:

(1)  pgpgout is really sectors written, or
(2)  pages are being paged out twice as part of memory migration.

I still don't understand why this pageout process doesn't happen
every time I do a migration (e. g. never on the first time),
and why it is taking 210 s to page out 138960 K.  That's around 600 KB/s
of I/O to the paging disk.
-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-11 18:31   ` Ray Bryant
@ 2005-04-11 23:41     ` Hirokazu Takahashi
  2005-04-12  4:57       ` Ray Bryant
                         ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Hirokazu Takahashi @ 2005-04-11 23:41 UTC (permalink / raw)
  To: raybry; +Cc: marcelo.tosatti, haveblue, linux-mm

Hi Ray,

> >>Hirokazu (and Marcelo),
> >>
> >>In testing my manual page migration code, I've run up against a situation
> >>where the migrations are occasionally very slow.  They work ok, but they
> >>can take minutes to migrate a few megabytes of memory.
> >>
> >>Dropping into kdb shows that the migration code is waiting in msleep() in
> >>migrate_page_common() due to an -EAGAIN return from page_migratable().
> >>A little further digging shows that the specific return in page_migratable()
> >>is the very last one there at the bottom of the routine.
> >>
> >>I'm puzzled as to why the page is still busy in this case.  Previous code
> >>in page_migratable() has unmapped the page, its not in PageWriteback()
> >>because we would have taken a different return statement in that case.
> >>
> >>According to /proc/meminfo, there are no pages in either SwapCache or
> >>Dirty state, and the system has been sync'd before the migrate_pages()
> >>call was issued.
> > 
> > 
> > Who is using the page? 
> > 
> > A little debugging might help similar to what bad_page does can help: 
> > 
> >         printk(KERN_EMERG "flags:0x%0*lx mapping:%p mapcount:%d count:%d\n",
> >                 (int)(2*sizeof(page_flags_t)), (unsigned long)page->flags,
> >                 page->mapping, page_mapcount(page), page_count(page));
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
> > 
> A little further digging shows that when we go into -EAGAIN case in
> migrate_page_common(), we have flag bits 0x104D set, and when we finally
> exit the routine, we have flags bits 0x004D set.  The 1 bit there is
> PG_private, as near as I can tell (not PG_arch_1, I guess I can't count).
> 
> PagePrivate() is cleared by truncation specific code in migrate_onepage(),
> but it doesn't appear to be cleared (directly) by code on the
> generic_migrate_page() patch.  I wonder if this has something to do with
> the problem I am seeing.

I understand what happened on your machine.

PG_private is a filesystem specific flag, setting some filesystem
depending data in page->private. When the flag is set on a page,
only the local filesystem on which the page depends can handle it. 

Most of the filesystems uses page->private to manage buffers while
others may use it for different purposes. Each filesystem can
implement migrate_page method to handles page->private.
At this moment, only ext2 and ext3 have this method, which migrates
buffers without any I/Os.

If the method isn't implemented for the page, the migration code
calls pageout() and try_to_release_page() to release page->private
instead. 

Which filesystem are you using? I guess it might be XFS which
doesn't have the method yet.

Thank you,
Hirokazu Takahashi.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-11 23:41     ` Hirokazu Takahashi
@ 2005-04-12  4:57       ` Ray Bryant
  2005-04-12  5:43       ` Ray Bryant
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-12  4:57 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: marcelo.tosatti, haveblue, linux-mm

Hirokazu Takahashi wrote:
> Hi Ray,
> 
> 
> 
  <snip>
> 
> I understand what happened on your machine.
> 
> PG_private is a filesystem specific flag, setting some filesystem
> depending data in page->private. When the flag is set on a page,
> only the local filesystem on which the page depends can handle it. 
> 
> Most of the filesystems uses page->private to manage buffers while
> others may use it for different purposes. Each filesystem can
> implement migrate_page method to handles page->private.
> At this moment, only ext2 and ext3 have this method, which migrates
> buffers without any I/Os.
> 
> If the method isn't implemented for the page, the migration code
> calls pageout() and try_to_release_page() to release page->private
> instead. 
> 
> Which filesystem are you using? I guess it might be XFS which
> doesn't have the method yet.
> 
> Thank you,
> Hirokazu Takahashi.
> 
Yes, I am using XFS.  However, the thing I still don't understand
why the migration is fast the first time I use it, but then the
next time it is slow?  It is the case that swap I/O is apparently
happening for the pages when I see the slowdown, so I agree that
you've probably diagnosed that part of the problem.  (Well, I
would wonder why pageout() followed by try_to_release_page() is
soooo slow.  But hey perhaps we are doing I/O in one page units
or such and that could explain why the I/O takes so long.)

But why does the first migration happen so quickly?  I'm wondering
if the migration process doesn't leave the page in a state that
requires cleaning, whereas the pages as originally found didn't
need to be cleaned.  It would seem to me we would want the page
state after migration to be effectively the same as the page
state before migration.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-11 23:41     ` Hirokazu Takahashi
  2005-04-12  4:57       ` Ray Bryant
@ 2005-04-12  5:43       ` Ray Bryant
  2005-04-13  2:30         ` IWAMOTO Toshihiro
                           ` (2 more replies)
  2005-04-12 16:46       ` Dave Hansen
  2005-04-12 19:29       ` Ray Bryant
  3 siblings, 3 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-12  5:43 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: marcelo.tosatti, haveblue, linux-mm

Hi Hirokazu,

What appears to be happening is the following:

dirty pte bits are being swept into the page dirty bit as a side effect
of migration.  That is, if a page had pte_dirty(pte) set, then after
migration, it will have PageDirty(page) = true.

Only pages with PageDirty() set will be written to swap as part of the
process of trying to clear PG_private.  So, when I do the first migration,
the PG_dirty bit is not set on the page, but the dirty bit is set in the
pte.  Because PG_dirty is not set, the page does not get written to swap,
and the migration is fast.  However, at the end of the migration process,
the pages all have PG_dirty set and the pte dirty bits are cleared.

The second time I do the migration, the PG_dirty bits are still set
(left over from the first migration), so they have to be written to swap
and the migration is slow.  As part of the pageout(), try_to_release_page()
process, the PG_dirty is cleared, along with the pte dirty bits, as before.

When the program is resumed, it will cause the pte dirty bits to be set,
and then we will be back in the situation we started with before the first
migration.

Hence the third migration will be fast, and the 4th migration will be slow,
etc.  This is a stable, repeatable process.

I guess it seems to me that if a page has pte dirty set, but doesn't have
PG_dirty set, then that state should be carried over to the newpage after
a migration, rather than sweeping the pte dirty bit into the PG_dirty bit.

Another way to do this would be to implement the migrate dirty buffers
without swap I/O trick of ext2/3 in XFS, but that is somewhat far afield
for me to try.  :-)  I'll discuss this with Nathan Scott et al and see
if that is something that would be straightforward to do.

But I have a nagging suspicion that this covers up, rather than fixes
the state transition from oldpage to newpage that really shouldn't be
happening, as near as I can tell.

BTW, the program that I am testing creates a relatively large mapped file,
and, as you guessed, this file is backed by XFS.  Programs that just use
large amounts of anonymous storage are not effected by this problem, I
would imagine.
-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-11 23:41     ` Hirokazu Takahashi
  2005-04-12  4:57       ` Ray Bryant
  2005-04-12  5:43       ` Ray Bryant
@ 2005-04-12 16:46       ` Dave Hansen
  2005-04-13 10:48         ` Hirokazu Takahashi
  2005-04-12 19:29       ` Ray Bryant
  3 siblings, 1 reply; 22+ messages in thread
From: Dave Hansen @ 2005-04-12 16:46 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: raybry, Marcelo Tosatti, linux-mm

On Tue, 2005-04-12 at 08:41 +0900, Hirokazu Takahashi wrote:
> If the method isn't implemented for the page, the migration code
> calls pageout() and try_to_release_page() to release page->private
> instead. 
> 
> Which filesystem are you using? I guess it might be XFS which
> doesn't have the method yet.

Can we more easily detect and work around this in the code, so that this
won't happen for more filesystems?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-11 23:41     ` Hirokazu Takahashi
                         ` (2 preceding siblings ...)
  2005-04-12 16:46       ` Dave Hansen
@ 2005-04-12 19:29       ` Ray Bryant
  3 siblings, 0 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-12 19:29 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: marcelo.tosatti, haveblue, linux-mm

Hirokazu Takahashi wrote:

> 
> I understand what happened on your machine.
> 
> PG_private is a filesystem specific flag, setting some filesystem
> depending data in page->private. When the flag is set on a page,
> only the local filesystem on which the page depends can handle it. 
> 
> Most of the filesystems uses page->private to manage buffers while
> others may use it for different purposes. Each filesystem can
> implement migrate_page method to handles page->private.
> At this moment, only ext2 and ext3 have this method, which migrates
> buffers without any I/Os.
> 
> If the method isn't implemented for the page, the migration code
> calls pageout() and try_to_release_page() to release page->private
> instead. 
> 
> Which filesystem are you using? I guess it might be XFS which
> doesn't have the method yet.
> 
> Thank you,
> Hirokazu Takahashi.
> 

Hi Hirakazu,

Just to make sure, I re-ran my test case with the test program's
home directory (and hence where its mapped files reside) on an
ext3 file system instead of on XFS.  In this case, the
migrations are all fast; however, there are still significant
number of page I/O's occuring (135 MB worth, I am migrating
138 MB).  So it doesn't appear that an I/O-less migration is
going on here either.

-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-12  5:43       ` Ray Bryant
@ 2005-04-13  2:30         ` IWAMOTO Toshihiro
  2005-04-13  4:43         ` Hirokazu Takahashi
  2005-04-15  6:41         ` IWAMOTO Toshihiro
  2 siblings, 0 replies; 22+ messages in thread
From: IWAMOTO Toshihiro @ 2005-04-13  2:30 UTC (permalink / raw)
  To: Ray Bryant; +Cc: Hirokazu Takahashi, marcelo.tosatti, haveblue, linux-mm

At Tue, 12 Apr 2005 00:43:42 -0500,
Ray Bryant wrote:
> What appears to be happening is the following:
> 
> dirty pte bits are being swept into the page dirty bit as a side effect
> of migration.  That is, if a page had pte_dirty(pte) set, then after
> migration, it will have PageDirty(page) = true.
> 
> Only pages with PageDirty() set will be written to swap as part of the
> process of trying to clear PG_private.  So, when I do the first migration,
> the PG_dirty bit is not set on the page, but the dirty bit is set in the
> pte.  Because PG_dirty is not set, the page does not get written to swap,
> and the migration is fast.  However, at the end of the migration process,
> the pages all have PG_dirty set and the pte dirty bits are cleared.

When I wrote the migration code long ago, I didn't think about such
side effects and just followed the kswapd operation.

> I guess it seems to me that if a page has pte dirty set, but doesn't have
> PG_dirty set, then that state should be carried over to the newpage after
> a migration, rather than sweeping the pte dirty bit into the PG_dirty bit.

I think this is the correct solution.

--
IWAMOTO Toshihiro
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-12  5:43       ` Ray Bryant
  2005-04-13  2:30         ` IWAMOTO Toshihiro
@ 2005-04-13  4:43         ` Hirokazu Takahashi
  2005-04-15  6:41         ` IWAMOTO Toshihiro
  2 siblings, 0 replies; 22+ messages in thread
From: Hirokazu Takahashi @ 2005-04-13  4:43 UTC (permalink / raw)
  To: raybry; +Cc: marcelo.tosatti, haveblue, linux-mm

Hi Ray,

> Hi Hirokazu,
> 
> What appears to be happening is the following:
> 
> dirty pte bits are being swept into the page dirty bit as a side effect
> of migration.  That is, if a page had pte_dirty(pte) set, then after
> migration, it will have PageDirty(page) = true.
> 
> Only pages with PageDirty() set will be written to swap as part of the
> process of trying to clear PG_private.  So, when I do the first migration,
> the PG_dirty bit is not set on the page, but the dirty bit is set in the
> pte.  Because PG_dirty is not set, the page does not get written to swap,
> and the migration is fast.  However, at the end of the migration process,
> the pages all have PG_dirty set and the pte dirty bits are cleared.
>
> The second time I do the migration, the PG_dirty bits are still set
> (left over from the first migration), so they have to be written to swap
> and the migration is slow.  As part of the pageout(), try_to_release_page()
> process, the PG_dirty is cleared, along with the pte dirty bits, as before.
> 
> When the program is resumed, it will cause the pte dirty bits to be set,
> and then we will be back in the situation we started with before the first
> migration.

In both cases, the PG_dirty flag are always set before
writeback_and_free_buffers() is called, as try_to_unmap() moves
the pte dirty bits to the PG_dirty on the page prior to starting
the migration.

In my guess, the difference may be the PG_private flag.
In the first migration, the pages may not have the PG_private flag
while it may have the flag in the second time.
If the PG_dirty flag is set, Linux VM tends to make the pages
have their own private data, preparing the write-back I/Os.

The scenario might be like this:
At the first time, the pages can be migrated without any I/Os
as the PG_private isn't set even though the PG_dirty is set.
Linux VM may set the PG_private on the pages since they have the
PG_dirty.
At the second time, the write-back is required as both the
PG_private and the PG_dirty are set, clearing both of the flags.
At the third time, the pages don't have the PG_private and can
be migrated easily.

But, this is not what we expected;(

> Hence the third migration will be fast, and the 4th migration will be slow,
> etc.  This is a stable, repeatable process.
> 
> I guess it seems to me that if a page has pte dirty set, but doesn't have
> PG_dirty set, then that state should be carried over to the newpage after
> a migration, rather than sweeping the pte dirty bit into the PG_dirty bit.
> 
> Another way to do this would be to implement the migrate dirty buffers
> without swap I/O trick of ext2/3 in XFS, but that is somewhat far afield
> for me to try.  :-)  I'll discuss this with Nathan Scott et al and see
> if that is something that would be straightforward to do.
> 
> But I have a nagging suspicion that this covers up, rather than fixes
> the state transition from oldpage to newpage that really shouldn't be
> happening, as near as I can tell.
> 
> BTW, the program that I am testing creates a relatively large mapped file,
> and, as you guessed, this file is backed by XFS.  Programs that just use
> large amounts of anonymous storage are not effected by this problem, I
> would imagine.
> -- 
> Best Regards,
> Ray
> -----------------------------------------------
>                    Ray Bryant
> 512-453-9679 (work)         512-507-7807 (cell)
> raybry@sgi.com             raybry@austin.rr.com
> The box said: "Requires Windows 98 or better",
>             so I installed Linux.
> -----------------------------------------------
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-12 16:46       ` Dave Hansen
@ 2005-04-13 10:48         ` Hirokazu Takahashi
  2005-04-14 15:57           ` Marcelo Tosatti
  2005-04-19  2:46           ` Ray Bryant
  0 siblings, 2 replies; 22+ messages in thread
From: Hirokazu Takahashi @ 2005-04-13 10:48 UTC (permalink / raw)
  To: haveblue; +Cc: raybry, marcelo.tosatti, linux-mm

Hi,

> > If the method isn't implemented for the page, the migration code
> > calls pageout() and try_to_release_page() to release page->private
> > instead. 
> > 
> > Which filesystem are you using? I guess it might be XFS which
> > doesn't have the method yet.
> 
> Can we more easily detect and work around this in the code, so that this
> won't happen for more filesystems?

As Ray said, the following seems to be a straight approach.
I haven't had any other ideas to work around it.

ray> I guess it seems to me that if a page has pte dirty set, but doesn't have
ray> PG_dirty set, then that state should be carried over to the newpage after
ray> a migration, rather than sweeping the pte dirty bit into the PG_dirty bit.

The implementation might be as follows:
   - to make try_to_unmap_one() record dirty bit in anywhere
     instead of calling set_page_dirty().
   - to make touch_unmapped_address() call get_user_pages() with
     the record of the dirty bit.

However, we have to remember that there must exit some race conditions.
For example, it may fail to restore the dirty bit since the process
address spaces might be deleted during the memory migration.
This may occur as the process isn't suspended during the migration.

Thanks,
Hirokazu Takahashi.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-13 10:48         ` Hirokazu Takahashi
@ 2005-04-14 15:57           ` Marcelo Tosatti
  2005-04-19  2:46           ` Ray Bryant
  1 sibling, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2005-04-14 15:57 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: haveblue, raybry, linux-mm

On Wed, Apr 13, 2005 at 07:48:00PM +0900, Hirokazu Takahashi wrote:
> Hi,
> 
> > > If the method isn't implemented for the page, the migration code
> > > calls pageout() and try_to_release_page() to release page->private
> > > instead. 
> > > 
> > > Which filesystem are you using? I guess it might be XFS which
> > > doesn't have the method yet.
> > 
> > Can we more easily detect and work around this in the code, so that this
> > won't happen for more filesystems?
> 
> As Ray said, the following seems to be a straight approach.
> I haven't had any other ideas to work around it. 

>From my understanding there are two problems:

1) PG_private set on file pages whose filesystems do not implement 
->migrate_page() method.

Not much can be done about it, except implementing migrate_page() for all 
filesystems using page->private for uses other than buffer_head's.

BTW: only ext2/3 are implementing migrate_page(), all buffer_head 
based filesystems should do the same on a final version. 
Have you guys tried fs'es other than ext2/3? 

Dave, I dont understand what you mean with "workaround". The page is 
not migratable, thus the memory area which contains it can't 
be migrated either.

2) PG_dirty bit set on anonymous pages which have been migrated.

> ray> I guess it seems to me that if a page has pte dirty set, but doesn't have
> ray> PG_dirty set, then that state should be carried over to the newpage after
> ray> a migration, rather than sweeping the pte dirty bit into the PG_dirty bit.

The dirty bit is set by swap allocation and freeing code. 

> The implementation might be as follows:
>    - to make try_to_unmap_one() record dirty bit in anywhere
>      instead of calling set_page_dirty().
>    - to make touch_unmapped_address() call get_user_pages() with
>      the record of the dirty bit.

Quoting Ray:
"Checking /proc/vmstat/pgpgout appears to indicate that the pages I am
migrating are being swapped out when I see the migration slow down,
although something is fishy with pgpgout."

Anonymous pages seem to the problem Ray is seeing, except (1) which 
vanishes with ext2/ext3 as he reports.

Anon pages _should_ be removed from the swapcache at the end of 
generic_migrate_page (__remove_exclusive_swap_page()).

So, it does not matter if they have PG_dirty bit set, as long as
they are not swap-allocated (PageSwapCache).

Ray, please confirm that anon pages are removed from the swapcache after
being migrated (watching /proc/meminfo should do it).

One point is that if free memory is below the safe watermarks, the
system will vmscan, allocating swap & writing out, which is expected.

How much memory is free during said tests? 

> However, we have to remember that there must exit some race conditions.
> For example, it may fail to restore the dirty bit since the process
> address spaces might be deleted during the memory migration.
> This may occur as the process isn't suspended during the migration.

The PG_dirty bit is set, by the migration code, for anonymous pages only.

That said, I see no need to reset PG_dirty in case it was not set before
migration, as you propose.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-12  5:43       ` Ray Bryant
  2005-04-13  2:30         ` IWAMOTO Toshihiro
  2005-04-13  4:43         ` Hirokazu Takahashi
@ 2005-04-15  6:41         ` IWAMOTO Toshihiro
  2005-04-15 12:53           ` Marcelo Tosatti
  2 siblings, 1 reply; 22+ messages in thread
From: IWAMOTO Toshihiro @ 2005-04-15  6:41 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Hirokazu Takahashi, haveblue, raybry, linux-mm

At Thu, 14 Apr 2005 12:57:34 -0300,
Marcelo Tosatti wrote:
> 
> On Wed, Apr 13, 2005 at 07:48:00PM +0900, Hirokazu Takahashi wrote:

> 2) PG_dirty bit set on anonymous pages which have been migrated.
> 
> > ray> I guess it seems to me that if a page has pte dirty set, but doesn't have
> > ray> PG_dirty set, then that state should be carried over to the newpage after
> > ray> a migration, rather than sweeping the pte dirty bit into the PG_dirty bit.
> 
> The dirty bit is set by swap allocation and freeing code. 
> 
> > The implementation might be as follows:
> >    - to make try_to_unmap_one() record dirty bit in anywhere
> >      instead of calling set_page_dirty().
> >    - to make touch_unmapped_address() call get_user_pages() with
> >      the record of the dirty bit.
> 
> Quoting Ray:
> "Checking /proc/vmstat/pgpgout appears to indicate that the pages I am
> migrating are being swapped out when I see the migration slow down,
> although something is fishy with pgpgout."
> 
> Anonymous pages seem to the problem Ray is seeing, except (1) which 
> vanishes with ext2/ext3 as he reports.

I think Ray is using the word "swap" to mean "page out" and anonymous
pages are irrelevant here, judging from his another mail (quoted below).

At Tue, 12 Apr 2005 00:43:42 -0500,
Ray Bryant wrote:
: BTW, the program that I am testing creates a relatively large mapped file,
: and, as you guessed, this file is backed by XFS.  Programs that just use
: large amounts of anonymous storage are not effected by this problem, I
: would imagine.

> One point is that if free memory is below the safe watermarks, the
> system will vmscan, allocating swap & writing out, which is expected.

If there are enough RAM, mmaped dirty pages shouldn't be written back.
However, memory migration triggers writebacks.

> > However, we have to remember that there must exit some race conditions.
> > For example, it may fail to restore the dirty bit since the process
> > address spaces might be deleted during the memory migration.
> > This may occur as the process isn't suspended during the migration.
> 
> The PG_dirty bit is set, by the migration code, for anonymous pages only.

If a file page is mmaped and its PTE is dirty, the page gets PG_dirty
bit when it is unmapped.

> That said, I see no need to reset PG_dirty in case it was not set before
> migration, as you propose.

I think PG_dirty should be reset, as the side effect is probably
unacceptable for Ray's application.  It would be a bit more
complicated than just changing page and PTE bits, but I think it's
doable.


--
IWAMOTO Toshihiro
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-15  6:41         ` IWAMOTO Toshihiro
@ 2005-04-15 12:53           ` Marcelo Tosatti
  2005-04-18 10:37             ` IWAMOTO Toshihiro
  0 siblings, 1 reply; 22+ messages in thread
From: Marcelo Tosatti @ 2005-04-15 12:53 UTC (permalink / raw)
  To: IWAMOTO Toshihiro; +Cc: Hirokazu Takahashi, haveblue, raybry, linux-mm

Hi Toshihiro,

On Fri, Apr 15, 2005 at 03:41:38PM +0900, IWAMOTO Toshihiro wrote:
> At Thu, 14 Apr 2005 12:57:34 -0300,
> Marcelo Tosatti wrote:
> > 
> > On Wed, Apr 13, 2005 at 07:48:00PM +0900, Hirokazu Takahashi wrote:
> 
> > 2) PG_dirty bit set on anonymous pages which have been migrated.
> > 
> > > ray> I guess it seems to me that if a page has pte dirty set, but doesn't have
> > > ray> PG_dirty set, then that state should be carried over to the newpage after
> > > ray> a migration, rather than sweeping the pte dirty bit into the PG_dirty bit.
> > 
> > The dirty bit is set by swap allocation and freeing code. 
> > 
> > > The implementation might be as follows:
> > >    - to make try_to_unmap_one() record dirty bit in anywhere
> > >      instead of calling set_page_dirty().
> > >    - to make touch_unmapped_address() call get_user_pages() with
> > >      the record of the dirty bit.
> > 
> > Quoting Ray:
> > "Checking /proc/vmstat/pgpgout appears to indicate that the pages I am
> > migrating are being swapped out when I see the migration slow down,
> > although something is fishy with pgpgout."
> > 
> > Anonymous pages seem to the problem Ray is seeing, except (1) which 
> > vanishes with ext2/ext3 as he reports.
> 
> I think Ray is using the word "swap" to mean "page out" and anonymous
> pages are irrelevant here, judging from his another mail (quoted below).

Ah, OK.

> At Tue, 12 Apr 2005 00:43:42 -0500,
> Ray Bryant wrote:
> : BTW, the program that I am testing creates a relatively large mapped file,
> : and, as you guessed, this file is backed by XFS.  Programs that just use
> : large amounts of anonymous storage are not effected by this problem, I
> : would imagine.
> 
> > One point is that if free memory is below the safe watermarks, the
> > system will vmscan, allocating swap & writing out, which is expected.
> 
> If there are enough RAM, mmaped dirty pages shouldn't be written back.
> However, memory migration triggers writebacks.
> 
> > > However, we have to remember that there must exit some race conditions.
> > > For example, it may fail to restore the dirty bit since the process
> > > address spaces might be deleted during the memory migration.
> > > This may occur as the process isn't suspended during the migration.
> > 
> > The PG_dirty bit is set, by the migration code, for anonymous pages only.
> 
> If a file page is mmaped and its PTE is dirty, the page gets PG_dirty
> bit when it is unmapped. 

Right. 

> > That said, I see no need to reset PG_dirty in case it was not set before
> > migration, as you propose.
> 
> I think PG_dirty should be reset, as the side effect is probably
> unacceptable for Ray's application.  It would be a bit more
> complicated than just changing page and PTE bits, but I think it's
> doable.

Yes, makes sense.

Question: Who is causing the writeouts here? 

Is there memory pressure or is it pdflush? 

Its not the migration code? (that would be a problem I think).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-15 12:53           ` Marcelo Tosatti
@ 2005-04-18 10:37             ` IWAMOTO Toshihiro
  0 siblings, 0 replies; 22+ messages in thread
From: IWAMOTO Toshihiro @ 2005-04-18 10:37 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: IWAMOTO Toshihiro, Hirokazu Takahashi, haveblue, raybry, linux-mm

Hi,

At Fri, 15 Apr 2005 09:53:55 -0300,
Marcelo Tosatti wrote:
> On Fri, Apr 15, 2005 at 03:41:38PM +0900, IWAMOTO Toshihiro wrote:
> > At Thu, 14 Apr 2005 12:57:34 -0300,
> > Marcelo Tosatti wrote:

> > > That said, I see no need to reset PG_dirty in case it was not set before
> > > migration, as you propose.
> > 
> > I think PG_dirty should be reset, as the side effect is probably
> > unacceptable for Ray's application.  It would be a bit more
> > complicated than just changing page and PTE bits, but I think it's
> > doable.
> 
> Yes, makes sense.
> 
> Question: Who is causing the writeouts here? 
> 
> Is there memory pressure or is it pdflush? 
> 
> Its not the migration code? (that would be a problem I think).

If I understand correctly, writebacks happen in the following way.

1. The migration code unmaps dirty PTEs.
2. try_to_unmap() calls set_page_dirty() for such pages, setting
   PG_dirty and the dirty radix tree tag.
3. When pdflush is woken, it calls do_writepages().
4. At least for ext2 (I assume it is true for most file systems),
   do_writepage() calls result in mpage_writepages() calls, which scan
   radix trees for dirty tags.

--
IWAMOTO Toshihiro
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-13 10:48         ` Hirokazu Takahashi
  2005-04-14 15:57           ` Marcelo Tosatti
@ 2005-04-19  2:46           ` Ray Bryant
  2005-04-20 18:16             ` Marcelo Tosatti
  1 sibling, 1 reply; 22+ messages in thread
From: Ray Bryant @ 2005-04-19  2:46 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: haveblue, raybry, marcelo.tosatti, linux-mm

Hirokazu et al,

I'm sorry, I've been kind of out of the loop here since last Wenesday
(that's the day I left Austin to fly to Melbourne, Australia which is
where I am now, visiting the SGI lab in Melbourne).

Nathan Scott (who works at SGI Melbourne) looked at the ext2/ext3
migrate_page code and realized that basically the same implementation
would work for xfs.  So I now have a kernel that implements that
function for xfs and, as you predicted, the "slow down" in the 2nd
migration that I was seeing before has gone away.  I'll add Nathan's
patch to my manual page migration stuff in the next version (later
this week, I hope).

So I guess it doesn't matter to me at the moment whether or not
the PG_dirty bit is set on the pages, except that I philosphically
dislike the fact that migration changes the state of the page.
I'm not sure it matters, but I would prefer it if this didn't
happen.  However, I'm not adamant about this, since what I really
want to happen is to have a functioning manual page migration
system call.  It does seem to be a bother to have to add that
migrate_page method to each file system, since in most cases
the addition is going to look somewhat like it does for ext2/3.
For xfs, Nathan did add an additional bit to make sure that
xfs metadata pages were not considered migratable.

WRT, Marcelo's question as to who is causing the page out I/O
to occur during migration, let me go back and verify this is
actually what is happening.

Otherwise, is there a consensus about what to do about the
PG_dirty bits being set on the migrated pages?  As I read
things Marcelo says it is not worth it, but others think
that it should be fixed?
-- 
-----------------------------------------------
Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
	 so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: question on page-migration code
  2005-04-19  2:46           ` Ray Bryant
@ 2005-04-20 18:16             ` Marcelo Tosatti
  0 siblings, 0 replies; 22+ messages in thread
From: Marcelo Tosatti @ 2005-04-20 18:16 UTC (permalink / raw)
  To: Ray Bryant; +Cc: Hirokazu Takahashi, haveblue, raybry, linux-mm

Ray,

On Mon, Apr 18, 2005 at 09:46:03PM -0500, Ray Bryant wrote:
> Hirokazu et al,
> 
> I'm sorry, I've been kind of out of the loop here since last Wenesday
> (that's the day I left Austin to fly to Melbourne, Australia which is
> where I am now, visiting the SGI lab in Melbourne).
> 
> Nathan Scott (who works at SGI Melbourne) looked at the ext2/ext3
> migrate_page code and realized that basically the same implementation
> would work for xfs.  So I now have a kernel that implements that
> function for xfs and, as you predicted, the "slow down" in the 2nd
> migration that I was seeing before has gone away.  I'll add Nathan's
> patch to my manual page migration stuff in the next version (later
> this week, I hope).
> 
> So I guess it doesn't matter to me at the moment whether or not
> the PG_dirty bit is set on the pages, except that I philosphically
> dislike the fact that migration changes the state of the page.
> I'm not sure it matters, but I would prefer it if this didn't
> happen.  However, I'm not adamant about this, since what I really
> want to happen is to have a functioning manual page migration
> system call.  It does seem to be a bother to have to add that
> migrate_page method to each file system, since in most cases
> the addition is going to look somewhat like it does for ext2/3. 

One could create "block_migrate_page()" in fs/buffer.c so to void 
migrate_page definition on each filesystem which uses buffer_head's.

But all address_space_operations need to be updated anyway.

> For xfs, Nathan did add an additional bit to make sure that
> xfs metadata pages were not considered migratable.
> 
> WRT, Marcelo's question as to who is causing the page out I/O
> to occur during migration, let me go back and verify this is
> actually what is happening.
> 
> Otherwise, is there a consensus about what to do about the
> PG_dirty bits being set on the migrated pages?  As I read
> things Marcelo says it is not worth it, but others think
> that it should be fixed?

Dirty mmaped file pages will have their dirty tag migrated from  
ptes to pages via unmapping (try_to_unmap), which causes
pdflush to sync these pages when their inodes get aged, as 
Toshihiro notices.

I dislike the idea of "saving the dirty state to reinstantiate 
it later", but, it seems its the only way of avoiding the dirty 
mmaped file writeouts.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* question on page-migration code
@ 2005-04-07 23:05 Ray Bryant
  0 siblings, 0 replies; 22+ messages in thread
From: Ray Bryant @ 2005-04-07 23:05 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: Marcelo Tosatti, Dave Hansen, linux-mm

Well, even my previous description is not quite correct.
Here are the times for a series of 20 migrations,
from nodes 0-3 to 4-7, and then back again:

0.134u 1.425s 0:02.98 52.0%     0+0k 0+0io 1pf+0w
0.124u 0.395s 3:22.11 0.2%      0+0k 0+0io 24pf+0w
0.154u 1.494s 0:03.03 54.1%     0+0k 0+0io 8pf+0w
0.134u 1.137s 1:04.38 1.9%      0+0k 0+0io 28pf+0w
0.119u 0.723s 1:20.16 1.0%      0+0k 0+0io 8pf+0w
0.142u 1.299s 0:39.06 3.6%      0+0k 0+0io 28pf+0w
0.124u 0.526s 2:20.03 0.4%      0+0k 0+0io 0pf+0w
0.135u 1.336s 0:22.18 6.5%      0+0k 0+0io 0pf+0w
0.125u 1.128s 0:36.73 3.3%      0+0k 0+0io 8pf+0w
0.129u 1.099s 0:59.17 2.0%      0+0k 0+0io 28pf+0w
0.130u 0.679s 1:53.12 0.7%      0+0k 0+0io 8pf+0w
0.139u 1.193s 0:52.88 2.4%      0+0k 0+0io 28pf+0w
0.121u 0.621s 1:57.64 0.6%      0+0k 0+0io 8pf+0w
0.127u 1.241s 0:43.46 3.1%      0+0k 0+0io 28pf+0w
0.127u 0.734s 1:19.92 1.0%      0+0k 0+0io 8pf+0w
0.126u 1.317s 0:51.17 2.7%      0+0k 0+0io 28pf+0w
0.137u 0.613s 2:19.44 0.5%      0+0k 0+0io 8pf+0w
0.113u 1.290s 0:42.33 3.3%      0+0k 0+0io 28pf+0w
0.125u 0.538s 2:06.91 0.5%      0+0k 0+0io 7pf+0w
0.128u 1.328s 0:41.59 3.4%      0+0k 0+0io 28pf+0w

So trial #3 is an anamoly, since it completed quickly
as well.  All the rest of the trials completed very
slowly, in comparison.

Any idea what is going on here?

AFAIK, the test program is in steady state and doesn't
do any I/O.  So its behavior should not be a factor.
-- 
Best Regards,
Ray
-----------------------------------------------
                   Ray Bryant
512-453-9679 (work)         512-507-7807 (cell)
raybry@sgi.com             raybry@austin.rr.com
The box said: "Requires Windows 98 or better",
            so I installed Linux.
-----------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2005-04-20 18:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-07 22:16 question on page-migration code Ray Bryant
2005-04-07 18:08 ` Marcelo Tosatti
2005-04-11 14:20   ` Ray Bryant
2005-04-11 18:31   ` Ray Bryant
2005-04-11 23:41     ` Hirokazu Takahashi
2005-04-12  4:57       ` Ray Bryant
2005-04-12  5:43       ` Ray Bryant
2005-04-13  2:30         ` IWAMOTO Toshihiro
2005-04-13  4:43         ` Hirokazu Takahashi
2005-04-15  6:41         ` IWAMOTO Toshihiro
2005-04-15 12:53           ` Marcelo Tosatti
2005-04-18 10:37             ` IWAMOTO Toshihiro
2005-04-12 16:46       ` Dave Hansen
2005-04-13 10:48         ` Hirokazu Takahashi
2005-04-14 15:57           ` Marcelo Tosatti
2005-04-19  2:46           ` Ray Bryant
2005-04-20 18:16             ` Marcelo Tosatti
2005-04-12 19:29       ` Ray Bryant
2005-04-11 19:00   ` Ray Bryant
2005-04-11 19:59   ` Ray Bryant
2005-04-07 22:44 ` Ray Bryant
2005-04-07 23:05 Ray Bryant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox