* unuse_pte: set pte dirty if the page is dirty
@ 2006-02-28 1:33 Christoph Lameter
2006-02-28 1:53 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2006-02-28 1:33 UTC (permalink / raw)
To: Hugh Dickins; +Cc: linux-mm, akpm
When replacing a swap pte with a real pte in unuse_pte, we simply generate
a pte that has no dirty bit set regardless of what state the page is in.
If a process wants to write to a dirty page after replacement then a
page fault has to first set the dirty bit in the pte.
This patch generates a pte with the dirty bit already set and so avoids
that fault.
Page migration moves a page from regular ptes to swap ptes and back
for anonymous page and so may generate lots of ptes that are not marked
dirty. This patch will increase the efficiency of page migration.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.16-rc5/mm/swapfile.c
===================================================================
--- linux-2.6.16-rc5.orig/mm/swapfile.c 2006-02-26 21:09:35.000000000 -0800
+++ linux-2.6.16-rc5/mm/swapfile.c 2006-02-27 17:17:38.000000000 -0800
@@ -425,10 +425,14 @@ void free_swap_and_cache(swp_entry_t ent
static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
unsigned long addr, swp_entry_t entry, struct page *page)
{
+ pte_t new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
+
inc_mm_counter(vma->vm_mm, anon_rss);
get_page(page);
+
set_pte_at(vma->vm_mm, addr, pte,
- pte_mkold(mk_pte(page, vma->vm_page_prot)));
+ PageDirty(page) ? pte_mkdirty(new_pte) : new_pte);
+
page_add_anon_rmap(page, vma, addr);
swap_free(entry);
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 1:33 unuse_pte: set pte dirty if the page is dirty Christoph Lameter
@ 2006-02-28 1:53 ` Andrew Morton
2006-02-28 1:57 ` Christoph Lameter
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2006-02-28 1:53 UTC (permalink / raw)
To: Christoph Lameter; +Cc: hugh, linux-mm
Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> When replacing a swap pte with a real pte in unuse_pte, we simply generate
> a pte that has no dirty bit set regardless of what state the page is in.
>
> If a process wants to write to a dirty page after replacement then a
> page fault has to first set the dirty bit in the pte.
>
> This patch generates a pte with the dirty bit already set and so avoids
> that fault.
>
> Page migration moves a page from regular ptes to swap ptes and back
> for anonymous page and so may generate lots of ptes that are not marked
> dirty. This patch will increase the efficiency of page migration.
>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
>
> Index: linux-2.6.16-rc5/mm/swapfile.c
> ===================================================================
> --- linux-2.6.16-rc5.orig/mm/swapfile.c 2006-02-26 21:09:35.000000000 -0800
> +++ linux-2.6.16-rc5/mm/swapfile.c 2006-02-27 17:17:38.000000000 -0800
> @@ -425,10 +425,14 @@ void free_swap_and_cache(swp_entry_t ent
> static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
> unsigned long addr, swp_entry_t entry, struct page *page)
> {
> + pte_t new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
> +
> inc_mm_counter(vma->vm_mm, anon_rss);
> get_page(page);
> +
> set_pte_at(vma->vm_mm, addr, pte,
> - pte_mkold(mk_pte(page, vma->vm_page_prot)));
> + PageDirty(page) ? pte_mkdirty(new_pte) : new_pte);
> +
> page_add_anon_rmap(page, vma, addr);
> swap_free(entry);
> /*
Are we sure this is race-free? Say, someone is in the process of cleaning
the page? munmap, conceivably swapout? We end up with a dirty pte
pointing at a now-clean page. The page will later become dirty again. Is
that a problem? It would generate a surprise if the vma had ben set
read-only in the interim, for example.
I can't immediately see a problem, but haven't thought about it a lot..
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 1:53 ` Andrew Morton
@ 2006-02-28 1:57 ` Christoph Lameter
2006-02-28 2:21 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2006-02-28 1:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: hugh, linux-mm
On Mon, 27 Feb 2006, Andrew Morton wrote:
> Are we sure this is race-free? Say, someone is in the process of cleaning
> the page? munmap, conceivably swapout? We end up with a dirty pte
> pointing at a now-clean page. The page will later become dirty again. Is
> that a problem? It would generate a surprise if the vma had ben set
> read-only in the interim, for example.
munmap sets the dirty bit in pages rather than clearing the dirty bits.
If we would set a dirty bit in a pte pointing to a now clean page then
unmapping (or the swaper) will mark the page dirty again and its going to
be rewritten again.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 1:57 ` Christoph Lameter
@ 2006-02-28 2:21 ` Andrew Morton
2006-02-28 4:20 ` Christoph Lameter
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2006-02-28 2:21 UTC (permalink / raw)
To: Christoph Lameter; +Cc: hugh, linux-mm
Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> On Mon, 27 Feb 2006, Andrew Morton wrote:
>
> > Are we sure this is race-free? Say, someone is in the process of cleaning
> > the page? munmap, conceivably swapout? We end up with a dirty pte
> > pointing at a now-clean page. The page will later become dirty again. Is
> > that a problem? It would generate a surprise if the vma had ben set
> > read-only in the interim, for example.
>
> munmap sets the dirty bit in pages rather than clearing the dirty bits.
>
> If we would set a dirty bit in a pte pointing to a now clean page then
> unmapping (or the swaper) will mark the page dirty again and its going to
> be rewritten again.
Precisely.
And will that crash the kernel, corrupt swapspace, or any other such
exciting things?
There are cases (eg, mprotect) in which a subsequent page-dirtying is
"impossible". Only we've now gone and made it possible. The worst part of
it is that we're made it possible in exceedingly rare circumstances.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 2:21 ` Andrew Morton
@ 2006-02-28 4:20 ` Christoph Lameter
2006-02-28 4:39 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2006-02-28 4:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: hugh, linux-mm
On Mon, 27 Feb 2006, Andrew Morton wrote:
> Christoph Lameter <clameter@engr.sgi.com> wrote:
> >
> > On Mon, 27 Feb 2006, Andrew Morton wrote:
> >
> > > Are we sure this is race-free? Say, someone is in the process of cleaning
> > > the page? munmap, conceivably swapout? We end up with a dirty pte
> > > pointing at a now-clean page. The page will later become dirty again. Is
> > > that a problem? It would generate a surprise if the vma had ben set
> > > read-only in the interim, for example.
> >
> > munmap sets the dirty bit in pages rather than clearing the dirty bits.
> >
> > If we would set a dirty bit in a pte pointing to a now clean page then
> > unmapping (or the swaper) will mark the page dirty again and its going to
> > be rewritten again.
>
> Precisely.
>
> And will that crash the kernel, corrupt swapspace, or any other such
> exciting things?
How could it do that if user space could accomplish the same dirtying
of the pte without harm?
> There are cases (eg, mprotect) in which a subsequent page-dirtying is
> "impossible". Only we've now gone and made it possible. The worst part of
> it is that we're made it possible in exceedingly rare circumstances.
Yes we need to check the VM_WRITE bit like in maybe_mkwrite() in
memory.c.... Thanks...
unuse_pte: set pte dirty if the page is dirty
When replacing a swap pte with a real pte in unuse_pte, we simply generate
a pte that has no dirty bit set regardless of what state the page is in.
If a process wants to write to a dirty page after replacement then a
page fault has to first set the dirty bit in the pte.
This patch generates a pte with the dirty bit already set and so avoids
that fault (if the vma is writable ....).
Page migration moves a page from regular ptes to swap ptes and back
for anonymous page. This patch will increase the efficiency of page migration.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Index: linux-2.6.16-rc5/mm/swapfile.c
===================================================================
--- linux-2.6.16-rc5.orig/mm/swapfile.c 2006-02-26 21:09:35.000000000 -0800
+++ linux-2.6.16-rc5/mm/swapfile.c 2006-02-27 20:12:44.000000000 -0800
@@ -425,10 +425,15 @@ void free_swap_and_cache(swp_entry_t ent
static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
unsigned long addr, swp_entry_t entry, struct page *page)
{
+ pte_t new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
+
inc_mm_counter(vma->vm_mm, anon_rss);
get_page(page);
+
set_pte_at(vma->vm_mm, addr, pte,
- pte_mkold(mk_pte(page, vma->vm_page_prot)));
+ (PageDirty(page) && (vma->vm_flags & VM_WRITE)) ?
+ pte_mkdirty(new_pte) : new_pte);
+
page_add_anon_rmap(page, vma, addr);
swap_free(entry);
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 4:20 ` Christoph Lameter
@ 2006-02-28 4:39 ` Andrew Morton
2006-02-28 5:32 ` Christoph Lameter
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2006-02-28 4:39 UTC (permalink / raw)
To: Christoph Lameter; +Cc: hugh, linux-mm
Christoph Lameter <clameter@engr.sgi.com> wrote:
>
> > There are cases (eg, mprotect) in which a subsequent page-dirtying is
> > "impossible". Only we've now gone and made it possible. The worst part of
> > it is that we're made it possible in exceedingly rare circumstances.
>
> Yes we need to check the VM_WRITE bit like in maybe_mkwrite() in
> memory.c.... Thanks...
>
I dunno - I said I hadn't thought about it much. But I'd like you guys to,
please - this is tricky stuff and bugs in there can reveal themselves in
horridly subtle ways. We need to spend much time, care and thought over
each change.
> static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
> unsigned long addr, swp_entry_t entry, struct page *page)
> {
> + pte_t new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
> +
> inc_mm_counter(vma->vm_mm, anon_rss);
> get_page(page);
> +
> set_pte_at(vma->vm_mm, addr, pte,
> - pte_mkold(mk_pte(page, vma->vm_page_prot)));
> + (PageDirty(page) && (vma->vm_flags & VM_WRITE)) ?
> + pte_mkdirty(new_pte) : new_pte);
> +
> page_add_anon_rmap(page, vma, addr);
> swap_free(entry);
argh. Whenever you find yourself thinking of the question-mark operator,
take a cold shower.
This?
--- devel/mm/swapfile.c~unuse_pte-set-pte-dirty-if-the-page-is-dirty 2006-02-27 20:33:19.000000000 -0800
+++ devel-akpm/mm/swapfile.c 2006-02-27 20:34:32.000000000 -0800
@@ -480,10 +480,16 @@ unsigned int count_swap_pages(int type,
static void unuse_pte(struct vm_area_struct *vma, pte_t *pte,
unsigned long addr, swp_entry_t entry, struct page *page)
{
+ pte_t new_pte;
+
inc_mm_counter(vma->vm_mm, anon_rss);
get_page(page);
- set_pte_at(vma->vm_mm, addr, pte,
- pte_mkold(mk_pte(page, vma->vm_page_prot)));
+
+ new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
+ if (PageDirty(page) && (vma->vm_flags & VM_WRITE))
+ new_pte = pte_mkdirty(new_pte);
+ set_pte_at(vma->vm_mm, addr, pte, new_pte);
+
page_add_anon_rmap(page, vma, addr);
swap_free(entry);
/*
_
I think it has the same race - if the page gets cleaned and someone
mprotects the vma to remove VM_WRITE, we dirty an undirtiable page.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 4:39 ` Andrew Morton
@ 2006-02-28 5:32 ` Christoph Lameter
2006-02-28 14:05 ` Hugh Dickins
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2006-02-28 5:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: hugh, linux-mm
On Mon, 27 Feb 2006, Andrew Morton wrote:
> argh. Whenever you find yourself thinking of the question-mark operator,
> take a cold shower.
Hehe.... Yes I love it...
> This?
Okay lets continue the work based on that...
> I think it has the same race - if the page gets cleaned and someone
> mprotects the vma to remove VM_WRITE, we dirty an undirtiable page.
unuse_pte is used:
1. To switch off a swap device.
2. To reestablish ptes for a migrated anonymous page.
In both cases we are only dealing with anonymous pages. The only writer
can be the swap code and as far as I can tell the only risk is writing a
swap page out once again. That is if it would be cleaned by pageout().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 5:32 ` Christoph Lameter
@ 2006-02-28 14:05 ` Hugh Dickins
2006-02-28 16:06 ` Christoph Lameter
0 siblings, 1 reply; 9+ messages in thread
From: Hugh Dickins @ 2006-02-28 14:05 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-mm
On Mon, 27 Feb 2006, Christoph Lameter wrote:
>
> unuse_pte is used:
>
> 1. To switch off a swap device.
>
> 2. To reestablish ptes for a migrated anonymous page.
>
> In both cases we are only dealing with anonymous pages. The only writer
> can be the swap code and as far as I can tell the only risk is writing a
> swap page out once again. That is if it would be cleaned by pageout().
I shared Andrew's unease, but couldn't put my finger on any actual
problem. But in the course of writing a much more hesitant reply,
came to realize the patch is just bogus. Did you ever measure any
improvement from it, on any architecture? 0% is my estimate.
I was recommending that the VM_WRITE test be replaced by a pte_write
test, when I remembered that vm_page_prot on any vma which contains
anonymous pages (excepting the very rare Linus ptrace case) will not
grant write access (see comment above unuse_pte). So if this pte is
actually written to afterwards, you'll have to handle a write fault
on it, won't you? No saving whatever from presetting dirty - or am
I misunderstanding how the architecture closest to your heart works?
I guess you could work around that by checking mapcount+swapcount
and granting write access in the common uniquely-mapped case; but
swapoff has never bothered to do so. Unless you can come up with
convincing numbers, I'd say let it die - halve the time of a
significant migration testcase? yes, we should make a patch;
shave 5% off it? no, for peace of mind let's not worry about it.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: unuse_pte: set pte dirty if the page is dirty
2006-02-28 14:05 ` Hugh Dickins
@ 2006-02-28 16:06 ` Christoph Lameter
0 siblings, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2006-02-28 16:06 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, linux-mm
On Tue, 28 Feb 2006, Hugh Dickins wrote:
> I shared Andrew's unease, but couldn't put my finger on any actual
> problem. But in the course of writing a much more hesitant reply,
> came to realize the patch is just bogus. Did you ever measure any
> improvement from it, on any architecture? 0% is my estimate.
I have not actually measured it. What I see though are dirty pages
referred to by ptes that are not dirty after page migration. The
ptes were dirty before migration.
> I was recommending that the VM_WRITE test be replaced by a pte_write
> test, when I remembered that vm_page_prot on any vma which contains
> anonymous pages (excepting the very rare Linus ptrace case) will not
> grant write access (see comment above unuse_pte). So if this pte is
> actually written to afterwards, you'll have to handle a write fault
> on it, won't you? No saving whatever from presetting dirty - or am
> I misunderstanding how the architecture closest to your heart works?
Yuck....
> I guess you could work around that by checking mapcount+swapcount
> and granting write access in the common uniquely-mapped case; but
> swapoff has never bothered to do so. Unless you can come up with
> convincing numbers, I'd say let it die - halve the time of a
> significant migration testcase? yes, we should make a patch;
> shave 5% off it? no, for peace of mind let's not worry about it.
Right. Thanks. At least I can now justify the vanishing of the
dirty bit from the ptes.
Hmm.. Maybe ultimately we need to have a special mechanism (like Marcelo's
migration cache) to remove ptes and add ptes during page migration. That
could take this into account but it seems that such a mechanism better be
separate from the common swap code.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-02-28 16:06 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-28 1:33 unuse_pte: set pte dirty if the page is dirty Christoph Lameter
2006-02-28 1:53 ` Andrew Morton
2006-02-28 1:57 ` Christoph Lameter
2006-02-28 2:21 ` Andrew Morton
2006-02-28 4:20 ` Christoph Lameter
2006-02-28 4:39 ` Andrew Morton
2006-02-28 5:32 ` Christoph Lameter
2006-02-28 14:05 ` Hugh Dickins
2006-02-28 16:06 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox