From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 22 May 2006 13:29:05 -0700 From: Andrew Morton Subject: Re: tracking dirty pages patches Message-Id: <20060522132905.6e1a711c.akpm@osdl.org> In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Hugh Dickins Cc: a.p.zijlstra@chello.nl, torvalds@osdl.org, dhowells@redhat.com, linux-mm@kvack.org List-ID: Hugh Dickins wrote: > > Belated observations on your "tracking dirty pages" patches. Thanks, Hugh. > page_wrprotect is a nice use of rmap, but I see a couple of problems. > One is in the lock ordering (there's info on mm lock ordering at the > top of filemap.c, but I find the list at the top of rmap.c easier). > > set_page_dirty has always (awkwardly) been liable to be called from > very low in the hierarchy; whereas you're assuming clear_page_dirty > is called from much higher up. And in most cases there's no problem > (please cross-check to confirm that); but try_to_free_buffers in fs/ > buffer.c calls it while holding mapping->private_lock - page_wrprotect > called from test_clear_page_dirty then violates the order. > > If we're lucky and that is indeed the only violation, maybe Andrew > can recommend a change to try_to_free_buffers to avoid it: I have > no appreciation of the issues at that end myself. I had troubles with that as well - tree_lock is a very "inner" lock. So I moved test_clear_page_dirty()'s call to page_wrprotect() to be outside tree_lock. But I don't think you were referring to that - I am unable to evaluate your expression "the order". The running of page_wrprotect_file() inside private_lock is a worry, yes. We can move the clear_page_dirty() call in try_to_free_buffers() to be outside private_lock. But I don't know which particular ranking violation you've identified. > ... > > (Why does follow_pages set_page_dirty at all? I _think_ it's in case > the get_user_pages caller forgets to set_page_dirty when releasing. > But that's not how we usually write kernel code, to hide mistakes most > of the time, Yes, that would be bad. > and your mods may change the balance there. Andrew will > remember better whether that set_page_dirty has stronger justification.) It was added by the below, which nobody was terribly happy with at the time. (Took me 5-10 minutes to hunt this down. Insert rote comment about comments). Date: Mon, 19 Jan 2004 18:43:46 +0000 From: Linux Kernel Mailing List To: bk-commits-head@vger.kernel.org Subject: [PATCH] s390: endless loop in follow_page. ChangeSet 1.1490.3.215, 2004/01/19 10:43:46-08:00, akpm@osdl.org [PATCH] s390: endless loop in follow_page. From: Martin Schwidefsky Fix endless loop in get_user_pages() on s390. It happens only on s/390 because pte_dirty always returns 0. For all other architectures this is an optimization. In the case of "write && !pte_dirty(pte)" follow_page() returns NULL. On all architectures except s390 handle_pte_fault() will then create a pte with pte_dirty(pte)==1 because write_access==1. In the following, second call to follow_page() all is fine. With the physical dirty bit patch pte_dirty() is always 0 for s/390 because the dirty bit doesn't live in the pte. # This patch includes the following deltas: # ChangeSet 1.1490.3.214 -> 1.1490.3.215 # mm/memory.c 1.145 -> 1.146 # memory.c | 21 +++++++++++++-------- 1 files changed, 13 insertions(+), 8 deletions(-) diff -Nru a/mm/memory.c b/mm/memory.c --- a/mm/memory.c Mon Jan 19 15:47:24 2004 +++ b/mm/memory.c Mon Jan 19 15:47:24 2004 @@ -651,14 +651,19 @@ pte = *ptep; pte_unmap(ptep); if (pte_present(pte)) { - if (!write || (pte_write(pte) && pte_dirty(pte))) { - pfn = pte_pfn(pte); - if (pfn_valid(pfn)) { - struct page *page = pfn_to_page(pfn); - - mark_page_accessed(page); - return page; - } + if (write && !pte_write(pte)) + goto out; + if (write && !pte_dirty(pte)) { + struct page *page = pte_page(pte); + if (!PageDirty(page)) + set_page_dirty(page); + } + pfn = pte_pfn(pte); + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + + mark_page_accessed(page); + return page; } } - To unsubscribe from this list: send the line "unsubscribe bk-commits-head" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org