From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 75EAF620001 for ; Mon, 21 Dec 2009 14:57:53 -0500 (EST) Date: Mon, 21 Dec 2009 19:57:40 +0000 From: Mel Gorman Subject: Re: [PATCH 14 of 28] pte alloc trans splitting Message-ID: <20091221195740.GC23345@csn.ul.ie> References: <20091218190334.GF21194@csn.ul.ie> <20091219155948.GA29790@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20091219155948.GA29790@random.random> Sender: owner-linux-mm@kvack.org To: Andrea Arcangeli Cc: linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Rik van Riel , Andi Kleen , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , Andrew Morton List-ID: On Sat, Dec 19, 2009 at 04:59:48PM +0100, Andrea Arcangeli wrote: > On Fri, Dec 18, 2009 at 07:03:34PM +0000, Mel Gorman wrote: > > On Thu, Dec 17, 2009 at 07:00:17PM -0000, Andrea Arcangeli wrote: > > > From: Andrea Arcangeli > > > > > > pte alloc routines must wait for split_huge_page if the pmd is not > > > present and not null (i.e. pmd_trans_splitting). > > > > More stupid questions. When a large page is about to be split, you clear the > > present bit to cause faults and hold those accesses until the split completes? > > That was previous version. New version doesn't clear the present bit > but sets its own reserved bit in the pmd. All we have to protect is > kernel code, not userland. We have to protect against anything that > will change the mapcount. The mapcount is the key here, as it is only > accounted in the head page and it has to be transferred to all tail > pages during the split. So during the split the mapcount can't > change. But that doesn't mean userland can't keep changing and reading > the page contents while we transfer the mapcount. > Ok, that makes sense. By having pte_alloc wait on splt_huge_page, it should be safe even if userspace calls fork(). No other gotcha springs to mind. > > Again, no doubt this is obvious later but a description in the leader of > > the basic approach to splitting huge pages wouldn't kill. > > Yes sure good idea, I added a comment in the most crucial point... not > in the header. > Thanks. > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -628,11 +628,28 @@ static void __split_huge_page_refcount(s > */ > smp_wmb(); > > + /* > + * __split_huge_page_splitting() already set the > + * splitting bit in all pmd that could map this > + * hugepage, that will ensure no CPU can alter the > + * mapcount on the head page. The mapcount is only > + * accounted in the head page and it has to be > + * transferred to all tail pages in the below code. So > + * for this code to be safe, the split the mapcount > + * can't change. But that doesn't mean userland can't > + * keep changing and reading the page contents while > + * we transfer the mapcount, so the pmd splitting > + * status is achieved setting a reserved bit in the > + * pmd, not by clearing the present bit. > + */ > BUG_ON(page_mapcount(page_tail)); > page_tail->_mapcount = page->_mapcount; > + > BUG_ON(page_tail->mapping); > page_tail->mapping = page->mapping; > + > page_tail->index = ++head_index; > + > BUG_ON(!PageAnon(page_tail)); > BUG_ON(!PageUptodate(page_tail)); > BUG_ON(!PageDirty(page_tail)); > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org