From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with ESMTP id 8A87B6001DA for ; Thu, 28 Jan 2010 11:12:11 -0500 (EST) Date: Thu, 28 Jan 2010 16:11:53 +0000 From: Mel Gorman Subject: Re: [PATCH 04 of 31] update futex compound knowledge Message-ID: <20100128161153.GE7139@csn.ul.ie> References: <2503a08ae3183f675931.1264689198@v2.random> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <2503a08ae3183f675931.1264689198@v2.random> Sender: owner-linux-mm@kvack.org To: Andrea Arcangeli Cc: linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Rik van Riel , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Christoph Lameter , Chris Wright , Andrew Morton , bpicco@redhat.com, Christoph Hellwig , KOSAKI Motohiro , Balbir Singh , Arnd Bergmann List-ID: On Thu, Jan 28, 2010 at 03:33:18PM +0100, Andrea Arcangeli wrote: > From: Andrea Arcangeli > > Futex code is smarter than most other gup_fast O_DIRECT code and knows about > the compound internals. However now doing a put_page(head_page) will not > release the pin on the tail page taken by gup-fast, leading to all sort of > refcounting bugchecks. Getting a stable head_page is a little tricky. > > page_head = page is there because if this is not a tail page it's also the > page_head. Only in case this is a tail page, compound_head is called, otherwise > it's guaranteed unnecessary. And if it's a tail page compound_head has to run > atomically inside irq disabled section __get_user_pages_fast before returning. > Otherwise ->first_page won't be a stable pointer. > > Disableing irq before __get_user_page_fast and releasing irq after running > compound_head is needed because if __get_user_page_fast returns == 1, it means > the huge pmd is established and cannot go away from under us. > pmdp_splitting_flush_notify in __split_huge_page_splitting will have to wait > for local_irq_enable before the IPI delivery can return. This means > __split_huge_page_refcount can't be running from under us, and in turn when we > run compound_head(page) we're not reading a dangling pointer from > tailpage->first_page. Then after we get to stable head page, we are always safe > to call compound_lock and after taking the compound lock on head page we can > finally re-check if the page returned by gup-fast is still a tail page. in > which case we're set and we didn't need to split the hugepage in order to take > a futex on it. > > Signed-off-by: Andrea Arcangeli When I rebased my tree, I found that the get_user_pages_fast() write is 1 unconditionally now where it wasn't on 2.6.32 so sorry about that confusion. Acked-by: Mel Gorman > --- > > diff --git a/kernel/futex.c b/kernel/futex.c > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -218,7 +218,7 @@ get_futex_key(u32 __user *uaddr, int fsh > { > unsigned long address = (unsigned long)uaddr; > struct mm_struct *mm = current->mm; > - struct page *page; > + struct page *page, *page_head; > int err; > > /* > @@ -250,10 +250,36 @@ again: > if (err < 0) > return err; > > - page = compound_head(page); > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + page_head = page; > + if (unlikely(PageTail(page))) { > + put_page(page); > + /* serialize against __split_huge_page_splitting() */ > + local_irq_disable(); > + if (likely(__get_user_pages_fast(address, 1, 1, &page) == 1)) { > + page_head = compound_head(page); > + local_irq_enable(); > + } else { > + local_irq_enable(); > + goto again; > + } > + } > +#else > + page_head = compound_head(page); > +#endif > + > + lock_page(page_head); > + if (unlikely(page_head != page)) { > + compound_lock(page_head); > + if (unlikely(!PageTail(page))) { > + compound_unlock(page_head); > + unlock_page(page_head); > + put_page(page); > + goto again; > + } > + } > + if (!page_head->mapping) { > + unlock_page(page_head); > put_page(page); > goto again; > } > @@ -265,19 +291,21 @@ again: > * it's a read-only handle, it's expected that futexes attach to > * the object not the particular process. > */ > - if (PageAnon(page)) { > + if (PageAnon(page_head)) { > key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */ > key->private.mm = mm; > key->private.address = address; > } else { > key->both.offset |= FUT_OFF_INODE; /* inode-based key */ > - key->shared.inode = page->mapping->host; > - key->shared.pgoff = page->index; > + key->shared.inode = page_head->mapping->host; > + key->shared.pgoff = page_head->index; > } > > get_futex_key_refs(key); > > - unlock_page(page); > + if (unlikely(PageTail(page))) > + compound_unlock(page_head); > + unlock_page(page_head); > put_page(page); > return 0; > } > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org