linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>, Mel Gorman <mel@csn.ul.ie>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	bpicco@redhat.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>
Subject: Re: [PATCH 32 of 32] khugepaged
Date: Tue, 2 Feb 2010 14:56:34 +0100	[thread overview]
Message-ID: <20100202135634.GK4135@random.random> (raw)
In-Reply-To: <4B670968.7090801@redhat.com>

On Mon, Feb 01, 2010 at 12:03:36PM -0500, Rik van Riel wrote:
> On 01/31/2010 03:27 PM, Andrea Arcangeli wrote:
> 
> > +	/* stop anon_vma rmap pagetable access */
> > +	spin_lock(&vma->anon_vma->lock);
> 
> This is no longer enough.  The anon_vma changes that
> went into -mm recently mean that a VMA can be associated
> with multiple anon_vmas.
> 
> Of course, forcefully COW copying/writing every page in
> the VMA will ensure that they are all in the anon_vma
> you lock with the code above.
> 
> I suspect the easiest fix would be to lock all the
> anon_vmas attached to a VMA.  That should not lead to
> any deadlocks, since multiple siblings of the same
> parent process would be encountering their anon_vma
> structs in the same order, due to the way that
> anon_vma_clone and anon_vma_fork work.
> 
> This may be too subtle for lockdep, though :/

I hope I can do this in split_huge_page_mm/vma:

if (pmd_trans_huge(*pmd)) {
	spin_lock(&mm->page_table_lock); /* stop pmd updates */
	if (pmd_trans_huge(*pmd)) {
	   unsigned long anon_mapping;
	   struct page *page;
	   struct anon_vma *anon_vma;
	   page = pmd_page(*pmd); /* works even while pmd_splitting is set */
	   anon_mapping = page->mapping; /* ACCESS_ONCE not needed page has to go away before anon_vma */
	   VM_BUG_ON((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON);
	   VM_BUG_ON(!page_mapped(page));
	   anon_vma = (struct anon_vma *)((unsigned long)page->mapping 	- PAGE_MAPPING_ANON); 
	   rcu_read_lock();
	   spin_unlock(&mm->page_table_lock);
	   spin_lock(&anon_vma->lock);
	   rcu_read_unlock();

If this only causes at most 1 more memory allocation for each new vma
created in the child, at the very first cow on the newly allocated vma
in the child (and no more memory allocation) it should be unmeasurable
overhead so it should be worth it and main downside seems to be in
making the tricky vma adjusting more complex.

I think the most urgent info that is missing is if this enough to fix
the regressions in AIM that grinds to an halt now and prevents to get
results. That is what started the development of this patch (for the
other pages not cowed there is nothing we can do about it and current
anon-vma is already as efficient as it can be in complexity
terms). Optimizing for one-process-per-connection not so important,
even if I know proprietary db do that. I've no idea how many pages
there are in AIM that are cowed and how many that are totally shared
and purely readonly for all processes in the system, and for the
latter this is a noop... If it won't be proven that this fixes AIM
we'll have to still to bisect previous patches.

So in short I recommend to test this patch on Larry's system but again
I think this change is good idea regardless (just I'm not convinced
it'll be enough to prevent aim to grind to an halt, if you call
page_referenced or try_to_unmap in a loop it won't be showing less in
the profiling just because it returns faster, still a loop that will
be and I suspect more the caller not the callee given the callee was
fine in previous kernels... but I didn't have time to look into it so
I may as well be wrong and maybe some legitimate change happened that
requires a more frequent page_referenced calls, dunno).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-02-02 13:57 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-31 20:27 [PATCH 00 of 32] Transparent Hugepage support #9 Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 01 of 32] define MADV_HUGEPAGE Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 02 of 32] compound_lock Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 03 of 32] alter compound get_page/put_page Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 04 of 32] update futex compound knowledge Andrea Arcangeli
2010-02-16 11:33   ` Peter Zijlstra
2010-03-01 17:58     ` Andrea Arcangeli
2010-03-01 18:07       ` Peter Zijlstra
2010-03-01 18:23         ` Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 05 of 32] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 06 of 32] clear compound mapping Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 07 of 32] add native_set_pmd_at Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 08 of 32] add pmd paravirt ops Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 09 of 32] no paravirt version of pmd ops Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 10 of 32] export maybe_mkwrite Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 11 of 32] comment reminder in destroy_compound_page Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 12 of 32] config_transparent_hugepage Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 13 of 32] special pmd_trans_* functions Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 14 of 32] add pmd mangling generic functions Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 15 of 32] add pmd mangling functions to x86 Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 16 of 32] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 17 of 32] pte alloc trans splitting Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 18 of 32] add pmd mmu_notifier helpers Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 19 of 32] clear page compound Andrea Arcangeli
2010-02-01 21:37   ` Christoph Lameter
2010-01-31 20:27 ` [PATCH 20 of 32] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 21 of 32] split_huge_page_mm/vma Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 22 of 32] split_huge_page paging Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 23 of 32] clear_copy_huge_page Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 24 of 32] kvm mmu transparent hugepage support Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 25 of 32] transparent hugepage core Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 26 of 32] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 27 of 32] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 28 of 32] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-02-01 21:46   ` Christoph Lameter
2010-02-03 15:49     ` Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 29 of 32] memcg compound Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 30 of 32] memcg huge memory Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 31 of 32] transparent hugepage vmstat Andrea Arcangeli
2010-01-31 20:27 ` [PATCH 32 of 32] khugepaged Andrea Arcangeli
2010-02-01 17:03   ` Rik van Riel
2010-02-02 13:56     ` Andrea Arcangeli [this message]
2010-02-01 22:18   ` Christoph Lameter
2010-02-01 22:56     ` Andrea Arcangeli
2010-02-02 19:52       ` Christoph Lameter
2010-02-02 20:24         ` Andrea Arcangeli
2010-02-03 16:13           ` Christoph Lameter
2010-02-03 16:30             ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100202135634.GK4135@random.random \
    --to=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bpicco@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox