linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: paulmck@linux.vnet.ibm.com, peterz@infradead.org,
	akpm@linux-foundation.org, kirill@shutemov.name,
	ak@linux.intel.com, mhocko@kernel.org, dave@stgolabs.net,
	jack@suse.cz, Matthew Wilcox <willy@infradead.org>,
	benh@kernel.crashing.org, mpe@ellerman.id.au, paulus@samba.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	hpa@zytor.com, Will Deacon <will.deacon@arm.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	haren@linux.vnet.ibm.com, khandual@linux.vnet.ibm.com,
	npiggin@gmail.com, bsingharora@gmail.com,
	Tim Chen <tim.c.chen@linux.intel.com>,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org
Subject: Re: [PATCH v5 07/22] mm: Protect VMA modifications using VMA sequence count
Date: Thu, 26 Oct 2017 12:18:33 +0200	[thread overview]
Message-ID: <20171026101833.GF563@redhat.com> (raw)
In-Reply-To: <1507729966-10660-8-git-send-email-ldufour@linux.vnet.ibm.com>

Hello Laurent,

Message-ID: <7ca80231-fe02-a3a7-84bc-ce81690ea051@intel.com> shows
significant slowdown even for brk/malloc ops both single and
multi threaded.

The single threaded case I think is the most important because it has
zero chance of getting back any benefit later during page faults.

Could you check if:

1. it's possible change vm_write_begin to be a noop if mm->mm_count is
   <= 1? Hint: clone() will run single threaded so there's no way it can run
   in the middle of a being/end critical section (clone could set an
   MMF flag to possibly keep the sequence counter activated if a child
   thread exits and mm_count drops to 1 while the other cpu is in the
   middle of a critical section in the other thread).

2. Same thing with RCU freeing of vmas. Wouldn't it be nicer if RCU
   freeing happened only once a MMF flag is set? That will at least
   reduce the risk of temporary memory waste until the next RCU grace
   period. The read of the MMF will scale fine. Of course to allow
   point 1 and 2 then the page fault should also take the mmap_sem
   until the MMF flag is set.

Could you also investigate a much bigger change: I wonder if it's
possible to drop the sequence number entirely from the vma and stop
using sequence numbers entirely (which is likely the source of the
single threaded regression in point 1 that may explain the report in
the above message-id), and just call the vma rbtree lookup once again
and check that everything is still the same in the vma and the PT lock
obtained is still a match to finish the anon page fault and fill the
pte?

Then of course we also need to add a method to the read-write
semaphore so it tells us if there's already one user holding the read
mmap_sem and we're the second one.  If we're the second one (or more
than second) only then we should skip taking the down_read mmap_sem.
Even a multithreaded app won't ever skip taking the mmap_sem until
there's sign of runtime contention, and it won't have to run the way
more expensive sequence number-less revalidation during page faults,
unless we get an immediate scalability payoff because we already know
the mmap_sem is already contended and there are multiple nested
threads in the page fault handler of the same mm.

Perhaps we'd need something more advanced than a
down_read_trylock_if_not_hold() (which has to guaranteed not to write
to any cacheline) and we'll have to count the per-thread exponential
backoff of mmap_sem frequency, but starting with
down_read_trylock_if_not_hold() would be good I think.

This is not how the current patch works, the current patch uses a
sequence number because it pretends to go lockless always and in turn
has to slow down all vma updates fast paths or the revalidation
slowsdown performance for page fault too much (as it always
revalidates).

I think it would be much better to go speculative only when there's
"detected" runtime contention on the mmap_sem with
down_read_trylock_if_not_hold() and that will make the revalidation
cost not an issue to worry about because normally we won't have to
revalidate the vma at all during page fault. In turn by making the
revalidation more expensive by starting a vma rbtree lookup from
scratch, we can drop the sequence number entirely and that should
simplify the patch tremendously because all vm_write_begin/end would
disappear from the patch and in turn the mmap/brk slowdown measured by
the message-id above, should disappear as well.
   
Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-26 10:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-11 13:52 [PATCH v5 00/22] Speculative page faults Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 01/22] x86/mm: Define CONFIG_SPF Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 02/22] powerpc/mm: " Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 03/22] mm: Dont assume page-table invariance during faults Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 04/22] mm: Prepare for FAULT_FLAG_SPECULATIVE Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 05/22] mm: Introduce pte_spinlock " Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 06/22] mm: VMA sequence count Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 07/22] mm: Protect VMA modifications using " Laurent Dufour
2017-10-26 10:18   ` Andrea Arcangeli [this message]
2017-11-02 15:16     ` Laurent Dufour
2017-11-02 17:25       ` Laurent Dufour
2017-11-02 20:08         ` Andrea Arcangeli
2017-11-06  9:47           ` Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 08/22] mm: RCU free VMAs Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 09/22] mm: Cache some VMA fields in the vm_fault structure Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 10/22] mm: Protect SPF handler against anon_vma changes Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 11/22] mm/migrate: Pass vm_fault pointer to migrate_misplaced_page() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 12/22] mm: Introduce __lru_cache_add_active_or_unevictable Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 13/22] mm: Introduce __maybe_mkwrite() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 14/22] mm: Introduce __vm_normal_page() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 15/22] mm: Introduce __page_add_new_anon_rmap() Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 16/22] mm: Provide speculative fault infrastructure Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 17/22] mm: Try spin lock in speculative path Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 18/22] mm: Adding speculative page fault failure trace events Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 19/22] perf: Add a speculative page fault sw event Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 20/22] perf tools: Add support for the SPF perf event Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 21/22] x86/mm: Add speculative pagefault handling Laurent Dufour
2017-10-11 13:52 ` [PATCH v5 22/22] powerpc/mm: Add speculative page fault Laurent Dufour
2017-10-26  8:14   ` [v5,22/22] " kemi
2017-11-02 14:11     ` Laurent Dufour
2017-11-06 10:27       ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171026101833.GF563@redhat.com \
    --to=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=bsingharora@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=haren@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=ldufour@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox