linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: "Török Edwin" <edwintorok@gmail.com>
Cc: Mike Waychison <mikew@google.com>, Ying Han <yinghan@google.com>,
	Ingo Molnar <mingo@elte.hu>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Rohit Seth <rohitseth@google.com>,
	Hugh Dickins <hugh@veritas.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY
Date: Thu, 27 Nov 2008 13:39:26 +0100	[thread overview]
Message-ID: <20081127123926.GN28285@wotan.suse.de> (raw)
In-Reply-To: <492E90BC.1090208@gmail.com>

On Thu, Nov 27, 2008 at 02:21:16PM +0200, Torok Edwin wrote:
> On 2008-11-27 14:03, Nick Piggin wrote:
> >> Running my testcase shows no significant performance difference. What am
> >> I doing wrong?
> >>     
> >  
> > Software may just be doing a lot of mmap/munmap activity. threads +
> > mmap is never going to be pretty because it is always going to involve
> > broadcasting tlb flushes to other cores... Software writers shouldn't
> > be scared of using processes (possibly with some shared memory).
> >   
> 
> It would be interesting to compare the performance of a threaded clamd,
> and of a clamd that uses multiple processes.
> Distributing tasks will be a bit more tricky, since it would need to use
> IPC, instead of mutexes and condition variables.

Yes, although you could use PTHREAD_PROCESS_SHARED pthread mutexes on
the shared memory I believe (having never tried it myself).

 
> > Actually, a lot of things get faster (like malloc, or file descriptor
> > operations) because locks aren't needed.
> >
> > Despite common perception, processes are actually much *faster* than
> > threads when doing common operations like these. They are slightly slower
> > sometimes with things like creation and exit, or context switching, but
> > if you're doing huge numbers of those operations, then it is unlikely
> > to be a performance critical app... :)
> >   
> 
> How about distributing tasks to a set of worked threads, is the overhead
> of using IPC instead of
> mutexes/cond variables acceptable?

It is really going to depend on a lot of things. What is involved in
distributing tasks, how many cores and cache/TLB architecture of the
system running on, etc.

You want to distribute as much work as possible while touching as
little memory as possible, in general.

But if you're distributing threads over cores, and shared caches are
physically tagged (which I think all x86 CPUs are), then you should
be able to have multiple processes operate on shared memory just as
efficiently as multiple threads I think.

And then you also get the advantages of reduced contention on other
shared locks and resources.

 
> > (end rant; sorry, that may not have been helpful to your immediate problem,
> > but we need to be realistic in what complexity we are ging to add where in
> > the kernel in order to speed things up. And we need to steer userspace
> > away from problems that are fundamentally hard and not going to get easier
> > with trends -- like virtual address activity with multiple threads)
> >   
> 
> I understood that mmap() is not scalable, however look  at
> http://lkml.org/lkml/2008/9/12/185, even fopen/fdopen does
> an (anonymous) mmap internally.

Well, I guess that would be all the more reason to avoid threads (and
things like fopen/fdopen fundamentally have to be synchronized between
threads regardless of whether they use mmap() or not, so you're going
to see a win on any OS avoiding threaded code that uses fopen/fdopen).


> That does not affect performance that much, since the overhead of a
> file-backed mmap + pagefaults is higher.
> Rewriting libclamav to not use mmap() would take a significant amount of
> time, however  I will try to avoid using mmap()
> in new code (and prefer pread/read).
> 
> Also clamd is a CPU bound application [given fast enough disks ;)] and
> having to wait for mmap_sem prevents it from doing "real work".
> Most of the time it reads files from /tmp, that should either be in the
> page cache, or (in my case) they are always in RAM (I use tmpfs).
> 
> So mmaping, and reading from these files does not involve disk I/O, yet
> threads working with /tmp files still need to wait
> for disk I/O to complete because it has to wait on mmap_sem (held by
> another thread).

Yeah, it's costly. Even if it didn't take mmap_sem, then it still
needs to broadcast TLB invalidates over the machine, so it would
probably go even faster if it weren't threaded and/or didn't use
mmap/munmap so heavily.


> >> ...............................................................................................................................................................................................
> >>
> >>                          &sem->wait_lock:        122700        
> >> 126641           0.42          77.94      125372.37       
> >> 1779026        7368894           0.27        1099.42     3085559.16
> >>                          ---------------
> >>                          &sem->wait_lock           5943         
> >> [<ffffffff8043a768>] __up_write+0x28/0x170
> >>                          &sem->wait_lock           8615         
> >> [<ffffffff805ce3ac>] __down_write_nested+0x1c/0xc0
> >>                          &sem->wait_lock          13568         
> >> [<ffffffff8043a5a0>] __down_write_trylock+0x20/0x60
> >>                          &sem->wait_lock          49377         
> >> [<ffffffff8043a600>] __down_read_trylock+0x20/0x60
> >>                          ---------------
> >>                          &sem->wait_lock           8097         
> >> [<ffffffff8043a5a0>] __down_write_trylock+0x20/0x60
> >>                          &sem->wait_lock          31540         
> >> [<ffffffff8043a768>] __up_write+0x28/0x170
> >>                          &sem->wait_lock           5501         
> >> [<ffffffff805ce3ac>] __down_write_nested+0x1c/0xc0
> >>                          &sem->wait_lock          33342         
> >> [<ffffffff8043a600>] __down_read_trylock+0x20/0x60
> >>
> >>     
> >
> > Interesting. I have some (ancient) patches to make rwsems more scalable
> > under heavy load by reducing contention on this lock. They should really
> > have been merged... Not sure how much it would help, but if you're
> > interested in testing, I could dust them off.
> 
> Sure, I can test patches (preferably against 2.6.28-rc6-tip ).

OK, I'll see if I can find them (am overseas at the moment, and I suspect
they are stranded on some stationary rust back home, but I might be able
to find them on the web).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-11-27 12:39 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-22  6:47 Ying Han
2008-11-22  7:15 ` Andrew Morton
2008-11-23  9:18 ` Ingo Molnar
2008-11-23 18:24   ` Andrew Morton
2008-11-25 18:42   ` Ying Han
2008-11-26 12:32     ` Nick Piggin
2008-11-26 19:57       ` Mike Waychison
2008-11-27  8:55         ` Nick Piggin
2008-11-27  9:28           ` Mike Waychison
2008-11-27 10:00             ` Peter Zijlstra
2008-11-27 10:14               ` Nick Piggin
2008-11-27 19:22                 ` Mike Waychison
2008-11-28  9:41                   ` Nick Piggin
2008-11-28 22:46                     ` Mike Waychison
2008-11-27 11:08               ` KOSAKI Motohiro
2008-11-27 19:10               ` Mike Waychison
2008-11-27 11:39             ` Török Edwin
2008-11-27 12:03               ` Nick Piggin
2008-11-27 12:21                 ` Török Edwin
2008-11-27 12:32                   ` Peter Zijlstra
2008-11-27 12:39                   ` Nick Piggin [this message]
2008-11-27 12:52                     ` Török Edwin
2008-11-27 13:05                       ` Nick Piggin
2008-11-27 13:10                         ` Török Edwin
2008-11-27 13:12                           ` Nick Piggin
2008-11-27 13:23                             ` Török Edwin
2008-11-28 12:10                               ` Nick Piggin
2008-11-30 19:38                                 ` Török Edwin
2008-12-01  8:52                                   ` Nick Piggin
2008-12-01 11:13                                   ` Nick Piggin
2008-12-01 11:37                                     ` Török Edwin
2008-12-04 22:27                       ` Ying Han
2008-12-05  6:50                         ` Török Edwin
2008-11-27 13:08             ` Nick Piggin
2008-11-27 19:03               ` Mike Waychison
2008-11-28  9:37                 ` Nick Piggin
2008-11-28 23:02                   ` Mike Waychison
2008-11-30 19:54                     ` Török Edwin
2008-12-01  4:50                       ` Mike Waychison
2008-12-01  8:58                       ` Nick Piggin
2008-12-01 11:45                     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081127123926.GN28285@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=edwintorok@gmail.com \
    --cc=hpa@zytor.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mikew@google.com \
    --cc=mingo@elte.hu \
    --cc=rientjes@google.com \
    --cc=rohitseth@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox