Re: [RFC][PATCH 5/7] create __remove_mapping_batch()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dave Hansen <dave@sr71.net>
To: Mel Gorman <mgorman@suse.de>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, tim.c.chen@linux.intel.com
Subject: Re: [RFC][PATCH 5/7] create __remove_mapping_batch()
Date: Thu, 16 May 2013 10:14:43 -0700	[thread overview]
Message-ID: <51951403.6030605@sr71.net> (raw)
In-Reply-To: <20130514155117.GW11497@suse.de>

On 05/14/2013 08:51 AM, Mel Gorman wrote:
> The same comments I had before about potentially long page lock hold
> times still apply at this point. Andrew's concerns about the worst-case
> scenario where no adjacent page on the LRU has the same mapping also
> still applies. Is there any noticable overhead with his suggested
> workload of a single threaded process that opens files touching one page
> in each file until reclaim starts?

This is an attempt to address some of Andrew's concerns from here:

 	http://lkml.kernel.org/r/20120912122758.ad15e10f.akpm@linux-foundation.org

The executive summary: This does cause a small amount of increased CPU
time in __remove_mapping_batch().  But, it *is* small and it comes with
a throughput increase.

Test #1:

1. My goal here was to create an LRU with as few adjacent pages in the
   same file as possible.
2. Using lots of small files turned out to be a pain in the butt just
   because I need to create tens of thousands of them.
3. I ended up writing a program that does:
	for (offset = 0; offset < somenumber; offset += PAGE_SIZE)
		for_each_file(f)
			read(f, offset)...
4. This was sitting in a loop where the working set of my file reads was
   slightly larger than the total amount of memory, so we were
   effectively evicting page cache with streaming reads.

Even doing that above loop across ~2k files at once, __remove_mapping()
itself isn't CPU intensive in the single-threaded case.  In my testing,
it only shows up at 0.021% of CPU usage.  That went up to 0.036% (and
shifted to __remove_mapping_batch()) with these patches applied.

In any case, there are no showstoppers here.  We're way down looking at
the 0.01% of CPU time scale.

    sample    %
     delta   change
    ------   ------
       462     2.7% ata_scsi_queuecmd
       194     0.1% default_idle
        59   999.9% __remove_mapping_batch
        54   490.9% prepare_to_wait
        41   585.7% rcu_process_callbacks
       -32   -49.2% blk_queue_bio
       -35  -100.0% __remove_mapping
       -38   -33.6% generic_file_aio_read
       -41   -68.3% mix_pool_bytes.constprop.0
       -48   -11.9% __wake_up
       -53   -66.2% copy_user_generic_string
       -75    -8.4% finish_task_switch
       -79   -53.4% cpu_startup_entry
       -87   -15.9% blk_end_bidi_request
      -109   -14.3% scsi_request_fn
      -172    -3.6% __do_softirq

Test #2:

The second test I did was a single-threaded dd.  I did a 4GB dd over and
over with just barely less than 4GB of memory available.  This was the
test that we would expect to hurt us in the single-threaded case since
we spread out accesses to 'struct page' over time and have less cache
warmth.  The total disk throughput (as reported by vmstat) actually went
_up_ 6% in this case with these patches.

Here are the relevant bits grepped out of 'perf report' during the dd:

> -------- perf.vanilla.data ----------
>      3.75%         swapper  [kernel.kallsyms]     [k] intel_idle                                
>      2.83%              dd  [kernel.kallsyms]     [k] put_page                                  
>      1.30%         kswapd0  [kernel.kallsyms]     [k] __ticket_spin_lock                        
>      1.05%              dd  [kernel.kallsyms]     [k] __ticket_spin_lock                        
>      1.04%         kswapd0  [kernel.kallsyms]     [k] shrink_page_list                          
>      0.38%         kswapd0  [kernel.kallsyms]     [k] __remove_mapping                          
>      0.34%         kswapd0  [kernel.kallsyms]     [k] put_page                                  
> -------- perf.patched.data ----------
>      4.47%          swapper  [kernel.kallsyms]           [k] intel_idle                                               
>      2.02%               dd  [kernel.kallsyms]           [k] put_page                                                 
>      1.55%               dd  [kernel.kallsyms]           [k] __ticket_spin_lock                                       
>      1.21%          kswapd0  [kernel.kallsyms]           [k] shrink_page_list                                         
>      0.97%          kswapd0  [kernel.kallsyms]           [k] __ticket_spin_lock                                       
>      0.43%          kswapd0  [kernel.kallsyms]           [k] put_page                                                 
>      0.36%          kswapd0  [kernel.kallsyms]           [k] __remove_mapping                                         
>      0.28%          kswapd0  [kernel.kallsyms]           [k] __remove_mapping_batch                 

And the same functions from 'perf diff':

>              +4.47%  [kernel.kallsyms]           [k] intel_idle                                               
>      3.22%   -0.77%  [kernel.kallsyms]           [k] put_page                                                 
>              +1.21%  [kernel.kallsyms]           [k] shrink_page_list                                         
>              +0.36%  [kernel.kallsyms]           [k] __remove_mapping                                         
>              +0.28%  [kernel.kallsyms]           [k] __remove_mapping_batch                                   
>      0.39%   -0.39%  [kernel.kallsyms]           [k] __remove_mapping                                         
>      1.04%   -1.04%  [kernel.kallsyms]           [k] shrink_page_list                                         
>      3.68%   -3.68%  [kernel.kallsyms]           [k] intel_idle                                    

1. Idle time goes up by quite a bit, probably since we hold the page
   locks longer amounts of time, and cause more sleeping on them
2. put_page() got substantially cheaper, probably since we are now doing
   all the put_page()s closer to each other.
3. __remove_mapping_batch() is definitely costing us CPU, and not
   directly saving it anywhere else (like shrink_page_list() which also
   gets a bit worse)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-05-16 17:14 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-07 21:19 [RFC][PATCH 0/7] mm: Batch page reclamation under shink_page_list Dave Hansen
2013-05-07 21:19 ` [RFC][PATCH 1/7] defer clearing of page_private() for swap cache pages Dave Hansen
2013-05-09 22:07   ` Seth Jennings
2013-05-09 22:19     ` Dave Hansen
2013-05-10  9:26     ` Michal Hocko
2013-05-10 14:01       ` Seth Jennings
2013-05-14 14:55   ` Mel Gorman
2013-05-07 21:19 ` [RFC][PATCH 2/7] make 'struct page' and swp_entry_t variants of swapcache_free() Dave Hansen
2013-05-14 15:00   ` Mel Gorman
2013-05-07 21:19 ` [RFC][PATCH 3/7] break up __remove_mapping() Dave Hansen
2013-05-14 15:22   ` Mel Gorman
2013-05-07 21:20 ` [RFC][PATCH 4/7] break out mapping "freepage" code Dave Hansen
2013-05-14 15:26   ` Mel Gorman
2013-05-07 21:20 ` [RFC][PATCH 5/7] create __remove_mapping_batch() Dave Hansen
2013-05-09 22:13   ` Seth Jennings
2013-05-09 22:18     ` Dave Hansen
2013-05-14 15:51   ` Mel Gorman
2013-05-16 17:14     ` Dave Hansen [this message]
2013-05-07 21:20 ` [RFC][PATCH 6/7] use __remove_mapping_batch() in shrink_page_list() Dave Hansen
2013-05-14 16:05   ` Mel Gorman
2013-05-14 16:50     ` Dave Hansen
2013-05-07 21:20 ` [RFC][PATCH 7/7] drain batch list during long operations Dave Hansen
2013-05-07 23:56   ` Dave Hansen
2013-05-08  0:42   ` Tim Chen
2013-05-14 16:08   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51951403.6030605@sr71.net \
    --to=dave@sr71.net \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox