linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jianguo Wu <wujianguo@huawei.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	linux-mm@kvack.org, qiuxishi <qiuxishi@huawei.com>,
	Hush Bensen <hush.bensen@gmail.com>
Subject: Re: Transparent Hugepage impact on memcpy
Date: Wed, 5 Jun 2013 10:49:15 +0800	[thread overview]
Message-ID: <51AEA72B.5070707@huawei.com> (raw)
In-Reply-To: <20130604202017.GJ3463@redhat.com>

Hi Andrea,

Thanks for your patient explanation:). Please see below.

On 2013/6/5 4:20, Andrea Arcangeli wrote:

> Hello everyone,
> 
> On Tue, Jun 04, 2013 at 08:30:51PM +0800, Wanpeng Li wrote:
>> On Tue, Jun 04, 2013 at 04:57:57PM +0800, Jianguo Wu wrote:
>>> Hi all,
>>>
>>> I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on,
>>> memcpy has worse performance.
>>>
>>> When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault).
>>>
>>
>> I get similar result as you against 3.10-rc4 in the attachment. This
>> dues to the characteristic of thp takes a single page fault for each 
>> 2MB virtual region touched by userland.
> 
> I had a look at what prefault does and page faults should not be
> involved in the measurement of GB/sec. The "stats" also include the
> page faults but the page fault is not part of the printed GB/sec, if
> "-o" is used.

Agreed.

> 
> If the perf test is correct, it looks more an hardware issue with
> memcpy and large TLBs than a software one. memset doesn't exibith it,
> if this was something fundamental memset should also exibith it. It

Yes, I test memset with perf bench, it's little faster with THP:
THP:    6.458863 GB/Sec (with prefault)
NO-THP: 6.393698 GB/Sec (with prefault)

> shall be possible to reproduce this with hugetlbfs in fact... if you
> want to be 100% sure it's not software, you should try that.
> 

Yes, I got following result:
hugetlb:    2.518822 GB/Sec	(with prefault)
no-hugetlb: 3.688322 GB/Sec	(with prefault)

> Chances are there's enough pre-fetching going on in the CPU to
> optimize for those 4k tlb loads in streaming copies, and the
> pagetables are also cached very nicely with streaming copies. Maybe
> large TLBs somewhere are less optimized for streaming copies. Only
> something smarter happening in the CPU optimized for 4k and not yet
> for 2M TLBs can explain this: if the CPU was equally intelligent it
> should definitely be faster with THP on even with "-o".
> 
> Overall I doubt there's anything in software to fix here.
> 
> Also note, this is not related to additional cache usage during page
> faults that I mentioned in the pdf. Page faults or cache effects in
> the page faults are completely removed from the equation because of
> "-o". The prefault pass, eliminates the page faults and trashes away
> all the cache (regardless if the page fault uses non-temporal stores
> or not) before the "measured" memcpy load starts.
> 

Test results from perf stat show a significant reduction in cache-references and cache-misses
when THP is off, how to explain this?
	cache-misses	cache-references
THP:	35455940	66267785
NO-THP: 16920763	17200000

> I don't think this is a major concern, as a proof of thumb you just
> need to prefix the "perf" command with "time" to see it: the THP

I test with "time ./perf bench mem memcpy -l 1gb -o", and the result is
consistent with your expect.

THP:
       3.629896 GB/Sec (with prefault)

real	0m0.849s
user	0m0.472s
sys	0m0.372s

NO-THP:
       6.169184 GB/Sec (with prefault)

real	0m1.013s
user	0m0.412s
sys	0m0.596s

> version still completes much faster despite the prefault part of it
> is slightly slower with THP on.
> 

Why the prefault part is slower with THP on?
perf bench shows when no prefault, with THP on is much faster:

# ./perf bench mem memcpy -l 1gb -n
THP:    1.759009 GB/Sec
NO-THP: 1.291761 GB/Sec

Thanks again for your explanation.

Jianguo Wu.

> THP pays off the most during computations that are accessing randomly,
> and not sequentially.
> 
> Thanks,
> Andrea
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-06-05  2:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-04  8:57 Jianguo Wu
2013-06-04 12:30 ` Wanpeng Li
2013-06-04 12:30 ` Wanpeng Li
2013-06-04 20:20   ` Andrea Arcangeli
2013-06-05  2:49     ` Jianguo Wu [this message]
     [not found] ` <51adde12.e6b2320a.610d.ffff96f3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-06-04 12:55   ` Jianguo Wu
2013-06-04 14:10 ` Hush Bensen
2013-06-05  3:26 ` Jianguo Wu
2013-06-06 13:54   ` Hitoshi Mitake
2013-06-07  1:26     ` Jianguo Wu
2013-06-07 13:50       ` Hitoshi Mitake
2013-06-08  1:13         ` Jianguo Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51AEA72B.5070707@huawei.com \
    --to=wujianguo@huawei.com \
    --cc=aarcange@redhat.com \
    --cc=hush.bensen@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=qiuxishi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox