From: Jianguo Wu <wujianguo@huawei.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>,
linux-mm@kvack.org, qiuxishi <qiuxishi@huawei.com>,
Hush Bensen <hush.bensen@gmail.com>
Subject: Re: Transparent Hugepage impact on memcpy
Date: Wed, 5 Jun 2013 10:49:15 +0800 [thread overview]
Message-ID: <51AEA72B.5070707@huawei.com> (raw)
In-Reply-To: <20130604202017.GJ3463@redhat.com>
Hi Andrea,
Thanks for your patient explanation:). Please see below.
On 2013/6/5 4:20, Andrea Arcangeli wrote:
> Hello everyone,
>
> On Tue, Jun 04, 2013 at 08:30:51PM +0800, Wanpeng Li wrote:
>> On Tue, Jun 04, 2013 at 04:57:57PM +0800, Jianguo Wu wrote:
>>> Hi all,
>>>
>>> I tested memcpy with perf bench, and found that in prefault case, When Transparent Hugepage is on,
>>> memcpy has worse performance.
>>>
>>> When THP on is 3.672879 GB/Sec (with prefault), while THP off is 6.190187 GB/Sec (with prefault).
>>>
>>
>> I get similar result as you against 3.10-rc4 in the attachment. This
>> dues to the characteristic of thp takes a single page fault for each
>> 2MB virtual region touched by userland.
>
> I had a look at what prefault does and page faults should not be
> involved in the measurement of GB/sec. The "stats" also include the
> page faults but the page fault is not part of the printed GB/sec, if
> "-o" is used.
Agreed.
>
> If the perf test is correct, it looks more an hardware issue with
> memcpy and large TLBs than a software one. memset doesn't exibith it,
> if this was something fundamental memset should also exibith it. It
Yes, I test memset with perf bench, it's little faster with THP:
THP: 6.458863 GB/Sec (with prefault)
NO-THP: 6.393698 GB/Sec (with prefault)
> shall be possible to reproduce this with hugetlbfs in fact... if you
> want to be 100% sure it's not software, you should try that.
>
Yes, I got following result:
hugetlb: 2.518822 GB/Sec (with prefault)
no-hugetlb: 3.688322 GB/Sec (with prefault)
> Chances are there's enough pre-fetching going on in the CPU to
> optimize for those 4k tlb loads in streaming copies, and the
> pagetables are also cached very nicely with streaming copies. Maybe
> large TLBs somewhere are less optimized for streaming copies. Only
> something smarter happening in the CPU optimized for 4k and not yet
> for 2M TLBs can explain this: if the CPU was equally intelligent it
> should definitely be faster with THP on even with "-o".
>
> Overall I doubt there's anything in software to fix here.
>
> Also note, this is not related to additional cache usage during page
> faults that I mentioned in the pdf. Page faults or cache effects in
> the page faults are completely removed from the equation because of
> "-o". The prefault pass, eliminates the page faults and trashes away
> all the cache (regardless if the page fault uses non-temporal stores
> or not) before the "measured" memcpy load starts.
>
Test results from perf stat show a significant reduction in cache-references and cache-misses
when THP is off, how to explain this?
cache-misses cache-references
THP: 35455940 66267785
NO-THP: 16920763 17200000
> I don't think this is a major concern, as a proof of thumb you just
> need to prefix the "perf" command with "time" to see it: the THP
I test with "time ./perf bench mem memcpy -l 1gb -o", and the result is
consistent with your expect.
THP:
3.629896 GB/Sec (with prefault)
real 0m0.849s
user 0m0.472s
sys 0m0.372s
NO-THP:
6.169184 GB/Sec (with prefault)
real 0m1.013s
user 0m0.412s
sys 0m0.596s
> version still completes much faster despite the prefault part of it
> is slightly slower with THP on.
>
Why the prefault part is slower with THP on?
perf bench shows when no prefault, with THP on is much faster:
# ./perf bench mem memcpy -l 1gb -n
THP: 1.759009 GB/Sec
NO-THP: 1.291761 GB/Sec
Thanks again for your explanation.
Jianguo Wu.
> THP pays off the most during computations that are accessing randomly,
> and not sequentially.
>
> Thanks,
> Andrea
>
> .
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-05 2:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 8:57 Jianguo Wu
2013-06-04 12:30 ` Wanpeng Li
2013-06-04 12:30 ` Wanpeng Li
2013-06-04 20:20 ` Andrea Arcangeli
2013-06-05 2:49 ` Jianguo Wu [this message]
[not found] ` <51adde12.e6b2320a.610d.ffff96f3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-06-04 12:55 ` Jianguo Wu
2013-06-04 14:10 ` Hush Bensen
2013-06-05 3:26 ` Jianguo Wu
2013-06-06 13:54 ` Hitoshi Mitake
2013-06-07 1:26 ` Jianguo Wu
2013-06-07 13:50 ` Hitoshi Mitake
2013-06-08 1:13 ` Jianguo Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51AEA72B.5070707@huawei.com \
--to=wujianguo@huawei.com \
--cc=aarcange@redhat.com \
--cc=hush.bensen@gmail.com \
--cc=linux-mm@kvack.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=qiuxishi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox