Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ning Qu <quning@google.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Hugh Dickins <hughd@google.com>,
	Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, Matthew Wilcox <willy@linux.intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Hillf Danton <dhillf@gmail.com>, Dave Hansen <dave@sr71.net>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()
Date: Tue, 1 Oct 2013 10:11:00 -0700	[thread overview]
Message-ID: <CACz4_2e+oU-7CDp8mzYOKXCtrA+EKN_bkQUDF4Yb7GyJ_todxQ@mail.gmail.com> (raw)
In-Reply-To: <20131001083828.GA8093@suse.de>

I can throw in some numbers for one of the test case I am working on.

One of the workload is using sysv shm to load GB level files into
memory, which is shared with other worker processes for long term. We
could load as much file which fits all the physical memory available.
And also, the heap is pretty big (GB level as well) to handle those
data.

For the workload I just mentioned, with thp, we have about 8%
performance improvement, 5% from thp anonymous memory and 3% from thp
page cache. It might not look so good but it's pretty good without
changing one line of code in application, which is the beauty of thp.

Before that, we have been using hugetlbfs, then we have to reserve a
huge amount of memory at boot time, no matter those memory will be
used or not. It is working but no other major services could ever
share the server resources anymore.
Best wishes,
-- 
Ning Qu (曲宁) | Software Engineer | quning@google.com | +1-408-418-6066


On Tue, Oct 1, 2013 at 1:38 AM, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, Sep 30, 2013 at 11:51:06AM -0700, Andi Kleen wrote:
>> > AFAIK, this is not a problem in the vast majority of modern CPUs
>>
>> Let's do some simple math: e.g. a Sandy Bridge system has 512 4K iTLB L2 entries.
>> That's around 2MB. There's more and more code whose footprint exceeds
>> that.
>>
>
> With an expectation that it is read-mostly data, replicated between the
> caches accessing it and TLB refills taking very little time. This is not
> universally true and there are exceptions but even recent papers on TLB
> behaviour have tended to dismiss the iTLB refill overhead as a negligible
> portion of the overall workload of interest.
>
>> Besides iTLB is not the only target. It is also useful for
>> data of course.
>>
>
> True, but how useful? I have not seen an example of a workload showing that
> dTLB pressure on file-backed data was a major component of the workload. I
> would expect that sysV shared memory is an exception but does that require
> generic support for all filesystems or can tmpfs be special cased when
> it's used for shared memory?
>
> For normal data, if it's read-only data then there would be some benefit to
> using huge pages once the data is in page cache. How common are workloads
> that mmap() large amounts of read-only data? Possibly some databases
> depending on the workload although there I would expect that the data is
> placed in shared memory.
>
> If the mmap()s data is being written then the cost of IO is likely to
> dominate, not TLB pressure. For write-mostly workloads there are greater
> concerns because dirty tracking can only be done at the huge page boundary
> potentially leading to greater amounts of IO and degraded performance
> overall.
>
> I could be completely wrong here but these were the concerns I had when
> I first glanced through the patches. The changelogs had no information
> to convince me otherwise so I never dedicated the time to reviewing the
> patches in detail. I raised my concerns and then dropped it.
>
>> > > and I found it very hard to be motivated to review the series as a result.
>> > > I suspected that in many cases that the cost of IO would continue to dominate
>> > > performance instead of TLB pressure
>>
>> The trend is to larger and larger memories, keeping things in memory.
>>
>
> Yes, but using huge pages is not *necessarily* the answer. For fault
> scalability it probably would be a lot easier to batch handle faults if
> readahead indicates accesses are sequential. Background zeroing of pages
> could be revisited for fault intensive workloads. A potential alternative
> is that a contiguous page is allocated, zerod as one lump, split the pages
> and put onto a local per-task list although the details get messy. Reclaim
> scanning could be heavily modified to use collections of pages instead of
> single pages (although I'm not aware of the proper design of such a thing).
>
> Again, this could be completely off the mark but if it was me that was
> working on this problem, I would have some profile data from some workloads
> to make sure the part I'm optimising was a noticable percentage of the
> workload and included that in the patch leader. I would hope that the data
> was compelling enough to convince reviewers to pay close attention to the
> series as the complexity would then be justified. Based on how complex THP
> was for anonymous pages, I would be tempted to treat THP for file-backed
> data as a last resort.
>
>> In fact there's a good argument that memory sizes are growing faster
>> than TLB capacities. And without large TLBs we're even further off
>> the curve.
>>
>
> I'll admit this is also true. It was considered to be true in the 90's
> when huge pages were first being thrown around as a possible solution to
> the problem. One paper recently suggested using segmentation for large
> memory segments but the workloads they examined looked like they would
> be dominated by anonymous access, not file-backed data with one exception
> where the workload frequently accessed compile-time constants.
>
> --
> Mel Gorman
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-10-01 17:11 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-23 12:05 Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 01/22] mm: implement zero_huge_user_segment and friends Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 02/22] radix-tree: implement preload for multiple contiguous elements Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 03/22] memcg, thp: charge huge cache pages Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 04/22] thp: compile-time and sysfs knob for thp pagecache Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 05/22] thp, mm: introduce mapping_can_have_hugepages() predicate Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 06/22] thp: represent file thp pages in meminfo and friends Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 07/22] thp, mm: rewrite add_to_page_cache_locked() to support huge pages Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 08/22] mm: trace filemap: dump page order Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 09/22] block: implement add_bdi_stat() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 10/22] thp, mm: rewrite delete_from_page_cache() to support huge pages Kirill A. Shutemov
2013-09-25 20:02   ` Ning Qu
2013-09-23 12:05 ` [PATCHv6 11/22] thp, mm: warn if we try to use replace_page_cache_page() with THP Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 12/22] thp, mm: add event counters for huge page alloc on file write or read Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 13/22] mm, vfs: introduce i_split_sem Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 14/22] thp, mm: allocate huge pages in grab_cache_page_write_begin() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 15/22] thp, mm: naive support of thp in generic_perform_write Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 16/22] thp, mm: handle transhuge pages in do_generic_file_read() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 17/22] thp, libfs: initial thp support Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 18/22] truncate: support huge pages Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 19/22] thp: handle file pages in split_huge_page() Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 20/22] thp: wait_split_huge_page(): serialize over i_mmap_mutex too Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 21/22] thp, mm: split huge page on mmap file page Kirill A. Shutemov
2013-09-23 12:05 ` [PATCHv6 22/22] ramfs: enable transparent huge page cache Kirill A. Shutemov
2013-09-24 23:37 ` [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap() Andrew Morton
2013-09-24 23:48   ` Ning Qu
2013-09-24 23:49   ` Andi Kleen
2013-09-24 23:58     ` Andrew Morton
2013-09-25 11:15       ` Kirill A. Shutemov
2013-09-25 15:05         ` Andi Kleen
2013-09-26 18:30     ` Zach Brown
2013-09-26 19:05       ` Andi Kleen
2013-09-30 10:13     ` Mel Gorman
2013-09-30 16:05       ` Andi Kleen
2013-09-25  9:51   ` Kirill A. Shutemov
2013-09-25 23:29     ` Dave Chinner
2013-10-14 13:56       ` Kirill A. Shutemov
2013-09-30 10:02   ` Mel Gorman
2013-09-30 10:10     ` Mel Gorman
2013-09-30 18:07       ` Ning Qu
2013-09-30 18:51       ` Andi Kleen
2013-10-01  8:38         ` Mel Gorman
2013-10-01 17:11           ` Ning Qu [this message]
2013-10-14 14:27           ` Kirill A. Shutemov
2013-09-30 15:27     ` Dave Hansen
2013-09-30 18:05       ` Ning Qu
2013-09-25  0:12 ` Ning Qu
2013-09-25  9:23   ` Kirill A. Shutemov
2013-09-26 21:13 ` Dave Hansen
2013-09-25 18:11 Ning Qu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACz4_2e+oU-7CDp8mzYOKXCtrA+EKN_bkQUDF4Yb7GyJ_todxQ@mail.gmail.com \
    --to=quning@google.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dave@sr71.net \
    --cc=dhillf@gmail.com \
    --cc=fengguang.wu@intel.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox