linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Minchan Kim <minchan@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	tim.c.chen@intel.com, dave.hansen@intel.com,
	andi.kleen@intel.com, aaron.lu@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Hugh Dickins <hughd@google.com>,
	Shaohua Li <shli@kernel.org>, Rik van Riel <riel@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Vladimir Davydov <vdavydov@virtuozzo.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out
Date: Tue, 13 Sep 2016 16:53:49 +0800	[thread overview]
Message-ID: <87vay0ji3m.fsf@yhuang-mobile.sh.intel.com> (raw)
In-Reply-To: <20160913070524.GA4973@bbox> (Minchan Kim's message of "Tue, 13 Sep 2016 16:05:24 +0900")

Minchan Kim <minchan@kernel.org> writes:
> On Tue, Sep 13, 2016 at 02:40:00PM +0800, Huang, Ying wrote:
>> Minchan Kim <minchan@kernel.org> writes:
>> 
>> > Hi Huang,
>> >
>> > On Fri, Sep 09, 2016 at 01:35:12PM -0700, Huang, Ying wrote:
>> >
>> > < snip >
>> >
>> >> >> Recently, the performance of the storage devices improved so fast that
>> >> >> we cannot saturate the disk bandwidth when do page swap out even on a
>> >> >> high-end server machine.  Because the performance of the storage
>> >> >> device improved faster than that of CPU.  And it seems that the trend
>> >> >> will not change in the near future.  On the other hand, the THP
>> >> >> becomes more and more popular because of increased memory size.  So it
>> >> >> becomes necessary to optimize THP swap performance.
>> >> >> 
>> >> >> The advantages of the THP swap support include:
>> >> >> 
>> >> >> - Batch the swap operations for the THP to reduce lock
>> >> >>   acquiring/releasing, including allocating/freeing the swap space,
>> >> >>   adding/deleting to/from the swap cache, and writing/reading the swap
>> >> >>   space, etc.  This will help improve the performance of the THP swap.
>> >> >> 
>> >> >> - The THP swap space read/write will be 2M sequential IO.  It is
>> >> >>   particularly helpful for the swap read, which usually are 4k random
>> >> >>   IO.  This will improve the performance of the THP swap too.
>> >> >> 
>> >> >> - It will help the memory fragmentation, especially when the THP is
>> >> >>   heavily used by the applications.  The 2M continuous pages will be
>> >> >>   free up after THP swapping out.
>> >> >
>> >> > I just read patchset right now and still doubt why the all changes
>> >> > should be coupled with THP tightly. Many parts(e.g., you introduced
>> >> > or modifying existing functions for making them THP specific) could
>> >> > just take page_list and the number of pages then would handle them
>> >> > without THP awareness.
>> >> 
>> >> I am glad if my change could help normal pages swapping too.  And we can
>> >> change these functions to work for normal pages when necessary.
>> >
>> > Sure but it would be less painful that THP awareness swapout is
>> > based on multiple normal pages swapout. For exmaple, we don't
>> > touch delay THP split part(i.e., split a THP into 512 pages like
>> > as-is) and enhances swapout further like Tim's suggestion
>> > for mulitple normal pages swapout. With that, it might be enough
>> > for fast-storage without needing THP awareness.
>> >
>> > My *point* is let's approach step by step.
>> > First of all, go with batching normal pages swapout and if it's
>> > not enough, dive into further optimization like introducing
>> > THP-aware swapout.
>> >
>> > I believe it's natural development process to evolve things
>> > without over-engineering.
>> 
>> My target is not only the THP swap out acceleration, but also the full
>> THP swap out/in support without splitting THP.  This patchset is just
>> the first step of the full THP swap support.
>> 
>> >> > For example, if the nr_pages is larger than SWAPFILE_CLUSTER, we
>> >> > can try to allocate new cluster. With that, we could allocate new
>> >> > clusters to meet nr_pages requested or bail out if we fail to allocate
>> >> > and fallback to 0-order page swapout. With that, swap layer could
>> >> > support multiple order-0 pages by batch.
>> >> >
>> >> > IMO, I really want to land Tim Chen's batching swapout work first.
>> >> > With Tim Chen's work, I expect we can make better refactoring
>> >> > for batching swap before adding more confuse to the swap layer.
>> >> > (I expect it would share several pieces of code for or would be base
>> >> > for batching allocation of swapcache, swapslot)
>> >> 
>> >> I don't think there is hard conflict between normal pages swapping
>> >> optimizing and THP swap optimizing.  Some code may be shared between
>> >> them.  That is good for both sides.
>> >> 
>> >> > After that, we could enhance swap for big contiguous batching
>> >> > like THP and finally we might make it be aware of THP specific to
>> >> > enhance further.
>> >> >
>> >> > A thing I remember you aruged: you want to swapin 512 pages
>> >> > all at once unconditionally. It's really worth to discuss if
>> >> > your design is going for the way.
>> >> > I doubt it's generally good idea. Because, currently, we try to
>> >> > swap in swapped out pages in THP page with conservative approach
>> >> > but your direction is going to opposite way.
>> >> >
>> >> > [mm, thp: convert from optimistic swapin collapsing to conservative]
>> >> >
>> >> > I think general approach(i.e., less effective than targeting
>> >> > implement for your own specific goal but less hacky and better job
>> >> > for many cases) is to rely/improve on the swap readahead.
>> >> > If most of subpages of a THP page are really workingset, swap readahead
>> >> > could work well.
>> >> >
>> >> > Yeah, it's fairly vague feedback so sorry if I miss something clear.
>> >> 
>> >> Yes.  I want to go to the direction that to swap in 512 pages together.
>> >> And I think it is a good opportunity to discuss that now.  The advantages
>> >> of swapping in 512 pages together are:
>> >> 
>> >> - Improve the performance of swapping in IO via turning small read size
>> >>   into 512 pages big read size.
>> >> 
>> >> - Keep THP across swap out/in.  With the memory size become more and
>> >>   more large, the 4k pages bring more and more burden to memory
>> >>   management.  One solution is to use 2M pages as much as possible, that
>> >>   will reduce the management burden greatly, such as much reduced length
>> >>   of LRU list, etc.
>> >> 
>> >> The disadvantage are:
>> >> 
>> >> - Increase the memory pressure when swap in THP.
>> >> 
>> >> - Some pages swapped in may not needed in the near future.
>> >> 
>> >> Because of the disadvantages, the 512 pages swapping in should be made
>> >> optional.  But I don't think we should make it impossible.
>> >
>> > Yeb. No need to make it impossible but your design shouldn't be coupled
>> > with non-existing feature yet.
>> 
>> Sorry, what is the "non-existing feature"?  The full THP swap out/in
>
> THP swapin.
>
> You said you increased cluster size to fit a THP size for recording
> some meta in there for THP swapin.

And to find the head of the THP to swap in the whole THP when an address
in the middle of a THP is accessed.

> You gave number about how scale bad current swapout so try to enhance
> that path. I agree it alghouth I don't like your approach for first step.
> However, you didn't give any clue why we should swap in a THP. How bad
> current conservative swapin from khugepagd is really bad and why cannot
> enhance that.
>
>> support without splitting THP?  If so, this patchset is the just the
>> first step of that.  I plan to finish the the full THP swap out/in
>> support in 3 steps:
>> 
>> 1. Delay splitting the THP after adding it into swap cache
>> 
>> 2. Delay splitting the THP after swapping out being completed
>> 
>> 3. Avoid splitting the THP during swap out, and swap in the full THP if
>>    possible
>> 
>> I plan to do it step by step to make it easier to review the code.
>
> 1. If we solve batching swapout, then how is THP split for swapout bad?
> 2. Also, how is current conservatie swapin from khugepaged bad?
>
> I think it's one of decision point for the motivation of your work
> and for 1, we need batching swapout feature.
>
> I am saying again that I'm not against your goal but only concern
> is approach. If you don't agree, please ignore me.

I am glad to discuss my final goal, that is, swapping out/in the full
THP without splitting.  Why I want to do that is copied as below,

>> >> The advantages of swapping in 512 pages together are:
>> >> 
>> >> - Improve the performance of swapping in IO via turning small read size
>> >>   into 512 pages big read size.
>> >> 
>> >> - Keep THP across swap out/in.  With the memory size become more and
>> >>   more large, the 4k pages bring more and more burden to memory
>> >>   management.  One solution is to use 2M pages as much as possible, that
>> >>   will reduce the management burden greatly, such as much reduced length
>> >>   of LRU list, etc.

- Avoid CPU time for splitting, collapsing THP across swap out/in.

>> >> 
>> >> The disadvantage are:
>> >> 
>> >> - Increase the memory pressure when swap in THP.
>> >> 
>> >> - Some pages swapped in may not needed in the near future.

I think it is important to use 2M pages as much as possible to deal with
the big memory problem.  Do you agree?

Best Regards,
Huang, Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-09-13  8:53 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-07 16:45 Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 01/10] mm, swap: Make swap cluster size same of THP size on x86_64 Huang, Ying
2016-09-08  5:45   ` Anshuman Khandual
2016-09-08 18:07     ` Huang, Ying
2016-09-19 17:09     ` Johannes Weiner
2016-09-20  2:01       ` Huang, Ying
2016-09-22 19:25         ` Johannes Weiner
2016-09-23  8:47           ` Huang, Ying
2016-09-08  8:21   ` Anshuman Khandual
2016-09-08 11:03   ` Kirill A. Shutemov
2016-09-08 17:39     ` Huang, Ying
2016-09-08 11:07   ` Kirill A. Shutemov
2016-09-08 17:23     ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 02/10] mm, memcg: Add swap_cgroup_iter iterator Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 03/10] mm, memcg: Support to charge/uncharge multiple swap entries Huang, Ying
2016-09-08  5:46   ` Anshuman Khandual
2016-09-08  8:28   ` Anshuman Khandual
2016-09-08 18:15     ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 04/10] mm, THP, swap: Add swap cluster allocate/free functions Huang, Ying
2016-09-08  5:49   ` Anshuman Khandual
2016-09-08  8:30   ` Anshuman Khandual
2016-09-08 18:14     ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 05/10] mm, THP, swap: Add get_huge_swap_page() Huang, Ying
2016-09-08 11:13   ` Kirill A. Shutemov
2016-09-08 17:22     ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 06/10] mm, THP, swap: Support to clear SWAP_HAS_CACHE for huge page Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 07/10] mm, THP, swap: Support to add/delete THP to/from swap cache Huang, Ying
2016-09-08  9:00   ` Anshuman Khandual
2016-09-08 18:10     ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 08/10] mm, THP: Add can_split_huge_page() Huang, Ying
2016-09-08 11:17   ` Kirill A. Shutemov
2016-09-08 17:02     ` Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 09/10] mm, THP, swap: Support to split THP in swap cache Huang, Ying
2016-09-07 16:46 ` [PATCH -v3 10/10] mm, THP, swap: Delay splitting THP during swap out Huang, Ying
2016-09-09  5:43 ` [PATCH -v3 00/10] THP swap: Delay splitting THP during swapping out Minchan Kim
2016-09-09 15:53   ` Tim Chen
2016-09-09 20:35   ` Huang, Ying
2016-09-13  6:13     ` Minchan Kim
2016-09-13  6:40       ` Huang, Ying
2016-09-13  7:05         ` Minchan Kim
2016-09-13  8:53           ` Huang, Ying [this message]
2016-09-13  9:16             ` Minchan Kim
2016-09-13 23:52               ` Chen, Tim C
2016-09-19  7:11                 ` Minchan Kim
2016-09-19 15:59                   ` Tim Chen
2016-09-18  1:53               ` Huang, Ying
2016-09-19  7:08                 ` Minchan Kim
2016-09-20  2:54                   ` Huang, Ying
2016-09-20  5:06                     ` Minchan Kim
2016-09-20  5:28                       ` Huang, Ying
2016-09-13 14:35             ` Andrea Arcangeli
2016-09-19 17:33 ` Hugh Dickins
2016-09-22 22:56 ` Shaohua Li
2016-09-22 23:49   ` Chen, Tim C
2016-09-22 23:53     ` Andi Kleen
2016-09-23  0:38   ` Rik van Riel
2016-09-23  2:32     ` Huang, Ying
2016-09-25 19:18       ` Shaohua Li
2016-09-26  1:06         ` Minchan Kim
2016-09-26  3:25         ` Huang, Ying
2016-09-23  2:12   ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vay0ji3m.fsf@yhuang-mobile.sh.intel.com \
    --to=ying.huang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    --cc=tim.c.chen@intel.com \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox