linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	 Yang Shi <yang.shi@linux.alibaba.com>,
	David Rientjes <rientjes@google.com>,
	 Huang Ying <ying.huang@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	 David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH 00/10] [v6] Migrate Pages in lieu of discard
Date: Mon, 8 Mar 2021 16:34:26 -0800	[thread overview]
Message-ID: <CAHbLzkofXg0CnCBYdtWf3cE8Do=B35ZsupV01EmR1SX5=7BHjw@mail.gmail.com> (raw)
In-Reply-To: <20210304235949.7922C1C3@viggo.jf.intel.com>

On Thu, Mar 4, 2021 at 4:00 PM Dave Hansen <dave.hansen@linux.intel.com> wrote:
>
>
> The full series is also available here:
>
>         https://github.com/hansendc/linux/tree/automigrate-20210304
>
> which also inclues some vm.zone_reclaim_mode sysctl ABI fixup
> prerequisites.
>
> The meat of this patch is in:
>
>         [PATCH 05/10] mm/migrate: demote pages during reclaim
>
> Which also has the most changes since the last post.  This version is
> mostly to address review comments from Yang Shi and Oscar Salvador.
> Review comments are documented in the individual patch changelogs.
>
> This also contains a few prerequisite patches that fix up an issue
> with the vm.zone_reclaim_mode sysctl ABI.
>
> Changes since (automigrate-20210122):
>  * move from GFP_HIGHUSER -> GFP_HIGHUSER_MOVABLE since pages *are*
>    movable.
>  * Separate out helpers that check for being able to relaim anonymous
>    pages versus being able to meaningfully scan the anon LRU.
>
> --
>
> We're starting to see systems with more and more kinds of memory such
> as Intel's implementation of persistent memory.
>
> Let's say you have a system with some DRAM and some persistent memory.
> Today, once DRAM fills up, reclaim will start and some of the DRAM
> contents will be thrown out.  Allocations will, at some point, start
> falling over to the slower persistent memory.
>
> That has two nasty properties.  First, the newer allocations can end
> up in the slower persistent memory.  Second, reclaimed data in DRAM
> are just discarded even if there are gobs of space in persistent
> memory that could be used.
>
> This set implements a solution to these problems.  At the end of the
> reclaim process in shrink_page_list() just before the last page
> refcount is dropped, the page is migrated to persistent memory instead
> of being dropped.
>
> While I've talked about a DRAM/PMEM pairing, this approach would
> function in any environment where memory tiers exist.
>
> This is not perfect.  It "strands" pages in slower memory and never
> brings them back to fast DRAM.  Other things need to be built to
> promote hot pages back to DRAM.
>
> This is also all based on an upstream mechanism that allows
> persistent memory to be onlined and used as if it were volatile:
>
>         http://lkml.kernel.org/r/20190124231441.37A4A305@viggo.jf.intel.com
>
> == Open Issues ==
>
>  * For cpusets and memory policies that restrict allocations
>    to PMEM, is it OK to demote to PMEM?  Do we need a cgroup-
>    level API to opt-in or opt-out of these migrations?

I'm wondering if such usecases, which don't want to have memory
allocate on pmem, will allow memory swapped out or reclaimed? If swap
is allowed then I failed to see why migrating to pmem should be
disallowed. If swap is not allowed, they should call mlock, then the
memory won't be migrated to pmem as well.

>  * Could be more aggressive about where anon LRU scanning occurs
>    since it no longer necessarily involves I/O.  get_scan_count()
>    for instance says: "If we have no swap space, do not bother
>    scanning anon pages"

Yes, I agree. Johannes's patchset
(https://lore.kernel.org/linux-mm/20200520232525.798933-1-hannes@cmpxchg.org/#r)
has lifted the swappiness to 200 so anonymous lru could be scanned
more aggressively. We definitely could tweak this if needed.

>
> --
>
>  Documentation/admin-guide/sysctl/vm.rst |    9
>  include/linux/migrate.h                 |   20 +
>  include/linux/swap.h                    |    3
>  include/linux/vm_event_item.h           |    2
>  include/trace/events/migrate.h          |    3
>  include/uapi/linux/mempolicy.h          |    1
>  mm/compaction.c                         |    3
>  mm/gup.c                                |    4
>  mm/internal.h                           |    5
>  mm/memory-failure.c                     |    4
>  mm/memory_hotplug.c                     |    4
>  mm/mempolicy.c                          |    8
>  mm/migrate.c                            |  369 +++++++++++++++++++++++++++++---
>  mm/page_alloc.c                         |   13 -
>  mm/vmscan.c                             |  173 +++++++++++++--
>  mm/vmstat.c                             |    2
>  16 files changed, 560 insertions(+), 63 deletions(-)
>
> --
>
> Changes since (automigrate-20200818):
>  * Fall back to normal reclaim when demotion fails
>  * Fix some compile issues, when page migration and NUMA are off
>
> Changes since (automigrate-20201007):
>  * separate out checks for "can scan anon LRU" from "can actually
>    swap anon pages right now".  Previous series conflated them
>    and may have been overly aggressive scanning LRU
>  * add MR_DEMOTION to tracepoint header
>  * remove unnecessary hugetlb page check
>
> Changes since (https://lwn.net/Articles/824830/):
>  * Use higher-level migrate_pages() API approach from Yang Shi's
>    earlier patches.
>  * made sure to actually check node_reclaim_mode's new bit
>  * disabled migration entirely before introducing RECLAIM_MIGRATE
>  * Replace GFP_NOWAIT with explicit __GFP_KSWAPD_RECLAIM and
>    comment why we want that.
>  * Comment on effects of that keep multiple source nodes from
>    sharing target nodes
>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: osalvador <osalvador@suse.de>
> Cc: Huang Ying <ying.huang@intel.com>
>
>


  parent reply	other threads:[~2021-03-09  0:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 23:59 Dave Hansen
2021-03-04 23:59 ` [PATCH 01/10] mm/numa: node demotion data structure and lookup Dave Hansen
2021-03-08 23:58   ` Yang Shi
2021-03-04 23:59 ` [PATCH 02/10] mm/numa: automatically generate node migration order Dave Hansen
2021-03-08 23:59   ` Yang Shi
2021-03-04 23:59 ` [PATCH 03/10] mm/migrate: update node demotion order during on hotplug events Dave Hansen
2021-03-09  0:03   ` Yang Shi
2021-03-09 22:07     ` Dave Hansen
2021-03-04 23:59 ` [PATCH 04/10] mm/migrate: make migrate_pages() return nr_succeeded Dave Hansen
2021-03-09  0:05   ` Yang Shi
2021-03-04 23:59 ` [PATCH 05/10] mm/migrate: demote pages during reclaim Dave Hansen
2021-03-09  0:10   ` Yang Shi
2021-03-09 23:05     ` Dave Hansen
2021-03-05  0:00 ` [PATCH 06/10] mm/vmscan: add page demotion counter Dave Hansen
2021-03-09  0:11   ` Yang Shi
2021-03-05  0:00 ` [PATCH 07/10] mm/vmscan: add helper for querying ability to age anonymous pages Dave Hansen
2021-03-09  0:14   ` Yang Shi
2021-03-20  4:05   ` Greg Thelen
2021-03-05  0:00 ` [PATCH 08/10] mm/vmscan: Consider anonymous pages without swap Dave Hansen
2021-03-09  0:17   ` Yang Shi
2021-03-09 23:08     ` Dave Hansen
2021-03-05  0:00 ` [PATCH 09/10] mm/vmscan: never demote for memcg reclaim Dave Hansen
2021-03-09  0:17   ` Yang Shi
2021-03-05  0:00 ` [PATCH 10/10] mm/migrate: new zone_reclaim_mode to enable reclaim migration Dave Hansen
2021-03-09  0:24   ` Yang Shi
2021-03-09 21:53     ` Dave Hansen
2021-03-09  0:34 ` Yang Shi [this message]
2021-03-09 21:52   ` [PATCH 00/10] [v6] Migrate Pages in lieu of discard Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkofXg0CnCBYdtWf3cE8Do=B35ZsupV01EmR1SX5=7BHjw@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox