Re: [PATCH v3] mm/vmscan: stop the loop if enough pages have been page_out

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: Chen Ridong <chenridong@huaweicloud.com>, <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<chenridong@huawei.com>, <wangweiyang2@huawei.com>,
	Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Yosry Ahmed <yosryahmed@google.com>, Yu Zhao <yuzhao@google.com>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Barry Song <baohua@kernel.org>
Subject: Re: [PATCH v3] mm/vmscan: stop the loop if enough pages have been page_out
Date: Thu, 10 Oct 2024 16:59:10 +0800	[thread overview]
Message-ID: <c3f2c5e2-4804-46e8-86ff-1f6a79ea9a7c@huawei.com> (raw)
In-Reply-To: <20241010081802.290893-1-chenridong@huaweicloud.com>

Hi Ridong,

This should be the first version for upstream, and the issue only
occurred when large folio is spited.

Adding more CCs to see if there's more feedback.


On 2024/10/10 16:18, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> An issue was found with the following testing step:
> 1. Compile with CONFIG_TRANSPARENT_HUGEPAGE=y
> 2. Mount memcg v1, and create memcg named test_memcg and set
>     usage_in_bytes=2.1G, memsw.usage_in_bytes=3G.
> 3. Create a 1G swap file, and allocate 2.2G anon memory in test_memcg.
> 
> It was found that:
> 
> cat memory.usage_in_bytes
> 2144940032
> cat memory.memsw.usage_in_bytes
> 2255056896
> 
> free -h
>                total        used        free
> Mem:           31Gi       2.1Gi        27Gi
> Swap:         1.0Gi       618Mi       405Mi
> 
> As shown above, the test_memcg used about 100M swap, but 600M+ swap memory
> was used, which means that 500M may be wasted because other memcgs can not
> use these swap memory.
> 
> It can be explained as follows:
> 1. When entering shrink_inactive_list, it isolates folios from lru from
>     tail to head. If it just takes folioN from lru(make it simple).
> 
>     inactive lru: folio1<->folio2<->folio3...<->folioN-1
>     isolated list: folioN
> 
> 2. In shrink_page_list function, if folioN is THP, it may be splited and
>     added to swap cache folio by folio. After adding to swap cache, it will
>     submit io to writeback folio to swap, which is asynchronous.
>     When shrink_page_list is finished, the isolated folios list will be
>     moved back to the head of inactive lru. The inactive lru may just look
>     like this, with 512 filioes have been move to the head of inactive lru.
> 
>     folioN512<->folioN511<->...filioN1<->folio1<->folio2...<->folioN-1
> 
> 3. When folio writeback io is completed, the folio may be rotated to tail
>     of lru. The following lru list is expected, with those filioes that have
>     been added to swap cache are rotated to tail of lru. So those folios
>     can be reclaimed as soon as possible.
> 
>     folio1<->folio2<->...<->folioN-1<->filioN1<->...folioN511<->folioN512
> 
> 4. However, shrink_page_list and folio writeback are asynchronous. If THP
>     is splited, shrink_page_list loops at least 512 times, which means that
>     shrink_page_list is not completed but some folios writeback have been
>     completed, and this may lead to failure to rotate these folios to the
>     tail of lru. The lru may look likes as below:
> 
>     folioN50<->folioN49<->...filioN1<->folio1<->folio2...<->folioN-1<->
>     folioN51<->folioN52<->...folioN511<->folioN512
> 
>     Although those folios (N1-N50) have been finished writing back, they
>     are still at the head of lru. When isolating folios from lru, it scans
>     from tail to head, so it is difficult to scan those folios again.
> 
> What mentioned above may lead to a large number of folios have been added
> to swap cache but can not be reclaimed in time, which may reduce reclaim
> efficiency and prevent other memcgs from using this swap memory even if
> they trigger OOM.
> 
> To fix this issue, it's better to stop looping if THP has been splited and
> nr_pageout is greater than nr_to_reclaim.
> 
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   mm/vmscan.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 749cdc110c74..fd8ad251eda2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1047,7 +1047,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   	LIST_HEAD(demote_folios);
>   	unsigned int nr_reclaimed = 0;
>   	unsigned int pgactivate = 0;
> -	bool do_demote_pass;
> +	bool do_demote_pass, splited = false;
>   	struct swap_iocb *plug = NULL;
>   
>   	folio_batch_init(&free_folios);
> @@ -1065,6 +1065,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   
>   		cond_resched();
>   
> +		/*
> +		 * If a large folio has been split, many folios are added
> +		 * to folio_list. Looping through the entire list takes
> +		 * too much time, which may prevent folios that have completed
> +		 * writeback from rotateing to the tail of the lru. Just
> +		 * stop looping if nr_pageout is greater than nr_to_reclaim.
> +		 */
> +		if (unlikely(splited && stat->nr_pageout > sc->nr_to_reclaim))
> +			break;
> +
>   		folio = lru_to_folio(folio_list);
>   		list_del(&folio->lru);
>   
> @@ -1273,6 +1283,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   		if ((nr_pages > 1) && !folio_test_large(folio)) {
>   			sc->nr_scanned -= (nr_pages - 1);
>   			nr_pages = 1;
> +			splited = true;
>   		}
>   
>   		/*
> @@ -1375,12 +1386,14 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   				if (nr_pages > 1 && !folio_test_large(folio)) {
>   					sc->nr_scanned -= (nr_pages - 1);
>   					nr_pages = 1;
> +					splited = true;
>   				}
>   				goto activate_locked;
>   			case PAGE_SUCCESS:
>   				if (nr_pages > 1 && !folio_test_large(folio)) {
>   					sc->nr_scanned -= (nr_pages - 1);
>   					nr_pages = 1;
> +					splited = true;
>   				}
>   				stat->nr_pageout += nr_pages;
>   
> @@ -1491,6 +1504,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   		if (nr_pages > 1) {
>   			sc->nr_scanned -= (nr_pages - 1);
>   			nr_pages = 1;
> +			splited = true;
>   		}
>   activate_locked:
>   		/* Not a candidate for swapping, so reclaim swap space. */

next prev parent reply	other threads:[~2024-10-10  8:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10  8:18 Chen Ridong
2024-10-10  8:59 ` Kefeng Wang [this message]
2024-10-10  9:28   ` chenridong
2024-10-10 16:17   ` Barry Song
2024-10-11  6:49     ` chenridong
2024-10-21  4:44       ` Barry Song
2024-10-21  8:14         ` Chen Ridong
2024-10-21  9:42           ` Barry Song
2024-10-21  9:56             ` chenridong
2024-10-21 10:09               ` Barry Song
2024-10-21 10:45                 ` Barry Song
2024-11-01  8:49                   ` 回复: " 解 咏梅
2024-11-14 12:56                     ` chenridong
2024-10-21 12:15                 ` chenridong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3f2c5e2-4804-46e8-86ff-1f6a79ea9a7c@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=chenridong@huawei.com \
    --cc=chenridong@huaweicloud.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=ryan.roberts@arm.com \
    --cc=wangweiyang2@huawei.com \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox