From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2A01AC433EF
	for <linux-mm@archiver.kernel.org>; Sat, 19 Mar 2022 04:07:48 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 9ECB38D0002; Sat, 19 Mar 2022 00:07:47 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 99BB98D0001; Sat, 19 Mar 2022 00:07:47 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 83D528D0002; Sat, 19 Mar 2022 00:07:47 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19])
	by kanga.kvack.org (Postfix) with ESMTP id 74C078D0001
	for <linux-mm@kvack.org>; Sat, 19 Mar 2022 00:07:47 -0400 (EDT)
Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 1D0A98249980
	for <linux-mm@kvack.org>; Sat, 19 Mar 2022 04:07:47 +0000 (UTC)
X-FDA: 79259802174.31.0C83006
Received: from loongson.cn (mail.loongson.cn [114.242.206.163])
	by imf04.hostedemail.com (Postfix) with ESMTP id 38A0D40025
	for <linux-mm@kvack.org>; Sat, 19 Mar 2022 04:07:44 +0000 (UTC)
Received: from localhost.localdomain (unknown [10.20.42.95])
	by mail.loongson.cn (Coremail) with SMTP id AQAAf9AxOswJVzVihfYLAA--.10370S3;
	Sat, 19 Mar 2022 12:07:38 +0800 (CST)
From: wangjianxing <wangjianxing@loongson.cn>
Subject: Re: [PATCH v2 1/1] mm/mmu_gather: limit free batch count and add
 schedule point in tlb_batch_pages_flush
To: Andrew Morton <akpm@linux-foundation.org>
Cc: peterz@infradead.org, will@kernel.org, aneesh.kumar@linux.ibm.com,
 npiggin@gmail.com, linux-arch@vger.kernel.org, linux-mm@kvack.org,
 linux-kernel@vger.kernel.org
References: <20220317072857.2635262-1-wangjianxing@loongson.cn>
 <20220317164011.27d7341715de12d890ca244a@linux-foundation.org>
Message-ID: <fc5a1fd2-c4cf-7a91-cb6a-f1e525b3bc7a@loongson.cn>
Date: Sat, 19 Mar 2022 12:07:37 +0800
User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <20220317164011.27d7341715de12d890ca244a@linux-foundation.org>
Content-Type: multipart/alternative;
 boundary="------------F154C732B419D8DA6861C10A"
Content-Language: en-US
X-CM-TRANSID:AQAAf9AxOswJVzVihfYLAA--.10370S3
X-Coremail-Antispam: 1UD129KBjvJXoWxZFy8trW7Gw15Kw1rJr13urg_yoW5AF18p3
	y5XrsFyr4rG3yrtw42y3Wvkry2van5Wa95JrykGrZxZwsxJ342gFyktwnI9F47Gr4rA3yf
	JF4DXa40gF4DZF7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
	9KBjDU0xBIdaVrnRJUUUkEb7Iv0xC_Kw4lb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I2
	0VC2zVCF04k26cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rw
	A2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xII
	jxv20xvEc7CjxVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I
	8E87Iv6xkF7I0E14v26F4UJVW0owAS0I0E0xvYzxvE52x082IY62kv0487McIj6xIIjxv2
	0xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7
	xvr2IY64vIr41l7480Y4vEI4kI2Ix0rVAqx4xJMxk0xIA0c2IEe2xFo4CEbIxvr21lc2xS
	Y4AK6svPMxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrV
	AFwI0_JrI_JrWlx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCI
	c40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267
	AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_WFyUJVCq3wCI42IY6I8E87Iv67AKxVWU
	JVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07b8Z2
	-UUUUU=
X-CM-SenderInfo: pzdqwyxldq5xtqj6z05rqj20fqof0/
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: 38A0D40025
X-Stat-Signature: xswnrshmqya7z9e9ekiayuk5gu6oyjs9
Authentication-Results: imf04.hostedemail.com;
	dkim=none;
	spf=pass (imf04.hostedemail.com: domain of wangjianxing@loongson.cn designates 114.242.206.163 as permitted sender) smtp.mailfrom=wangjianxing@loongson.cn;
	dmarc=none
X-Rspam-User: 
X-HE-Tag: 1647662864-64131
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

This is a multi-part message in MIME format.
--------------F154C732B419D8DA6861C10A
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

On 03/18/2022 07:40 AM, Andrew Morton wrote:
> On Thu, 17 Mar 2022 03:28:57 -0400 Jianxing Wang<wangjianxing@loongson.cn>  wrote:
>
>> free a large list of pages maybe cause rcu_sched starved on
>> non-preemptible kernels. howerver free_unref_page_list maybe can't
>> cond_resched as it maybe called in interrupt or atomic context,
>> especially can't detect atomic context in CONFIG_PREEMPTION=n.
>>
>> tlb flush batch count depends on PAGE_SIZE, it's too large if
>> PAGE_SIZE > 4K, here limit free batch count with 512.
>> And add schedule point in tlb_batch_pages_flush.
>>
>> rcu: rcu_sched kthread starved for 5359 jiffies! g454793 f0x0
>> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=19
>> [...]
>> Call Trace:
>>     free_unref_page_list+0x19c/0x270
>>     release_pages+0x3cc/0x498
>>     tlb_flush_mmu_free+0x44/0x70
>>     zap_pte_range+0x450/0x738
>>     unmap_page_range+0x108/0x240
>>     unmap_vmas+0x74/0xf0
>>     unmap_region+0xb0/0x120
>>     do_munmap+0x264/0x438
>>     vm_munmap+0x58/0xa0
>>     sys_munmap+0x10/0x20
>>     syscall_common+0x24/0x38
> tlb_batch_pages_flush() doesn't appear in this trace.  I assume the call
> sequence is
>
> zap_pte_range
> ->tlb_flush_mmu
>    ->tlb_flush_mmu_free
>
> correct?
Yeah, you are right.
>> --- a/mm/mmu_gather.c
>> +++ b/mm/mmu_gather.c
>> @@ -47,8 +47,20 @@ static void tlb_batch_pages_flush(struct mmu_gather *tlb)
>>   	struct mmu_gather_batch *batch;
>>   
>>   	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
>> -		free_pages_and_swap_cache(batch->pages, batch->nr);
>> -		batch->nr = 0;
>> +		struct page **pages = batch->pages;
>> +
>> +		do {
>> +			/*
>> +			 * limit free batch count when PAGE_SIZE > 4K
>> +			 */
>> +			unsigned int nr = min(512U, batch->nr);
>> +
>> +			free_pages_and_swap_cache(pages, nr);
>> +			pages += nr;
>> +			batch->nr -= nr;
>> +
>> +			cond_resched();
>> +		} while (batch->nr);
>>   	}
> The patch looks safe enough.  But again, it's unlikely to work if the
> calling task has realtime policy.  The same can be said of the
> cond_resched() in zap_pte_range(), and presumably many others.
Yes, cond_resched can't work in task with realtime policy, sorry but no 
good idea now.
> I'll save this away for now and will revisit after 5.18-rc1.
>
> How serious is this problem?  Under precisely what circumstances were
> you able to trigger this?  In other words, do you believe that a
> backport into -stable kernels is needed and if so, why?
>
> Thanks.
>
The issue is detected in guest with kvm cpu 200% overcommit, however I 
didn't see the warning in the host with the same application.
I'm sure that the patch is needed for guest kernel, but no sure for host.

 >Under precisely what circumstances were you able to trigger this?
setup two virtual machines in one host machine, per vm has the same 
number cpu and half memory of host.
the run ltpstress.sh in per vm, then will see rcu stall warning.kernel 
is preempt disabled, append kernel command 'preempt=none' if enable 
dynamic preempt .
It could detected in loongson machine(32 core, 128G mem) and ProLiant 
DL380 Gen9(x86 E5-2680, 28 core, 64G mem)


--------------F154C732B419D8DA6861C10A
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    On 03/18/2022 07:40 AM, Andrew Morton wrote:<br>
    <blockquote type="cite"
      cite="mid:20220317164011.27d7341715de12d890ca244a@linux-foundation.org">
      <pre wrap="">On Thu, 17 Mar 2022 03:28:57 -0400 Jianxing Wang <a class="moz-txt-link-rfc2396E" href="mailto:wangjianxing@loongson.cn">&lt;wangjianxing@loongson.cn&gt;</a> wrote:

</pre>
      <blockquote type="cite">
        <pre wrap="">free a large list of pages maybe cause rcu_sched starved on
non-preemptible kernels. howerver free_unref_page_list maybe can't
cond_resched as it maybe called in interrupt or atomic context,
especially can't detect atomic context in CONFIG_PREEMPTION=n.

tlb flush batch count depends on PAGE_SIZE, it's too large if
PAGE_SIZE &gt; 4K, here limit free batch count with 512.
And add schedule point in tlb_batch_pages_flush.

rcu: rcu_sched kthread starved for 5359 jiffies! g454793 f0x0
RCU_GP_WAIT_FQS(5) -&gt;state=0x0 -&gt;cpu=19
[...]
Call Trace:
   free_unref_page_list+0x19c/0x270
   release_pages+0x3cc/0x498
   tlb_flush_mmu_free+0x44/0x70
   zap_pte_range+0x450/0x738
   unmap_page_range+0x108/0x240
   unmap_vmas+0x74/0xf0
   unmap_region+0xb0/0x120
   do_munmap+0x264/0x438
   vm_munmap+0x58/0xa0
   sys_munmap+0x10/0x20
   syscall_common+0x24/0x38
</pre>
      </blockquote>
      <pre wrap="">tlb_batch_pages_flush() doesn't appear in this trace.  I assume the call
sequence is

zap_pte_range
-&gt;tlb_flush_mmu
  -&gt;tlb_flush_mmu_free

correct?
</pre>
    </blockquote>
    Yeah, you are right.<br>
    <blockquote type="cite"
      cite="mid:20220317164011.27d7341715de12d890ca244a@linux-foundation.org">
      <blockquote type="cite">
        <pre wrap="">--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -47,8 +47,20 @@ static void tlb_batch_pages_flush(struct mmu_gather *tlb)
 	struct mmu_gather_batch *batch;
 
 	for (batch = &amp;tlb-&gt;local; batch &amp;&amp; batch-&gt;nr; batch = batch-&gt;next) {
-		free_pages_and_swap_cache(batch-&gt;pages, batch-&gt;nr);
-		batch-&gt;nr = 0;
+		struct page **pages = batch-&gt;pages;
+
+		do {
+			/*
+			 * limit free batch count when PAGE_SIZE &gt; 4K
+			 */
+			unsigned int nr = min(512U, batch-&gt;nr);
+
+			free_pages_and_swap_cache(pages, nr);
+			pages += nr;
+			batch-&gt;nr -= nr;
+
+			cond_resched();
+		} while (batch-&gt;nr);
 	}
</pre>
      </blockquote>
      <pre wrap="">The patch looks safe enough.  But again, it's unlikely to work if the
calling task has realtime policy.  The same can be said of the
cond_resched() in zap_pte_range(), and presumably many others.
</pre>
    </blockquote>
    Yes, cond_resched can't work in task with realtime policy, sorry but
    no good idea now.<span style="color: rgb(51, 51, 51); font-family:
      Arial, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;,
      STHeiti, &quot;Microsoft YaHei&quot;, &quot;WenQuanYi Micro
      Hei&quot;, sans-serif; font-size: 14px; font-style: normal;
      font-variant-ligatures: normal; font-variant-caps: normal;
      font-weight: 400; letter-spacing: normal; orphans: 2; text-align:
      start; text-indent: 0px; text-transform: none; white-space:
      normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width:
      0px; background-color: rgb(249, 249, 249); text-decoration-style:
      initial; text-decoration-color: initial; display: inline
      !important; float: none;"></span>
    <blockquote type="cite"
      cite="mid:20220317164011.27d7341715de12d890ca244a@linux-foundation.org">
      <pre wrap="">I'll save this away for now and will revisit after 5.18-rc1.

How serious is this problem?  Under precisely what circumstances were
you able to trigger this?  In other words, do you believe that a
backport into -stable kernels is needed and if so, why?

Thanks.

</pre>
    </blockquote>
    <span style="color: rgb(51, 51, 51); font-family: Arial,
      &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, STHeiti,
      &quot;Microsoft YaHei&quot;, &quot;WenQuanYi Micro Hei&quot;,
      sans-serif; font-size: 14px; font-style: normal;
      font-variant-ligatures: normal; font-variant-caps: normal;
      font-weight: 400; letter-spacing: normal; orphans: 2; text-align:
      start; text-indent: 0px; text-transform: none; white-space:
      normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width:
      0px; background-color: rgb(249, 249, 249); text-decoration-style:
      initial; text-decoration-color: initial; display: inline
      !important; float: none;">The issue is detected in guest with kvm
      cpu 200% overcommit, however I didn't see the warning in the host
      with the same application.</span><br>
    <span style="color: rgb(51, 51, 51); font-family: Arial,
      &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, STHeiti,
      &quot;Microsoft YaHei&quot;, &quot;WenQuanYi Micro Hei&quot;,
      sans-serif; font-size: 14px; font-style: normal;
      font-variant-ligatures: normal; font-variant-caps: normal;
      font-weight: 400; letter-spacing: normal; orphans: 2; text-align:
      start; text-indent: 0px; text-transform: none; white-space:
      normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width:
      0px; background-color: rgb(249, 249, 249); text-decoration-style:
      initial; text-decoration-color: initial; display: inline
      !important; float: none;"><span style="color: rgb(51, 51, 51);
        font-family: Arial, &quot;PingFang SC&quot;, &quot;Hiragino Sans
        GB&quot;, STHeiti, &quot;Microsoft YaHei&quot;, &quot;WenQuanYi
        Micro Hei&quot;, sans-serif; font-size: 14px; font-style:
        normal; font-variant-ligatures: normal; font-variant-caps:
        normal; font-weight: 400; letter-spacing: normal; orphans: 2;
        text-align: start; text-indent: 0px; text-transform: none;
        white-space: normal; widows: 2; word-spacing: 0px;
        -webkit-text-stroke-width: 0px; background-color: rgb(249, 249,
        249); text-decoration-style: initial; text-decoration-color:
        initial; display: inline !important; float: none;">I'm sure that
        the patch is needed for guest kernel, but no sure for host.<br>
        <br>
      </span><span style="color: rgb(51, 51, 51); font-family: Arial,
        &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, STHeiti,
        &quot;Microsoft YaHei&quot;, &quot;WenQuanYi Micro Hei&quot;,
        sans-serif; font-size: 14px; font-style: normal;
        font-variant-ligatures: normal; font-variant-caps: normal;
        font-weight: 400; letter-spacing: normal; orphans: 2;
        text-align: start; text-indent: 0px; text-transform: none;
        white-space: normal; widows: 2; word-spacing: 0px;
        -webkit-text-stroke-width: 0px; background-color: rgb(249, 249,
        249); text-decoration-style: initial; text-decoration-color:
        initial; display: inline !important; float: none;"> </span></span>&gt;Under
    precisely what circumstances were you able to trigger this?<br>
    setup two virtual machines in one host machine, per vm has the same
    number cpu and half memory of host.<br>
    the run ltpstress.sh in per vm, then will see rcu stall warning.<span
      style="color: rgb(51, 51, 51); font-family: Arial, &quot;PingFang
      SC&quot;, &quot;Hiragino Sans GB&quot;, STHeiti, &quot;Microsoft
      YaHei&quot;, &quot;WenQuanYi Micro Hei&quot;, sans-serif;
      font-size: 14px; font-style: normal; font-variant-ligatures:
      normal; font-variant-caps: normal; font-weight: 400;
      letter-spacing: normal; orphans: 2; text-align: start;
      text-indent: 0px; text-transform: none; white-space: normal;
      widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
      background-color: rgb(249, 249, 249); text-decoration-style:
      initial; text-decoration-color: initial; display: inline
      !important; float: none;"> kernel is preempt disabled, append
      kernel command 'preempt=none' </span><span style="color: rgb(51,
      51, 51); font-family: Arial, &quot;PingFang SC&quot;,
      &quot;Hiragino Sans GB&quot;, STHeiti, &quot;Microsoft
      YaHei&quot;, &quot;WenQuanYi Micro Hei&quot;, sans-serif;
      font-size: 14px; font-style: normal; font-variant-ligatures:
      normal; font-variant-caps: normal; font-weight: 400;
      letter-spacing: normal; orphans: 2; text-align: start;
      text-indent: 0px; text-transform: none; white-space: normal;
      widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
      background-color: rgb(249, 249, 249); text-decoration-style:
      initial; text-decoration-color: initial; display: inline
      !important; float: none;"><span style="color: rgb(51, 51, 51);
        font-family: Arial, &quot;PingFang SC&quot;, &quot;Hiragino Sans
        GB&quot;, STHeiti, &quot;Microsoft YaHei&quot;, &quot;WenQuanYi
        Micro Hei&quot;, sans-serif; font-size: 14px; font-style:
        normal; font-variant-ligatures: normal; font-variant-caps:
        normal; font-weight: 400; letter-spacing: normal; orphans: 2;
        text-align: start; text-indent: 0px; text-transform: none;
        white-space: normal; widows: 2; word-spacing: 0px;
        -webkit-text-stroke-width: 0px; background-color: rgb(249, 249,
        249); text-decoration-style: initial; text-decoration-color:
        initial; display: inline !important; float: none;"> if enable
        dynamic preempt </span>.<br>
      It could detected in loongson machine(</span><span style="color:
      rgb(51, 51, 51); font-family: Arial, &quot;PingFang SC&quot;,
      &quot;Hiragino Sans GB&quot;, STHeiti, &quot;Microsoft
      YaHei&quot;, &quot;WenQuanYi Micro Hei&quot;, sans-serif;
      font-size: 14px; font-style: normal; font-variant-ligatures:
      normal; font-variant-caps: normal; font-weight: 400;
      letter-spacing: normal; orphans: 2; text-align: start;
      text-indent: 0px; text-transform: none; white-space: normal;
      widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
      background-color: rgb(249, 249, 249); text-decoration-style:
      initial; text-decoration-color: initial; display: inline
      !important; float: none;"><span style="color: rgb(51, 51, 51);
        font-family: Arial, &quot;PingFang SC&quot;, &quot;Hiragino Sans
        GB&quot;, STHeiti, &quot;Microsoft YaHei&quot;, &quot;WenQuanYi
        Micro Hei&quot;, sans-serif; font-size: 14px; font-style:
        normal; font-variant-ligatures: normal; font-variant-caps:
        normal; font-weight: 400; letter-spacing: normal; orphans: 2;
        text-align: start; text-indent: 0px; text-transform: none;
        white-space: normal; widows: 2; word-spacing: 0px;
        -webkit-text-stroke-width: 0px; background-color: rgb(249, 249,
        249); text-decoration-style: initial; text-decoration-color:
        initial; display: inline !important; float: none;">32 core, 128G
        mem</span>) and ProLiant DL380 Gen9(x86 E5-2680, 28 core, 64G
      mem)<br>
    </span><span style="color: rgb(51, 51, 51); font-family: Arial,
      &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, STHeiti,
      &quot;Microsoft YaHei&quot;, &quot;WenQuanYi Micro Hei&quot;,
      sans-serif; font-size: 14px; font-style: normal;
      font-variant-ligatures: normal; font-variant-caps: normal;
      font-weight: 400; letter-spacing: normal; orphans: 2; text-align:
      start; text-indent: 0px; text-transform: none; white-space:
      normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width:
      0px; background-color: rgb(249, 249, 249); text-decoration-style:
      initial; text-decoration-color: initial; display: inline
      !important; float: none;"><br>
    </span>
  </body>
</html>

--------------F154C732B419D8DA6861C10A--