linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: kernel test robot <yujie.liu@intel.com>
Cc: Liam Howlett <liam.howlett@oracle.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [linus:master] [maple_tree] 120b116208: INFO:task_blocked_for_more_than#seconds
Date: Tue, 31 Jan 2023 12:26:35 -0800	[thread overview]
Message-ID: <20230131202635.GA3019407@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <202301310940.4a37c7af-yujie.liu@intel.com>

On Tue, Jan 31, 2023 at 03:18:22PM +0800, kernel test robot wrote:
> Hi Liam,
> 
> We caught a "task blocked" dmesg in maple tree test. Not sure if this
> is expected for maple tree test, so we are sending this report for
> your information. Thanks.
> 
> Greeting,
> 
> FYI, we noticed INFO:task_blocked_for_more_than#seconds due to commit (built with clang-14):
> 
> commit: 120b116208a0877227fc82e3f0df81e7a3ed4ab1 ("maple_tree: reorganize testing to restore module testing")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> [   17.318428][    T1] calling  maple_tree_seed+0x0/0x15d0 @ 1
> [   17.319219][    T1] 
> [   17.319219][    T1] TEST STARTING
> [   17.319219][    T1] 
> [  999.249871][   T23] INFO: task rcu_scale_shutd:59 blocked for more than 491 seconds.
> [  999.253363][   T23]       Not tainted 6.1.0-rc4-00003-g120b116208a0 #1
> [  999.254249][   T23] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  999.255390][   T23] task:rcu_scale_shutd state:D stack:30968 pid:59    ppid:2      flags:0x00004000
> [  999.256934][   T23] Call Trace:
> [  999.257418][   T23]  <TASK>
> [  999.257900][   T23]  __schedule+0x169b/0x1f90
> [  999.261677][   T23]  schedule+0x151/0x300
> [  999.262281][   T23]  ? compute_real+0xe0/0xe0
> [  999.263364][   T23]  rcu_scale_shutdown+0xdd/0x130
> [  999.264093][   T23]  ? wake_bit_function+0x2c0/0x2c0
> [  999.268985][   T23]  kthread+0x309/0x3a0
> [  999.269958][   T23]  ? compute_real+0xe0/0xe0
> [  999.270552][   T23]  ? kthread_unuse_mm+0x200/0x200
> [  999.271281][   T23]  ret_from_fork+0x1f/0x30
> [  999.272385][   T23]  </TASK>
> [  999.272865][   T23] 
> [  999.272865][   T23] Showing all locks held in the system:
> [  999.273988][   T23] 2 locks held by swapper/0/1:
> [  999.274684][   T23] 1 lock held by khungtaskd/23:
> [  999.275400][   T23]  #0: ffffffff88346e00 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x8/0x30
> [  999.277171][   T23] 
> [  999.277525][   T23] =============================================
> [  999.277525][   T23] 
> [ 1049.050884][    T1] maple_tree: 12610686 of 12610686 tests passed
> 
> 
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <yujie.liu@intel.com>
> | Link: https://lore.kernel.org/oe-lkp/202301310940.4a37c7af-yujie.liu@intel.com

Liam brought this to my attention on IRC, and it looks like the root
cause is that the rcuscale code does not deal gracefully with grace
periods that are in much excess of a second in duration.

Now, it might well be worth looking into why the grace periods were taking
that long, but if you were running Maple Tree stress tests concurrently
with rcuscale, this might well be expected behavior.

So, does the patch below clear this up for you?

							Thanx, Paul

------------------------------------------------------------------------

commit 8e44d51e3411994091f7c7c136286d82c5757a4a
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Tue Jan 31 12:08:54 2023 -0800

    rcuscale: Move shutdown from wait_event() to wait_event_idle()
    
    The rcu_scale_shutdown() and kfree_scale_shutdown() kthreads/functions
    use wait_event() to wait for the rcuscale test to complete.  However,
    each updater thread in such a test waits for at least 100 grace periods.
    If each grace period takes more than 1.2 seconds, which is long, but
    not insanely so, this can trigger the hung-task timeout.
    
    This commit therefore replaces those wait_event() calls with calls to
    wait_event_idle(), which do not trigger the hung-task timeout.
    
    Reported-by: kernel test robot <yujie.liu@intel.com>
    Reported-by: Liam Howlett <liam.howlett@oracle.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 91fb5905a008f..4120f94030c3c 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -631,8 +631,7 @@ static int compute_real(int n)
 static int
 rcu_scale_shutdown(void *arg)
 {
-	wait_event(shutdown_wq,
-		   atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
+	wait_event_idle(shutdown_wq, atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
 	smp_mb(); /* Wake before output. */
 	rcu_scale_cleanup();
 	kernel_power_off();
@@ -771,8 +770,8 @@ kfree_scale_cleanup(void)
 static int
 kfree_scale_shutdown(void *arg)
 {
-	wait_event(shutdown_wq,
-		   atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
+	wait_event_idle(shutdown_wq,
+			atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
 
 	smp_mb(); /* Wake before output. */
 


  reply	other threads:[~2023-01-31 20:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31  7:18 kernel test robot
2023-01-31 20:26 ` Paul E. McKenney [this message]
2023-01-31 20:45   ` Liam R. Howlett
2023-01-31 20:52     ` Paul E. McKenney
2023-02-01  8:37       ` Yujie Liu
2023-02-01 14:36         ` Paul E. McKenney
2023-02-01 14:43           ` Liam R. Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230131202635.GA3019407@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=liam.howlett@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=yujie.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox