From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AF8DC7EE2F for ; Mon, 12 Jun 2023 14:53:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03B948E0002; Mon, 12 Jun 2023 10:53:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2EC76B0074; Mon, 12 Jun 2023 10:53:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DCE758E0002; Mon, 12 Jun 2023 10:53:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CA4916B0072 for ; Mon, 12 Jun 2023 10:53:36 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8CAD516028B for ; Mon, 12 Jun 2023 14:53:36 +0000 (UTC) X-FDA: 80894389632.01.376542B Received: from mail-oi1-f172.google.com (mail-oi1-f172.google.com [209.85.167.172]) by imf28.hostedemail.com (Postfix) with ESMTP id 656F8C0011 for ; Mon, 12 Jun 2023 14:53:34 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="Tx+Unm/h"; spf=pass (imf28.hostedemail.com: domain of groeck7@gmail.com designates 209.85.167.172 as permitted sender) smtp.mailfrom=groeck7@gmail.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686581614; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xXZUlqp9iu1ElLOxxvAF9ZxKXgH0SiVjPs4yNCri6CY=; b=a6ydWe/zyo14eh3YEkM439zvqrvI7Kj0L2tr3BSgd0HIgmqljPSVMO9Zz7NA2/JczO4oMj 1wiHRVTagWN/KCfDUHUNtYCjoBX++TBRpvVcBX6q7JM4tqFNWpL2e9TovFB8ZYceGT5m0h C2iwDJYWq+WCZOSu/XQ2Vi1V8CYwhbo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686581614; a=rsa-sha256; cv=none; b=d55kxIDXfybr0/RUq5uiEo813Km89XYrNDl74n1YZIqmR3Zr2cv5bO8BHIPa01vjG6LNYl 72AoRFWkQOebcpxLCbyOOKLTt9hPNdiNw4nAGugSZXyH5YyINpGD863hbJtDEMaXfN/pHh ngAbV3g2+AzU7nfUjn1wDJnRSQpGisI= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="Tx+Unm/h"; spf=pass (imf28.hostedemail.com: domain of groeck7@gmail.com designates 209.85.167.172 as permitted sender) smtp.mailfrom=groeck7@gmail.com; dmarc=none Received: by mail-oi1-f172.google.com with SMTP id 5614622812f47-38c35975545so2663286b6e.1 for ; Mon, 12 Jun 2023 07:53:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686581613; x=1689173613; h=content-transfer-encoding:in-reply-to:subject:from:references:cc:to :content-language:user-agent:mime-version:date:message-id:sender :from:to:cc:subject:date:message-id:reply-to; bh=xXZUlqp9iu1ElLOxxvAF9ZxKXgH0SiVjPs4yNCri6CY=; b=Tx+Unm/hPrtYr1cm5fwuKOm6z5/O01iP/qAh+HKbM7hXaGbIhjPA8r+z8El060ncx7 a2Q1DrZzjHlKTVOKphheqCOqAxobblWB1fL7faxEwJgTdq7KH5mWvbxOXvmceo5wt2Mc yIrU6nMzWr31uS33/58dpzUcWTriPQlLLJzITT8glRAVjjN5xgb/17+AAkIu/tiA3vXx Rl8nlKceM+TBHsp2tGDR8QQd1eJacf2K4OAVLYr8h5rAehky4bVPnxL7UBhOkSrUXMD3 Gz1a83ze7HyQ8JV1gwiHWhz1NvvCrKLIovnzSwnDLD878xYAcGaCw1tlH/zVCxEPTY7X XO6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686581613; x=1689173613; h=content-transfer-encoding:in-reply-to:subject:from:references:cc:to :content-language:user-agent:mime-version:date:message-id:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xXZUlqp9iu1ElLOxxvAF9ZxKXgH0SiVjPs4yNCri6CY=; b=UoR3p+pfImrdn0mmAjpr8cl7psRb/MawLWrxFjp4t5Q4rftmRqirqDG8DBhotvu6Dn LTembJgH5yUvkoB5WUQfQec+HhxKa7ezXmHhJszhIUjjpGrLJbwFCYQVixUzCEIdhuV6 01b5NPBKvd/u45xXzGn9AVhIrWecJhLFxAWJHdO2yhbHWsKknHLSM0GkCXwYXxoIVSrB VKpIw3U0XpS0ykPpf2y3JLwLbMf+1U8iKP8JzYMJIhkZ24ZYPgbgYlHIBWx7XSUVRv8T VWefq10z6DLepGuMWV2hlZ4mDtiwz2yGva6M4dgqkJrlCfmFGCs8C8wNsNCjukSNmeBh sDVA== X-Gm-Message-State: AC+VfDzYtKrncva0kGQ0ZkUo/afPQmf7kVwgUxZZg3qxNmBJExDi4SK2 tbaxEogbpjB/hWq46T89wPg= X-Google-Smtp-Source: ACHHUZ6hRVi2pnJLN+sHvQh2tjj20aUUPfQBb9JvRUOPPYM664hoNCeQTgN6hta4lDXxJpIh/T4Dqg== X-Received: by 2002:a05:6808:1291:b0:398:2c03:45fc with SMTP id a17-20020a056808129100b003982c0345fcmr5776211oiw.15.1686581612951; Mon, 12 Jun 2023 07:53:32 -0700 (PDT) Received: from ?IPV6:2600:1700:e321:62f0:329c:23ff:fee3:9d7c? ([2600:1700:e321:62f0:329c:23ff:fee3:9d7c]) by smtp.gmail.com with ESMTPSA id ft4-20020a17090b0f8400b0025c0cd8a91bsm29883pjb.9.2023.06.12.07.53.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Jun 2023 07:53:32 -0700 (PDT) Message-ID: <41ecdf8d-59be-ded0-1ace-0a7cadabbcc3@roeck-us.net> Date: Mon, 12 Jun 2023 07:53:30 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Content-Language: en-US To: Vincent Whitchurch , Wim Van Sebroeck , Andrew Morton Cc: linux-watchdog@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel@axis.com References: <20230608-pretimeout-oom-v2-1-581f0ad0e4f3@axis.com> From: Guenter Roeck Subject: Re: [PATCH v2] watchdog/mm: Allow dumping memory info in pretimeout In-Reply-To: <20230608-pretimeout-oom-v2-1-581f0ad0e4f3@axis.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 656F8C0011 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: cmujpxyre86qm6oizwae9rtm6itediny X-HE-Tag: 1686581614-806119 X-HE-Meta: U2FsdGVkX19Ri9Qs7KUgLQgxAyFyr8tckMrUZqMkPgmF8l0KOfmuezdhvQuusSBOtIziGq1wTz7VrQqhXUQPxf5H/eVLvlw15ZNbMP7Up+bfY3lmWSqUT0q1NgCUhPZ+7iu84C+LJiEq1nzlZE0yFBaau+0QDZSnDEmILAPiVALd1faKNDEMIZp+RxR6iZBw9c8HXVvcLs+zTA0VX231/cxhDSOIml/q8aV2hivssi+BtcfSJehosu+G8+NmzYZrvyupE562g4GH8gFpTBnOEja1Y1XdzudW7E6n40IlQUqhLQ8RIKXveehNQNeYQT1BDPCLGYNm1SRtmDz81TzbW43TpIV9A6b1HiXeRYO6zzW8SajVNztnk9azztIlhz1mvhjJW4GJkA4PPb+46wn3iAEiFzwyZN3qgwyb1YcWOV/hTOUXfCNnRGvyxQu9xWqvKVPnIzwm9LBT02efBzifcXle6nSgGukvw4RPiYMcRPfE8O5eZgDtsaisbxG1d07H6sEiNl8gZaWbQbjSuJeuwAptFPR1nCbeGaddvjmTBg7gGiyeAeiTpJLn/hg0So25ebh/mUYCJY3hCWX27MD1WuBKlnFAvm71ywc4385ZAeDsw5dzZ0BsvkZkQTx3hh15rFzAW/4HyeFAc/VuoGwU2vqwdPoDYQT7G7iujtgPxLA8YXpjs9ma7iDFXSPB40xTmyf1HtoWzdTDH2CscliiqU8npX4l0epyDr2ZmKq9JpS/exxj8SF/y/JlssBvLVKjgpFjPLAahaimkdckiDlJNEKx/iGHNq/XoS+bPfsQIokUnjIeOU40oPJntrThCSzUJvijRIYx0UwTovqjJTuC4XTh6DqhjVs43424RxFPxIjIlLXfELLcBeQi5+Oz8s9rEPIRxifoNlixRwy3CWhLMRyPUAhyWaoejgYRK8T3mtu3Ae1e4cFgp26eMpdu23mBuNlgiHwaZo/51r/cnlQ 4jCcFN95 qAO03kOvs1mi4NR8DY6ojt/DOQpStuq8WqYZyVECpbGq0fdUoCFcxI6ypjEJyP5kGzdrlZSqAdxI1zXgja8GK0JuN7bUsA6yasrtBYl8wda+kCwZUmyVKoWMjlx8P8Wd1RaB8XT1sbA8Ak1S/kfSCA2yMWdysgLBB7s8+rt8bVSR3dFcKsmpAcPYpX/RBrjylmLpPpSPPWsA5MUzuytUcgjAbyIrEcIWwziMBsfDGpohagK976pxV2euenAf3gqu6N1GKaPNxTQjYdG3xyRuALIo05Vc9ocO2TMITSOu7P4jblo1ZHNnoStetv4TAO767HW6bhvxw8/zHkKO9h3roTSQrUhakTJQ48KJYDO6GlvccVhEhQjYqoe5Zj17TaCCMRgjQFhwfa6DmgNMbdbZuVZD1n1LZrCllm8AFfPUyBUfoEYw41WzNj8//fXGuKpIBthHlcgPoRYKOdcJd1pvBVMpf8Ak2UXGfLi2YIKD9eHefZQ0tJ+88zwYJBQbKGc3QcxCeLVKI8VOz8ZyOF0MOIZxkYnTgYUazOiT9y5tOk03Cdsy+/yVDi3ObDdGQ1AzGYUEjsFlzM08/1q6/GwLhaQJdVA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/12/23 00:26, Vincent Whitchurch wrote: > On my (embedded) systems, the most common cause of hitting the watchdog > (pre)timeout is due to thrashing. Diagnosing these problems is hard > without knowing the memory state at the point of the watchdog hit. In > order to make this information available, add a module parameter to the > watchdog pretimeout panic governor to ask it to dump memory info and the > OOM task list (using a new helper in the OOM code) before triggering the > panic. > Personally I don't think this is the right way of approaching this problem. First, the userspace task controlling the watchdog should run as realtime task, forced to be in memory, and not be affected by thrashing. Second, the problem should be observable well before the watchdog fires. Last but not least, I don't think it is appropriate to intertwine watchdog code with oom handling code as suggested here. Guenter > Signed-off-by: Vincent Whitchurch > --- > Changes in v2: > - Add missing static to fix warning reported by kernel test robot. > - Export __show_mem to fix error reported by kernel test robot. > - Link to v1: https://lore.kernel.org/r/20230608-pretimeout-oom-v1-1-542cc91062d7@axis.com > --- > drivers/watchdog/pretimeout_panic.c | 15 +++++++++++ > include/linux/oom.h | 5 ++++ > include/linux/sched/task.h | 5 ++++ > lib/show_mem.c | 1 + > mm/oom_kill.c | 54 ++++++++++++++++++++++++++++++++++++- > 5 files changed, 79 insertions(+), 1 deletion(-) > > diff --git a/drivers/watchdog/pretimeout_panic.c b/drivers/watchdog/pretimeout_panic.c > index 2cc3c41d2be5b..52d686fa541c7 100644 > --- a/drivers/watchdog/pretimeout_panic.c > +++ b/drivers/watchdog/pretimeout_panic.c > @@ -5,10 +5,15 @@ > > #include > #include > +#include > +#include > #include > > #include "watchdog_pretimeout.h" > > +static unsigned long dump_min_rss_bytes; > +module_param(dump_min_rss_bytes, ulong, 0644); > + > /** > * pretimeout_panic - Panic on watchdog pretimeout event > * @wdd - watchdog_device > @@ -17,6 +22,16 @@ > */ > static void pretimeout_panic(struct watchdog_device *wdd) > { > + /* > + * Since the root cause is not certain to be low memory, only print > + * tasks with RSS above a configurable limit, to avoid losing > + * potentially more important messages from the log. > + */ > + if (dump_min_rss_bytes) { > + show_mem(SHOW_MEM_FILTER_NODES, NULL); > + oom_dump_tasks(DIV_ROUND_UP(dump_min_rss_bytes, PAGE_SIZE)); > + } > + > panic("watchdog pretimeout event\n"); > } > > diff --git a/include/linux/oom.h b/include/linux/oom.h > index 7d0c9c48a0c54..1451fe2c38d78 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -52,6 +52,9 @@ struct oom_control { > > /* Used to print the constraint info. */ > enum oom_constraint constraint; > + > + bool dump_trylock; > + unsigned long dump_min_rss_pages; > }; > > extern struct mutex oom_lock; > @@ -102,6 +105,8 @@ long oom_badness(struct task_struct *p, > > extern bool out_of_memory(struct oom_control *oc); > > +extern void oom_dump_tasks(unsigned long min_rss_pages); > + > extern void exit_oom_victim(void); > > extern int register_oom_notifier(struct notifier_block *nb); > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > index e0f5ac90a228b..e8a68b2a3e829 100644 > --- a/include/linux/sched/task.h > +++ b/include/linux/sched/task.h > @@ -183,6 +183,11 @@ static inline void task_lock(struct task_struct *p) > spin_lock(&p->alloc_lock); > } > > +static inline int task_trylock(struct task_struct *p) > +{ > + return spin_trylock(&p->alloc_lock); > +} > + > static inline void task_unlock(struct task_struct *p) > { > spin_unlock(&p->alloc_lock); > diff --git a/lib/show_mem.c b/lib/show_mem.c > index 1485c87be9354..cf90d1c5182b7 100644 > --- a/lib/show_mem.c > +++ b/lib/show_mem.c > @@ -35,3 +35,4 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > #endif > } > +EXPORT_SYMBOL_GPL(__show_mem); > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 044e1eed720ee..0fad1c6d3c90c 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -149,6 +149,30 @@ struct task_struct *find_lock_task_mm(struct task_struct *p) > return t; > } > > +/* > + * Identical to the above, except that we avoid tasks which we can't lock, to > + * avoid deadlocks when called from an interrupt handler. > + */ > +static struct task_struct *find_trylock_task_mm(struct task_struct *p) > +{ > + struct task_struct *t; > + > + rcu_read_lock(); > + > + for_each_thread(p, t) { > + if (!task_trylock(t)) > + continue; > + if (likely(t->mm)) > + goto found; > + task_unlock(t); > + } > + t = NULL; > +found: > + rcu_read_unlock(); > + > + return t; > +} > + > /* > * order == -1 means the oom kill is required by sysrq, otherwise only > * for display purposes. > @@ -390,15 +414,26 @@ static int dump_task(struct task_struct *p, void *arg) > if (!is_memcg_oom(oc) && !oom_cpuset_eligible(p, oc)) > return 0; > > - task = find_lock_task_mm(p); > + task = oc->dump_trylock ? find_trylock_task_mm(p) : > + find_lock_task_mm(p); > if (!task) { > /* > * All of p's threads have already detached their mm's. There's > * no need to report them; they can't be oom killed anyway. > + * > + * Or we got here from an interrupt and the task lock is > + * locked, in which case we're forced to ignore this task to > + * avoid deadlocks. > */ > return 0; > } > > + if (oc->dump_min_rss_pages && > + get_mm_rss(task->mm) < oc->dump_min_rss_pages) { > + task_unlock(task); > + return 0; > + } > + > pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu %5hd %s\n", > task->pid, from_kuid(&init_user_ns, task_uid(task)), > task->tgid, task->mm->total_vm, get_mm_rss(task->mm), > @@ -437,6 +472,23 @@ static void dump_tasks(struct oom_control *oc) > } > } > > +void oom_dump_tasks(unsigned long min_rss_pages) > +{ > + const gfp_t gfp_mask = GFP_KERNEL; > + struct oom_control oc = { > + .zonelist = node_zonelist(first_memory_node, gfp_mask), > + .nodemask = NULL, > + .memcg = NULL, > + .gfp_mask = gfp_mask, > + .order = -1, > + .dump_min_rss_pages = min_rss_pages, > + .dump_trylock = in_interrupt(), > + }; > + > + dump_tasks(&oc); > +} > +EXPORT_SYMBOL_GPL(oom_dump_tasks); > + > static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim) > { > /* one line summary of the oom killer context. */ > > --- > base-commit: 9561de3a55bed6bdd44a12820ba81ec416e705a7 > change-id: 20230608-pretimeout-oom-99148438a1df > > Best regards,