From: Hui Wang <hui.wang@canonical.com>
To: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, surenb@google.com,
colin.i.king@gmail.com, shy828301@gmail.com, hannes@cmpxchg.org,
vbabka@suse.cz, hch@infradead.org, mgorman@suse.de,
dan.carpenter@oracle.com,
Phillip Lougher <phillip@squashfs.org.uk>
Subject: Re: [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS
Date: Wed, 26 Apr 2023 19:07:23 +0800 [thread overview]
Message-ID: <be75a80b-fe95-e5cd-2049-522cbd95317a@canonical.com> (raw)
In-Reply-To: <ZEjhwasBsame8Fbi@dhcp22.suse.cz>
On 4/26/23 16:33, Michal Hocko wrote:
> [CC squashfs maintainer]
>
> On Wed 26-04-23 13:10:30, Hui Wang wrote:
>> If we run the stress-ng in the filesystem of squashfs, the system
>> will be in a state something like hang, the stress-ng couldn't
>> finish running and the console couldn't react to users' input.
>>
>> This issue happens on all arm/arm64 platforms we are working on,
>> through debugging, we found this issue is introduced by oom handling
>> in the kernel.
>>
>> The fs->readahead() is called between memalloc_nofs_save() and
>> memalloc_nofs_restore(), and the squashfs_readahead() calls
>> alloc_page(), in this case, if there is no memory left, the
>> out_of_memory() will be called without __GFP_FS, then the oom killer
>> will not be triggered and this process will loop endlessly and wait
>> for others to trigger oom killer to release some memory. But for a
>> system with the whole root filesystem constructed by squashfs,
>> nearly all userspace processes will call out_of_memory() without
>> __GFP_FS, so we will see that the system enters a state something like
>> hang when running stress-ng.
>>
>> To fix it, we could trigger a kthread to call page_alloc() with
>> __GFP_FS before returning from out_of_memory() due to without
>> __GFP_FS.
> I do not think this is an appropriate way to deal with this issue.
> Does it even make sense to trigger OOM killer for something like
> readahead? Would it be more mindful to fail the allocation instead?
> That being said should allocations from squashfs_readahead use
> __GFP_RETRY_MAYFAIL instead?
Thanks for your comment, and this issue could hardly be reproduced on
ext4 filesystem, that is because the ext4->readahead() doesn't call
alloc_page(). If changing the ext4->readahead() as below, it will be
easy to reproduce this issue with the ext4 filesystem (repeatedly run:
$stress-ng --bigheap ${num_of_cpu_threads} --sequential 0 --timeout 30s
--skip-silent --verbose)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ffbbd9626bd8..8b9db0b9d0b8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3114,12 +3114,18 @@ static int ext4_read_folio(struct file *file,
struct folio *folio)
static void ext4_readahead(struct readahead_control *rac)
{
struct inode *inode = rac->mapping->host;
+ struct page *tmp_page;
/* If the file has inline data, no need to do readahead. */
if (ext4_has_inline_data(inode))
return;
+ tmp_page = alloc_page(GFP_KERNEL);
+
ext4_mpage_readpages(inode, rac, NULL);
+
+ if (tmp_page)
+ __free_page(tmp_page);
}
BTW, I applied my patch to the linux-next and ran the oom stress-ng
testcases overnight, there is no hang, oops or crash, looks like there
is no big problem to use a kthread to trigger the oom killer in this case.
And Hi squashfs maintainer, I checked the code of filesystem, looks like
most filesystems will not call alloc_page() in the readahead(), could
you please help take a look at this issue, thanks.
>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Suren Baghdasaryan <surenb@google.com>
>> Cc: Colin Ian King <colin.i.king@gmail.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Dan Carpenter <dan.carpenter@oracle.com>
>> Signed-off-by: Hui Wang <hui.wang@canonical.com>
>> ---
>> mm/oom_kill.c | 22 +++++++++++++++++++++-
>> 1 file changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 044e1eed720e..c9c38d6b8580 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -1094,6 +1094,24 @@ int unregister_oom_notifier(struct notifier_block *nb)
>> }
>> EXPORT_SYMBOL_GPL(unregister_oom_notifier);
>>
>> +/*
>> + * If an oom occurs without the __GFP_FS flag in the gfp_mask, the oom killer
>> + * will not be triggered. In this case, we could call schedule_work to run
>> + * trigger_oom_killer_work() to trigger an oom forcibly with __GFP_FS flag,
>> + * this could make the oom killer run with a fair chance.
>> + */
>> +static void trigger_oom_killer_work(struct work_struct *work)
>> +{
>> + struct page *tmp_page;
>> +
>> + /* This could trigger an oom forcibly with a chance */
>> + tmp_page = alloc_page(GFP_KERNEL);
>> + if (tmp_page)
>> + __free_page(tmp_page);
>> +}
>> +
>> +static DECLARE_WORK(oom_trigger_work, trigger_oom_killer_work);
>> +
>> /**
>> * out_of_memory - kill the "best" process when we run out of memory
>> * @oc: pointer to struct oom_control
>> @@ -1135,8 +1153,10 @@ bool out_of_memory(struct oom_control *oc)
>> * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to
>> * invoke the OOM killer even if it is a GFP_NOFS allocation.
>> */
>> - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc))
>> + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) {
>> + schedule_work(&oom_trigger_work);
>> return true;
>> + }
>>
>> /*
>> * Check if there were limitations on the allocation (only relevant for
>> --
>> 2.34.1
next prev parent reply other threads:[~2023-04-26 11:07 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-26 5:10 [PATCH 0/1] mm/oom_kill: system enters a state something like hang when running stress-ng Hui Wang
2023-04-26 5:10 ` [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS Hui Wang
2023-04-26 8:33 ` Michal Hocko
2023-04-26 11:07 ` Hui Wang [this message]
2023-04-26 16:44 ` Phillip Lougher
2023-04-26 17:38 ` Phillip Lougher
2023-04-26 18:26 ` Yang Shi
2023-04-26 19:06 ` Phillip Lougher
2023-04-26 19:34 ` Phillip Lougher
2023-04-27 0:42 ` Hui Wang
2023-04-27 1:37 ` Phillip Lougher
2023-04-27 5:22 ` Hui Wang
2023-04-27 1:18 ` Gao Xiang
2023-04-27 3:47 ` Hui Wang
2023-04-27 4:17 ` Gao Xiang
2023-04-27 7:03 ` Colin King (gmail)
2023-04-27 7:49 ` Hui Wang
2023-04-28 19:53 ` Michal Hocko
2023-05-03 11:49 ` Hui Wang
2023-05-03 12:20 ` Michal Hocko
2023-05-03 18:41 ` Phillip Lougher
2023-05-03 19:10 ` Phillip Lougher
2023-05-03 19:38 ` Hui Wang
2023-05-07 21:07 ` Phillip Lougher
2023-05-08 10:05 ` Hui Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=be75a80b-fe95-e5cd-2049-522cbd95317a@canonical.com \
--to=hui.wang@canonical.com \
--cc=akpm@linux-foundation.org \
--cc=colin.i.king@gmail.com \
--cc=dan.carpenter@oracle.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=phillip@squashfs.org.uk \
--cc=shy828301@gmail.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox