linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Lougher <phillip@squashfs.org.uk>
To: Hui Wang <hui.wang@canonical.com>, Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, surenb@google.com,
	colin.i.king@gmail.com, shy828301@gmail.com, hannes@cmpxchg.org,
	vbabka@suse.cz, hch@infradead.org, mgorman@suse.de,
	dan.carpenter@oracle.com
Subject: Re: [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS
Date: Wed, 26 Apr 2023 17:44:55 +0100	[thread overview]
Message-ID: <9f827ae2-eaec-8660-35fa-71e218d5a2c5@squashfs.org.uk> (raw)
In-Reply-To: <be75a80b-fe95-e5cd-2049-522cbd95317a@canonical.com>


On 26/04/2023 12:07, Hui Wang wrote:
>
> On 4/26/23 16:33, Michal Hocko wrote:
>> [CC squashfs maintainer]
>>
>> On Wed 26-04-23 13:10:30, Hui Wang wrote:
>>> If we run the stress-ng in the filesystem of squashfs, the system
>>> will be in a state something like hang, the stress-ng couldn't
>>> finish running and the console couldn't react to users' input.
>>>
>>> This issue happens on all arm/arm64 platforms we are working on,
>>> through debugging, we found this issue is introduced by oom handling
>>> in the kernel.
>>>
>>> The fs->readahead() is called between memalloc_nofs_save() and
>>> memalloc_nofs_restore(), and the squashfs_readahead() calls
>>> alloc_page(), in this case, if there is no memory left, the
>>> out_of_memory() will be called without __GFP_FS, then the oom killer
>>> will not be triggered and this process will loop endlessly and wait
>>> for others to trigger oom killer to release some memory. But for a
>>> system with the whole root filesystem constructed by squashfs,
>>> nearly all userspace processes will call out_of_memory() without
>>> __GFP_FS, so we will see that the system enters a state something like
>>> hang when running stress-ng.
>>>
>>> To fix it, we could trigger a kthread to call page_alloc() with
>>> __GFP_FS before returning from out_of_memory() due to without
>>> __GFP_FS.
>> I do not think this is an appropriate way to deal with this issue.
>> Does it even make sense to trigger OOM killer for something like
>> readahead? Would it be more mindful to fail the allocation instead?
>> That being said should allocations from squashfs_readahead use
>> __GFP_RETRY_MAYFAIL instead?
>
> Thanks for your comment, and this issue could hardly be reproduced on 
> ext4 filesystem, that is because the ext4->readahead() doesn't call 
> alloc_page(). If changing the ext4->readahead() as below, it will be 
> easy to reproduce this issue with the ext4 filesystem (repeatedly run: 
> $stress-ng --bigheap ${num_of_cpu_threads} --sequential 0 --timeout 
> 30s --skip-silent --verbose)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index ffbbd9626bd8..8b9db0b9d0b8 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3114,12 +3114,18 @@ static int ext4_read_folio(struct file *file, 
> struct folio *folio)
>  static void ext4_readahead(struct readahead_control *rac)
>  {
>         struct inode *inode = rac->mapping->host;
> +       struct page *tmp_page;
>
>         /* If the file has inline data, no need to do readahead. */
>         if (ext4_has_inline_data(inode))
>                 return;
>
> +       tmp_page = alloc_page(GFP_KERNEL);
> +
>         ext4_mpage_readpages(inode, rac, NULL);
> +
> +       if (tmp_page)
> +               __free_page(tmp_page);
>  }
>
>
> BTW, I applied my patch to the linux-next and ran the oom stress-ng 
> testcases overnight, there is no hang, oops or crash, looks like there 
> is no big problem to use a kthread to trigger the oom killer in this 
> case.
>
> And Hi squashfs maintainer, I checked the code of filesystem, looks 
> like most filesystems will not call alloc_page() in the readahead(), 
> could you please help take a look at this issue, thanks.


This will be because most filesystems don't need to do so. Squashfs is a 
compressed filesystem with large blocks covering much more than one 
page, and it decompresses these blocks in squashfs_readahead().   If 
__readahead_batch() does not return the full set of pages covering the 
Squashfs block, it allocates a temporary page for the decompressors to 
decompress into to "fill in the hole".

What can be done here as far as Squashfs is concerned ....  I could move 
the page allocation out of the readahead path (e.g. do it at mount time).

Adding __GFP_RETRY_MAYFAIL so the alloc() can fail will mean Squashfs 
returning I/O failures due to no memory.  That will cause a lot of 
applications to crash in a low memory situation.  So a crash rather than 
a hang.

Phillip






  reply	other threads:[~2023-04-26 16:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-26  5:10 [PATCH 0/1] mm/oom_kill: system enters a state something like hang when running stress-ng Hui Wang
2023-04-26  5:10 ` [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS Hui Wang
2023-04-26  8:33   ` Michal Hocko
2023-04-26 11:07     ` Hui Wang
2023-04-26 16:44       ` Phillip Lougher [this message]
2023-04-26 17:38         ` Phillip Lougher
2023-04-26 18:26           ` Yang Shi
2023-04-26 19:06             ` Phillip Lougher
2023-04-26 19:34               ` Phillip Lougher
2023-04-27  0:42                 ` Hui Wang
2023-04-27  1:37                   ` Phillip Lougher
2023-04-27  5:22                     ` Hui Wang
2023-04-27  1:18       ` Gao Xiang
2023-04-27  3:47         ` Hui Wang
2023-04-27  4:17           ` Gao Xiang
2023-04-27  7:03           ` Colin King (gmail)
2023-04-27  7:49             ` Hui Wang
2023-04-28 19:53           ` Michal Hocko
2023-05-03 11:49             ` Hui Wang
2023-05-03 12:20               ` Michal Hocko
2023-05-03 18:41                 ` Phillip Lougher
2023-05-03 19:10               ` Phillip Lougher
2023-05-03 19:38                 ` Hui Wang
2023-05-07 21:07                 ` Phillip Lougher
2023-05-08 10:05                   ` Hui Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f827ae2-eaec-8660-35fa-71e218d5a2c5@squashfs.org.uk \
    --to=phillip@squashfs.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=colin.i.king@gmail.com \
    --cc=dan.carpenter@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hui.wang@canonical.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox