From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BF39C77B78 for ; Wed, 26 Apr 2023 18:26:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 781EE6B011A; Wed, 26 Apr 2023 14:26:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7322A6B011B; Wed, 26 Apr 2023 14:26:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D16A6B011C; Wed, 26 Apr 2023 14:26:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4A8216B011A for ; Wed, 26 Apr 2023 14:26:22 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0EB0280267 for ; Wed, 26 Apr 2023 18:26:22 +0000 (UTC) X-FDA: 80724372204.29.EF05E04 Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by imf11.hostedemail.com (Postfix) with ESMTP id 35B394001A for ; Wed, 26 Apr 2023 18:26:19 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=lsjVHh2W; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682533580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bc/te5NLX4b7O4M7Koev2+507CeqQ9zIAOZrNHUe+C4=; b=U5I8VVcOF56FaAVMa5yQnhnUMVybAmkY+FvbYyu2E4upW1Q4jAxJ20wwB+7VlyPixoGuB3 +gBDMu63nGV4GewtwGqs34xVANvk17OMwtN6vPXOBdt/THw7C9LbNCulCSxHXk7TSosBdk JfNrA9kP0jP/fWPPcCQ84IbjbZe1If0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=lsjVHh2W; spf=pass (imf11.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682533580; a=rsa-sha256; cv=none; b=pKHycnwuaYh55weaHQRw0RSZZGKSw0gqHSajXgdHYZX13dVzVbpW/LzJqCGGTO4dUfLQO7 yvJsrm5xGU+CC5+DBaZGE1al1SQnibzRje481Qo+jwc4AWeOtb8Iy+8cB01DA2/qsFGXXj fYgu1WKipoh/H9b6PzNGuDQ7v4Ioqgs= Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-63b5c4c76aaso5381208b3a.2 for ; Wed, 26 Apr 2023 11:26:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682533579; x=1685125579; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Bc/te5NLX4b7O4M7Koev2+507CeqQ9zIAOZrNHUe+C4=; b=lsjVHh2WpYvMwL2NPndefD7qAZmL2BA33CNRq2xr6hHXSeejtLfbwvjc51Xz565gsq o6bEJ0XvOsWzQYpmr/GTio7BbLMX6/iWTM188MsyG9VMSbr6tKytD4xLZ6zM32p7/H5g mySh/8XsUZY5HTXtgqrc9WMGqBG8bjWd5AJgTF9Mui/iIElR5vmrgFt1lr0XZN20Hq7w LQhTvorUfAefUAYxG9FexKZJ61CROlWqpv1bY/CU/O2qpJAgYzI286x9cAjwquOJp9Za DcOQXP8uitnrgwuTvu5qtDUGYMENFSe1JIrhBF5qKS2m8yXotE/2KA3BMVVe2JvrlsPs owWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682533579; x=1685125579; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Bc/te5NLX4b7O4M7Koev2+507CeqQ9zIAOZrNHUe+C4=; b=dj/a+vAh2d1lTt+zmZR5fAoZyuq+trO9qbmnKil/Zevl2Bv8psEtIalNIqaQlLXHFD bzAlC0WaYGoeQ1pdUqwWE+ce7Jq77rdjkBke/TyfZhz0o4A/O0bk5cldPLog6gr66vFz zj497ixNS33tAu2gwWB1o3tCHVt0mTht/ktSqyymKeThIa1zPsAKxIHp0tc5YLYLqEZp 4Xi7YdcaHyX3y7fliSJwz5CHpFjhFqlBD+FL0aKkbb4agA61kp8pLYpfT5neUX5W3a0C bezCPpLJZ4neVA7wbvmtlEoXDJFJREYI2Aiyu5a8Meebhx8TA2umgs3/bmrMYLYeSzSA omRQ== X-Gm-Message-State: AAQBX9dZlVG8L6b0h/8P1ICgSPn757r43gmL2W6KT6q64LA9xv09a2z7 nsGzJqC44K3jGUJcyl3wr1Tq0xmAojFUfv1PfVY= X-Google-Smtp-Source: AKy350b1jLwfJBFzqwLdewk1qrY7N6xxT9uwujb1aBA2y4yVYQTfTgIspbTb0W/EvkdqC/HoM9JkgkUbQg6dfQKu8ts= X-Received: by 2002:a17:90a:9707:b0:247:8b7d:743f with SMTP id x7-20020a17090a970700b002478b7d743fmr20931588pjo.21.1682533578814; Wed, 26 Apr 2023 11:26:18 -0700 (PDT) MIME-Version: 1.0 References: <20230426051030.112007-1-hui.wang@canonical.com> <20230426051030.112007-2-hui.wang@canonical.com> <9f827ae2-eaec-8660-35fa-71e218d5a2c5@squashfs.org.uk> <72cf1f14-c02b-033e-6fa9-8558e628ffb6@squashfs.org.uk> In-Reply-To: <72cf1f14-c02b-033e-6fa9-8558e628ffb6@squashfs.org.uk> From: Yang Shi Date: Wed, 26 Apr 2023 11:26:07 -0700 Message-ID: Subject: Re: [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS To: Phillip Lougher Cc: Hui Wang , Michal Hocko , linux-mm@kvack.org, akpm@linux-foundation.org, surenb@google.com, colin.i.king@gmail.com, hannes@cmpxchg.org, vbabka@suse.cz, hch@infradead.org, mgorman@suse.de, dan.carpenter@oracle.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: rdz3bsdgub8no57rhjqu8axesc16rfdw X-Rspam-User: X-Rspamd-Queue-Id: 35B394001A X-Rspamd-Server: rspam06 X-HE-Tag: 1682533579-398160 X-HE-Meta: U2FsdGVkX18hflwJIhV0SBYoDdu4x61VZ761oUtgKGRGX0I6E1y+jGj78LzFCQpdsLNithxJ3ziqnv9SUSZ8CKvkBBB++GigrXANsT1dVoyZdkpYShQKozC8dfsHwcx/GBRORoMcclfOz5B4wbc8Q0BwpMNItLsEPv9q+XGtJZdMAqlYg+21WxWVd5XDGro3SvthiYb950vG8KtfeSKGhP/Y/LbKBOhfCKCQRPQcj0GKBiJj9aWM4u7/gMv/7jGYfDOZ1lUwtPIrusab3ce1KywgTOY+UWhmrokhZqYak04pmaPwzD6tcduG9EJ2KawOp7IKPJykF3Ij3aykjY9od/UsKqskJC9F9LiI+xQcvJw8meOnIokp0+KQXtqxYoGJh1XhGfZfmCNZ73C1gsAUaGqv0ts21nwNy7NIqqLkIt6pAVPr+hA1eEbIr1yDS/N/QTbmE1bmjtacp1gdqA9zQRK+3tzJKECJmAufSmZrna+pXPoMxGIPiYeyBYgC4EqPco5D60FiLK7iF8dia3ma+u7amyaoPXOlrzxxwEfv9dSL2WUMHeTYRaxO1+Na9gfScXFtG1+8zmiMwNe6xR9MGNDHBhCrAgRBAqnPJm42m9BTWfi14IPeqC8tel0oMr4I5CZ6LCgksvQbVsy5otTKs26Qos4N5hW70BlqjFz3Dbb7kLMdvid9zTSdUFCplv7T0yeJ3I8u4Jjz8bbMN1XhvtdfD4xP57T+e3vXi0vPDNsFOXUxGT9wMTAs2mtv8a/pYUvSdsq6KiRpwc1FTozzZGwUcb/zF0tWZScCOTkUPMDhNKR3hmiGNjvFqyUVQm8fjoxodJsN1MI1iNd6A1hnxj4vrwtlVgYUFA9Qp2S2FFHRL8cXLhLBroEG/DuHcahSLaMZmcO2eR09KXTJx0yLEqdWmaZmuyjtarkDf1NAy9pQwFoqeDBpYm6L8n450UnaKYBjCJ40JnmfVU2/QJp zuExR9al 0z5K9ePfXAK4aBWlFKl47d+9M8ZZnWQ9mO1WuoC05KLyY6zhyq3NlXldtmvO7ClxG0NXzx1NnBOEvIDpy1Sa9Eke19FjvdhstcmrDjLyiWLDRGySsOkuqwiM59qfY36MAK+xACLcyRoH8Awm4WNDy19U4MJdC3ccEKy/7tjveKTpGXxmDSaIugiC/rLL80MAMbpJeHQTYb1noEcQt86IGhMo/EZxFA3Bd32v1MaF8Xzgi69CcyM446pQ5BGwWKDvOQMH68qYq4Fy/Hv2DQrWeLRAoXCLJnrevCuOx6qGzJXcuiSk937Nk9KWZPDn4UMk1ldljuYlt4k4E8mbU/4SyzYXUGiHwASW/9jV+i+C3HqFS7uSeKlTpRbTW+UaVVuaCHX1hsOHcT6bpHncl0B8ByLd6J+WvMFRkoTPc6bQddziWRZZ1uW55moBI5PCD3QdjxgKIJSp4KaAQCR26JSg2QWlRNE/3bBrVYVzLGdqR9Zl4jwh0aQhfpnXtWjycdxjDaRAJZ5f2W94skbE5IGPmt8AWSw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 26, 2023 at 10:38=E2=80=AFAM Phillip Lougher wrote: > > > On 26/04/2023 17:44, Phillip Lougher wrote: > > > > On 26/04/2023 12:07, Hui Wang wrote: > >> > >> On 4/26/23 16:33, Michal Hocko wrote: > >>> [CC squashfs maintainer] > >>> > >>> On Wed 26-04-23 13:10:30, Hui Wang wrote: > >>>> If we run the stress-ng in the filesystem of squashfs, the system > >>>> will be in a state something like hang, the stress-ng couldn't > >>>> finish running and the console couldn't react to users' input. > >>>> > >>>> This issue happens on all arm/arm64 platforms we are working on, > >>>> through debugging, we found this issue is introduced by oom handling > >>>> in the kernel. > >>>> > >>>> The fs->readahead() is called between memalloc_nofs_save() and > >>>> memalloc_nofs_restore(), and the squashfs_readahead() calls > >>>> alloc_page(), in this case, if there is no memory left, the > >>>> out_of_memory() will be called without __GFP_FS, then the oom killer > >>>> will not be triggered and this process will loop endlessly and wait > >>>> for others to trigger oom killer to release some memory. But for a > >>>> system with the whole root filesystem constructed by squashfs, > >>>> nearly all userspace processes will call out_of_memory() without > >>>> __GFP_FS, so we will see that the system enters a state something li= ke > >>>> hang when running stress-ng. > >>>> > >>>> To fix it, we could trigger a kthread to call page_alloc() with > >>>> __GFP_FS before returning from out_of_memory() due to without > >>>> __GFP_FS. > >>> I do not think this is an appropriate way to deal with this issue. > >>> Does it even make sense to trigger OOM killer for something like > >>> readahead? Would it be more mindful to fail the allocation instead? > >>> That being said should allocations from squashfs_readahead use > >>> __GFP_RETRY_MAYFAIL instead? > >> > >> Thanks for your comment, and this issue could hardly be reproduced on > >> ext4 filesystem, that is because the ext4->readahead() doesn't call > >> alloc_page(). If changing the ext4->readahead() as below, it will be > >> easy to reproduce this issue with the ext4 filesystem (repeatedly > >> run: $stress-ng --bigheap ${num_of_cpu_threads} --sequential 0 > >> --timeout 30s --skip-silent --verbose) > >> > >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > >> index ffbbd9626bd8..8b9db0b9d0b8 100644 > >> --- a/fs/ext4/inode.c > >> +++ b/fs/ext4/inode.c > >> @@ -3114,12 +3114,18 @@ static int ext4_read_folio(struct file *file, > >> struct folio *folio) > >> static void ext4_readahead(struct readahead_control *rac) > >> { > >> struct inode *inode =3D rac->mapping->host; > >> + struct page *tmp_page; > >> > >> /* If the file has inline data, no need to do readahead. */ > >> if (ext4_has_inline_data(inode)) > >> return; > >> > >> + tmp_page =3D alloc_page(GFP_KERNEL); > >> + > >> ext4_mpage_readpages(inode, rac, NULL); > >> + > >> + if (tmp_page) > >> + __free_page(tmp_page); > >> } > >> > >> > >> BTW, I applied my patch to the linux-next and ran the oom stress-ng > >> testcases overnight, there is no hang, oops or crash, looks like > >> there is no big problem to use a kthread to trigger the oom killer in > >> this case. > >> > >> And Hi squashfs maintainer, I checked the code of filesystem, looks > >> like most filesystems will not call alloc_page() in the readahead(), > >> could you please help take a look at this issue, thanks. > > > > > > This will be because most filesystems don't need to do so. Squashfs is > > a compressed filesystem with large blocks covering much more than one > > page, and it decompresses these blocks in squashfs_readahead(). If > > __readahead_batch() does not return the full set of pages covering the > > Squashfs block, it allocates a temporary page for the decompressors to > > decompress into to "fill in the hole". > > > > What can be done here as far as Squashfs is concerned .... I could > > move the page allocation out of the readahead path (e.g. do it at > > mount time). > > You could try this patch which does that. Compile tested only. The kmalloc_array() may call alloc_page() to trigger this problem too IIUC. It should be pre-allocated as well? > > fs/squashfs/page_actor.c | 10 +--------- > fs/squashfs/page_actor.h | 1 - > fs/squashfs/squashfs_fs_sb.h | 1 + > fs/squashfs/super.c | 10 ++++++++++ > 4 files changed, 12 insertions(+), 10 deletions(-) > > diff --git a/fs/squashfs/page_actor.c b/fs/squashfs/page_actor.c > index 81af6c4ca115..6cce239eca66 100644 > --- a/fs/squashfs/page_actor.c > +++ b/fs/squashfs/page_actor.c > @@ -110,15 +110,7 @@ struct squashfs_page_actor *squashfs_page_actor_init= _special(struct squashfs_sb_ > if (actor =3D=3D NULL) > return NULL; > > - if (msblk->decompressor->alloc_buffer) { > - actor->tmp_buffer =3D kmalloc(PAGE_SIZE, GFP_KERNEL); > - > - if (actor->tmp_buffer =3D=3D NULL) { > - kfree(actor); > - return NULL; > - } > - } else > - actor->tmp_buffer =3D NULL; > + actor->tmp_buffer =3D msblk->actor_page; > > actor->length =3D length ? : pages * PAGE_SIZE; > actor->page =3D page; > diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h > index 97d4983559b1..df5e999afa42 100644 > --- a/fs/squashfs/page_actor.h > +++ b/fs/squashfs/page_actor.h > @@ -34,7 +34,6 @@ static inline struct page *squashfs_page_actor_free(str= uct squashfs_page_actor * > { > struct page *last_page =3D actor->last_page; > > - kfree(actor->tmp_buffer); > kfree(actor); > return last_page; > } > diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h > index 72f6f4b37863..8feddc9e6cce 100644 > --- a/fs/squashfs/squashfs_fs_sb.h > +++ b/fs/squashfs/squashfs_fs_sb.h > @@ -47,6 +47,7 @@ struct squashfs_sb_info { > struct squashfs_cache *block_cache; > struct squashfs_cache *fragment_cache; > struct squashfs_cache *read_page; > + void *actor_page; > int next_meta_index; > __le64 *id_table; > __le64 *fragment_index; > diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c > index e090fae48e68..674dc187d961 100644 > --- a/fs/squashfs/super.c > +++ b/fs/squashfs/super.c > @@ -329,6 +329,15 @@ static int squashfs_fill_super(struct super_block *s= b, struct fs_context *fc) > goto failed_mount; > } > > + > + /* Allocate page for squashfs_readahead()/squashfs_read_folio() *= / > + if (msblk->decompressor->alloc_buffer) { > + msblk->actor_page =3D kmalloc(PAGE_SIZE, GFP_KERNEL); > + > + if(msblk->actor_page =3D=3D NULL) > + goto failed_mount; > + } > + > msblk->stream =3D squashfs_decompressor_setup(sb, flags); > if (IS_ERR(msblk->stream)) { > err =3D PTR_ERR(msblk->stream); > @@ -454,6 +463,7 @@ static int squashfs_fill_super(struct super_block *sb= , struct fs_context *fc) > squashfs_cache_delete(msblk->block_cache); > squashfs_cache_delete(msblk->fragment_cache); > squashfs_cache_delete(msblk->read_page); > + kfree(msblk->actor_page); > msblk->thread_ops->destroy(msblk); > kfree(msblk->inode_lookup_table); > kfree(msblk->fragment_index); > -- > 2.35.1 > > > > > Adding __GFP_RETRY_MAYFAIL so the alloc() can fail will mean Squashfs > > returning I/O failures due to no memory. That will cause a lot of > > applications to crash in a low memory situation. So a crash rather > > than a hang. > > > > Phillip > > > > > > > >