From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61709C7618E for ; Wed, 26 Apr 2023 16:45:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A4AB6B00FB; Wed, 26 Apr 2023 12:45:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9543E6B00FC; Wed, 26 Apr 2023 12:45:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81B236B00FD; Wed, 26 Apr 2023 12:45:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 72B306B00FB for ; Wed, 26 Apr 2023 12:45:01 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 226AA1C6919 for ; Wed, 26 Apr 2023 16:45:01 +0000 (UTC) X-FDA: 80724116802.11.01D7FCC Received: from p3plwbeout22-03.prod.phx3.secureserver.net (p3plsmtp22-03-2.prod.phx3.secureserver.net [68.178.252.58]) by imf05.hostedemail.com (Postfix) with ESMTP id E0CBC10000C for ; Wed, 26 Apr 2023 16:44:58 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=none (imf05.hostedemail.com: domain of phillip@squashfs.org.uk has no SPF policy when checking 68.178.252.58) smtp.mailfrom=phillip@squashfs.org.uk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682527499; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JMQiXQWk0aXGzpPpfpKiJwq8rbn0FqH1Wbdq8whBhf0=; b=X84ERJe1+O6CrNiAiQzqQBNY8NLmSTOwyW29awxlD51a7vdIjdzwkUuDmrpae0G2fKkHhg Ex6L6DfJlHB+gD/0bdWZhJQN4gkbJq83gDNLNfRcYbQOUmJ4nUmMv3+9PHwFE12wOpGG0L MonhRTMvLc/tHGF7XKrdcbNQY+ru2Xg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682527499; a=rsa-sha256; cv=none; b=BMVqKS8dPuM0D0hgLM1nGYy6BA6mKbOLGShR7i+yt5v/zx26xKngf9qh7LQFVRL53W4uaJ NUbYd3wql7ZJOr3nJgNG/Kg0CAD02jXU719oeYmWaY7H2Tn74Iei9dyv7X4DnbolBUnbsR FN7ug3sVTCoVw1AFAXyzR1RXlqci/W4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=none (imf05.hostedemail.com: domain of phillip@squashfs.org.uk has no SPF policy when checking 68.178.252.58) smtp.mailfrom=phillip@squashfs.org.uk; dmarc=none Received: from mailex.mailcore.me ([94.136.40.142]) by :WBEOUT: with ESMTP id riGGpB6Q3gOQ4riGHpsBYv; Wed, 26 Apr 2023 09:44:57 -0700 X-SECURESERVER-ACCT: phillip@squashfs.org.uk X-SID: riGGpB6Q3gOQ4 Received: from 82-69-79-175.dsl.in-addr.zen.co.uk ([82.69.79.175] helo=[192.168.178.87]) by smtp06.mailcore.me with esmtpa (Exim 4.94.2) (envelope-from ) id 1priGH-00077x-DI; Wed, 26 Apr 2023 17:44:58 +0100 Message-ID: <9f827ae2-eaec-8660-35fa-71e218d5a2c5@squashfs.org.uk> Date: Wed, 26 Apr 2023 17:44:55 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS Content-Language: en-GB To: Hui Wang , Michal Hocko Cc: linux-mm@kvack.org, akpm@linux-foundation.org, surenb@google.com, colin.i.king@gmail.com, shy828301@gmail.com, hannes@cmpxchg.org, vbabka@suse.cz, hch@infradead.org, mgorman@suse.de, dan.carpenter@oracle.com References: <20230426051030.112007-1-hui.wang@canonical.com> <20230426051030.112007-2-hui.wang@canonical.com> From: Phillip Lougher In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailcore-Auth: 439999529 X-Mailcore-Domain: 1394945 X-123-reg-Authenticated: phillip@squashfs.org.uk X-Originating-IP: 82.69.79.175 X-CMAE-Envelope: MS4xfPkQIu4xN3IgSSk+dhMgdDh/WqCRqlOiguaRpiEPR8+LGPd7kycLInUDyotOJIcB35c67kv2DShAa4aRCX00Zsh3eC+P4jTFjhNPn28D3uUJrrVlZBMn XN7aZ/yFqb+UACt5293VkgXKzwzTE0TMbvyvNJ09UrDyv8e4KyvaAQJWoOTER7wLZaVjKiSA/l5Ntg4llUJbb4Rpkeyw5O0c/84= X-Rspam-User: X-Rspamd-Queue-Id: E0CBC10000C X-Rspamd-Server: rspam09 X-Stat-Signature: txxr48fs7b4wcn31kk6by9o3thyaa15y X-CMAE-Analysis: v=2.4 cv=Gdiv3ybL c=1 sm=1 tr=0 ts=6449550b a=brH/N5IkFIefIyviJUQWmA==:117 a=84ok6UeoqCVsigPHarzEiQ==:17 a=ggZhUymU-5wA:10 a=IkcTkHD0fZMA:10 a=dKHAf1wccvYA:10 a=ROLaMO9dBht10UbolAEA:9 a=QEXdDO2ut3YA:10 X-HE-Tag: 1682527498-422856 X-HE-Meta: U2FsdGVkX1803jVulWvi2ANQ9IhzbSWQU02FA9Rrq5nszkgPb25+aI6/qd1CjDs/y2E4+dWFc/4jboqzm2DKJZ/GPTFmfxk/O7HIEf9ZQH4yi7cYP3P204f9oWKLpElMTJdO0Y+TRxU85w8pUlz1f8nlpnKDW4/u/D/gjgbktzrPVGRzItexEGIvbzyfozy0d11aLI4KRd5qRnN/ThYXj+KPdG/Y6ZF1VrLY+q5aKitCa+GXhaBLCZleLa5sm6MHkBP/1LwWej8R6yNeTDpStj/d5ZhBI9ifVz45j02HuVNVGfGoRwOcNxPWxAm29B1CPu98k+VLkQJWwkDtootcFXnmjOom+/Jti4iHraeL/7bhHNX7ClnKjpWDS0101ZuFjJ6qRaW06lJP7yzQhbPPqXcIlzfQnF0vPNr9NUzPVDiq2laam7yGTO6dijdaxoRBW9Yrrj5aiU/+YmHE5E0v4I8LQSQu+QoRYeIbeKR+WG/DTJCK+wR2d5G+OBhGBZgH+zf1MVgYqfoNPOByxqfDI9y5QZ+EtZ4A6JRH2fg4dLVG7fAmwWbXJTPq0URBn4XlqTeFI+P4rA1hFScQ5oIdzX/npbAnh268JE1sBvV7yI8i10HLTTLnHk46VnhDzBze+a3p0iMCfPk4trWC1rLKw3bvMt46tdH2vxDKO3VBkKTkKEgyBseo0ITwVymSODdHAp2+5Qc7aX2nU2TqSrEr2aRdx78wdMGw9acLzyOqq3CcMk0/NC/qUbM94PI6BHlTKlxJifwvm4N+u8FHutYQjwzVvlR6mWdyf46hbUqxEOwvSQ8TVwdpsjfzWFTAnIBe9jb0zTqvt8Jw5Z/f6LvAR+Xjo0MKXxNeBk6uu77fhO1WmKLIjwuGy9XzKvyASt3sh4bw579pYbo1K7jmQm5wvhduuOcKfVWN0ZG75wrUVFAonIxUBO6FH1ZfRZMKvNVlt5hROlKU7Za7v3JuWOV ojJfZO++ /SYY1U7rzvihW0OTPDlW3gMKHP+IAz8c/1Q3R7DNUAqzVE5fsqCNXTf4jSKyt9cdQ+6V2YRJcXN7q8NdwHjC0uOzGaKfft2nzoTnqYS57XUBwM6hD5YIoNR8Jnxrx+/LHPkll8P1rY5rHGEksUr0f+LgADePa0Z8SMdbpGcf8zpg3/iyd+FQ7hhA64KnRfoeHAaY30skshzSekGwn9f8jFlFgn1LsaQY+K+Q2Gc1NOd7kOogZxVKcZQ9sgBL7f1i8GQxTyMY95tD2GfLI+Afm0MPm2AC9GZ26fZnHYUxx4jfe4R/U28/H910uOE4ZDtlQXzDm6qxYTJfGT1GGm6Z3HlOAnCUH+iixdUseX2IDVO7I1ds40k9ed6/MpuLeLjr+FQeWphATTk3ZzbZu2TFb4pJohra85Lm4ZejH3meDvikG98pXlW4ZX1uFzkO5c4IdAuBJ4lcuiD39DVF6CnAU63HAPHv3Rm0EjZDs/7n7XfGoT85fGKEAg6Jb2jFqmJpW+g+kIWrZICdRUYPbswy8HiP68MndUi7LXr19LeRyBb1ZqJ4XZJIOtrv0KmXUb73DTR8V/qVimp60fm2zV4dCZws1jPteKY91wF3APeIA503915NwRc0CslScUA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 26/04/2023 12:07, Hui Wang wrote: > > On 4/26/23 16:33, Michal Hocko wrote: >> [CC squashfs maintainer] >> >> On Wed 26-04-23 13:10:30, Hui Wang wrote: >>> If we run the stress-ng in the filesystem of squashfs, the system >>> will be in a state something like hang, the stress-ng couldn't >>> finish running and the console couldn't react to users' input. >>> >>> This issue happens on all arm/arm64 platforms we are working on, >>> through debugging, we found this issue is introduced by oom handling >>> in the kernel. >>> >>> The fs->readahead() is called between memalloc_nofs_save() and >>> memalloc_nofs_restore(), and the squashfs_readahead() calls >>> alloc_page(), in this case, if there is no memory left, the >>> out_of_memory() will be called without __GFP_FS, then the oom killer >>> will not be triggered and this process will loop endlessly and wait >>> for others to trigger oom killer to release some memory. But for a >>> system with the whole root filesystem constructed by squashfs, >>> nearly all userspace processes will call out_of_memory() without >>> __GFP_FS, so we will see that the system enters a state something like >>> hang when running stress-ng. >>> >>> To fix it, we could trigger a kthread to call page_alloc() with >>> __GFP_FS before returning from out_of_memory() due to without >>> __GFP_FS. >> I do not think this is an appropriate way to deal with this issue. >> Does it even make sense to trigger OOM killer for something like >> readahead? Would it be more mindful to fail the allocation instead? >> That being said should allocations from squashfs_readahead use >> __GFP_RETRY_MAYFAIL instead? > > Thanks for your comment, and this issue could hardly be reproduced on > ext4 filesystem, that is because the ext4->readahead() doesn't call > alloc_page(). If changing the ext4->readahead() as below, it will be > easy to reproduce this issue with the ext4 filesystem (repeatedly run: > $stress-ng --bigheap ${num_of_cpu_threads} --sequential 0 --timeout > 30s --skip-silent --verbose) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index ffbbd9626bd8..8b9db0b9d0b8 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -3114,12 +3114,18 @@ static int ext4_read_folio(struct file *file, > struct folio *folio) >  static void ext4_readahead(struct readahead_control *rac) >  { >         struct inode *inode = rac->mapping->host; > +       struct page *tmp_page; > >         /* If the file has inline data, no need to do readahead. */ >         if (ext4_has_inline_data(inode)) >                 return; > > +       tmp_page = alloc_page(GFP_KERNEL); > + >         ext4_mpage_readpages(inode, rac, NULL); > + > +       if (tmp_page) > +               __free_page(tmp_page); >  } > > > BTW, I applied my patch to the linux-next and ran the oom stress-ng > testcases overnight, there is no hang, oops or crash, looks like there > is no big problem to use a kthread to trigger the oom killer in this > case. > > And Hi squashfs maintainer, I checked the code of filesystem, looks > like most filesystems will not call alloc_page() in the readahead(), > could you please help take a look at this issue, thanks. This will be because most filesystems don't need to do so. Squashfs is a compressed filesystem with large blocks covering much more than one page, and it decompresses these blocks in squashfs_readahead().   If __readahead_batch() does not return the full set of pages covering the Squashfs block, it allocates a temporary page for the decompressors to decompress into to "fill in the hole". What can be done here as far as Squashfs is concerned ....  I could move the page allocation out of the readahead path (e.g. do it at mount time). Adding __GFP_RETRY_MAYFAIL so the alloc() can fail will mean Squashfs returning I/O failures due to no memory.  That will cause a lot of applications to crash in a low memory situation.  So a crash rather than a hang. Phillip