From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6185C77B7F for ; Wed, 3 May 2023 19:38:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59C4F900003; Wed, 3 May 2023 15:38:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54C26900002; Wed, 3 May 2023 15:38:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43A91900003; Wed, 3 May 2023 15:38:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from smtp-relay-canonical-1.canonical.com (smtp-relay-canonical-1.canonical.com [185.125.188.121]) by kanga.kvack.org (Postfix) with ESMTP id 21F51900002 for ; Wed, 3 May 2023 15:38:34 -0400 (EDT) Received: from [192.168.38.233] (unknown [62.168.35.125]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPSA id 132573FC27; Wed, 3 May 2023 19:38:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1683142712; bh=D/NT8UztdZgOiiJwWd9EYWZVviEZRjjGIBZafCt0HYU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=LqdhGc7z/pErkGsmknauPAN85Bqkml15kaTjdClB4HwV0WHeExagSbEDqq1Z35w/F DuqoeZCw9S7GgQESzwgS14bhu1IfpfpdOn7iY8XlO3aYY75bBMt7G03BZ+tetYWG+b mNiw3fpYrflG2R/Ws6RWgut9VXhdVZghXxETDGQ1DY2WKx4taENbjgRW0JLgCLHnNm ZWr0G4SSHkVmgHI+59nRpi2/Ipp5Unrpns332aPlPCgxhEAnrHjwy1o4ylMMKP46yt 70sVS4Hpwn9S6BAM/qNZAWRZ28IgdvU3ajKLrE09SDRshhAyuHrM7ncjEbsUzvYu0X HW7U44hsAfmZw== Message-ID: <3ed037af-33f9-f5cb-3e0a-d1e0d686bd64@canonical.com> Date: Thu, 4 May 2023 03:38:31 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH 1/1] mm/oom_kill: trigger the oom killer if oom occurs without __GFP_FS To: Phillip Lougher , Michal Hocko Cc: Gao Xiang , linux-mm@kvack.org, akpm@linux-foundation.org, surenb@google.com, colin.i.king@gmail.com, shy828301@gmail.com, hannes@cmpxchg.org, vbabka@suse.cz, hch@infradead.org, mgorman@suse.de References: <20230426051030.112007-1-hui.wang@canonical.com> <20230426051030.112007-2-hui.wang@canonical.com> <68b085fe-3347-507c-d739-0dc9b27ebe05@linux.alibaba.com> <4aa48b6a-362d-de1b-f0ff-9bb8dafbdcc7@canonical.com> <70000460-ace2-3965-084d-34be65a6bd6a@squashfs.org.uk> Content-Language: en-US From: Hui Wang In-Reply-To: <70000460-ace2-3965-084d-34be65a6bd6a@squashfs.org.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/4/23 03:10, Phillip Lougher wrote: > > On 03/05/2023 12:49, Hui Wang wrote: >> >> On 4/29/23 03:53, Michal Hocko wrote: >>> On Thu 27-04-23 11:47:10, Hui Wang wrote: >>> [...] >>>> So Michal, >>>> >>>> Don't know if you read the "[PATCH 0/1] mm/oom_kill: system enters >>>> a state >>>> something like hang when running stress-ng", do you know why >>>> out_of_memory() >>>> will return immediately if there is no __GFP_FS, could we drop >>>> these lines >>>> directly: >>>> >>>>      /* >>>>       * The OOM killer does not compensate for IO-less reclaim. >>>>       * pagefault_out_of_memory lost its gfp context so we have to >>>>       * make sure exclude 0 mask - all other users should have at >>>> least >>>>       * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to >>>>       * invoke the OOM killer even if it is a GFP_NOFS allocation. >>>>       */ >>>>      if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && >>>> !is_memcg_oom(oc)) >>>>          return true; >>> The comment is rather hard to grasp without an intimate knowledge of >>> the >>> memory reclaim. The primary reason is that the allocation context >>> without __GFP_FS (and also __GFP_IO) cannot perform a full memory >>> reclaim because fs or the storage subsystem might be holding locks >>> required for the memory reclaim. This means that a large amount of >>> reclaimable memory is out of sight of the specific direct reclaim >>> context. If we allowed oom killer to trigger we could invoke the oom >>> killer while there is a lot of otherwise reclaimable memory. As you can >>> imagine not something many users would appreciate as the oom kill is a >>> very disruptive operation. In this case we rely on kswapd or other >>> GFP_KERNEL like allocation context to make forward instead. If there is >>> really nothing reclaimable then the oom killer would eventually hit >>> from >>> elsewhere. >>> >>> HTH >> Hi Michal, >> >> Understand. Thanks for explanation. So we can't remove those 2 lines >> of code. >> >> Here in my patch, letting a kthread allocate a page with GFP_KERNEL, >> It could possibly trigger the reclaim and if nothing reclaimable, >> trigger the oom killer. Do you think it is a safe workaround for the >> issue we are facing currently? >> >> >> And Hi Phillip, >> >> What is your opinion on it, do you have a direction to solve this >> issue from filesystem? >> > > The following patch creates the concept of "squashfs contexts", which > moves all memory dynamically allocated (in a readahead/read_page path) > into a single structure which can be allocated and deleted once.  It > then creates a pool of these at filesystem mount time.  Threads > entering readahead/read_page will take a context from the pool, and > will then perform no dynamic memory allocation. > > The final patch-series will make this a non-default build option for > systems that need this. > > Phillip > > Hi Phillip, Got it. Will verify the patch next Monday or Tuesday. I am on travel now, will be back to the office next Monday. Thanks, Hui.