From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A12F5C3DA60 for ; Thu, 18 Jul 2024 07:17:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2E3446B0088; Thu, 18 Jul 2024 03:17:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 292FA6B008C; Thu, 18 Jul 2024 03:17:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15ADB6B0092; Thu, 18 Jul 2024 03:17:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EC9696B0088 for ; Thu, 18 Jul 2024 03:17:50 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6CCCCC195E for ; Thu, 18 Jul 2024 07:17:50 +0000 (UTC) X-FDA: 82352018700.19.B994756 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id A861940007 for ; Thu, 18 Jul 2024 07:17:48 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=nES1GApa; spf=pass (imf01.hostedemail.com: domain of vbabka@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=vbabka@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721287048; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aJHz+GtlIP/oNROy8JqH2iJjflnO0W6uq/dwFyHzCL8=; b=zRVp9km0gv+HfnCeq0LtQAb23+fQhYa/6DLQGrx6w3+ZgvBCRlZkDLnE0nbHYbb56wnQNP gm+EEvLJqDmxMpA8hPQS1kCCzc/gGurcXbdh8fXh23+Bj6wvKB1Z2NMVWuMZvdLla7e34p 6CNB8SivZrSimfm1sSjXcVYJofcjpoo= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=nES1GApa; spf=pass (imf01.hostedemail.com: domain of vbabka@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=vbabka@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721287048; a=rsa-sha256; cv=none; b=L5+t456L3aQINvqvQdWQaeXNg4ezGY7M3uuQWw9Q6sCQI6pH6iFLmxFoSWjS0nq0UqZoOP +MxVJDcZTP8GfyOTi5zie8rhyVL19vVD7DOnhy7GIcDnYhd+P7kZAnzKa3D18aMmvgrIm+ 511RKCPintrAURNKTsLndZ81kJPUO4U= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id C7156617D7; Thu, 18 Jul 2024 07:17:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1211CC116B1; Thu, 18 Jul 2024 07:17:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721287067; bh=6geIF7vYh23EULIWqeZnsJ4uWp50DaQE9puxPgPkEPU=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=nES1GApabH95lkwjPEerwfUV3d3HaWGeswuGNZvyXJaxVH+NDrSiEbHzElzDI/4/d OZV2a4VOE6KG0p/LyojZ8XoIMwDqTRwSdQluY6axPMDs97aN9c37U97HFJiiuJvyR3 /GVKsoI1iJwjwO2xMOUBtF6AWodp4iyX6k6QSE66GTDKzg+crb9kt58cknByoeKkrn LMFa0ZuKrUZIGOXNef3kt07LKZCwlg8MebpXNGkDS9HP/w2s0Ym6Ng3675jBPWSOvm Wsy5kvel9o6w0HtxAyZoRE7Bo9dbtw3vEykC09U4DY95haJ6MbyKtUfSfFBdVrCm2A WGYhicI76sgxQ== Message-ID: <9572fc2b-12b0-41a3-82dc-bb273bfdd51d@kernel.org> Date: Thu, 18 Jul 2024 09:17:42 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] mm: skip memcg for certain address space To: Qu Wenruo , Michal Hocko Cc: Qu Wenruo , linux-btrfs@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Cgroups , Matthew Wilcox References: <8faa191c-a216-4da0-a92c-2456521dcf08@kernel.org> <9c0d7ce7-b17d-4d41-b98a-c50fd0c2c562@gmx.com> Content-Language: en-US From: "Vlastimil Babka (SUSE)" In-Reply-To: <9c0d7ce7-b17d-4d41-b98a-c50fd0c2c562@gmx.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: A861940007 X-Stat-Signature: 8sh548zy6fynpi5kqwyepabroj1ytu86 X-HE-Tag: 1721287068-524028 X-HE-Meta: U2FsdGVkX1+woUB5CyhL2eQQ7KCTsm+S9bjWgdtF8uKYLd1MDtwKV+N3zvuHoEi2YQO/xDske8njhYIEAYPy4SRptvL/goHSzbhCPhG74OIhDVNLWUGh72hk2i5t3c0m4yzifvjRLUyOiQn4MNwMmo5xhm83MtRxssYLvtL+KB8lFxopAOU1yE7UnaZwgu9vDnrBLAPsVybomcvansgUubIlj7fpxiRr0GYj8enq1e7LcuOqgUYCVqK0jb1U11IkQ+KW6OYU0z24MCTFZHbQBg+x+T/H2jolnb6hDjNu5hntOsdR3iMA/UL/kfkcVukFJHg9Ji2mNBYp63gvQEK9oeYMlsiFMZIsYDPUrXGKKHNfWDy84A4hUyfV0jPvo87wUep93bNyI1sF+Vjk/HbMAgEDPeGTgekPbxu6iz0ntehX0uTNVG2XzKHLqEDyWJBZFHsN241E+C4/JLJWC30BYYcARMj9yEcFkXE0BpNW7cY5mHCzAISlwfUMwM6+dXds0XBFHUj+QpnHoKbmM9+qiEpURpmLrAkJDW0l2ZAJxdKf5m8jE0gQrgrs5OWz0/Zue9mcCNk8TQ4exQjhGIlMrvppUS04U2DA8hks5M1Rd4rTzVDf1gZhCBB0ej7HFkZcG2V2LjabXrhq8LJYdoDUEeXl+fqSc3BbdGfV84fHQ4SR+BSl9UVSXFbPSvDF1RAsCz+Skb4SwscePworNOYFa016oCB07ZwTy5pg0HSGW/7+eVGyJjjCRUjppF6ody0WhgJPyzQbL+qMC9cuIq/pwxsk7udBgsendmr8vYPd2RQYOuBq1hAY2dMkwQhrm2Bm1CfaJhj2rkU1526LDOkK02AdqyNN9EzAEQAXJ3t4Zcre0pBuLA60RdtrH79DA43XDeMFuaX1+HI/2oPtF0F6IzuK3LPtmMacoEbVhAUzlVy/aNhVwv6hIrEB+JKQOdd9wsWUXscEUY9PpeCfMR+ ExLf/1cA D6Ve/u731P7kNvFF7fVb6NOcUtJ45xQixIuLxcfseft3gMTuw10TXXEzz+oD/t8TGswQvK/TV2VSSblwNG6geQWxU7Pj86iLXwuHBxKg+WMmipZkQ3Ar8IhGATy5LnOV412ORiVAPxUWUNzegqmUg/dL1D0ryc2NQTq6A1hJ82OSJCqlUreoTlAIwszo/pugbmyoppm9I9DoPfSC4ksvktwg4djnm7h3cd2sHNR+PKLGumMFzGLvyhL0yPLBM0hSrxa58 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 7/18/24 12:38 AM, Qu Wenruo wrote: > 在 2024/7/18 01:44, Michal Hocko 写道: >> On Wed 17-07-24 17:55:23, Vlastimil Babka (SUSE) wrote: >>> Hi, >>> >>> you should have Ccd people according to get_maintainers script to get a >>> reply faster. Let me Cc the MEMCG section. >>> >>> On 7/10/24 3:07 AM, Qu Wenruo wrote: >>>> Recently I'm hitting soft lockup if adding an order 2 folio to a >>>> filemap using GFP_NOFS | __GFP_NOFAIL. The softlockup happens at memcg >>>> charge code, and I guess that's exactly what __GFP_NOFAIL is expected to >>>> do, wait indefinitely until the request can be met. >>> >>> Seems like a bug to me, as the charging of __GFP_NOFAIL in >>> try_charge_memcg() should proceed to the force: part AFAICS and just go over >>> the limit. >>> >>> I was suspecting mem_cgroup_oom() a bit earlier return true, causing the >>> retry loop, due to GFP_NOFS. But it seems out_of_memory() should be >>> specifically proceeding for GFP_NOFS if it's memcg oom. But I might be >>> missing something else. Anyway we should know what exactly is going first. >> >> Correct. memcg oom code will invoke the memcg OOM killer for NOFS >> requests. See out_of_memory >> >> /* >> * The OOM killer does not compensate for IO-less reclaim. >> * But mem_cgroup_oom() has to invoke the OOM killer even >> * if it is a GFP_NOFS allocation. >> */ >> if (!(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) >> return true; >> >> That means that there will be a victim killed, charges reclaimed and >> forward progress made. If there is no victim then the charging path will >> bail out and overcharge. >> >> Also the reclaim should have cond_rescheds in the reclaim path. If that >> is not sufficient it should be fixed rather than workaround. > > Another question is, I only see this hang with larger folio (order 2 vs > the old order 0) when adding to the same address space. > > Does the folio order has anything related to the problem or just a > higher order makes it more possible? I didn't spot anything in the memcg charge path that would depend on the order directly, hm. Also what kernel version was showing these soft lockups? > And finally, even without the hang problem, does it make any sense to > skip all the possible memcg charge completely, either to reduce latency > or just to reduce GFP_NOFAIL usage, for those user inaccessible inodes? Is it common to even use the filemap code for such metadata that can't be really mapped to userspace? How does it even interact with reclaim, do they become part of the page cache and are scanned by reclaim together with data that is mapped? How are the lru decisions handled if there are no references for PTE access bits. Or can they be even reclaimed, or because there may e.g. other open inodes pinning this metadata, the reclaim is impossible? (sorry if the questions seem noob, I'm not that much familiar with the page cache side of mm) > Thanks, > Qu