From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67ACCCE8375 for ; Mon, 30 Sep 2024 17:23:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 546D26B009B; Mon, 30 Sep 2024 13:23:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F68B6B00C4; Mon, 30 Sep 2024 13:23:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39739280003; Mon, 30 Sep 2024 13:23:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 191B46B009B for ; Mon, 30 Sep 2024 13:23:26 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 92F251C5F3C for ; Mon, 30 Sep 2024 17:23:25 +0000 (UTC) X-FDA: 82622075970.04.29C7289 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) by imf13.hostedemail.com (Postfix) with ESMTP id B669820005 for ; Mon, 30 Sep 2024 17:23:23 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=q2Go8nM9; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf13.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.172 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727716984; a=rsa-sha256; cv=none; b=12UuzHIewBo1EFI6K/UivQuzFI8BV/Umxc0aQEKkf3WFKKcLcKuoFTRS0ZgdAlGnPeWUzz RL4P27NmgVIarL/53gWy7eW2caMtZ0sq57nySBOykdVj9YzuoHPzkTlR/XIYrMwk291uW2 srxGbAKX49MJYhUyBvodlnafN1BmQSQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=q2Go8nM9; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf13.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.172 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727716984; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dzSxpLu+y25AB6WLpgOaZVb8gZEJ5dE5xNKaC6UmINQ=; b=mOu7EKEz87yOucANrpIxX/J71k+AbPvz/EM8wG59f2azDLfBCLS7unxwxZA9TETaPeRL06 vBBiIXnjYliej6nUh/cJdERXsh57UpPcE/ZH7Q7q6JT0SkiuPP+jkHF7/oAgQ6T2znSQ2f jsk85t5/XmkaMOZOlFZubguERSa9S6s= Date: Mon, 30 Sep 2024 10:23:16 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1727717001; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dzSxpLu+y25AB6WLpgOaZVb8gZEJ5dE5xNKaC6UmINQ=; b=q2Go8nM9wbpL5b7RA1YB0iCsFwTZ6CAce/9vNZNG6Mk4ZjD6S0aLPFsVivVvC8y305Pbbe wvIutYhV5fkHgYPMQ9wZVqb/oqnO4ugBFrAkOSB2qHnOmMM4EaC+9p2AfG/2L+jkhxGzp0 TUuHTiDXya7wNMce9rhp1+sYhAhUZxw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Qu Wenruo Cc: linux-btrfs@vger.kernel.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Michal Hocko , "Vlastimil Babka (SUSE)" Subject: Re: [PATCH] btrfs: root memcgroup for metadata filemap_add_folio() Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Stat-Signature: whempstkwzebhbpbjoiyq545oupht5zs X-Rspamd-Queue-Id: B669820005 X-Rspamd-Server: rspam02 X-HE-Tag: 1727717003-430045 X-HE-Meta: U2FsdGVkX1+G0Z2pfxHLuhye1jw9DqtFYTmQjndIQg/qUT5GL+x6zpeqvwkbYBOeGk0++uZk6nrNBRhnllQnOZaOp0pxvd9L8nDUFovqLzhCa/Ph62wD76PPWiw+ouh0THFnls3kEhaxp/89vd/id+zr6dtWJRWBy5S5pDU2C8W8MnmsA137t1vSoTat9X9F5PhP10MLCXDveZNz1KzZZyGkcf9W6Omla8llZpYaocuOjXJ5rj6xZk5xGe4GMn6fX6pkzYf3L71QPQ5nH+SYhu2DFuT+hW/LfiOEZdKrzvp3XY73lFdXH0y5gmYBw/liGwk/g/lCPNbtmJ7n979YjaEKi5xIzLium8O/cUOpaWVi7ans9uZWGJvAN/5qLjwquwVutZWieo43sAgiDNipeUZb4D00f9wgnVKWWHFLzMVmP+ShXNI0nZYXmHsbrbpom4WIm323nfg5HUE8IyebxxLeZDWta3x5I6XbhFsvWOqjMI02sTwOuy/sgrnXP5kQHky1LctBb+eX5VHoiF49tHxg7IuRn23XiDpZXoen0VoKK5jyiljIWZ/QlNFbC8Pmeqqv7J4/h/BNXfD86tlBoSBOcXPlD1QZB4KUWb1yDAuUeHlmLzStryUPb295JIqZ8OlyTKvuoVgw070duWY8/zK7QwawIKUAwrpxN6icBhb8GvFELMPMmsnJbPUQkdXlwc3SNf5bXodJTCccCwXL1u4/rQXDaTnBbTcpN8whjAOpJ7/ZCrHKvnKjTF+jaTZrJaUkoBwVvZe9JiK+6t8U2/YDIOKdRctDLDjgXU6d4xzS8FxzSW39EMZfiEkMfAVySjyHqVVtPIGDaBJvuXeymaIb3FlM2hhTzVPKC+qcb+ByKXhHe28xsgKE3uLLyMFtBCIxxR9eFYh3MLexgX/4Qowg/S1561vRCTQowYOzxpF5eB5TYHodiVg795JH+Kq2V/ANk5rbPPQZc0PZicp SbpBb8mq wA/LuJwPos3txNodUGGpMPe1fCSfAAp4ln2bR3YU4BR23RglVL1xT0bYJSwr8Mcepc89B2GcqELhSAHouk439Jphe9YC3tXXD+x6r3EnH+iT1Tfmmd5EfwgWuoBKEAT+fH54l/c8xKk7HhvJNBVuSLPNwWkIY5bH3z7b1jT10eYrI1iOelmwyCF3GQ+Md/5CYP9Rw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Qu, On Sat, Sep 28, 2024 at 02:15:56PM GMT, Qu Wenruo wrote: > [BACKGROUND] > The function filemap_add_folio() charges the memory cgroup, > as we assume all page caches are accessible by user space progresses > thus needs the cgroup accounting. > > However btrfs is a special case, it has a very large metadata thanks to > its support of data csum (by default it's 4 bytes per 4K data, and can > be as large as 32 bytes per 4K data). > This means btrfs has to go page cache for its metadata pages, to take > advantage of both cache and reclaim ability of filemap. > > This has a tiny problem, that all btrfs metadata pages have to go through > the memcgroup charge, even all those metadata pages are not > accessible by the user space, and doing the charging can introduce some > latency if there is a memory limits set. > > Btrfs currently uses __GFP_NOFAIL flag as a workaround for this cgroup > charge situation so that metadata pages won't really be limited by > memcgroup. > > [ENHANCEMENT] > Instead of relying on __GFP_NOFAIL to avoid charge failure, use root > memory cgroup to attach metadata pages. > > Although this needs to export the symbol mem_root_cgroup for > CONFIG_MEMCG, or define mem_root_cgroup as NULL for !CONFIG_MEMCG. > > With root memory cgroup, we directly skip the charging part, and only > rely on __GFP_NOFAIL for the real memory allocation part. > I have a couple of questions: 1. Were you using __GFP_NOFAIL just to avoid ENOMEMs? Are you ok with oom-kills? 2. What the normal overhead of these metadata in real world production environment? I see 4 to 32 bytes per 4k but what's the most used one and does it depend on the data of 4k or something else? 3. Most probably multiple metadata values are colocated on a single 4k page of the btrfs page cache even though the corresponding page cache might be charged to different cgroups. Is that correct? 4. What is stopping us to use reclaimable slab cache for this metadata? thanks, Shakeel