From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F690C3DA59 for ; Fri, 19 Jul 2024 18:13:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D57D96B0083; Fri, 19 Jul 2024 14:13:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CDFFC6B008C; Fri, 19 Jul 2024 14:13:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B80696B0092; Fri, 19 Jul 2024 14:13:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 94D206B0088 for ; Fri, 19 Jul 2024 14:13:41 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 32717A05FC for ; Fri, 19 Jul 2024 18:13:41 +0000 (UTC) X-FDA: 82357300242.01.E334A5C Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by imf18.hostedemail.com (Postfix) with ESMTP id 2C91E1C002C for ; Fri, 19 Jul 2024 18:13:38 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=dmGGEb+z; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721412778; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jtkgtrgq5T9527z52jU1fuPROGptcAZm6NwuubLRp0s=; b=QwlzmwI597JUycNlqsGYkH3jtgazeCiuJezbR1oTOfnw7dm5ggbHSEAbPfHoMzmGpF2SEX zarAKY91qmAOWWAqUYILUvN8gVhs3oUsz3JJiYjrwKgVP/4p1FR9cnr/DRTNEk+XBm+Q9+ YmTOIUu20XZGqfbX2nlrxKY2+Lxkoys= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721412778; a=rsa-sha256; cv=none; b=t5bL+DIYIurGv77wlxhThyIIfCf0WS8/tzyhQrlTDB+PK8/BO9FL49okPMJ82YJYFYjg0w 5kAbqb607Y1iF03hh52SajpHllN7Ky5w8T0SG2NJFHdZOhdpfm6GuSUrJctXVSM6QzI8YO jUi/1/RQ1tYxz2wDA2nPqYLDoBZX36A= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=dmGGEb+z; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-3685a564bafso783226f8f.3 for ; Fri, 19 Jul 2024 11:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1721412818; x=1722017618; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=jtkgtrgq5T9527z52jU1fuPROGptcAZm6NwuubLRp0s=; b=dmGGEb+ztV6iL310wiBvgHSFTZ+ZmhsQb6SqEICjiYiYtmDYkmAeRVCg6XWJUKNqaQ cwFKNQWjxNpEgxkNKGKJMKOUyAYoMtGVEf4NLkbmBZIsPTwycQzQ2PAdk9LZQJGC8aX1 CObv/oXZvaWHgakkezjopW9M41QM6e55sgycnvujGYIKrbCrayUuwO2aZIlmie11plzL NM/yvKaqSe6WpsYEQernrIEMV8kVxSrBde2DbOssZZxMjG8GvCw3MUOvO37be/Nmgy/T YnUMV7ExMoR2jrOMHDDJKVsX3Wl1S/XeFuYIYpgNhUG8hz3gGXanI+hL6u9vy7EQSqV4 hoKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721412818; x=1722017618; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=jtkgtrgq5T9527z52jU1fuPROGptcAZm6NwuubLRp0s=; b=FumOMu12nW2H2FxrZs400TTEwqBFDAWHefLvxy6OZoKBrJ0JyO1est4nzcvmU52q8F CDPm63qu8UXaXWt6tHZwzRrvA6Sji/1oQDxWaJSd2o6VkDp9qjV3CCotGSU6e5dA9O1Z vzGWhxeeaFn+2gUe1Hrp6eIuPyPXahbULr9pcQNxc/aNW8IJZO++XKucgPPfEwpHgprF DGKoM8Qx318Z3N4KEEnZF293ho1bofU57YL6M4N2hxg7GR5p3eFrRas/RLfbLopSFI87 lwHD4lmq+hbDqVtVZAUL+A4G56QgrIlnJVoe9R26BLzYjafxgW0eSRqdBJ08PPRNjSk5 oWcg== X-Forwarded-Encrypted: i=1; AJvYcCXpuidAZCIjDJdru1gTv+gpve9KnFD+QYM9xbPaUEZV2nuCfqHuPRqjwPMKr8Cu/a4qfdtcsrkJCeBK6zzs2FCv1RE= X-Gm-Message-State: AOJu0YwBlzAB2r6UGE4H27K2iDL9+NDgxaMkGkjuYVnhg4a86HkOL1tC tW4hgoGNMZeZRXEFhb3KAqFNWGfXCjaaO9l62HyoowjkQ1bWTh6EGTABwR2Pgho= X-Google-Smtp-Source: AGHT+IFX6MK/dNO28N7btu/BHuth1OXIjPKoRHbGCTlplFaJwfwdjS2joP6EcRYcvcM+RpKYHrt2xw== X-Received: by 2002:a5d:4b0e:0:b0:368:7a18:908c with SMTP id ffacd0b85a97d-3687a18926amr1457236f8f.51.1721412817507; Fri, 19 Jul 2024 11:13:37 -0700 (PDT) Received: from localhost (109-81-94-157.rct.o2.cz. [109.81.94.157]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-427d2a6fd5fsm59755435e9.22.2024.07.19.11.13.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jul 2024 11:13:37 -0700 (PDT) Date: Fri, 19 Jul 2024 20:13:36 +0200 From: Michal Hocko To: Johannes Weiner Cc: Qu Wenruo , linux-btrfs@vger.kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, cgroups@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka Subject: Re: [PATCH v7 2/3] btrfs: always uses root memcgroup for filemap_add_folio() Message-ID: References: <6a9ba2c8e70c7b5c4316404612f281a031f847da.1721384771.git.wqu@suse.com> <20240719170206.GA3242034@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240719170206.GA3242034@cmpxchg.org> X-Rspamd-Queue-Id: 2C91E1C002C X-Stat-Signature: kp1bmdbjecpd6mr7u9yatkhe4bhyrn66 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1721412818-659306 X-HE-Meta: U2FsdGVkX1+BdZCaSUyHIPsAo+4RxbkSPPAlmK3hrtaNDRZh2dQMGC0wTu2vMjNkZqFrWCdMJtAQpWJToJskcJL4zfc54/0/7G5VfzJj/6e4TnwDvvLYpoqUZri/KcQorVKLnvwXm7fTmsmwE4A4lJu0il/75VthyOAaWzyo4Q/EA3bpRkzrmlrsQyOIcJ2VJ5mmzmpo9uTmN6DGnok1l32PXnLt6syG6B08GYwscgMxoMsaC25rdUuCykZtBsMVCzzOuw3dtTBb+VXjxFB2KyfW4KlWW8wYRblHWDlxW58VLMu2ncMk2jYh4PU8TPPTDBSQ+MvnBpcbW3QHbOnMQZHnz2DQplo+/lPVL6HkF/r4NsGcWdgYIK0cdP0gL0wmYZha7XlWoZrtO4be0zT9VQvIspRMW1OaP/VEG5TT8/bNvcURikuBda9Ctx4QWI2HIq75Sezz8WNvInKtgZLdeaVcHt11g09NcOqfCR8vnzNoMBwB8h+EBiRkvNtaCzTCYrFkLPaHeW4EjjMopEEIEBAZgAdxy9DzeN3h7BMoRsqqqQPEMSCyEBJSKJUv23N7H+V4jeX7bS9AOYW5cOnexPZ+kgJAqAuTKiA/6k4zckieWCG9Ej7qjm2F3ypI4fCT2VT5983t8BCZI0sCZOGho6IVryjsAOD6nLgcoFrpq3zW3YmfI/xSvJss1BeQ2CApURWe/mnWHq/pX+iWwwIqEKChKkiTKBP+1/t6/vdQ0f0Abb08XhvpmPkHXOS0zIISJpcOIsQ9aL7FYmUKGuN5Vdn0REcwNEjOyckLG2Et/4iZzCXLki3lD1l1UPUDaNozpqkI2AO2FxBqhzskj64NIYf9QQPu/MkcQfa4XnRxWmA1BHhD+d2Y6sGtsSmFYEndw8/Aj/8z9bGAnKt3z6NXLCm4FFl4uLva3tkrGlfzHOcDQVm+PSJudt5tmbGITwNs3avP+Enem00IJkY1dvR UdJhRT6h 6iyFqfxB7L5M8a0ePaZr72CacFS/KvK0oWS8sEznh+xv0vP89y8Ar3wEBpmrLQXbhFKHxqC4y77IXHHruFGy90OXtRTAOdExofKi3V+7V7Orvthw9GstsQDJe4o3sBBOtN0UBSAeo4kVYndyA8BCWKLyIxkIsan4mR8OHQToDSDVrfKKp3ajmCWjl2n0anROxgRp0/19wuUn9C/xL5nXwRtbTYIRUZgefyebniV/aNihObqodbbO1zD4OVDXdtJgq56c3yTQ/ZVG2wMof12nnnUNkhP7kfYPAw69Vy5eYmQM8tCN9amQ7MzVBxMGFfZDtZE/d4iSM+3RmU5uaXfIaxBbo3IoJFsRZw7Pzz+6CYg7m04eyxj6hzCwBdAn0L+D1e5sUdOaxyoXZlF6XAu4xovuEqLH2sJ+wx2StkRGL/esnmWodoNPbAlN/oyWBYA9v9Qsm X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri 19-07-24 13:02:06, Johannes Weiner wrote: > On Fri, Jul 19, 2024 at 07:58:40PM +0930, Qu Wenruo wrote: > > [BACKGROUND] > > The function filemap_add_folio() charges the memory cgroup, > > as we assume all page caches are accessible by user space progresses > > thus needs the cgroup accounting. > > > > However btrfs is a special case, it has a very large metadata thanks to > > its support of data csum (by default it's 4 bytes per 4K data, and can > > be as large as 32 bytes per 4K data). > > This means btrfs has to go page cache for its metadata pages, to take > > advantage of both cache and reclaim ability of filemap. > > > > This has a tiny problem, that all btrfs metadata pages have to go through > > the memcgroup charge, even all those metadata pages are not > > accessible by the user space, and doing the charging can introduce some > > latency if there is a memory limits set. > > > > Btrfs currently uses __GFP_NOFAIL flag as a workaround for this cgroup > > charge situation so that metadata pages won't really be limited by > > memcgroup. > > > > [ENHANCEMENT] > > Instead of relying on __GFP_NOFAIL to avoid charge failure, use root > > memory cgroup to attach metadata pages. > > > > With root memory cgroup, we directly skip the charging part, and only > > rely on __GFP_NOFAIL for the real memory allocation part. > > > > Suggested-by: Michal Hocko > > Suggested-by: Vlastimil Babka (SUSE) > > Signed-off-by: Qu Wenruo > > --- > > fs/btrfs/extent_io.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > > index aa7f8148cd0d..cfeed7673009 100644 > > --- a/fs/btrfs/extent_io.c > > +++ b/fs/btrfs/extent_io.c > > @@ -2971,6 +2971,7 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, > > > > struct btrfs_fs_info *fs_info = eb->fs_info; > > struct address_space *mapping = fs_info->btree_inode->i_mapping; > > + struct mem_cgroup *old_memcg; > > const unsigned long index = eb->start >> PAGE_SHIFT; > > struct folio *existing_folio = NULL; > > int ret; > > @@ -2981,8 +2982,17 @@ static int attach_eb_folio_to_filemap(struct extent_buffer *eb, int i, > > ASSERT(eb->folios[i]); > > > > retry: > > + /* > > + * Btree inode is a btrfs internal inode, and not exposed to any > > + * user. > > + * Furthermore we do not want any cgroup limits on this inode. > > + * So we always use root_mem_cgroup as our active memcg when attaching > > + * the folios. > > + */ > > + old_memcg = set_active_memcg(root_mem_cgroup); > > ret = filemap_add_folio(mapping, eb->folios[i], index + i, > > GFP_NOFS | __GFP_NOFAIL); I thoutght you've said that NOFAIL was added to workaround memcg charges. Can you remove it when memcg is out of the picture? It would be great to add some background about how much memory are we talking about. Because this might require memcg configuration in some setups. > > + set_active_memcg(old_memcg); > > It looks correct. But it's going through all dance to set up > current->active_memcg, then have the charge path look that up, > css_get(), call try_charge() only to bail immediately, css_put(), then > update current->active_memcg again. All those branches are necessary > when we want to charge to a "real" other cgroup. But in this case, we > always know we're not charging, so it seems uncalled for. > > Wouldn't it be a lot simpler (and cheaper) to have a > filemap_add_folio_nocharge()? Yes, that would certainly simplify things. From the previous discussion I understood that there would be broader scopes which would opt-out from charging. If this is really about a single filemap_add_folio call then having a variant without doesn't call mem_cgroup_charge sounds like a much more viable option and also it doesn't require to make any memcg specific changes. -- Michal Hocko SUSE Labs