From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6ED7C433DB for ; Mon, 25 Jan 2021 18:52:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 442C222B3F for ; Mon, 25 Jan 2021 18:52:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 442C222B3F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 873346B008A; Mon, 25 Jan 2021 13:52:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FC6E6B008C; Mon, 25 Jan 2021 13:52:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C58F8D0001; Mon, 25 Jan 2021 13:52:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 4DE606B008A for ; Mon, 25 Jan 2021 13:52:36 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 09896180ACF1F for ; Mon, 25 Jan 2021 18:52:36 +0000 (UTC) X-FDA: 77745193512.03.ants81_1801ce827588 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id D77E928A4E9 for ; Mon, 25 Jan 2021 18:52:35 +0000 (UTC) X-HE-Tag: ants81_1801ce827588 X-Filterd-Recvd-Size: 11656 Received: from mail-qv1-f45.google.com (mail-qv1-f45.google.com [209.85.219.45]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Mon, 25 Jan 2021 18:52:34 +0000 (UTC) Received: by mail-qv1-f45.google.com with SMTP id s6so6668614qvn.6 for ; Mon, 25 Jan 2021 10:52:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=j3Xw/oUQa+S/aBSbggXK87Od8rLNAanZ/b3oUBXYjEc=; b=e7cR0ZkUc3Squxu4N7tsIG9B42FgOGrFNiImW0dWiyrg4zUJwTidbZpK7W1xVvuWaf 8NKPqVsud++RW0sBib73ddkSGhobD087cDBYGuf5QhJ/xlNqq+rrPkJ0SWngm6FJobkh VeLEzapz9YCa/4UUQkV69cotn7GPF9O3MhrJubQ1M+AqJdFN3jT5Yn5x4hrOXpZcJavJ 8veR72HThrqFdVoZg6jkt7sfbDqtqjczpjkn5MdD6M8AvctDpQ9PYGAKja8s0EuiWQJZ 04ieqY8h9fLKRAZslQd4fBz3oZ4cSu6wnANCxjBvkzpkLJpHFQyw4hZkkjvfUIENMpf9 nvuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=j3Xw/oUQa+S/aBSbggXK87Od8rLNAanZ/b3oUBXYjEc=; b=FbqcPE0brirOC1rS1IwEtGv29CbzkxoBOzd9ZdglpTDPjzzim0JFRnb2J+N9YzT2aL v4aK0o34E+vBrvX/LrOQ//Rq8We8YBGfGz/DRv/EM1yDAcGykc1BKzsBZdreHdeg4Ul7 YRq6nMWInR0Rk+hZ+wBQAuTWH51XH156JreXh2UbAbmeQ9BGr7YsSmMzsPOYxJEYeT9f S+D53m7jsBfDldt9r4yBwKdQieeKnzgNOKkRYQoAn8ZXTBoSQTDgi32eaqtpSlYYplX3 RpOSEMP/chIHqNLd3bIn1cv4p8VBw6+g2vQhbVvSD8KhEuL2hFbo3ZK4BTysmRr+bZIp wzCQ== X-Gm-Message-State: AOAM5306FbTzj6Pa23tHQAp9I7b74QYjKr6lKvEnjxXvTMsEpwNmO5EW NGWdF+ZgJUCelVymYqIljD00dQ== X-Google-Smtp-Source: ABdhPJz61WKW7+3RRnYPymhLMqo8u5vUDYq80GJjjeVR0iW65AN/hvww8VxkZYxc+3aPrzkQemHA+w== X-Received: by 2002:a0c:fa4c:: with SMTP id k12mr2118880qvo.16.1611600754187; Mon, 25 Jan 2021 10:52:34 -0800 (PST) Received: from localhost ([2620:10d:c091:480::1:f735]) by smtp.gmail.com with ESMTPSA id c17sm12772425qkb.13.2021.01.25.10.52.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jan 2021 10:52:33 -0800 (PST) Date: Mon, 25 Jan 2021 13:52:32 -0500 From: Johannes Weiner To: Waiman Long Cc: Michal Hocko , Matthew Wilcox , Andrew Morton , Alex Shi , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm/filemap: Adding missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Message-ID: References: <20210125042441.20030-1-longman@redhat.com> <20210125092815.GB827@dhcp22.suse.cz> <20210125160328.GP827@dhcp22.suse.cz> <20210125162506.GF308988@casper.infradead.org> <20210125164118.GS827@dhcp22.suse.cz> <20210125181436.GV827@dhcp22.suse.cz> <53eb7692-e559-a914-e103-adfe951d7a7c@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <53eb7692-e559-a914-e103-adfe951d7a7c@redhat.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 25, 2021 at 01:23:58PM -0500, Waiman Long wrote: > On 1/25/21 1:14 PM, Michal Hocko wrote: > > On Mon 25-01-21 17:41:19, Michal Hocko wrote: > > > On Mon 25-01-21 16:25:06, Matthew Wilcox wrote: > > > > On Mon, Jan 25, 2021 at 05:03:28PM +0100, Michal Hocko wrote: > > > > > On Mon 25-01-21 10:57:54, Waiman Long wrote: > > > > > > On 1/25/21 4:28 AM, Michal Hocko wrote: > > > > > > > On Sun 24-01-21 23:24:41, Waiman Long wrote: > > > > > > > > The commit 3fea5a499d57 ("mm: memcontrol: convert page > > > > > > > > cache to a new mem_cgroup_charge() API") introduced a bug= in > > > > > > > > __add_to_page_cache_locked() causing the following splat: > > > > > > > >=20 > > > > > > > > [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(pag= e_memcg(page)) > > > > > > > > [ 1570.068333] pages's memcg:ffff8889a4116000 > > > > > > > > [ 1570.068343] ------------[ cut here ]------------ > > > > > > > > [ 1570.068346] kernel BUG at mm/memcontrol.c:2924! > > > > > > > > [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI > > > > > > > > [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G= S W I 5.11.0-rc4-debug+ #1 > > > > > > > > [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/= 81C7, BIOS P60 v01.25 12/06/2017 > > > > > > > > [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130 > > > > > > > > : > > > > > > > > [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 0001= 0286 > > > > > > > > [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260d= dd00 RCX: 0000000000000027 > > > > > > > > [ 1570.068382] RDX: 0000000000000000 RSI: 000000000000= 0004 RDI: ffff88907ebe05a8 > > > > > > > > [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7= c0b6 R09: ffffed120fd7c0b6 > > > > > > > > [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7= c0b5 R12: ffffea00260ddd38 > > > > > > > > [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a411= 6000 R15: 0000000000000001 > > > > > > > > [ 1570.068391] FS: 00007ff039638680(0000) GS:ffff8890= 7ea00000(0000) knlGS:0000000000000000 > > > > > > > > [ 1570.068394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000= 0080050033 > > > > > > > > [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a012= 6006 CR4: 00000000007706e0 > > > > > > > > [ 1570.068398] DR0: 0000000000000000 DR1: 000000000000= 0000 DR2: 0000000000000000 > > > > > > > > [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe= 0ff0 DR7: 0000000000000400 > > > > > > > > [ 1570.068402] PKRU: 55555554 > > > > > > > > [ 1570.068404] Call Trace: > > > > > > > > [ 1570.068407] mem_cgroup_charge+0x175/0x770 > > > > > > > > [ 1570.068413] __add_to_page_cache_locked+0x712/0xad0 > > > > > > > > [ 1570.068439] add_to_page_cache_lru+0xc5/0x1f0 > > > > > > > > [ 1570.068461] cachefiles_read_or_alloc_pages+0x895/0= x2e10 [cachefiles] > > > > > > > > [ 1570.068524] __fscache_read_or_alloc_pages+0x6c0/0x= a00 [fscache] > > > > > > > > [ 1570.068540] __nfs_readpages_from_fscache+0x16d/0x6= 30 [nfs] > > > > > > > > [ 1570.068585] nfs_readpages+0x24e/0x540 [nfs] > > > > > > > > [ 1570.068693] read_pages+0x5b1/0xc40 > > > > > > > > [ 1570.068711] page_cache_ra_unbounded+0x460/0x750 > > > > > > > > [ 1570.068729] generic_file_buffered_read_get_pages+0= x290/0x1710 > > > > > > > > [ 1570.068756] generic_file_buffered_read+0x2a9/0xc30 > > > > > > > > [ 1570.068832] nfs_file_read+0x13f/0x230 [nfs] > > > > > > > > [ 1570.068872] new_sync_read+0x3af/0x610 > > > > > > > > [ 1570.068901] vfs_read+0x339/0x4b0 > > > > > > > > [ 1570.068909] ksys_read+0xf1/0x1c0 > > > > > > > > [ 1570.068920] do_syscall_64+0x33/0x40 > > > > > > > > [ 1570.068926] entry_SYSCALL_64_after_hwframe+0x44/0x= a9 > > > > > > > > [ 1570.068930] RIP: 0033:0x7ff039135595 > > > > > > > >=20 > > > > > > > > Before that commit, there was a try_charge() and commit_c= harge() > > > > > > > > in __add_to_page_cache_locked(). These 2 separated charge= functions > > > > > > > > were replaced by a single mem_cgroup_charge(). However, i= t forgot > > > > > > > > to add a matching mem_cgroup_uncharge() when the xarray i= nsertion > > > > > > > > failed with the page released back to the pool. Fix this = by adding a > > > > > > > > mem_cgroup_uncharge() call when insertion error happens. > > > > > > > >=20 > > > > > > > > Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache = to a new mem_cgroup_charge() API") > > > > > > > > Signed-off-by: Waiman Long > > > > > > > OK, this is indeed a subtle bug. The patch aimed at simplif= ying the > > > > > > > charge lifetime so that users do not really have to think a= bout when to > > > > > > > uncharge as that happens when the page is freed. fscache so= mehow breaks > > > > > > > that assumption because it doesn't free up pages but it kee= ps some of > > > > > > > them in the cache. > > > > > > >=20 > > > > > > > I have tried to wrap my head around the cached object life = time in > > > > > > > fscache but failed and got lost in the maze. Is this the on= ly instance > > > > > > > of the problem? Would it make more sense to explicitly hand= le charges in > > > > > > > the fscache code or there are other potential users to fall= into this > > > > > > > trap? > > > > > > There may be other places that have similar problem. I focus = on the > > > > > > filemap.c case as I have a test case that can reliably produc= e the bug > > > > > > splat. This patch does fix it for my test case. > > > > > I believe this needs a more general fix than catching a random = places > > > > > which you can trigger. Would it make more sense to address this= at the > > > > > fscache level and always make sure that a page returned to the = pool is > > > > > always uncharged instead? > > > > I believe you mean "page cache" -- there is a separate thing call= ed > > > > 'fscache' which is used to cache network filesystems. > > > Yes, I really had fscache in mind because it does have an "unusual"= page > > > life time rules. > > >=20 > > > > I don't understand the memcg code at all, so I have no useful fee= dback > > > > on what you're saying other than this. > > > Well the memcg accounting rules after the rework should have simpli= fied > > > the API usage for most users. You will get memory charged when it i= s > > > used and it will go away when the page is freed. If a page is not r= eally > > > freed in some cases and it can be reused then it doesn't really fit= into > > > this scheme automagically. I do undestand that this puts some addit= ional > > > burden on those special cases. I am not really sure what is the rig= ht > > > way here myself but considering there might be other similar cases = like > > > that I would lean towards special casing where the pool is implemen= ted. > > > I would expect there is some state to be maintain for that purpose > > > already. > > After some more thinking I've came to conclusion that the patch as > > proposed is the proper way forward. It is easier to follow if the > > unwinding of state changes are local to the function. > I think so. It is easier to understand if the charge and uncharge funct= ions > are grouped together in the same function. > >=20 > > With the proposed simplification by Willy > > Acked-by: Michal Hocko >=20 > Thank for the ack. However, I am a bit confused about what you mean by > simplification. There is another linux-next patch that changes the cond= ition > for mem_cgroup_charge() to >=20 > -=A0=A0=A0=A0=A0=A0 if (!huge) { > +=A0=A0=A0=A0=A0=A0 if (!huge && !page_is_secretmem(page)) { > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 error =3D mem_cgroup_char= ge(page, current->mm, gfp); >=20 > That is the main reason why I introduced the boolean variable as I don'= t > want to call the external page_is_secretmem() function twice. The variable works for me. On the other hand, as Michal points out, the uncharge function will be called again on the page when it's being freed (in non-fscache cases), so you're already relying on being able to call it on any page - charged, uncharged, never charged. It would be fine to call it unconditionally in the error path. Aesthetic preference, I guess.