From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 896C3C433F5 for ; Tue, 4 Oct 2022 16:18:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 029096B0071; Tue, 4 Oct 2022 12:18:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1B186B0073; Tue, 4 Oct 2022 12:18:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE3306B0074; Tue, 4 Oct 2022 12:18:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CC4236B0071 for ; Tue, 4 Oct 2022 12:18:49 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9A6341A0842 for ; Tue, 4 Oct 2022 16:18:49 +0000 (UTC) X-FDA: 79983775578.20.21ABDCC Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by imf18.hostedemail.com (Postfix) with ESMTP id ABF161C0024 for ; Tue, 4 Oct 2022 16:18:48 +0000 (UTC) Date: Tue, 4 Oct 2022 09:18:26 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1664900326; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ZUComksE3kcEt46sxXrKqcpfs6gx5hGj/h6rV5lmqcA=; b=sMO7vktZT5MGx84xs2oOUmu8oUi89tdQnMxkIFjywwl4y39OEIc2bizLFqfa+JS6ri6azL Puy0FUqzIvk0/3dyPuJS4UoaxKZk12X/YFishkxYpBUyyi2TlsCqf3i3rXFq8BZhbPfgA+ IgKU+H2s3Qlcq0X/zgTyfbpEgtEQMgw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Alexander Fedorov Cc: Michal Hocko , Johannes Weiner , Shakeel Butt , Vladimir Davydov , Muchun Song , Sebastian Andrzej Siewior , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: Possible race in obj_stock_flush_required() vs drain_obj_stock() Message-ID: References: <1664546131660.1777662787.1655319815@gmail.com> <821923d8-17c3-f1c2-4d6a-5653c88db3e8@gmail.com> <2f9bdffd-062e-a364-90c4-da7f09c95619@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2f9bdffd-062e-a364-90c4-da7f09c95619@gmail.com> X-Migadu-Flow: FLOW_OUT ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664900329; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZUComksE3kcEt46sxXrKqcpfs6gx5hGj/h6rV5lmqcA=; b=QUY+4jd5Ky+nO8OPcACt4aP8uJES3G3Gl+t2+wl29G+T86KE4/T4KxFvR8ngyc9gRQZSAe gtEZWggc1o2ZQAkUI8W3nLr9U3s3V9N6W4W9Bn/LBp//Rh3TyugF0HjCAjRe4F/o0U8H80 9VoPhCTwrYWotMI9OEEAWHY4r65+MEo= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sMO7vktZ; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664900329; a=rsa-sha256; cv=none; b=AOfR96JydhfkDrr4ATgzevR0v/V4erbIRs80rJhilMuG0hm0yLiCu31ocOEwivvzMrmqex xXjCHvGX5mmsc2LHqggMpFlWez1CeI07jo/b40Pf20FYa6RQLpNzfae4C0bN8dAP3U2HdA EQ5xhuM2FcFZLghIghBY0B6WSL7wJU4= X-Rspamd-Queue-Id: ABF161C0024 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sMO7vktZ; spf=pass (imf18.hostedemail.com: domain of roman.gushchin@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: enmc55z9ahmy1k9jj8wzjdnofpk5b3ks X-HE-Tag: 1664900328-54524 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 03, 2022 at 06:01:35PM +0300, Alexander Fedorov wrote: > On 03.10.2022 17:27, Michal Hocko wrote: > > On Mon 03-10-22 17:09:15, Alexander Fedorov wrote: > >> On 03.10.2022 16:32, Michal Hocko wrote: > >>> On Mon 03-10-22 15:47:10, Alexander Fedorov wrote: > >>>> @@ -3197,17 +3197,30 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) > >>>> stock->nr_bytes = 0; > >>>> } > >>>> > >>>> - obj_cgroup_put(old); > >>>> + /* > >>>> + * Clear pointer before freeing memory so that > >>>> + * drain_all_stock() -> obj_stock_flush_required() > >>>> + * does not see a freed pointer. > >>>> + */ > >>>> stock->cached_objcg = NULL; > >>>> + obj_cgroup_put(old); > >>> > >>> Do we need barrier() or something else to ensure there is no reordering? > >>> I am not reallyu sure what kind of barriers are implied by the pcp ref > >>> counting. > >> > >> obj_cgroup_put() -> kfree_rcu() -> synchronize_rcu() should take care > >> of this: > > > > This is a very subtle guarantee. Also it would only apply if this is the > > last reference, right? > > Hmm, yes, for the last reference only, also not sure about pcp ref > counter ordering rules for previous references. > > > Is there any reason to not use > > WRITE_ONCE(stock->cached_objcg, NULL); > > obj_cgroup_put(old); > > > > IIRC this should prevent any reordering. > > Now that I think about it we actually must use WRITE_ONCE everywhere > when writing cached_objcg because otherwise compiler might split the > pointer-sized store into several smaller-sized ones (store tearing), > and obj_stock_flush_required() would read garbage instead of pointer. > > And thinking about memory barriers, maybe we need them too alongside > WRITE_ONCE when setting pointer to non-null value? Otherwise > drain_all_stock() -> obj_stock_flush_required() might read old data. > Since that's exactly what rcu_assign_pointer() does, it seems > that we are going back to using rcu_*() primitives everywhere? Hm, Idk, I'm still somewhat resistant to the idea of putting rcu primitives, but maybe it's the right thing. Maybe instead we should always schedule draining on all cpus instead and perform a cpu-local check and bail out if a flush is not required? Michal, Johannes, what do you think? Btw the same approach is used for the memcg part (stock->cached), so if we're going to change anything, we need to change it too. Thanks!