From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19C0AC43215 for ; Thu, 14 Nov 2019 19:17:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C70502071F for ; Thu, 14 Nov 2019 19:17:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C70502071F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 310686B0003; Thu, 14 Nov 2019 14:17:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 29A5A6B0005; Thu, 14 Nov 2019 14:17:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 115BC6B0006; Thu, 14 Nov 2019 14:17:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id EB8D06B0003 for ; Thu, 14 Nov 2019 14:17:01 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 924A0181AC9C6 for ; Thu, 14 Nov 2019 19:17:01 +0000 (UTC) X-FDA: 76155840642.03.knee92_315b7c720b45 X-HE-Tag: knee92_315b7c720b45 X-Filterd-Recvd-Size: 3175 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Thu, 14 Nov 2019 19:17:01 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C28CFB1F0; Thu, 14 Nov 2019 19:16:59 +0000 (UTC) Date: Thu, 14 Nov 2019 20:16:57 +0100 From: Michal Hocko To: Roman Gushchin Cc: Michal =?iso-8859-1?Q?Koutn=FD?= , "linux-mm@kvack.org" , Andrew Morton , Johannes Weiner , "linux-kernel@vger.kernel.org" , Kernel Team , "stable@vger.kernel.org" , Tejun Heo Subject: Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Message-ID: <20191114191657.GN20866@dhcp22.suse.cz> References: <20191106225131.3543616-1-guro@fb.com> <20191113162934.GF19372@blackbody.suse.cz> <20191113170823.GA12464@castle.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20191113170823.GA12464@castle.DHCP.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 13-11-19 17:08:29, Roman Gushchin wrote: > On Wed, Nov 13, 2019 at 05:29:34PM +0100, Michal Koutn=FD wrote: > > Hi. > >=20 > > On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote: > > > Let's fix it by switching from css_tryget_online() to css_tryget(). > > Is this a safe thing to do? The stack captures a kmem charge path, wi= th > > css_tryget() it may happen it gets an offlined memcg and carry out > > charge into it. What happens when e.g. memcg_deactivate_kmem_caches i= s > > skipped as a consequence? >=20 > The thing here is that css_tryget_online() cannot pin the online state, > so even if returned true, the cgroup can be offline at the return from > the function. So if we rely somewhere on it, it's already broken. Then what is the point of this function and what about all other users? > Generally speaking, it's better to reduce it's usage to the bare minimu= m. If it doesn't have any sensible semantic then I would argue it should go altogether otherwise we are going to chase new users again and aagain? =20 > > > The problem is caused by an exiting task which is associated with > > > an offline memcg. We're iterating over and over in the > > > do {} while (!css_tryget_online()) loop, but obviously the memcg wo= n't > > > become online and the exiting task won't be migrated to a live memc= g. > > As discussed in other replies, the task is not yet exiting. However, = the > > access to memcg isn't through `current` but `mm->owner`, i.e. another > > task of a threadgroup may have got stuck in an offlined memcg (I don'= t > > have a good explanation for that though). The trace however points to current->mm or current->active_memcg. Is it possible that we have a stale active_memcg? --=20 Michal Hocko SUSE Labs