From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB2E4C3DA7F for ; Mon, 5 Aug 2024 17:25:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6C7406B00AF; Mon, 5 Aug 2024 13:25:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 677846B00B0; Mon, 5 Aug 2024 13:25:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5403F6B00B2; Mon, 5 Aug 2024 13:25:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 34BA36B00AF for ; Mon, 5 Aug 2024 13:25:36 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D0F3D16046A for ; Mon, 5 Aug 2024 17:25:35 +0000 (UTC) X-FDA: 82418868630.23.1CED20A Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) by imf26.hostedemail.com (Postfix) with ESMTP id D4D59140018 for ; Mon, 5 Aug 2024 17:25:33 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=pJ9HXd8k; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722878726; a=rsa-sha256; cv=none; b=gjk8WmqUDekrwJRb9yV6J3tZeooFgydK+kTvU4ke4K9lT2Uyq2iVDGM7T57wZzQ7g36PF0 kTnhp+b02b7/4BRKQG56N3tS6VJ4SX72uy1wlH0eSMvxISmZES6uXUpNQjjifvghKvtTmC wLYpsUqKVc1/Gyyg54SH4sg7BhRDn+I= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=pJ9HXd8k; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf26.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722878726; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BSyIpocgo4EjXlHcEmDY7pMivTvSufzck5K4nc0PmSE=; b=Y+OToPVIJh2P4Y/XAviK1wnoySS1OzyzDWJIE9UBrQ/8ibpld+VakhhH0S0pIbB23uRNUz LXg18z+VpaIUSsxoymNFcMSVP+rpd77eZVojCn/GGQ0Err8yyvE4U+PSTFw5dt1oWOQ0Sy ndBDk4/FMK3OWY5l2f56YZJGOLHb1nE= Date: Mon, 5 Aug 2024 17:25:25 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1722878731; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BSyIpocgo4EjXlHcEmDY7pMivTvSufzck5K4nc0PmSE=; b=pJ9HXd8kIZl7wly61z0L0jQolO/0ZwVJmeDFniZGVaVBCJSlJ8YDC2+ch0yunrst6sntqO vvGVqOcGhjZSe2ROPbys/LDgZPHv2Dno1J2AREnPUa7d4FcRszQC4U2EkZs9xHXSrlXkpy 4PBOA4ZTqLynya7x7OeZMGdZOT2ol/A= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Shakeel Butt Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Meta kernel team , cgroups@vger.kernel.org Subject: Re: [PATCH] memcg: protect concurrent access to mem_cgroup_idr Message-ID: References: <20240802235822.1830976-1-shakeel.butt@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240802235822.1830976-1-shakeel.butt@linux.dev> X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: D4D59140018 X-Rspamd-Server: rspam01 X-Stat-Signature: 7sro3e71ru1dhwoqrpqt33yijfzo4tzg X-HE-Tag: 1722878733-446819 X-HE-Meta: U2FsdGVkX19+LfD3XhYG1NtAF9n9XagjJJjstfSXAhnwXCDFGNsJ60WsNARD9HFAWNeR3f/85PjeBvCu8Vc68P2z0auxuGmk6Pc4IVebSzNFrjgMtNi1k20WeTmBdikRsoR9eWHdghQSTahBNUCBCHHW2GL5BTmboT795/JCQi1uEU6g6ZiaYqGWzlQ5XLR3zGAAD4TFJgJ4HkqzpmGawo/MPvcuBTQSJkMdD9tGBo7PRYFqE/eWjiklZitABqisb+9R9kgPG6TB+M5k7SA4nSog/DOPuvt+Da3wHRMwp1C9VsXpRd6BrknxA4yU6+X3mY1yL+/2A3Zr1UHlnkrO2jlEH6n+GJPBeuTlVkiHScd2NfbWEn85Wtm7CCv/hJvZgX/7OLfmqvBsxrTnmki4ytHDSIFAbR4SkMyD7cooOMYuKUXeaR8UbUsPIX/wxFOGZMJnQVoSbEGR6WAsHm6Q8Sqv2wc4SyjA/3Q9kBlinQUJ2nWMbGL0MDGhI69sI6isPZGDH0X1Zx3ux3XfuBENIkdPybSLtCMfk3agLPpNjSrTxul2d6r4x0CUFrSXYMPXnnOLdhcL+lkOvG86igrJVWoGen97I/oLJkcpJuDhsWtAEsHJ3OLkn+H6LQYe1HxAQD4yEYJNlr+bqrWN/4SyLXfEGQH1Cv3QnOin9CFGUg8IITmgK3+5GZbP/1g0T+L+xZYRKQTTHfrMGEIVYKBUPbSYv8iNzivIbwlywB3xpKRV94gALIvTxGiLWuop771acohscd9KZeij0aMnfY7g97v5KdToZs3nwjY65Kbzn8mPiRIFvM+YdrpE6HNovNop3TE7iG5zNu0HEaMDtem2xyglivTD+abtco6XY+L8WuiPBdkAMA8HXvCELA6LmOXXFnQeHetk7Gv4OUlSwFfxMrjmfdl8TqByoMe9GKRVdGMsSIi5kUQniOSRg/TNTRBDSKIKfmVP4Yj354QKafE 0jp0Tzy/ 1rgGgrENlbsJ/D7g7xZYnxkBy1ev269dMdw+adOjctyRcN35D1WEPAKe6X5tzxOs+ip+5PcG0TatOQqs4LyhM9YSw2ZC5IvytIX0ZLfkLg/vsKueMKseTZTikFIBjXj0D6bVcaYzC+t+CcCEhwXC/Pp57lXJ/jO+bKcP+t/Ta7UVOkUFlLknnkaS5JQkJxNuOsaz44s0BD0YzXXI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 02, 2024 at 04:58:22PM -0700, Shakeel Butt wrote: > The commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure > after many small jobs") decoupled the memcg IDs from the CSS ID space to > fix the cgroup creation failures. It introduced IDR to maintain the > memcg ID space. The IDR depends on external synchronization mechanisms > for modifications. For the mem_cgroup_idr, the idr_alloc() and > idr_replace() happen within css callback and thus are protected through > cgroup_mutex from concurrent modifications. However idr_remove() for > mem_cgroup_idr was not protected against concurrency and can be run > concurrently for different memcgs when they hit their refcnt to zero. > Fix that. > > We have been seeing list_lru based kernel crashes at a low frequency in > our fleet for a long time. These crashes were in different part of > list_lru code including list_lru_add(), list_lru_del() and reparenting > code. Upon further inspection, it looked like for a given object (dentry > and inode), the super_block's list_lru didn't have list_lru_one for the > memcg of that object. The initial suspicions were either the object is > not allocated through kmem_cache_alloc_lru() or somehow > memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but > returned success. No evidence were found for these cases. > > Looking more deeper, we started seeing situations where valid memcg's id > is not present in mem_cgroup_idr and in some cases multiple valid memcgs > have same id and mem_cgroup_idr is pointing to one of them. So, the most > reasonable explanation is that these situations can happen due to race > between multiple idr_remove() calls or race between > idr_alloc()/idr_replace() and idr_remove(). These races are causing > multiple memcgs to acquire the same ID and then offlining of one of them > would cleanup list_lrus on the system for all of them. Later access from > other memcgs to the list_lru cause crashes due to missing list_lru_one. Great catch! Reviewed-by: Roman Gushchin Thanks