From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BD20C7EE23 for ; Wed, 24 May 2023 04:22:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D51336B0074; Wed, 24 May 2023 00:22:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0111900003; Wed, 24 May 2023 00:22:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC892900002; Wed, 24 May 2023 00:22:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AA9926B0074 for ; Wed, 24 May 2023 00:22:14 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7BCA6120724 for ; Wed, 24 May 2023 04:22:14 +0000 (UTC) X-FDA: 80823851388.08.74A94DF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id BF65640013 for ; Wed, 24 May 2023 04:22:11 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cYNb0sAi; spf=pass (imf01.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684902132; a=rsa-sha256; cv=none; b=55VuvVgjWgU6a/a6xIPjWQ2iahyKx/X7sXAvo2Xb0eTAW1/9I6loE34Q7gsv/axpIWK+Uw DNEGYUAjjT2K0V1LZbA6k3rD1yOWVE/rQL+H1l5QSXpgKXxWzphwYDb1wZ1hIq1Vzkq3Ox v6hSESTw3KeOtVmdlOpK0XQSUQLk0Wc= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cYNb0sAi; spf=pass (imf01.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684902132; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8RThbuZCrca9ja7PWjTCEifSQrfusG35C3xzRTql4Rg=; b=Yzr1FTi8h5WhjhoRpGUeG2HVTtcy952dSxxEUTA41t2bXeMmscf6m/AxOxtQz/GMoHO7f3 YS0xsmPOEO2fkrA5Q2HeiO3kpeeBXfopLzxGin+QPCkENabpAufSvFuoT94UmLC0oDBE67 VgtjOoyupGERCNOD/QlOnMrs0eYQB3s= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684902131; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8RThbuZCrca9ja7PWjTCEifSQrfusG35C3xzRTql4Rg=; b=cYNb0sAiu/W6R/ezsJOfoKzDhh0X6MACnGi3jNBPawLquYmEKzuQglKuJLI1ipnDP//yu2 vQ6jZoIftNOKDum5JTEkAvlHuYxy+YHW3jg+L1SSEQUGTx4+yQ5vum5yp1Z6m4ByYdqlaa e3AosYOaBk21sqAe9VIJITh787uhq7g= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-659-F5la8j4wM1GiyPo0V-Mo0A-1; Wed, 24 May 2023 00:22:07 -0400 X-MC-Unique: F5la8j4wM1GiyPo0V-Mo0A-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E5A52811E85; Wed, 24 May 2023 04:22:06 +0000 (UTC) Received: from ovpn-8-17.pek2.redhat.com (ovpn-8-17.pek2.redhat.com [10.72.8.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8B0532166B25; Wed, 24 May 2023 04:21:59 +0000 (UTC) Date: Wed, 24 May 2023 12:21:53 +0800 From: Ming Lei To: Waiman Long Cc: Yosry Ahmed , Linux-MM , Michal Hocko , Shakeel Butt , Johannes Weiner , Roman Gushchin , Muchun Song , Jens Axboe , linux-block@vger.kernel.org, cgroups@vger.kernel.org, Tejun Heo , mkoutny@suse.com, ming.lei@redhat.com Subject: Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq Message-ID: References: <20230524011935.719659-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: BF65640013 X-Stat-Signature: jr5kjc719ah9dno4yx6suj5dzn9kfzqn X-Rspam-User: X-HE-Tag: 1684902131-81488 X-HE-Meta: U2FsdGVkX19uKp2E8TaQzq9L1v15Hej+W/7DOvIEMG6F2uGRVcGjH31FwXhKMm3QFxPTMp2XZ+/ZnxnWKgohEXS/yGp5f7/0N7oPs9Pntp8kBrD6+8JTp24oIxnAPBZCPWdKhGLH3YJn30+AYBREzkeiumU2eZPPRW6YeY34l9mZ3lmn9gwTR/cxk5mEV9qP30o/hdvDscNJh8gfFrJelVCHq9ru428hoVoQHC6cnitBC0KeTZ5BMoJCTOoG8cGtbi+c3fKSy6CM4C0c46vLmBxEbpwJIE8KAUlOX0ZerW0cTWuzOELp5TQKyWZX2HA6VbiIIfEpp5lSPGE41d2IxkUYxvCodqiGkGc9Sjfvx6Q6A9yc+h6Q25faQ+5fXhvbzknxYqyShXYYOVHssN8h7I44bMF/KZ89ITVy5k/TVPe2cFhoVpuk0Bu/nt6JVNbevaSjNrXxoMdzUmDAhGsE8AszlYVakDXChdA0ktSqed9QWXVM+9AdanlUaXKghlHIUSjKTQJOrJCD9CdowbLpMTW6X6zOy/jX5OOFrfFf80cUrMrKAdJ3Qug93zS1sLDSGv740OYxdnQ+4hzryBvPZxCi4xNWS9Mm0OTOIesbsIULiMJ4EM9CsACfbPX0LRqnrTJSqgLUlPbYUD7vRpShryf7O6uGu4RIVHWWosfgIb7rYhZSacTyVZx0i8N301oRcpPiNaADKCSsFto1aqMJc41iEgz1R6tMDm9vzV5LlcayMtP8Sxrl3TMWffRwWZpvvtIUJiX9MfHvNzPpDMAAlrQ8xqu371oM2TuY+4jMmNPAOB+os2zUr5Gp7tFFNIoPv5eovK61VEZpfxG0rntVx3f33oBJLSy1cQyyBZu+ScvHkb/bWNUP7YCmeWB7eCGf3Mc0ote9SXto5Ebp0fMz1/W4cHwKkIVROLAMorbQdk8jZLOYm2s0mUe9F7k6cJHshcH2xqSrWnocGhae2A0 FkjHoEj1 e7mkkwkWkZiowmSw1HFLoJAVcZrvoRQBSqXCqA5ZQZHdMyWib47481bLAGTeGzoXrHqTob06Qy55AP228oA9RqRLVm43SnB8jQB3mtw50ZyHBrSXV6S6oR97c7mHF4xbXerXBjAZ19UBKxCObuAitXQTuDSNoCl6oub/oEnWvL0wAFscRxoDRKW7BaO6/YAAWG3Pm63p+PLn/BwLh+adOrl+UfM/0+Hpz8Ble1oKzl1EaW6g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 24, 2023 at 12:10:55AM -0400, Waiman Long wrote: > > On 5/23/23 22:37, Ming Lei wrote: > > Hi Yosry, > > > > On Tue, May 23, 2023 at 07:06:38PM -0700, Yosry Ahmed wrote: > > > Hi Ming, > > > > > > On Tue, May 23, 2023 at 6:21 PM Ming Lei wrote: > > > > As noted by Michal, the blkg_iostat_set's in the lockless list > > > > hold reference to blkg's to protect against their removal. Those > > > > blkg's hold reference to blkcg. When a cgroup is being destroyed, > > > > cgroup_rstat_flush() is only called at css_release_work_fn() which > > > > is called when the blkcg reference count reaches 0. This circular > > > > dependency will prevent blkcg and some blkgs from being freed after > > > > they are made offline. > > > I am not at all familiar with blkcg, but does calling > > > cgroup_rstat_flush() in offline_css() fix the problem? > > Except for offline, this list needs to be flushed after the associated disk > > is deleted. > > > > > or can items be > > > added to the lockless list(s) after the blkcg is offlined? > > Yeah. > > > > percpu_ref_*get(&blkg->refcnt) still can succeed after the percpu refcnt > > is killed in blkg_destroy() which is called from both offline css and > > removing disk. > > As suggested by Tejun, we can use percpu_ref_tryget(&blkg->refcnt) to make > sure that we can only take a reference when the blkg is online. I think it > is a bit safer to take a percpu refcnt to avoid use after free. My other blkg_release() does guarantee that no new stat associated with this blkg can be added any more, so what is the use-after-free? > concern about your patch is that the per cpu list iterations will be done > multiple times when a blkcg is destroyed if many blkgs are attached to the Yeah, but is it one big deal? The percpu list should be flushed just in one of blkg's release handler. > blkcg. I still prefer to do it once in blkcg_destroy_blkgs(). I am going to Problem is that new stat still may be added to this list after percpu_ref_kill() returns. To be honest, I really hate to grab/put blkg refcnt for adding/consuming state, and this way is too fragile. Thanks, Ming