From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DED69C433F5
	for <linux-mm@archiver.kernel.org>; Sat,  9 Apr 2022 00:40:11 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 333AC6B0071; Fri,  8 Apr 2022 20:40:11 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2E2686B0073; Fri,  8 Apr 2022 20:40:11 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1836E6B0074; Fri,  8 Apr 2022 20:40:11 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229])
	by kanga.kvack.org (Postfix) with ESMTP id 05BC76B0071
	for <linux-mm@kvack.org>; Fri,  8 Apr 2022 20:40:11 -0400 (EDT)
Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 934DA183A71AF
	for <linux-mm@kvack.org>; Sat,  9 Apr 2022 00:40:10 +0000 (UTC)
X-FDA: 79335483780.25.9B6C3FF
Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181])
	by imf08.hostedemail.com (Postfix) with ESMTP id 39924160002
	for <linux-mm@kvack.org>; Sat,  9 Apr 2022 00:40:09 +0000 (UTC)
Received: by mail-pg1-f181.google.com with SMTP id h5so8193365pgc.7
        for <linux-mm@kvack.org>; Fri, 08 Apr 2022 17:40:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance-com.20210112.gappssmtp.com; s=20210112;
        h=message-id:date:mime-version:user-agent:subject:content-language:to
         :cc:references:from:in-reply-to:content-transfer-encoding;
        bh=xqlerOQAShjNpw89fatQQNxfhA2VCQCLIXOVkoBth0o=;
        b=VvcYqbP5xIbGHM96pPPS+PZc1Crc6wRFz6M1fph4k93ezi3QU576d6OjvWsnyS8xop
         VFTQjq+vfqdo5b9Pp28v6K+xGK6JVYc72VLiMGf78gEvYqd1iRCBcyZMe80W5FdSY+gO
         XA9+BhATg0JbV2wv2GMWqZaqyr9eTw0mJ44nPbwf/51R5BF8jLZf82nRizIB/sX7aJMa
         fYQifX3JDi+lAOduZegsDAf6/UDSb61QphmtjoyXAUVmX8kaq/5XtVjscnXrKA9qOOJK
         c/1VFoxF2imjQOLk2NIAWR0/T6MH20HonYHqipbBueSLH9U9rwbaZ8pWUS4oi0PgLfAP
         OhvA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
         :content-language:to:cc:references:from:in-reply-to
         :content-transfer-encoding;
        bh=xqlerOQAShjNpw89fatQQNxfhA2VCQCLIXOVkoBth0o=;
        b=1t3GdqrdFC9/nxdmpnI6Eql0ZdNdf1mh5RdP+9qZanv4Nq9BGB0TD1MzkCF0qLGxVe
         CeZ4T9zJP6FbsAuWdIhcfM64Z3f60tj7AO+bHIceMNPgcRwwGoW1by2Rn1D8PxVLfBrD
         eTwMgQg/6zYAalOCHPq5WYO0MsyJpYfdKYPTPD1950ETw1377ldmdEQ6mhqnJFhdNHPT
         lcKiQ6boSirRHEDPnVZtDXLR0C8TguMNw2bQgxGIET8nCFqvN3oNbNNHAXtBeyFMLEqF
         uCbZ8l1uk54QHJta6K8mIJuAzVi04xyEXoX9HzH3mXNs8qCVh2JXSvOx4q237n/UfvHM
         tzqQ==
X-Gm-Message-State: AOAM531lBPepOaT28Bagpb+HojhCHhSfJBNEDJIck81oGmglUojEeqxo
	S50/swTn/tKyelO2O05VBAx00w==
X-Google-Smtp-Source: ABdhPJwRKcJoWZI2eJVAhdHXfRYRDgvJV9CaMCaIGY/2Fd8OmQb/XJCEYH3jZGBs4GoSe7lpmIWz2Q==
X-Received: by 2002:a63:f54b:0:b0:384:64d1:fa45 with SMTP id e11-20020a63f54b000000b0038464d1fa45mr17607928pgk.95.1649464807808;
        Fri, 08 Apr 2022 17:40:07 -0700 (PDT)
Received: from [10.255.182.146] ([139.177.225.255])
        by smtp.gmail.com with ESMTPSA id q61-20020a17090a17c300b001cb0df6b046sm6731349pja.24.2022.04.08.17.40.04
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Fri, 08 Apr 2022 17:40:07 -0700 (PDT)
Message-ID: <7213fc3b-27f5-373a-0786-0ca9441b9e7e@bytedance.com>
Date: Sat, 9 Apr 2022 08:40:00 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
 Gecko/20100101 Thunderbird/91.7.0
Subject: Re: [PATCH] percpu_ref: call wake_up_all() after percpu_ref_put()
 completes
Content-Language: en-US
To: Tejun Heo <tj@kernel.org>
Cc: dennis@kernel.org, cl@linux.com, akpm@linux-foundation.org,
 linux-mm@kvack.org, linux-kernel@vger.kernel.org,
 zhouchengming@bytedance.com, songmuchun@bytedance.com
References: <20220407103335.36885-1-zhengqi.arch@bytedance.com>
 <YlBzsakUloG4nS7W@slm.duckdns.org>
From: Qi Zheng <zhengqi.arch@bytedance.com>
In-Reply-To: <YlBzsakUloG4nS7W@slm.duckdns.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Stat-Signature: a3ydegirnhi75g7ca1bxz5c8par4tktm
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 39924160002
Authentication-Results: imf08.hostedemail.com;
	dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=VvcYqbP5;
	dmarc=pass (policy=none) header.from=bytedance.com;
	spf=pass (imf08.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.215.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com
X-Rspam-User: 
X-HE-Tag: 1649464809-998581
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


On 2022/4/9 1:41 AM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Apr 07, 2022 at 06:33:35PM +0800, Qi Zheng wrote:
>> In the percpu_ref_call_confirm_rcu(), we call the wake_up_all()
>> before calling percpu_ref_put(), which will cause the value of
>> percpu_ref to be unstable when percpu_ref_switch_to_atomic_sync()
>> returns.
>>
>> 	CPU0				CPU1
>>
>> percpu_ref_switch_to_atomic_sync(&ref)
>> --> percpu_ref_switch_to_atomic(&ref)
>>      --> percpu_ref_get(ref);	/* put after confirmation */
>> 	call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
>>
>> 					percpu_ref_switch_to_atomic_rcu
>> 					--> percpu_ref_call_confirm_rcu
>> 					    --> data->confirm_switch = NULL;
>> 						wake_up_all(&percpu_ref_switch_waitq);
>>
>>      /* here waiting to wake up */
>>      wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch);
>> 						(A)percpu_ref_put(ref);
>> /* The value of &ref is unstable! */
>> percpu_ref_is_zero(&ref)
>> 						(B)percpu_ref_put(ref);
>>
>> As shown above, assuming that the counts on each cpu add up to 0 before
>> calling percpu_ref_switch_to_atomic_sync(), we expect that after switching
>> to atomic mode, percpu_ref_is_zero() can return true. But actually it will
>> return different values in the two cases of A and B, which is not what
>> we expected.
>>
>> Maybe the original purpose of percpu_ref_switch_to_atomic_sync() is
>> just to ensure that the conversion to atomic mode is completed, but it
>> should not return with an extra reference count.
>>
>> Calling wake_up_all() after percpu_ref_put() ensures that the value of
>> percpu_ref is stable after percpu_ref_switch_to_atomic_sync() returns.
>> So just do it.
>>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   lib/percpu-refcount.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
>> index af9302141bcf..b11b4152c8cd 100644
>> --- a/lib/percpu-refcount.c
>> +++ b/lib/percpu-refcount.c
>> @@ -154,13 +154,14 @@ static void percpu_ref_call_confirm_rcu(struct rcu_head *rcu)
>>   
>>   	data->confirm_switch(ref);
>>   	data->confirm_switch = NULL;
>> -	wake_up_all(&percpu_ref_switch_waitq);
>>   
>>   	if (!data->allow_reinit)
>>   		__percpu_ref_exit(ref);
>>   
>>   	/* drop ref from percpu_ref_switch_to_atomic() */
>>   	percpu_ref_put(ref);
>> +
>> +	wake_up_all(&percpu_ref_switch_waitq);
> 
> The interface, at least originally, doesn't give any guarantee over whether
> there's gonna be a residual reference on it or not. There's nothing
> necessarily wrong with guaranteeing that but it's rather unusual and given
> that putting the base ref in a percpu_ref is a special "kill" operation and
> a ref in percpu mode always returns %false on is_zero(), I'm not quite sure
> how such semantics would be useful. Do you care to explain the use case with
> concrete examples?

There are currently two users of percpu_ref_switch_to_atomic_sync(), and 
both are used in the example, one is mddev->writes_pending in
driver/md/md.c and the other is q->q_usage_counter in block/blk-pm.c.

The former discards the initial reference count after percpu_ref_init(),
and the latter kills the initial reference count(by calling 
percpu_ref_kill() in blk_freeze_queue_start()) before
percpu_ref_switch_to_atomic_sync(). Looks like they all expect
percpu_ref to be stable when percpu_ref_switch_to_atomic_sync() returns.

> 
> Also, the proposed patch is racy. There's nothing preventing
> percpu_ref_switch_to_atomic_sync() from waking up early between
> confirm_switch clearing and the wake_up_all, so the above change doesn't
> guarantee what it tries to guarantee. For that, you'd have to move
> confirm_switch clearing *after* percpu_ref_put() but then, you'd be
> accessing the ref after its final ref is put which can lead to
> use-after-free.
> 

Oh sorry, it is my bad missing.

> In fact, the whole premise seems wrong. The switching needs a reference to
> the percpu_ref because it is accessing it asynchronously. The switching side
> doesn't know when the ref is gonna go away once it puts its reference and
> thus can't signal that they're done after putting their reference.
> 
> We *can* make that work by putting the whole thing in its own critical
> section so that we can make confirm_switch clearing atomic with the possibly
> final put, but that's gonna add some complexity and begs the question why
> we'd need such a thing.

How about moving the last percpu_ref_put() outside of the
percpu_ref_switch_to_atomic_rcu() in sync mode like below? But this may 
not be elegant.

diff --git a/include/linux/percpu-refcount.h 
b/include/linux/percpu-refcount.h
index d73a1c08c3e3..07f92e7e3e19 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -98,6 +98,7 @@ struct percpu_ref_data {
         percpu_ref_func_t       *confirm_switch;
         bool                    force_atomic:1;
         bool                    allow_reinit:1;
+       bool                    sync;
         struct rcu_head         rcu;
         struct percpu_ref       *ref;
  };
@@ -123,7 +124,8 @@ int __must_check percpu_ref_init(struct percpu_ref *ref,
                                  gfp_t gfp);
  void percpu_ref_exit(struct percpu_ref *ref);
  void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
-                                percpu_ref_func_t *confirm_switch);
+                                percpu_ref_func_t *confirm_switch,
+                                bool sync);
  void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref);
  void percpu_ref_switch_to_percpu(struct percpu_ref *ref);
  void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index af9302141bcf..2a9d777bcf35 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -99,6 +99,7 @@ int percpu_ref_init(struct percpu_ref *ref, 
percpu_ref_func_t *release,
         data->release = release;
         data->confirm_switch = NULL;
         data->ref = ref;
+       data->sync = false;
         ref->data = data;
         return 0;
  }
@@ -146,21 +147,30 @@ void percpu_ref_exit(struct percpu_ref *ref)
  }
  EXPORT_SYMBOL_GPL(percpu_ref_exit);

+static inline void percpu_ref_switch_to_atomic_post(struct percpu_ref *ref)
+{
+       struct percpu_ref_data *data = ref->data;
+
+       if (!data->allow_reinit)
+               __percpu_ref_exit(ref);
+
+       /* drop ref from percpu_ref_switch_to_atomic() */
+       percpu_ref_put(ref);
+}
+
  static void percpu_ref_call_confirm_rcu(struct rcu_head *rcu)
  {
         struct percpu_ref_data *data = container_of(rcu,
                         struct percpu_ref_data, rcu);
         struct percpu_ref *ref = data->ref;
+       bool need_put = !data->sync;

         data->confirm_switch(ref);
         data->confirm_switch = NULL;
         wake_up_all(&percpu_ref_switch_waitq);

-       if (!data->allow_reinit)
-               __percpu_ref_exit(ref);
-
-       /* drop ref from percpu_ref_switch_to_atomic() */
-       percpu_ref_put(ref);
+       if (need_put)
+               percpu_ref_switch_to_atomic_post(ref);
  }

  static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu)
@@ -302,12 +312,14 @@ static void __percpu_ref_switch_mode(struct 
percpu_ref *ref,
   * switching to atomic mode, this function can be called from any context.
   */
  void percpu_ref_switch_to_atomic(struct percpu_ref *ref,
-                                percpu_ref_func_t *confirm_switch)
+                                percpu_ref_func_t *confirm_switch,
+                                bool sync)
  {
         unsigned long flags;

         spin_lock_irqsave(&percpu_ref_switch_lock, flags);

+       ref->data->sync = sync;
         ref->data->force_atomic = true;
         __percpu_ref_switch_mode(ref, confirm_switch);

@@ -325,8 +337,9 @@ EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic);
   */
  void percpu_ref_switch_to_atomic_sync(struct percpu_ref *ref)
  {
-       percpu_ref_switch_to_atomic(ref, NULL);
+       percpu_ref_switch_to_atomic(ref, NULL, true);
         wait_event(percpu_ref_switch_waitq, !ref->data->confirm_switch);
+       percpu_ref_switch_to_atomic_post(ref);
  }
  EXPORT_SYMBOL_GPL(percpu_ref_switch_to_atomic_sync);

> 
> Andrew, I don't think the patch as proposed makes much sense. Maybe it'd be
> better to keep it out of the tree for the time being?
> 
> Thanks.
> 

-- 
Thanks,
Qi