From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25806C3A5A7 for ; Thu, 8 Dec 2022 08:55:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 939598E0003; Thu, 8 Dec 2022 03:55:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EAA68E0001; Thu, 8 Dec 2022 03:55:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B3368E0003; Thu, 8 Dec 2022 03:55:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6C2C88E0001 for ; Thu, 8 Dec 2022 03:55:43 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2E4EAA0596 for ; Thu, 8 Dec 2022 08:55:43 +0000 (UTC) X-FDA: 80218530966.13.989C15B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 6847C140006 for ; Thu, 8 Dec 2022 08:55:41 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NwDV3vMQ; spf=pass (imf26.hostedemail.com: domain of minlei@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=minlei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670489741; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=94o1vDMU6dLWOzRVJEl23OAXi5JeqCSUmbJc/8D9OYk=; b=aqqni1MacPz3OM43T3EPG4WD4BBhq2vc87eZr25eoE+nkPo0E/t+skHLCh8OIBzMLAtrfD 86x51rT4gLq+q/a90ilE67UTlsloqvgG1Fl3E0s3D2x+6rOwMfMKdf3KpSym1NpTBt6OeQ w7qOo3NT70tkxsy/nvwMrAdNoNCmYKg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NwDV3vMQ; spf=pass (imf26.hostedemail.com: domain of minlei@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=minlei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670489741; a=rsa-sha256; cv=none; b=uYB3U+kQFtG7fEXzCs5+hr+ljbthKxzN/ikcpI+O+iuWJB5KkL8Hv8ZCXCK8Ay87bJq4Mp bRrzTrtAJqSv6YUsPy2l0a7coVO+AVzd9OtH+ImyXuvz04Y+IwvG5K+B7EoKSqr24cJ3Hw Tw3RPrDqMn7LJFF55J6BM4H3nyl2eew= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670489740; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=94o1vDMU6dLWOzRVJEl23OAXi5JeqCSUmbJc/8D9OYk=; b=NwDV3vMQK95hLzFa0UtwVveOD9uL+jhxZD2iIZ3ye04kCloyOeGJrnV9y12alkpPSRqsZh YBqSCoptnyJnLuVSmC+CwM93Bks4eh+PYGMN/wewoF/AG6jQ08m2HeDjRiaXRgFmKBJ73Y REVDS791cO9CvyJvksEuG4b+QmedEks= Received: from mail-ua1-f70.google.com (mail-ua1-f70.google.com [209.85.222.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-490-z2Ce-QtZNHKEK2dDq0B5ew-1; Thu, 08 Dec 2022 03:55:38 -0500 X-MC-Unique: z2Ce-QtZNHKEK2dDq0B5ew-1 Received: by mail-ua1-f70.google.com with SMTP id h14-20020a9f300e000000b0041878808d9dso421546uab.2 for ; Thu, 08 Dec 2022 00:55:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=94o1vDMU6dLWOzRVJEl23OAXi5JeqCSUmbJc/8D9OYk=; b=lfi798FHoxDLMSwteOgdHeEE/G2f1+dJDsWezP+4kCabYJEMW4ddc6W/3ZWjx29B6X qno2Ep7GzJshfNLNG4sD1S/UFglGgjiz78Ew1Mt5h5ADomp0A87hGeZFrrRITIpDdIow FSqejl+WTZ9S3vGUn7najpWnkHoCQeLdD9w/hG0svtQZhubA+e8k16+YUl6tUXZ5XrRU kb52hf5XvEXR0owcFMBpejW8R9C/Clg197CbC+Ky5Oe975QQHdROAkd0awjOA+GyumxC H7cI5ojz8IqxoBh2tnSiDwHhyaL/WoqvgXq2ybz9xldRmn8HVXEJeG+d5R6R38VqWqDJ gX/w== X-Gm-Message-State: ANoB5pmV2IEZIaMbImoUuIecCFgYKj6DmBmEf/TwxpVvmJvtbNDovICC w9QvjcqYsHOPR1IJX7rPwyqrI5PNZSb3ulbuIZlUCmT6crdshg0CU1BXgxP9TzaW5tiRhLpXuqX 4SN2pk6dLhsNW6uy7DP5mjz327+Y= X-Received: by 2002:a05:6102:a4c:b0:3aa:2125:2573 with SMTP id i12-20020a0561020a4c00b003aa21252573mr41688491vss.59.1670489737621; Thu, 08 Dec 2022 00:55:37 -0800 (PST) X-Google-Smtp-Source: AA0mqf5RYrccsx+1CD8iQ1H386933+kgTEz7P8+0xUUtameokiOyPtVElOj0qR1dboms72q9sne52T2ahn4Gs+1Ikl0= X-Received: by 2002:a05:6102:a4c:b0:3aa:2125:2573 with SMTP id i12-20020a0561020a4c00b003aa21252573mr41688483vss.59.1670489737380; Thu, 08 Dec 2022 00:55:37 -0800 (PST) MIME-Version: 1.0 References: <20221206090939.871239-1-zhongjinghua@huawei.com> In-Reply-To: From: Ming Lei Date: Thu, 8 Dec 2022 16:55:26 +0800 Message-ID: Subject: Re: [PATCH-next] block: fix null-deref in percpu_ref_put To: Dennis Zhou Cc: Zhong Jinghua , tj@kernel.org, cl@linux.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yi.zhang@huawei.com, yukuai3@huawei.com, Ming Lei X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 6847C140006 X-Stat-Signature: poduqiiue8ffxg6u75eeokccbb456txa X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1670489741-742990 X-HE-Meta: U2FsdGVkX18MqXDFyc9LyonhvuajIe60X1Py3VsPvV3fKlEeK1DhsilBGPoiNP4/Yc10auKkLfJatODLyoTAk8xcV4rihrssJr3bREXH3YF3LkFwaqKFrLx7nncfjogPqNtrOMPdOQBJFJl+O8SsIUg0RjB+5e1147t0M3HTPXd2akUCQR9hYpq4n7chg3HxnGLhXqG+H8/Jw9aktezaBxcZrQ6NLQHvnx9ICtT3P1pwjKKG1or3I2v0kEhFFsMgwOogzAhxCQiwUKvDzMFzoHnhj8ixwmQMtwhBwwXbv6139up8xFyzDdXzbo2IoR/RmtGg+pabmishoXA44cZvMy9GNnkrbSDltFw/RVX22HsWDuK3HN9atlEpLeoPJLQ0j8x223NUENrRV3dw/fPLFcILZi/t2uXowxjhv/0fUoSra2V6HrZVT2GcCXy/AUjc13oH6LS8STWLU5KJj3elTldc4855HZvF9VH5RsZ2SW1i3jcCWosAMjkYmwOuix5D1dZ9yhS3Ohr2BhL6A5yV2xKx8BVRAbnPE/ksHB3KrtxtKhqyuaFazNqboNgMzE60Qkt7F3wXK2UFP9gKwBCfXmu94euJRICgZAzA9GkIeaLzRlLx8AlAhVOjVQFvwHzFy/ZZEjyDxi317+blVrUm1ErYSxthPYn4X8ZfyleASKEM2NfN17B615Jlqzni/41B5iM7c/kV6o3ldRbBHltLyoeFxtR54qUkGRsiYRSJeQWLiquVeEebWJHIcVYHP4KzyAIdPyVBnb2S65YTn5dJRemfnssxZCXQN5noBVeybY/IC9ExW+Hf5XQg0Is6yBHEWOhRN8j34WlO1ZCN+heKpvcBRV/7xjRwjn3XkT48UOZi4Vs1j9qSuvpxlyBB7NLZK6V7pihtpdzJigsU2Gx2nDcT9Pci576GTFQCSxw3sE1KNGaknT3cwNmFxkKPrMZPWVppYf0ijPXOAGlQqEF PCoGsRDl /FPBH6hyjc6T2ZfWHp/aWxE39xA8S9G3S0G69oPhrEVEHBPRSUUSSRc1bX277BIwgH6T85cJhlxtnHUTUZE22VePg+Zpu2g/7WCg0nBvPLoNPLeXDg8D69jN3+Q9b9+fjBkDNmrRCxyLt+cGT5KeJK7oslDe1Ep+QBjp5OdQ9et02MbctU8wm7FgQ40Ru3aX99XD/OM6kwLGHhbs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000018, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 7, 2022 at 9:08 AM Dennis Zhou wrote: > > Hello, > > On Tue, Dec 06, 2022 at 05:09:39PM +0800, Zhong Jinghua wrote: > > A problem was find in stable 5.10 and the root cause of it like below. > > > > In the use of q_usage_counter of request_queue, blk_cleanup_queue using > > "wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))" > > to wait q_usage_counter becoming zero. however, if the q_usage_counter > > becoming zero quickly, and percpu_ref_exit will execute and ref->data > > will be freed, maybe another process will cause a null-defef problem > > like below: > > > > CPU0 CPU1 > > blk_mq_destroy_queue > > blk_freeze_queue > > blk_mq_freeze_queue_wait > > scsi_end_request > > percpu_ref_get > > ... > > percpu_ref_put > > atomic_long_sub_and_test > > blk_put_queue > > kobject_put > > kref_put > > blk_release_queue > > percpu_ref_exit > > ref->data -> NULL > > ref->data->release(ref) -> null-deref > > > > I remember thinking about this a while ago. I don't think this fix works > as nicely as it may seem. Please correct me if I'm wrong. > > q->q_usage_counter has the oddity that the lifetime of the percpu_ref > object isn't managed by the release function. The freeing is handled by > a separate path where it depends on the percpu_ref hitting 0. So here we > have 2 concurrent paths racing to run with 1 destroying the object. We > probably need blk_release_queue() to wait on percpu_ref's release > finishing, not starting. > > I think the above works in this specific case because there is a > call_rcu() in blk_release_queue(). If there wasn't a call_rcu(), > then by the same logic we could delay ref->data->release(ref) further > and that could potentially lead to a use after free. > > Ideally, I think fixing the race in q->q_usage_counter's pattern is > better than masking it here as I think we're being saved by the > call_rcu() call further down the object release path. The problem is actually in percpu_ref_is_zero(), which can return true before ->release() is called. And any pattern of wait_event(percpu_ref_is_zero) may imply such risk. It may be not easy to fix the issue in block layer cleanly, but can be solved in percpu-refcount simply by adding ->release_lock(spin lock) in the counter for draining atomic_long_sub_and_test() & ->release() in percpu_ref_exit(). Or simply use percpu_ref_switch_lock. Thanks, Ming