From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1734AEE49A3 for ; Tue, 22 Aug 2023 12:09:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9522628000D; Tue, 22 Aug 2023 08:09:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 901EE90000D; Tue, 22 Aug 2023 08:09:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F16028000D; Tue, 22 Aug 2023 08:09:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6FD0190000D for ; Tue, 22 Aug 2023 08:09:17 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 48EE81C902C for ; Tue, 22 Aug 2023 12:09:17 +0000 (UTC) X-FDA: 81151620354.11.4BADBF2 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf28.hostedemail.com (Postfix) with ESMTP id 1ACD4C0012 for ; Tue, 22 Aug 2023 12:09:13 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=mdnvnBSK; dmarc=none; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692706155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wx38bWn2ocEjXREWndy+Iq8q4fL7+b2AWpgeCOMij3E=; b=wuAC2Pr0F5t2VsG5QFkP8T6rX5EXtyF7IRj0JQ+vkT/fHbcSJwSqTKlHS4MhwYuebresiK YK0Qf+C1uoA86RlK0Z8VIehNVfOQLyEvHxZMsNWB7QUzh4LGWwD9skZyHj6hfco8bTFu8r 0bbUu6vHQpAjwGmyGKzB8XYp6aSA1oQ= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=mdnvnBSK; dmarc=none; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692706155; a=rsa-sha256; cv=none; b=Tg1t+TbabXUQQ1tseSyY2ou3ZL9GurIy1cbK6uw18SY4dv/O+Pfihz833aco4AtIdOEYkl dAU1pl9Oj4XVCRvuOz2nS99VGMc+CY4mKx/mCR+ysxwhnKTnmUBukg0WZku2xl0uU6cHGf aWewCTVwAyig61zig01Jw1W93OCSiHY= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=Wx38bWn2ocEjXREWndy+Iq8q4fL7+b2AWpgeCOMij3E=; b=mdnvnBSKrWT9NQu05yJSdaK6Lf h09L3l9tZUeHSdLb1I9X5AH5Xj2OK9d25guBD1XJjaQWpG5Fzm8BE+b0u82yeEnjFoC0+uoCIjlud 8sHnBy1OAdDViKXmXodzLH2FalQ/N5YNTWcBw6OfzCZyEmB7prkPXpzRA/CwR1WAi2MTOnduYzYXI zqIZu+O5yA2GpMdwIfcChJnKXqKvuTD1U66/wq/xbgnLaCG2UC5Q+zPr7GQBboERcYeqA+LfjqEMP VJI6i8PRCXY30S15lnmWmUwd+trCfW8HjNWm4mxJRKwfTXewHyVq2uXSim5J5iLvxUQw7eGXiG42i ZKbUIKDw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qYQBo-00GEKP-Br; Tue, 22 Aug 2023 12:08:52 +0000 Date: Tue, 22 Aug 2023 13:08:52 +0100 From: Matthew Wilcox To: Tong Tiangen Cc: Andrew Morton , Naoya Horiguchi , Miaohe Lin , wangkefeng.wang@huawei.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm: memory-failure: use rcu lock instead of tasklist_lock when collect_procs() Message-ID: References: <20230821091312.2034844-1-tongtiangen@huawei.com> <0bbbb7d8-699b-30ac-9657-840112c41a78@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <0bbbb7d8-699b-30ac-9657-840112c41a78@huawei.com> X-Rspam-User: X-Stat-Signature: m3s6mm4xikgf3sjoj1qbdoogxt1k4zxp X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1ACD4C0012 X-HE-Tag: 1692706153-719750 X-HE-Meta: U2FsdGVkX1/EMFeJbiEY29tYBkdWPmyEU+962UFw76DLMDl0sq9eanb2HVMAfU5/2fviklJ5OCxxYBat7uo+8RB+/nA0d5pSJtoVoJt4+aAmVkp+C9D15/6GRbn7Wgh3/k8csDsl23IF9y0dGsMFUiU8XEyOq8sqvgxn6OP6kcedUTw2op/IC7gte4EtkI0MEPZ5vC/zJO+fAMp6mPZXs9AcneONtYihI70gn0zJ2JyttDWJvFQUupuXAy12zRNE5u9Ble2W3KI8e3RjK3Rgtv8fpo3w3wva6cDp7NHfAMGd2D2dReypLogtlM/+6G4j+52q9Z6O5mAHI2lCI/7AehSeICi+5Aofn+Y6U5AAKkTW++JMNefRQdU4F+uEyCX02D2IylzbcEz5Q0QAgS7o4WfrA8URh4qcP2mQM8tSCjBNjT1awz45PVDlNotjpiyW37/hDXZD3DRUSaXCxKKSJI7hefBhKjAGoANFnhzvMrmEDjhAIBAlKqJhBhCuOSqCM8PftByO6CUiZJ7DZsBDeIEUSmH9YFMf9EK9tAojY1vQRjKWORjwSaTPvPjq1ASgrh2e5oP/1Uoe6JMWSdVVMUPIVP8gYqOVy5ZxLQUEChMtYobGusf7B2bj2+mEGN9nHTCSp59QvQmv1NISoGDgwY8G0s4IFgD1gUxv3Qv0cq0qG3qCQHP4j2NZoOIuaFJoJSBCrLVqhehw0He6wCAO1q5+cDRoo6BqlKr6CE3gG+HmWg2ttZ+Jp1uP1dJ6ClZGKjs4FAYiXVr6f8RccbZ5l7fXHDa53mIQP9Z+eDzSKU7j711iOWKhibo+ScC806Gy7kAaVNSIcvvxSHSewgMLVjn7t1VkKgxg1OZwdXSu1kPjgbVs+I+amRDkATPrNKm06Pg4uaTQFNpPm8Y8wFRYfdpYZkKEk6KCumj+oEg83MxV9fm/EcGq1wpQYwzj/FUcfczwJ6DfiIk+wpsb9E/ 6h7TTJXD pFv5nS76JeeZSAhorFDY0wdxVolStfORcRxVj6cikpA0yzdCZ7pxdPW/FlBGRPUQ4ceNPMpQon55Jk0LEuv065mTvpDo5yCQjlETShfDgy2H6aK+fIHJYJO77b63QTxMNF/RpWo4gXICo4BTCHcnmCmmJZFBP+cf7J3ECesVKmeIWZ5APgRxjc7qOiQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 22, 2023 at 11:41:41AM +0800, Tong Tiangen wrote: > 在 2023/8/22 2:33, Matthew Wilcox 写道: > > On Mon, Aug 21, 2023 at 05:13:12PM +0800, Tong Tiangen wrote: > > > We can see that CPU1 waiting for CPU0 respond IPI,CPU0 waiting for CPU2 > > > unlock tasklist_lock, CPU2 waiting for CPU1 unlock page->ptl. As a result, > > > softlockup is triggered. > > > > > > For collect_procs_anon(), we will not modify the tasklist, but only perform > > > read traversal. Therefore, we can use rcu lock instead of spin lock > > > tasklist_lock, from this, we can break the softlock chain above. > > > > The only thing that's giving me pause is that there's no discussion > > about why this is safe. "We're not modifying it" isn't really enough > > to justify going from read_lock() to rcu_read_lock(). When you take a > > normal read_lock(), writers are not permitted and so you see an atomic > > snapshot of the list. With rcu_read_lock() you can see inconsistencies. > > Hi Matthew: > > When rcu_read_lock() is used, the task list can be modified during the > iteration, but cannot be seen during iteration. After the iteration is > complete, the task list can be updated in the RCU mechanism. Therefore, the > task list used by iteration can also be considered as a snapshot. No, that's not true! You are not iterating a snapshot of the list, you're iterating the live list. It will change under you. RCU provides you with some guarantees about that list. See Documentation/RCU/listRCU.rst > > For example, if new tasks can be added to the tasklist, they may not > > be seen by an iteration. Is this OK? > > The newly added tasks does not access the HWPoison page, because the > HWPoison page has been isolated from the > buddy(memory_failure()->take_page_off_buddy()). Therefore, it is safe to see > the newly added task during the iteration and not be seen by iteration. > > Tasks may be removed from the > > tasklist after they have been seen by the iteration. Is this OK? > > Task be seen during iteration are deleted from the task list after > iteration, it's task_struct is not released because reference counting is > added in __add_to_kill(). Therefore, the subsequent processing of > kill_procs() is not affected (sending signals to the task deleted from task > list). so i think it's safe too. I don't know this code, but it seems unsafe to me. Look: collect_procs_anon: for_each_process(tsk) { struct task_struct *t = task_early_kill(tsk, force_early); add_to_kill_anon_file(t, page, vma, to_kill); add_to_kill_anon_file: __add_to_kill(tsk, p, vma, to_kill, 0, FSDAX_INVALID_PGOFF); __add_to_kill: get_task_struct(tsk); static inline struct task_struct *get_task_struct(struct task_struct *t) { refcount_inc(&t->usage); return t; } /** * refcount_inc - increment a refcount * @r: the refcount to increment * * Similar to atomic_inc(), but will saturate at REFCOUNT_SATURATED and WARN. * * Provides no memory ordering, it is assumed the caller already has a * reference on the object. * * Will WARN if the refcount is 0, as this represents a possible use-after-free * condition. */ I don't see anything that prevents that refcount_inc from seeing a zero refcount. Usually that would be prevented by tasklist_lock, right? Andrew, I think this patch is bad and needs to be dropped.