From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98DF0C02187 for ; Mon, 20 Jan 2025 02:39:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 247226B0082; Sun, 19 Jan 2025 21:39:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F7356B0083; Sun, 19 Jan 2025 21:39:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BFA06B0085; Sun, 19 Jan 2025 21:39:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E20276B0082 for ; Sun, 19 Jan 2025 21:39:18 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 65BDF1A26C4 for ; Mon, 20 Jan 2025 02:39:18 +0000 (UTC) X-FDA: 83026273596.29.ABC45BF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 5A400180005 for ; Mon, 20 Jan 2025 02:39:16 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FKnZhi0K; spf=pass (imf24.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737340756; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IOmQpc3a/Z+EiU0zoEFi30PkbyDvn4isXQkxedS9DLQ=; b=7EeU9/9cL7x7EaWalZExtbGK8fK3aalIaqpcz7b/x99rT8HzDQCOntfzjjWvf7p8MQ71yW DYNtHBQxkZc7CH0HxipAONPx/2XGR7Mxv/Rp71puJax4Xa+y87zbKivS/dm9dMTFt4TZca 0+WHVmPqs6MAMsE8JYbEO57OUIx7e+Y= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FKnZhi0K; spf=pass (imf24.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737340756; a=rsa-sha256; cv=none; b=eqkc6YM/lw4DnQZfKtgjfpJrfM27z/WpH72ga1t9w+rNRSKvcoqFuG414Nz63BlbAcD0T7 Z4L5T4uBoCNvXbiAsEFM29pGCTrYE97v9E/Fik92zl0lG1wfpOqKYwT/pChJtydCqSe79U eFsjfOi8m6dT9L3Vu6jDNPrm+Ra6vQ4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737340755; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IOmQpc3a/Z+EiU0zoEFi30PkbyDvn4isXQkxedS9DLQ=; b=FKnZhi0K/0k4Is3itXh8WT6PMRvKb4aryNspFbdifW/f7W7oCnxxz/yrAElAhKbI99yiXX aDCtZOydEl6tLgd38KJbKbanszkVozEv3Y5qSNUlxzb6Aew+h76WmFzfADSWyc+ncBq9xT /WoO/OZoVrLCEGmSDWFeDo7baq/6kEg= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-440-_5G-S1eLMT2iXAj9172JhQ-1; Sun, 19 Jan 2025 21:39:12 -0500 X-MC-Unique: _5G-S1eLMT2iXAj9172JhQ-1 X-Mimecast-MFC-AGG-ID: _5G-S1eLMT2iXAj9172JhQ Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3C3C219560B4; Mon, 20 Jan 2025 02:39:09 +0000 (UTC) Received: from localhost (unknown [10.72.112.227]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DB10219560BF; Mon, 20 Jan 2025 02:39:06 +0000 (UTC) Date: Mon, 20 Jan 2025 10:39:02 +0800 From: Baoquan He To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 07/13] mm, swap: hold a reference during scan and cleanup flag usage Message-ID: References: <20241230174621.61185-1-ryncsn@gmail.com> <20241230174621.61185-8-ryncsn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 5A400180005 X-Stat-Signature: 59o3i3djriwi7i78qbsus8xi8iywgick X-HE-Tag: 1737340756-839605 X-HE-Meta: U2FsdGVkX1928LnirYr2eiXm7LbvljLcqk1gHtzeXe14Q7VXZzSFGGbzp3wu3/53WwvseHsu9WAaYnEMDZyaZjg+legZQXAttSOD5mxvTact/pGz/gnCE+i+DYkAiamKb3DCW9HMH7J9kwx5OrfOjisOZ3uOVMaABj47hGlxyHlVIX4iVcUnqLAbLGE0rktQyL/HSZKWiPepq4W7nhp/TbgwoWx4Dw4wMQ+2raBuNdRjgY2fuCK2v5BYCPt1l+fujxFm7bRMgWyqtqsUkV5AeG5LkvcbzRLww6lG2v/kL0lfujiUoeU7og0LI9f6Yy0tYCJw8kuGYvzpLQEk8fX6PO/mep5H1NFGhzk9DdrnjCq8ywH7tzl08nWIMGNfKaDyqqgQ9ztiWxwyu335J1hiOOdEBLEnRwSq/wFgAbuvhdAls6H+zMQpPt+07mqCUXRUzB0ix4X0nU9K5K6LuCcHABStsWO6oDPYx048X/pxQ9qL89XnS6/00sbxrBUflE/zgAzdDxKLn2dVShj/Z43S4ilmqOf/CYNimhy4CXC7JUh+qlQ8CNkRde8yeorRdRj4fqDpSHaeOgUiQyKgMA2EGEDfIX3efq+MR3gMhgnDfIZP+dr2eFp6ceebNRq+i1TnQT0hdSbdSynd8ttdtMfDm8M/gS45aP34WuBj9lIY8DA17fTYmFg7kUU0YhGpKxXwrBNnxLwtUsOtShMemHZNytGYg3pMHgc6HY5neY3mOzerja/4Go2yvX0QjDoAsY397xw5l7sXNxra5vwVNq6dzZbcP3v3JrgFL9djAmuWymN79Ce/rl6pr+J/37aveqBdutuIOrcfYwv89nQt6Ict+TnYUzXUqfJWY+F7YfMiroW4/lfOQEa7wyJxXotXaih9OvAjm5kuAxebimliNUNNdtvdmSVg5ZbH/RIqfx50dE3MFH4D97X2GSXjKYaKAPyBGDXL+4cDqHDiw7t14yK 97TXnhWD XsFxQ/YafIl73PuPIN91E5hmiEh9jNN2rsj1V/iSKTZFHoAAmhUaWrnThsOcnxd90Fpw1O/X/KVFtSYru9lnuord/eE7CROWvV2bNQPcL1RpKH6vnYC58an258E7WbkYb+G/AB+bMd2hOobZ5j1jKX2vwSXDAfTXVxjHNqtpHAI/gEnmVezfjJHbXeBTNdAXBaGR7cLoNMjY41NKolj+yD1KYHl4UnI9pVbHudX9wk39o9eYPABoXmFKhDIfkBodKOBqnQswkX4niry46xCJcKKuv8Pt2Gx6CUoj1dxsZcTtZ6Yn1YbOO26otYDIwfW2fAlN5hO0be7MCPLPrmMv5ZGFWW+CkF9A1/ZiJRGlrRxA5F2RPYdZMm2pep3ShJXzYPfrCVgdxRHAtt4oBzLyqFgeSNsDO3dz5Z7lMwxv14lMQwmc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 01/13/25 at 01:34pm, Kairui Song wrote: > On Sat, Jan 4, 2025 at 1:46 PM Baoquan He wrote: > > > > On 12/31/24 at 01:46am, Kairui Song wrote: > > > From: Kairui Song > > > > > > The flag SWP_SCANNING was used as an indicator of whether a device > > > is being scanned for allocation, and prevents swapoff. Combined with > > > SWP_WRITEOK, they work as a set of barriers for a clean swapoff: > > > > > > 1. Swapoff clears SWP_WRITEOK, allocation requests will see > > > ~SWP_WRITEOK and abort as it's serialized by si->lock. > > > 2. Swapoff unuses all allocated entries. > > > 3. Swapoff waits for SWP_SCANNING flag to be cleared, so ongoing > > > allocations will stop, preventing UAF. > > > 4. Now swapoff can free everything safely. > > > > > > This will make the allocation path have a hard dependency on > > > si->lock. Allocation always have to acquire si->lock first for > > > setting SWP_SCANNING and checking SWP_WRITEOK. > > > > > > This commit removes this flag, and just uses the existing per-CPU > > > refcount instead to prevent UAF in step 3, which serves well for > > > such usage without dependency on si->lock, and scales very well too. > > > Just hold a reference during the whole scan and allocation process. > > > Swapoff will kill and wait for the counter. > > > > > > And for preventing any allocation from happening after step 1 so the > > > unuse in step 2 can ensure all slots are free, swapoff will acquire > > > the ci->lock of each cluster one by one to ensure all allocations > > > see ~SWP_WRITEOK and abort. > > > > Changing to use si->users is great, while wondering why we need acquire = > > each ci->lock now. After setup 1, we have cleared SWP_WRITEOK, and take > > the si off swap_avail_heads list. No matter what, we just need wait for > > p->comm's completion and continue, why bothering to loop for the > > ci->lock acquiring? > > > > Hi Baoquan, > > Waiting for p->comm's completion must be done after unuse is called > (unuse will need to take the si->users refcound, so it can't be dead > yet), but unuse must be called after no one will allocate any new > entry. That is guaranteed by the loop ci->lock acquiring. Sorry for late response, Kairui. I went trought the code flow of swap allocation several times, however haven't made clear how loop ci->lock acquiring is needed here. Once si->flags &= ~SWP_WRITEOK is executed in del_from_avail_list() when swaping off, even though the allocation action is still on going, it will be failed in cluster_alloc_range() by the 'if (!(si->flags & SWP_WRITEOK))' checking. Then that allocation requirement will be failed and returned, means no new swap entry|slot allcation will be done. Then unuse won't be impacted at all. In this case, why do we care about it? Please forgive my stupidity, could you elaborate in which case this kind of still ongoging swap allocation will happen during its swap device's off? Could you give an example of the concurrent execution flows? Thanks Baoquan