From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B63C7C3DA60 for ; Thu, 18 Jul 2024 11:16:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19A226B00A0; Thu, 18 Jul 2024 07:16:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1232F6B00A1; Thu, 18 Jul 2024 07:16:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F057F6B00A2; Thu, 18 Jul 2024 07:16:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D0BBC6B00A0 for ; Thu, 18 Jul 2024 07:16:46 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 70C76121A14 for ; Thu, 18 Jul 2024 11:16:46 +0000 (UTC) X-FDA: 82352620812.18.636776B Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf09.hostedemail.com (Postfix) with ESMTP id 120B614000A for ; Thu, 18 Jul 2024 11:16:43 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="O/O7hlgt"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf09.hostedemail.com: domain of vbabka@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=vbabka@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721301371; a=rsa-sha256; cv=none; b=VyNVo4qfp196qD4S2UCkQ0mPav+JySrQvkwPOseuVWxg0gArTZ3PM2UEmzQowWSkSD5sKe L9b0rkvbb+z6gcMbkwTyXma8yMHDpta1AYBRQBIw5TxYY+2CkR2P5qiSU6zW3uK82ykiyE GfjrAtgM61Z1mFRfo3kNL/pr70vXMOE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="O/O7hlgt"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf09.hostedemail.com: domain of vbabka@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=vbabka@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721301371; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hggJKOEBKWCsohfTMWbDrgYRUA93LJO135kQ4XwKOmI=; b=cmpFWjtRzeAtaL5oJ/cGC0D7i8eCThLaMSq2AiOzkt+ZVuPMYaqk1sN+//OGysns4jAfyq MVJt8pD7FQkRN2ijYbD2e4MKs+4fvUcqtg1R1ILX73e2urpbuYeX2WXJEuUC7cVTWuX3+C QbTuFEHRP60ohrE2IykbsNfbuEVaPx8= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 35419CE19B6; Thu, 18 Jul 2024 11:16:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E604BC116B1; Thu, 18 Jul 2024 11:16:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1721301399; bh=41McLAL0buFEegU9JpAI8+wwAy56uuO6YT3HPAfpnZc=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=O/O7hlgtxbeSUrMj49RzBgA7RI7Mz4tx2bmVlmn4bvyGhumV/FRUw8RHEg4iROuE/ +HM3mVEl3FjdKXN6UIcybeiRPRvbviRMa0jU0TnBO8G9RvOaeD8EpbDlXD9iPgjA3+ GtLqnsXczne2F0Tz2Kx30jEcxz/tGVfSrIDlilLHQXaGYFSIIjhi3+ipq6SvaZk7eJ j5U2BcbuKeZXmKAMMq/1Wmws0uQ7Bk20l+tqgChaO+PFSfxtX7P13VGuLM1c4c+2Rb 7fpbu7VMbfYi447hACzgRaS8HBmW0irXOQTV6kKjzoKRs/xNzml3HDrrE+u74DYeZ0 jE6FU9I4c4Ljw== Message-ID: <6f2151a9-da23-4917-b985-8de6b0852e37@kernel.org> Date: Thu, 18 Jul 2024 13:16:35 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() Content-Language: en-US To: Li Zhijian , linux-mm@kvack.org Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Yasunori Gotou , David Hildenbrand , Yao Xingtao , Lucas Stach References: <20240716073929.843277-1-lizhijian@fujitsu.com> From: "Vlastimil Babka (SUSE)" In-Reply-To: <20240716073929.843277-1-lizhijian@fujitsu.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 120B614000A X-Stat-Signature: ajacjnb3jhs3cknn3rtye3xianotzidp X-Rspam-User: X-HE-Tag: 1721301403-880614 X-HE-Meta: U2FsdGVkX18vkqhA1fMwvT7u8XU85MgEswQ1R+O+yQzYdNtQGKKHTFR31yR3Cd47Sfcuygldqb4Kb9YvKfaLZmqhgskmt3SBFIQMkyTtuYmWNt0G1h8rBPhWUfac3YIXZrQqSjXeWl9sHQeeH19jv4FDdTiRVBpjUXuTXm9Z0KMEg7o8z+3BpOI6f/a1q4iZBex3mGhi+EV9Dj05GmKPUU82k7hE7agV0OYZQA/KYEVOFuLAO5NMedXh1LOk0R6C3miMlNXlo9KISIMa0EBNIAIFRUQ+QUax0tAlpWMwY8pfkdeusSoHELYeP5au+oe3lgzQS1RiA9vnX0OFfirFbCAsgnT6dZfvIOOM7WbNLKF/UWLZrdrNvoMDv+HbcLxJiS+Im4/tt6uCVJq0+CBjJckUVrzsbNL2J29jMwyVL5xyduhYQb9DIe5FcfUUSR6glEgRE3Su54WvgYDbAx79G3fa4aIAM3ylglEXJfWn+mo3iH0Ctn+ESZi+vpdiKO9GfOojNiaO4cOxzKpSspbWJy0tgMhjFCDvuSr/EQcvXPOq+wa9WXQJjUIePxhu3/QtY5Tg1Nyc3V4xUfAGR/UDxAiyBVPon9aY8qghF6RJEnFduS0q4Gvnkg/LXtjVSd2GvQXyMHBtJ3cPsiQsqS7g+d4Fxus7tFSPVWfSSUtikpFdMknVBgJpkY82dcX/qHnGTR8D0bNdWc2foU9UxXVg8dCFOhGdWtIk3pHgAMYVKKpPrzwE67yabDe/ycjGLipAGKBVUeOkGsr4/Gn45qXYsjNIk913Uszpj0mBzUA/ZfnglDdGbK8JUaPHYe0cwQCcTfDnUUBcdXZr9KIuuWXSGhM4vSHIti3tEzG/qwHGHb0Ws4CRpq7tH2IwTCpiPGxWnvsNnS6EC7sHU47vw5/eYfvsprL5ajr/h5GCxyDtJLxNaaYhIuCglfmtB98bbwVIs10Az1PyMGP5ZMELx7E zl7wZIXb lgHHtaGP4HWXUstICthT5/SHSWGnOVOYr6J+bTRQQtIWp8OlYbxvlVV1a4njcTODut6YbG6yj3HrEIK5VdW8NdYTGXERTpXZHZg6iSnmUV7UzbFNxsy/UQVl0QLiAWUXEXXs04sjEkhtNqwdScMJJvqzHuCclH+avlCBX2C69Uc86Hyb/pF2bCMF6XixeE0OugxKvW7x/k2DFTmQfM4x0RsmGAEW7lVol8bb+XW6eu54HLQKm+4WR9fnxwap9ykrEaTq8VDzKh2Bs1bEmxG1NdpSmbpZQENY+7CTxxuy1VMccD5K6rJKb9PT++ih8Hvn7/1394znYzl9eu80z6oK7rHku+uYnL2XmUoHlLTrTwXpMqSkfEFt9y6bbuA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 7/16/24 9:39 AM, Li Zhijian wrote: > It's expected that no page should be left in pcp_list after calling > zone_pcp_disable() in offline_pages(). Previously, it's observed that > offline_pages() gets stuck [1] due to some pages remaining in pcp_list. > > Cause: > There is a race condition between drain_pages_zone() and __rmqueue_pcplist() > involving the pcp->count variable. See below scenario: > > CPU0 CPU1 > ---------------- --------------- > spin_lock(&pcp->lock); > __rmqueue_pcplist() { > zone_pcp_disable() { > /* list is empty */ > if (list_empty(list)) { > /* add pages to pcp_list */ > alloced = rmqueue_bulk() > mutex_lock(&pcp_batch_high_lock) > ... > __drain_all_pages() { > drain_pages_zone() { > /* read pcp->count, it's 0 here */ > count = READ_ONCE(pcp->count) > /* 0 means nothing to drain */ > /* update pcp->count */ > pcp->count += alloced << order; > ... > ... > spin_unlock(&pcp->lock); > > In this case, after calling zone_pcp_disable() though, there are still some > pages in pcp_list. And these pages in pcp_list are neither movable nor > isolated, offline_pages() gets stuck as a result. > > Solution: > Expand the scope of the pcp->lock to also protect pcp->count in > drain_pages_zone(), ensuring no pages are left in the pcp list. > > [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/ > > Cc: David Hildenbrand > Reported-by: Yao Xingtao > Signed-off-by: Li Zhijian > --- > mm/page_alloc.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9ecf99190ea2..1780df31d5f5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2323,16 +2323,17 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp) > static void drain_pages_zone(unsigned int cpu, struct zone *zone) > { > struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); > - int count = READ_ONCE(pcp->count); > + int count; > > + spin_lock(&pcp->lock); > + count = pcp->count; > while (count) { > int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); > count -= to_drain; > > - spin_lock(&pcp->lock); > free_pcppages_bulk(zone, to_drain, pcp, 0); > - spin_unlock(&pcp->lock); > } > + spin_unlock(&pcp->lock); This way seems to be partially going against the purpose of 55f77df7d715 ("mm: page_alloc: control latency caused by zone PCP draining") - the zone lock hold time will still be limited by the batch, but not the pcp lock time. It should still be possible to relock between the iterations? To prevent the race I think the main part is determining pcp->count under the lock, but release/retake should still be ok if the pcp->count is reread after relocking. > } > > /*