From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9132BCFA451 for ; Thu, 24 Oct 2024 10:20:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06BBD6B0082; Thu, 24 Oct 2024 06:20:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 01BBB6B0083; Thu, 24 Oct 2024 06:20:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DFE316B0085; Thu, 24 Oct 2024 06:20:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C301E6B0082 for ; Thu, 24 Oct 2024 06:20:19 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 16439ACD23 for ; Thu, 24 Oct 2024 10:19:43 +0000 (UTC) X-FDA: 82708100412.24.3028055 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf25.hostedemail.com (Postfix) with ESMTP id 6A9D8A000A for ; Thu, 24 Oct 2024 10:20:04 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=i8DdPCxd; spf=pass (imf25.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729765140; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iA8XsUv9ZiRb6klI05HB7Oq4fmXPGR/w8amvD8JEpMA=; b=5FLSI9bIiIb7mViR2LhrhUBNCkSZzwIaqmqgfwHTgP0+YRBpL86T+dlQoLOojvw3Z/O1sx 8kvoJd/oxi3X57Hqs6HKeSZEyofof+ElGbm6BGOLbMVOQJn0H5pLb3NfDxWsdeqtVrUZrt +owG0rvqJjg88MdX0icefQfaxTrTniE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=i8DdPCxd; spf=pass (imf25.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729765140; a=rsa-sha256; cv=none; b=MacQXDeegfH9B0cf+Mf4zYsxyyTYhiBXHCQRIBWa41bSIJamqs5t7PoDYNOmkShMBm97tK YnbN7Axcd4d6zajOmUEkVCXBAwQu3/muM+ymeuE2juJ0q/DKFHODHB+2RHqTMNDAve8caH bUvs4zyNTmMLaPVUeVLm30kVkmmGAOM= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5c937b5169cso1185026a12.1 for ; Thu, 24 Oct 2024 03:20:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729765216; x=1730370016; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=iA8XsUv9ZiRb6klI05HB7Oq4fmXPGR/w8amvD8JEpMA=; b=i8DdPCxdZD/EHm6uUDA1i1qBWWuTsafMt0xZHW5v2G1+MM15M8yJsbwDDqFqsNAT0V kDErSoH227cSKRs7tbf25AoomXspjnPW0qiDwiMNnej7Plwlsm42Xa5U4uIipMgobLh+ gZFShd4B3EWRiZsG0ef9IBvfoPZJxS+DAWFnQfI5bLonWQRIBI/qIRmI0pbjqoMJ1i+t LE/WPlbSbVM+UXjHSmBq7iJgTU8cbJVpRa8sb53jbGLF1memvzHDeabsHlmawzbMb0Iz 1V8J7yImQMg6QNDyCiPy41YRPmnqtY5xtKQ02EuVse56zkChfb+4i1PCTXnlZ1whWIo6 i0sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729765216; x=1730370016; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iA8XsUv9ZiRb6klI05HB7Oq4fmXPGR/w8amvD8JEpMA=; b=L5RzOvnOboEUmTMFiPpmqyxotnYPFKJB+1X9zKlQoGtySciYq6cGMrD8uynNLkvvOb Hsdp8YH04Cs9PiMdfUKqa8coQc44EiQ0/noFUZYNOkNt2wi2/AFW2xOpByBTmYvbMZYt XyqKhpxUkSK5Pe2KHTopRs7aEzLR/iT27eV2m6SvJIYE57deQxVh/v9Ddf7KbzqxK6Dg tEyJ9rQMsaHYDPTfobDYG4PZdhLtXhUCHpfjv/m8qDMY0WliynuOpxJSM7oJ7S3A6VbQ vQdwh6pRf6XfOP+g6K9a5hzZIIAtG7/QlRcyDeLYbp8RVVCwuf88CB8kqxE4+uO7+/K9 JAOQ== X-Forwarded-Encrypted: i=1; AJvYcCVqu9N8HTqopHd12Bfq/zkoGvipXRZxpM7kZCUfWBvkG18DsfxCZR6Bt7en+PXW0RTbOSTYUYJy/A==@kvack.org X-Gm-Message-State: AOJu0YxC61WpsgG3FvXoSQ1XQxPSmS+eqeYvY3y49XbAl2HL7UE7mxtn T9X01W2BIrbh+AcPx8genyIOyfEAp8ldZsU58IRZyjiiIjqK4jap/hAEhuIY X-Google-Smtp-Source: AGHT+IGUsj/cRGtaDvz+F0J9nNmkVXRZfqrOpZd7s6gYfKwD8/kx33LqFr/jokfnX0ukDrqnm18jLg== X-Received: by 2002:a05:6402:234b:b0:5ca:18c9:f382 with SMTP id 4fb4d7f45d1cf-5cba1fdb9f2mr1766787a12.3.1729765215369; Thu, 24 Oct 2024 03:20:15 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:eb:d0d0:c7fd:c82c? ([2620:10d:c092:500::6:222b]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5cb66c6b9d1sm5741285a12.70.2024.10.24.03.20.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Oct 2024 03:20:14 -0700 (PDT) Message-ID: <82d32616-9135-4b30-9e91-b190a03bcb54@gmail.com> Date: Thu, 24 Oct 2024 11:20:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH hotfix 1/2] mm/thp: fix deferred split queue not partially_mapped To: Hugh Dickins , Andrew Morton Cc: Yang Shi , Wei Yang , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Johannes Weiner , Baolin Wang , Barry Song , Kefeng Wang , Ryan Roberts , Nhat Pham , Zi Yan , Chris Li , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <760237a3-69d6-9197-432d-0306d52c048a@google.com> Content-Language: en-US From: Usama Arif In-Reply-To: <760237a3-69d6-9197-432d-0306d52c048a@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 6A9D8A000A X-Stat-Signature: yebgdne3coi3cbytbi5surwfhg9pzs7j X-HE-Tag: 1729765204-202196 X-HE-Meta: U2FsdGVkX1+Naqb8BvfZeq76lcZe4bSq/CitvNlpRnTYYtKTKuTp64cNRH+W06CVNMFNsZqNMXBSiVjk/gEI6s3RPGiS1I6fyy/s48va0rAnbkU8ep9XWqN2uhi1mNdI/67oKgtiShHeJEUO4g+af79s4qR0RkyKd789hSO6md6j9X8XOryLUEeBub6U4a3X59xups13Vrzpf4JgI/Y5KYQ5ULa6glC1JnKzJBfML//OIW+znp0JZrrKv+jiXjVxElR2vkcvtxdkqTdS6VoJJHZG7B99i0U/+JlkXNBIVivCWLYP2CpPxEBatj6g4lVUGtf+hFl4NTrqmhGxKYxb+76lhfjWMchSIvjzPNa9RrF8Q1jYk6LfPvmvb3+9PAgGANja9xU0qf2rq+KsKGWfwmhU/kQVywHRQXJHrnEEOoimxMiQsb5e5ULQ8JT16P8dQ9Us6RmyvFWPS2qOHIMF84FjoIjnJG4Lh3KC0mR9DZkR9vrV67wR/qLeQRqjCztNExWmfbPZyW/BZdnnR9L3agGi/+CvL8vowDBhgXEVOnVygAIjlwbYPTfSwKnIdkU/jgiz1gQDojgAjLhgWJR/IehIB07KrtH06tOhKx2fhNefHSs2272Vea9XAfKo2ej00TBSNbS3nfRLOZlFuku5+o23ODu284AK89Mf8fNuSZvbOvJibZ7c4ME8+1PbA6eADah31DbrJ5VUfAU43HZaeKyjJIaR/YIHlaukDCHyQPOFs8+8ZsYuqIg3FP7DbwYmIUB/Uz7haWCwNW3D81Tu9ZNuf5jA5ph9nGLzTrdS//uO/JSnwhpsLXtVRVGJxk0GqL90pn9UPdi8Z6lFYAjA18FQyq/WD0mPCt/p1Pd8uuuSjzAr7Zn6PMlGU0Mv/1rXUfFCLQAfqwgJS8fZQI6yydQ3dEAqlPa3S2yS+rS6sMSuTneMrudSWwDhF5h8/Pjgg3Inco1JmBKqXql0aP3 cM1mUWCr yYcUABDwOpXYPOAmpYnE8bYhqz5bWNPRutWCIkP6pVGZ1031URok2j9Pt77/MrusQlqTDgULTMyrTp8HNQ/jUY1IVk4KyEm0lh3h/RVlObHkzJ9yfWIUqrnDUsYPqMAuMS/jRFqVAgwjx+MwO3KTzIMlOpfW6TbIBaShKHrYzeNezmMD1qt4Ic62BUk4/iSvPQfhlL0F6LZIZHt9fRCwRqWhUP89NK3tcO0WqzKsISla8hruBKXbe3VHHtxaiZURoB1rfx3h+dVnGD19He0jeZFjE2d/DDDEbKBJofjFxLfGid3evCwgLf5IQEzUkMsXLTTWRMFzXOjqdsTuiHagZ+1NJU1+NPJg5xHv6uDTWe6zVeL2TsDSYFGIS5JRUMlblWQVgit1/plUmcBlCqfH0KzKVSGNi6JuAbkjrmhHp7zZV342qmhnXjWrOMDPeJdZG/MRxVZ3nCxBHQTcpo08xJCLhPyjmsDplimyIFkhCrITM+uRPsC198dBqISsy63UFx8WMYJJHNQ4fyHP9l6vu1kSOlkdXzNCWZwUO6+LONAyyG1zYTVTWe8/10QC+BPkp02Lep77acU6arMr+n1NW/HNSrONtaXV2ovoy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 24/10/2024 05:10, Hugh Dickins wrote: > Recent changes are putting more pressure on THP deferred split queues: > under load revealing long-standing races, causing list_del corruptions, > "Bad page state"s and worse (I keep BUGs in both of those, so usually > don't get to see how badly they end up without). The relevant recent > changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, > improved swap allocation, and underused THP splitting. > > The new unlocked list_del_init() in deferred_split_scan() is buggy. > I gave bad advice, it looks plausible since that's a local on-stack > list, but the fact is that it can race with a third party freeing or > migrating the preceding folio (properly unqueueing it with refcount 0 > while holding split_queue_lock), thereby corrupting the list linkage. > > The obvious answer would be to take split_queue_lock there: but it has > a long history of contention, so I'm reluctant to add to that. Instead, > make sure that there is always one safe (raised refcount) folio before, > by delaying its folio_put(). (And of course I was wrong to suggest > updating split_queue_len without the lock: leave that until the splice.) > Thanks for this, I imagine this was quite tricky to debug. Avoiding taking the split_queue_lock by keeping reference of the preceding folio is really nice. And I should have spotted in the original patch that split_queue_len shouldn't be changed without holding split_queue_lock. Acked-by: Usama Arif > And remove two over-eager partially_mapped checks, restoring those tests > to how they were before: if uncharge_folio() or free_tail_page_prepare() > finds _deferred_list non-empty, it's in trouble whether or not that folio > is partially_mapped (and the flag was already cleared in the latter case). > > Fixes: dafff3f4c850 ("mm: split underused THPs") > Signed-off-by: Hugh Dickins > --- > mm/huge_memory.c | 21 +++++++++++++++++---- > mm/memcontrol.c | 3 +-- > mm/page_alloc.c | 5 ++--- > 3 files changed, 20 insertions(+), 9 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 2fb328880b50..a1d345f1680c 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3718,8 +3718,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > struct deferred_split *ds_queue = &pgdata->deferred_split_queue; > unsigned long flags; > LIST_HEAD(list); > - struct folio *folio, *next; > - int split = 0; > + struct folio *folio, *next, *prev = NULL; > + int split = 0, removed = 0; > > #ifdef CONFIG_MEMCG > if (sc->memcg) > @@ -3775,15 +3775,28 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > */ > if (!did_split && !folio_test_partially_mapped(folio)) { > list_del_init(&folio->_deferred_list); > - ds_queue->split_queue_len--; > + removed++; > + } else { > + /* > + * That unlocked list_del_init() above would be unsafe, > + * unless its folio is separated from any earlier folios > + * left on the list (which may be concurrently unqueued) > + * by one safe folio with refcount still raised. > + */ > + swap(folio, prev); > } > - folio_put(folio); > + if (folio) > + folio_put(folio); > } > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > list_splice_tail(&list, &ds_queue->split_queue); > + ds_queue->split_queue_len -= removed; > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > + if (prev) > + folio_put(prev); > + > /* > * Stop shrinker if we didn't split any page, but the queue is empty. > * This can happen if pages were freed under us. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7845c64a2c57..2703227cce88 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -4631,8 +4631,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) > VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); > VM_BUG_ON_FOLIO(folio_order(folio) > 1 && > !folio_test_hugetlb(folio) && > - !list_empty(&folio->_deferred_list) && > - folio_test_partially_mapped(folio), folio); > + !list_empty(&folio->_deferred_list), folio); > > /* > * Nobody should be changing or seriously looking at > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8afab64814dc..4b21a368b4e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -961,9 +961,8 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) > break; > case 2: > /* the second tail page: deferred_list overlaps ->mapping */ > - if (unlikely(!list_empty(&folio->_deferred_list) && > - folio_test_partially_mapped(folio))) { > - bad_page(page, "partially mapped folio on deferred list"); > + if (unlikely(!list_empty(&folio->_deferred_list))) { > + bad_page(page, "on deferred list"); > goto out; > } > break;