From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6FE2D149C7 for ; Fri, 25 Oct 2024 16:34:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35CD06B0083; Fri, 25 Oct 2024 12:34:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30B836B0088; Fri, 25 Oct 2024 12:34:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D3016B0089; Fri, 25 Oct 2024 12:34:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F164F6B0083 for ; Fri, 25 Oct 2024 12:34:44 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E3AAD1C6A70 for ; Fri, 25 Oct 2024 16:34:21 +0000 (UTC) X-FDA: 82712672322.03.8B09290 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf04.hostedemail.com (Postfix) with ESMTP id 546AD4001C for ; Fri, 25 Oct 2024 16:34:17 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dlzhga6E; spf=pass (imf04.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729873877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/Ac17hdVh+onN5kuR/AiipcbAnn1S6E1bEALkkLK7RY=; b=XvefuFk51JM4h4jwwe1a4AtwgRPwTMQdcqOwS8OhKupXWyo3ORsNVJzTSFmZx2pmt1SDOo jIWE0GkhQ7qzucOGMBxBezjje9EjgOfGIKKlV4V00x2m3yAZSnSzRWyD0oPzj914gapaTz seWsh4PaBeTPrmoqiKkNM38VTK0ZSXM= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dlzhga6E; spf=pass (imf04.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729873877; a=rsa-sha256; cv=none; b=HwcQAKaOFJUs6QB3kcxlKsfEdV0+PJJ61TucHpNEw3LcAH0Y8Dsa0QXWqiay3/up9WB6uy qFKvK6dO8yq7nwANY6p3NFwrua/7slfeC8Vqdj79kFTGNbeEEyyHw1yuLdsfNRvLvMIe2G yBox9gzSMDLugKZIJkVfBmi92ukpk4g= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-5c937b5169cso3558879a12.1 for ; Fri, 25 Oct 2024 09:34:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729874081; x=1730478881; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/Ac17hdVh+onN5kuR/AiipcbAnn1S6E1bEALkkLK7RY=; b=dlzhga6EoFrxenj7NZvYxBnZ6u+rhtlcnLbMM4G0tBl1TXRkrraAl4zEnfLgmx8wVy crKNn+HkybPBazYufCR5DApNu7x95AfRW697b+dtK6FcanKCVfTGdhct5tbgu5rOLAfm 1PqEi7AV+BySGtbE480xC9AmjwtWmwVUzBhxCgkhLnjvBtm606YhvGaGKTeQPCVTqP/1 XGrk/O/3M3cPxH1vvp6cttZd4sBAMWlTUvHdkoG1REDLhxkrDWQ2Vqpi+lnNr0pSNZEl mbpUsZCWCLaz0axg6a5RaXqlpRd+wS25/n4wUDEEdW3aErwty1caaQmZ4KK8nDl1Lgfe PhFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729874081; x=1730478881; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/Ac17hdVh+onN5kuR/AiipcbAnn1S6E1bEALkkLK7RY=; b=cluJLsNPDILEjbAh7/489ZJZqjvFeO9aUEehskJ4vIQVL9RwBrKcigHwAusgULLqQq 4s+bPUqz4gTqfNKWyHIHlUYBnbHCH3Kn/R4uTN1DrlgDaeubG9+3AWi92+VWbXG4N+LK v9SRjZv1SC2so4qvVdcqtIp3uX4jlc9e7Ui4q7QJ4sySRRkHY8luOR/uKI/7Jl/A1UXS CfWGlKrsOzEyLaeD2o35EP1o7+LelHEvagY0sm9iUkcGPwqL94QirrdwrqCiaqt/Jeqx R0Nml3j0WZbcrU20OYhgIzHMKhJZ7TyiGih5/uDbcsuOnhdYItCpTrIitG+BPGj1FQvZ pjFg== X-Forwarded-Encrypted: i=1; AJvYcCVLmbLpfZ4zTWLcM/Yf7KmMmzE69P3ASqisjtsGS2mdBq80j7iY89LPYLN5emuGGA2aTA6YFwua3g==@kvack.org X-Gm-Message-State: AOJu0YzkrQesx5thCRVL3bxUOJ7GitbvD5k99qSuZxDnnFZ/biCJvLbM vu7ibit4KetbxFAZKcEYLMpJR8h9qp7dFBG2gpQVELtwrzAEUk0IEJLuvL4PRPxDU4Cga+TTCCH Rasd2AU/uTsoMgVleSyat+B8zFCg= X-Google-Smtp-Source: AGHT+IHyZlYXlehUxpbIqwguKr8w+asAHNQiFzjOlt0wLqYi5e6BjSpS3oEJ4PG4NomTOLnyJIZxdzocOgL7HHoWQYo= X-Received: by 2002:a05:6402:42d6:b0:5cb:be69:1444 with SMTP id 4fb4d7f45d1cf-5cbbf1d07b1mr164408a12.9.1729874080758; Fri, 25 Oct 2024 09:34:40 -0700 (PDT) MIME-Version: 1.0 References: <760237a3-69d6-9197-432d-0306d52c048a@google.com> <7dc6b280-cd87-acd1-1124-e512e3d2217d@google.com> In-Reply-To: From: Yang Shi Date: Fri, 25 Oct 2024 09:34:29 -0700 Message-ID: Subject: Re: [PATCH hotfix 2/2] mm/thp: fix deferred split unqueue naming and locking To: Hugh Dickins Cc: Andrew Morton , Usama Arif , Wei Yang , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Johannes Weiner , Baolin Wang , Barry Song , Kefeng Wang , Ryan Roberts , Nhat Pham , Zi Yan , Chris Li , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 546AD4001C X-Stat-Signature: k7jkfsbhszkwhnnt3cgga3o7gpuatyes X-Rspam-User: X-HE-Tag: 1729874057-933719 X-HE-Meta: U2FsdGVkX19rkV5O9Bq2DU7KaIJ/mz/RPm+PUTwpXG+5EDT/s6GZ9J22RkNFVqtskhEPa398VW8/AtLF1nzGXwOOUaNqJ4PQBr4WyK5zfWS0jpGpUWUj8Ldw5ttOSY329orQJrdSeoAyZHA4DRdgkiUcIRVxrfQSv4/gh4V9seSWpuiGnuu8vctfEXpAS9YQvNKq8p3EWM40++5FJY1G1o6sWH00w3JzPo3kPPKre+MsdbYVNGPFoHfFEc/o6JHxluHw90QAanAdhypZ6OclF6fQ3VX9ZyM8BrI5Poo3sX6tVAUK2rp7O3E+0NIYKgyq0DGsqd9ZiRhPLWcXerNC778JO3xoTcB+UcyKxSdZl9J/mkhvvmvemyZcisSXaoQ/cjWCkgvzJ/QcKjLWubCDq/cibo5VXFUr8e7rUetOzq9ZZViSSZTMYEsczv7ME9asahRHFEr7xtn/PyDYedliK87ThgV8/DKIMn5oi3+tC7q+MpoouXrUFzNobTuFHTKkBF+08V7xT7QDsGBN4RaiJxcbwvPBhUnjJ98JK2jY910oAAGRXY1ToQfKdJ0yqnye5kWFKBdDYE2Beut5QBRXZTEbwCwkdytVh2BV6NHKjLwHbzjgcE/9IuQ7I7Z9az7z3qe7UtGcy1BI7LCLuBsRSkeGlr0sRRIAzGmsB5i2PWJ/XzUBlRgOOkLAPnLzVg2IepIN47zoprK0U7HKh3nAlLYj3u0TY9EHmmuD25fsLjUVDHmgag2dijZ89/YHGgI+zFD3vVm6UPVGkiodE1DYQJ/2IfMLEEugMItOo4nzrC2Y+u+N2MbQgG+HHmrYQDGs/FR2w3ofvwzSvG9vi9hrUfkPq8sHGyv1xOJ3qh1jMzV4zTVG05hYRskHlBIhU9z/UiK7WRj5N/wB49cs0+imbnPToZVT90wWAsm6xZTVebG9YbfyK1zGpi6ILxDDjiyuy6VFT9GJB8GTq6vRLBH 60+8BZJY InrNz3zrwpHcZsmea0MfSSkLwKG1m5IL+TdA0GpjhUjSwvuRLa/eJpZJKrOOF13mQZNwrcBUNmePm7a4m51+UTnWXGgG52lS6wK/QkkyU/srnBva/rp9Z0mfxd9T3AdaYw1LP2PzheEtzdq9U2kgiHRaBwkmNIOnwHXt9ykkgaYubB1dcDFd3KVG7E+8JuJKhx5uKi2GkN7aDQ9qITYyPW5MY+ESWae9Yjc1GK0RAtKnKGkjhaoQH02/BXUIs9SGAjL1+uyOFSB+pwAhso/jVs7P6JEhTpVS4x6+NoPSyX1vSuIFho47UK29QyCKwX+Zoz5GTzzGU0Wc+oyufIAYteiIpu/qL4fbW5bYqAfGxx/UokMIVGtNviHr/G/Q3Lve8TH2Dd1Sc2O/9R+Vky4+BUX5ttb5dm2Qs6Bsu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 24, 2024 at 11:57=E2=80=AFPM Hugh Dickins wr= ote: > > On Thu, 24 Oct 2024, Yang Shi wrote: > > On Wed, Oct 23, 2024 at 9:13=E2=80=AFPM Hugh Dickins = wrote: > > > > > > That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred sp= lit > > > shrinker memcg aware"): which included a check on swapcache before ad= ding > > > to deferred queue (which can now be removed), but no check on deferre= d > > > queue before adding THP to swapcache (maybe other circumstances preve= nted > > > it at that time, but not now). > > > > If I remember correctly, THP just can be added to deferred list when > > there is no PMD map before mTHP swapout, so shrink_page_list() did > > check THP's compound_mapcount (called _entire_mapcount now) before > > adding it to swap cache. > > > > Now the code just checked whether the large folio is on deferred list o= r not. > > I've continued to find it hard to think about, hard to be convinced by > that sequence of checks, without an actual explicit _deferred_list check. You meant the swap cache check? I was trying to recall the reason. If I remember correctly (sorry, memory is still vague), if the THP was PMD-mapped and PTE-mapped by two processes, the THP may be added to swap cache since just compound_mapcount was checked. Then try_to_unmap() in shrink_page_list() may add it to deferred list if PMD mapping was unmapped first. The potential list corruption fixed by you now may be triggered. But this was based on the assumption that there can't be PMD-mapped THP on deferred list. If this happens (as David's migration fail example), the swap cache check should be not enough. This case was overlooked. > > David has brilliantly come up with the failed THP migration example; > and I think now perhaps 5.8's 5503fbf2b0b8 ("khugepaged: allow to > collapse PTE-mapped compound pages") introduced another way? > > But I certainly need to reword that wagging finger pointing to your > commit: these are much more exceptional cases than I was thinking there. > > I have this evening tried running swapping load on 5.10 and 6.6 and 6.11, > each with just a BUG_ON(!list_empty(the deferred list)) before resetting > memcg in mem_cgroup_swapout() - it would of course be much easier to hit > such a BUG_ON() than for the consequent wrong locking to be so unlucky > as to actually result in list corruption. > > None of those BUG_ONs hit; though I was only running each for 1.5 hour, > and looking at vmstats at the end, see the were really not exercising > deferred split very much at all. I'd been hoping for an immediate hit > (as on 6.12-rc) to confirm my doubt, but no. That doesn't *prove* you're > right, but (excepting David's and my weird cases) I bet you are right. Maybe it just happened rarely and was hard to hit. But still theoretically possible. Your fix is more reliable. > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 4b21a368b4e2..57f64b5d0004 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -2681,7 +2681,9 @@ void free_unref_folios(struct folio_batch *foli= os) > > > unsigned long pfn =3D folio_pfn(folio); > > > unsigned int order =3D folio_order(folio); > > > > > > - folio_undo_large_rmappable(folio); > > > + if (mem_cgroup_disabled()) > > > + folio_unqueue_deferred_split(folio); > > > > This looks confusing. It looks all callsites of free_unref_folios() > > have folio_unqueue_deferred_split() and memcg uncharge called before > > it. If there is any problem, memcg uncharge should catch it. Did I > > miss something? > > I don't understand what you're suggesting there. But David remarked > on it too, so it seems that I do need at least to add some comment. > > I'd better re-examine the memcg/non-memcg forking paths again: but by > strange coincidence (or suggestion?), I'm suddenly now too tired here, > precisely where David stopped too. I'll have to come back to this > tomorrow, sorry. I perhaps misunderstood this code. Just feel free to correct me if it doesn't make sense to you. But, yes, some comments are definitely welcome and helpful for understanding the code and review. > > Hugh