From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14299C4345F for ; Fri, 12 Apr 2024 20:36:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 850EE6B008A; Fri, 12 Apr 2024 16:36:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 800266B0093; Fri, 12 Apr 2024 16:36:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C7796B0096; Fri, 12 Apr 2024 16:36:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4E3206B008A for ; Fri, 12 Apr 2024 16:36:12 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DA8E61A0F65 for ; Fri, 12 Apr 2024 20:36:11 +0000 (UTC) X-FDA: 82002036942.14.135766C Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) by imf12.hostedemail.com (Postfix) with ESMTP id 465E040006 for ; Fri, 12 Apr 2024 20:36:09 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RdxQQBFa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712954169; a=rsa-sha256; cv=none; b=gXlRKbs8QcRcRo7PFSoh2r5p+YCqmq0jaVAphT6JZZ5n1MYCFnNjQGQBTkjJ/7AfAY3Ej5 UEDULHjqcTj5aLkVxB++EeGDLqVa2S08P9KzjQpa09SWmo210HIHtUyUayi8UGqYcNO5K2 rQFAuk/bnuPw0Me9t6Hu7DqzlyJaU30= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RdxQQBFa; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.47 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712954169; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fwF5Q5R7Qkp4qYnTqzK2FqyRjSnKNNcNhwy8t7pOLVc=; b=ZEMrvL/wMNWZUKVbJ9Oz0Zhff5lC9SaFiyTc19JlJn36oogPGhU6uEQHOvl8rQwQtLTdhM I0FQk/3D83lKRf1K5oeWyW5RmFWrzljyf+64vg3uN8Fqi3UUnAevuPI3xsr7ncGL9YtoXo 1Dfy5Qme+4slExQIqEZnO7e8opMGSWU= Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-56e477db7fbso1938547a12.3 for ; Fri, 12 Apr 2024 13:36:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712954168; x=1713558968; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fwF5Q5R7Qkp4qYnTqzK2FqyRjSnKNNcNhwy8t7pOLVc=; b=RdxQQBFapyp6ywHidslwoulovDwIMx8ktYMhHyU1ZZZuyTxnv0YDUEKG6keFglLYVp YFbgwmeavcySNlDPYg0dOK0FioO91ESfQzHrg2ZMf9mli0xTBBxYcrH8Wt+u2rrnR5z9 A3y2XXZkfE/8FQLRmCLDqQBkdF4Jea3qHaInCg2XmXZ7SX/hY+5dhfs0x4LYQTYTMEuY 0Wv7sFE5nhy5KZs12bdNgomgMzaEpT89es/KYQIJQQEIhrqz/1VcGu9BXixPVX7yDubD JAXKZjkgDI18HdxcaCjTPFfHQVdyhXbZGSpETB34QcYCJwxQdirc8uNKZ5u8inNmn8CV ixQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712954168; x=1713558968; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fwF5Q5R7Qkp4qYnTqzK2FqyRjSnKNNcNhwy8t7pOLVc=; b=fXJZcdaso3gxxrEeZON5cIaKKGw5HqnizMwLyS9DQMF1ySpOscS1TwcEh+YiKDQL/2 hYODrR7LkiHoaHR1CZdjMyNHOhDY+h9JsW0Vtbgp2jnX0d5IhK6OpCfLK0TcrlidMjTe LB1/Jpi5zuN4VjdrytLBdM1mLRkC8UwD3WzbilUzZY5RDmWt0UOBy6Qr52r8KB4aw+sK RWTnnpa0UD5c/X1E9yEiFEzsahDhiQkScVjlZQ/XrhNxjokbGv56hq9g5L2cttGIl4y8 OguNeiiCTyc4zBiJO0sLE4M2898udX6laontH2A6baIbOyNxkpyIt81aZyAanZ0UH3Ni 5Spg== X-Forwarded-Encrypted: i=1; AJvYcCXs8wFbt//w4hEearpgQwC2M2QE/TeQ72edPgL/y05wuyFd1VaxBpemU2AHTWDYSX/jHV3O/309LnTqhRVmSPn1vso= X-Gm-Message-State: AOJu0Yx+MX0Pen408c8jZSgOUFMKece+Tz2aR5PmQCP4+W0N+V94CEpd XdbdvkUldJaXPce9dwAfLKfKxcyrZ1v90/V9OUrLgs3HJE1upVaFMV+43tBudSmBH7ts8q0f2uw AV/YWTqgCOR84Bnc8TfPVz2z0IHw= X-Google-Smtp-Source: AGHT+IGsh7L52vGNIGsYorqSd0cWDewuHZsE+togqNSWvkcBi7SwWCLDBnTQNNy6VtGVoKaPQ8tnyb6IAV45bVVJScY= X-Received: by 2002:a17:907:9705:b0:a52:4403:9c2 with SMTP id jg5-20020a170907970500b00a52440309c2mr492944ejc.14.1712954167248; Fri, 12 Apr 2024 13:36:07 -0700 (PDT) MIME-Version: 1.0 References: <20240411153232.169560-1-zi.yan@sent.com> <2C698A64-268C-4E43-9EDE-6238B656A391@nvidia.com> In-Reply-To: From: Yang Shi Date: Fri, 12 Apr 2024 13:35:56 -0700 Message-ID: Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list To: David Hildenbrand Cc: Zi Yan , linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Ryan Roberts , Barry Song <21cnbao@gmail.com>, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 465E040006 X-Stat-Signature: n9hahhzf7c8wgtfz9xuhu43ut5igotr7 X-HE-Tag: 1712954169-856989 X-HE-Meta: U2FsdGVkX1/nLgaJDNjpsA7jxAduNRqxpMgFKAZBQQVEpyITAbL/etvkiig9me7rcfOD86E3ztzQV2Arr/i27E9GkzBnmsZNZNh8cA+oUPtYeRdqdom6ClFp+MLjD6cacN+OO7wThV7xDgQwZiUwbOaTKfBMr6KNBDmuKxqXVOOxcUwF78mFlZTeitPRCw85sa2rxAVljCJ74/2y39ZUxsa7PS7dzAQdI9Dg9LJjdipr+52KJux0/NenMBspwkmKtQo8NnjCHuEAaei3r7tdt1snLnxzdxbNeXJOpuTgkByG+8gucsTwlmmGdLW/VxxdIXjlITGZNT+N/ZHITK6M30giQgv04wYDD95NQljPBuHp6vrLP7n+EllP14N2NiRHoajqPXq4sDzVvUTwcoW9/KTHH/TUhjcK729AIKj55WlGlAWCtHmV0Zjq3dIw5DakjytpXQXp/4WTEP62ZYq0tm3jf6CKpdGm9pImixJQvnRGoZj/7qkGqJE1mRu2fhLRonaTFSqMPO97e9ere920GN0EGyfPi0HcFcNTAzFE7g7pmEEwW4Xk7jL9IOmEhz8wtLkwemsKtXlieLv/FkCpDFp3ThOyj1A4unfyp8Z9zkwVen7ozB7PmU1Qkao/Uzdqa1TsLa96oUX02aOSJ0cHbk8S4OWYWiJmdSoarPYEHWUP7MC1qsksGP7lahGHhY4O2qP84Upwgdag9hOiW9I88Sv5doHrl0qkZDQrdDMR4icEQPSIPb2oQHI6dH0Yw6nDJXHNvuoHhEkSNe0SCCMJJVUSLUdp5oVqsc7n+0D8fD6cInNz9HOQJ3zOxIlR5mQxO71suPnd9bgIcO6uGvdgWpuKnJUMZddI3YglqnS9wLMXCj9ykj+Lz8Okl6J0rS/izbHe5+D8pTugHWtewnD38DNSfCfDmm88+d1CU+w8bPGrxyvt/MSTQP01yOTAhlyJ4G+KraGlHVKG3ZhAFri TP3tQvX1 RWSFjAmaV+i7WsfQ0NISYSOqDRWjpvKAgwmMi/4SEdDBE2Y2NM5P0Y/vvc2QDkfOdsD2JTmfMy+NhAB3BlkCcUACiJRFq+JkYJow7G8VzSDd+D1TIFsl8o/T6nK7B1vggbTkSae8L7KXP7pJscpSgRwgSE7eOuczjJalYn116flZz008zmskhTU0gPnp3Ok66yOUOjaoe1+AjVKp8NxejZq1c7dx1uwvDp64IK5w8313SauHb5xRRBjBJA7mdOwvcNMCVbWHEBIIgGYvomb1a/VzxEOFr4OuzEPjMFXHdJGlQSNNwW5dCIeVThnCbk0tCHvxelo6JNHDbUsV0YAcqQZKlUMxesymaRfagbg+MpeCdYuoUxdItXdZaP5/rL2G1Uo583RhLavqAVAE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 12, 2024 at 12:33=E2=80=AFPM David Hildenbrand wrote: > > On 12.04.24 16:35, Zi Yan wrote: > > On 11 Apr 2024, at 11:46, David Hildenbrand wrote: > > > >> On 11.04.24 17:32, Zi Yan wrote: > >>> From: Zi Yan > >>> > >>> In __folio_remove_rmap(), a large folio is added to deferred split li= st > >>> if any page in a folio loses its final mapping. It is possible that > >>> the folio is unmapped fully, but it is unnecessary to add the folio > >>> to deferred split list at all. Fix it by checking folio mapcount befo= re > >>> adding a folio to deferred split list. > >>> > >>> Signed-off-by: Zi Yan > >>> --- > >>> mm/rmap.c | 9 ++++++--- > >>> 1 file changed, 6 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/mm/rmap.c b/mm/rmap.c > >>> index 2608c40dffad..d599a772e282 100644 > >>> --- a/mm/rmap.c > >>> +++ b/mm/rmap.c > >>> @@ -1494,7 +1494,7 @@ static __always_inline void __folio_remove_rmap= (struct folio *folio, > >>> enum rmap_level level) > >>> { > >>> atomic_t *mapped =3D &folio->_nr_pages_mapped; > >>> - int last, nr =3D 0, nr_pmdmapped =3D 0; > >>> + int last, nr =3D 0, nr_pmdmapped =3D 0, mapcount =3D 0; > >>> enum node_stat_item idx; > >>> __folio_rmap_sanity_checks(folio, page, nr_pages, level); > >>> @@ -1506,7 +1506,8 @@ static __always_inline void __folio_remove_rmap= (struct folio *folio, > >>> break; > >>> } > >>> - atomic_sub(nr_pages, &folio->_large_mapcount); > >>> + mapcount =3D atomic_sub_return(nr_pages, > >>> + &folio->_large_mapcount) + 1= ; > >> > >> That becomes a new memory barrier on some archs. Rather just re-read i= t below. Re-reading should be fine here. > > > > Would atomic_sub_return_relaxed() work? Originally I was using atomic_r= ead(mapped) > > below, but to save an atomic op, I chose to read mapcount here. > > Some points: > > (1) I suggest reading about atomic get/set vs. atomic RMW vs. atomic > RMW that return a value -- and how they interact with memory barriers. > Further, how relaxed variants are only optimized on some architectures. > > atomic_read() is usually READ_ONCE(), which is just an "ordinary" memory > access that should not be refetched. Usually cheaper than most other stuf= f > that involves atomics. > > (2) We can either use folio_large_mapcount() =3D=3D 0 or !atomic_read(map= ped) > to figure out if the folio is now completely unmapped. > > (3) There is one fundamental issue: if we are not batch-unmapping the who= le > thing, we will still add the folios to the deferred split queue. Migratio= n > would still do that, or if there are multiple VMAs covering a folio. Maybe we can let rmap remove code not touch deferred split queue in migration path at all because we know all the PTEs will be converted to migration entries. And all the migration entries will be converted back to PTEs regardless of whether try_to_migrate succeeded or not. > > (4) We should really avoid making common operations slower only to make > some unreliable stats less unreliable. > > > We should likely do something like the following, which might even be a b= it > faster in some cases because we avoid a function call in case we unmap > individual PTEs by checking _deferred_list ahead of time > > diff --git a/mm/rmap.c b/mm/rmap.c > index 2608c40dffad..356598b3dc3c 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1553,9 +1553,11 @@ static __always_inline void __folio_remove_rmap(st= ruct folio *folio, > * page of the folio is unmapped and at least one page > * is still mapped. > */ > - if (folio_test_large(folio) && folio_test_anon(folio)) > - if (level =3D=3D RMAP_LEVEL_PTE || nr < nr_pmdmap= ped) > - deferred_split_folio(folio); > + if (folio_test_large(folio) && folio_test_anon(folio) && > + (level =3D=3D RMAP_LEVEL_PTE || nr < nr_pmdmapped) && > + atomic_read(mapped) && > + data_race(list_empty(&folio->_deferred_list))) > + deferred_split_folio(folio); > } > > > I also thought about handling the scenario where we unmap the whole > think in smaller chunks. We could detect "!atomic_read(mapped)" and > detect that it is on the deferred split list, and simply remove it > from that list incrementing an THP_UNDO_DEFERRED_SPLIT_PAGE event. > > But it would be racy with concurrent remapping of the folio (might happen= with > anon folios in corner cases I guess). > > What we can do is the following, though: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index dc30139590e6..f05cba1807f2 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3133,6 +3133,8 @@ void folio_undo_large_rmappable(struct folio *folio= ) > ds_queue =3D get_deferred_split_queue(folio); > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (!list_empty(&folio->_deferred_list)) { > + if (folio_test_pmd_mappable(folio)) > + count_vm_event(THP_UNDO_DEFERRED_SPLIT_PAGE); > ds_queue->split_queue_len--; > list_del_init(&folio->_deferred_list); > } > > Adding the right event of course. > > > Then it's easy to filter out these "temporarily added to the list, but ne= ver split > before the folio was freed" cases. > > > -- > Cheers, > > David / dhildenb >