From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B9180C369CA
	for <linux-mm@archiver.kernel.org>; Sat, 19 Apr 2025 16:33:56 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 7238D6B0082; Sat, 19 Apr 2025 12:33:55 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6D1B56B0083; Sat, 19 Apr 2025 12:33:55 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 5735D6B0085; Sat, 19 Apr 2025 12:33:55 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 379BC6B0082
	for <linux-mm@kvack.org>; Sat, 19 Apr 2025 12:33:55 -0400 (EDT)
Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 7F8121629AD
	for <linux-mm@kvack.org>; Sat, 19 Apr 2025 16:33:55 +0000 (UTC)
X-FDA: 83351340030.27.8A83ECA
Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44])
	by imf29.hostedemail.com (Postfix) with ESMTP id 7A7EF120011
	for <linux-mm@kvack.org>; Sat, 19 Apr 2025 16:33:53 +0000 (UTC)
Authentication-Results: imf29.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="hpLEsz/m";
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=ryncsn@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1745080433;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=7G+GIZNarc3SHoXMoseZHYLu7UGj4EQrIMKm2cXpml0=;
	b=MjkIbwuSL3Lg5JnDJ+BSSs7KuzRp36UOQ262m5Z3Kru4DSOBoTw5rNVkvNxjrkss3OPAZn
	BUzalmto5PVFJGcEWS+x6FPYh4U1ECn44Kx53c+kAyUfMjbtECViJUYU6HwhzwF9IMxfKO
	4tjZurE9nUmc4B9kwbMaRnT7i8w/myw=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745080433; a=rsa-sha256;
	cv=none;
	b=CtiV/uL32PpHJa0jGPhockn6NMqwuKczRa4kspUhoTZuSysZfhxPy9YO+fwWtbCKbzOCKy
	O+0gcU0WP4xvdStPomYhfKWRmoQI3FKQ8DS/D91CNyMvM5DZMiOEEimesTGfT/vBknzc8U
	Y8T/CaJn7JHiUe2V7sXrpRqIZmhXwdo=
ARC-Authentication-Results: i=1;
	imf29.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b="hpLEsz/m";
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=ryncsn@gmail.com
Received: by mail-lf1-f44.google.com with SMTP id 2adb3069b0e04-54d65eb26b2so2461300e87.2
        for <linux-mm@kvack.org>; Sat, 19 Apr 2025 09:33:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1745080432; x=1745685232; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=7G+GIZNarc3SHoXMoseZHYLu7UGj4EQrIMKm2cXpml0=;
        b=hpLEsz/mWvklKrWK1zdZzurFphFX4KGSrcVjBeLgKzQeF+BsHcjwHIybvhfBbjqZ8u
         CXapANfXtFSA8PXe5Bk4ICCd9929/2xN/1KZbzaWmPIm6x9yCMZpUxGgde3ux2bQwIRe
         E1lYNV6FWzPUcB88p9+wnbDBe0oMEm3sKdrRD0yaoZbaZBs0s5EYhumTZPp9EfuQdff1
         GHbz75Y9caZ8RmuhoS4FB6DZkhOmVVnfXcJVgzUpzSPsukQrWohsfPRtQSebn2wq0yNy
         e4xnF9UbWuY1UxMP2860I1OB4yXiaAG2l0ZAFFR246NFgDMF0j1J3ykkFv5DPVgb20rb
         p/UA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1745080432; x=1745685232;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=7G+GIZNarc3SHoXMoseZHYLu7UGj4EQrIMKm2cXpml0=;
        b=SSXh69lHs92GyY695X55uoV1Kt3yPVFBmkWX7L2cDwAAJ0bQhH8TvODbgrUPoepiFP
         yzfN3sRxc57Tuw1ykOgCg4sPy2jslPi9adzPwsyE5MrAOTnDiVKOs8GeCGiShSCZUI42
         NSaPrRh8R0x0BMmb5iKDJMyaDZrksiCaTV0VGzAT3hv+wUztX0faLFwGsMPh+VZDl2VU
         adMzURefYWuuvpJEHocGhIhTcER8MSxFfkgyNFhdwKEmeduqXka/rPxlptpnAHuOR3yU
         uKWUF+6l0KEgtCptFO8tqhzK1IzQ9dEcthmrgKUUI7nOts534sg4uLRGyJQ1EiCJOKWz
         oR8Q==
X-Forwarded-Encrypted: i=1; AJvYcCU0cvTVtXg5PY/aUk2s8ckBfr3vjCg10tom1024IVQaxdubQ+ddRPsb4XBvuhKWXOchDMu9ik2Lhg==@kvack.org
X-Gm-Message-State: AOJu0YyzvNHGgpdvGw0agqdjMb9MCmkU4/q2GrBapUWAFC3KqmrEDELC
	0JtpzX7BlmnPgCD3G+UtZYT5p61riszbOommiiZ/5vX487Ao7/MiX1z5/4yVvz7jNXlr+aW51kC
	CV7sHTKm/hPqQV0KPSoq+XepYChg=
X-Gm-Gg: ASbGnctcwGF3Y5gk6/tYNMhxxa75Mhv8lJOeu0vQTLWp3VMJcBS/zZvVFFHYNeMG8Qd
	zPQXF4BeKJM1h+q/HHGtqu1mnU/q49usN4N1bWo8YnQtFo0tz4MllOwVq0AcycYVILlkszDqY73
	5YQLgAPsC2Bk6488egJRsDGA==
X-Google-Smtp-Source: AGHT+IEWvft2ookxtDiV7Uwop5ebC5INPd5L7jdy0dsyDGnurVaekeNdA7/VNKhd/bKNxw2lUi1147ljR7qwNzyrDTA=
X-Received: by 2002:a05:6512:118b:b0:545:c7d:1791 with SMTP id
 2adb3069b0e04-54d6e663ee5mr1704791e87.43.1745080431264; Sat, 19 Apr 2025
 09:33:51 -0700 (PDT)
MIME-Version: 1.0
References: <20250303163014.1128035-1-david@redhat.com> <20250303163014.1128035-14-david@redhat.com>
 <CAMgjq7D+ea3eg9gRCVvRnto3Sv3_H3WVhupX4e=k8T5QAfBHbw@mail.gmail.com> <c7e85336-5e34-4dd9-950f-173f48ff0be1@redhat.com>
In-Reply-To: <c7e85336-5e34-4dd9-950f-173f48ff0be1@redhat.com>
From: Kairui Song <ryncsn@gmail.com>
Date: Sun, 20 Apr 2025 00:33:34 +0800
X-Gm-Features: ATxdqUFwMuyqe8OWkCceLCpNh3dlMS_fBv5LDlIGCufhQHJV96svfzO18XX-ZUI
Message-ID: <CAMgjq7DMpOMAi8Y9Db43P=4g7V5JPBb4E6sA_n5Rp8Y+34AjQQ@mail.gmail.com>
Subject: Re: [PATCH v3 13/20] mm: Copy-on-Write (COW) reuse support for
 PTE-mapped THP
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, 
	cgroups@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, 
	linux-api@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, 
	"Matthew Wilcox (Oracle)" <willy@infradead.org>, Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>, 
	Johannes Weiner <hannes@cmpxchg.org>, =?UTF-8?Q?Michal_Koutn=C3=BD?= <mkoutny@suse.com>, 
	Jonathan Corbet <corbet@lwn.net>, Andy Lutomirski <luto@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, 
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, 
	Dave Hansen <dave.hansen@linux.intel.com>, Muchun Song <muchun.song@linux.dev>, 
	"Liam R. Howlett" <Liam.Howlett@oracle.com>, Lorenzo Stoakes <lorenzo.stoakes@oracle.com>, 
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 7A7EF120011
X-Rspam-User: 
X-Stat-Signature: jbdzxdrbhth3wk99i4z6wyoyi7wh85f1
X-HE-Tag: 1745080433-120671
X-HE-Meta: U2FsdGVkX19/gHruSe5/CpxI+gSg/MfEOhy9Ea64qlzm/aAkA0quBOnGQ15wGy/tTX7IwBW4o9gAeJuSZ9a/7XlZu/8cptXSsgp5b5bk6UeNw+wsaerCzUJ6n7aw5aFjH0vPux74bwFSs3A1YGuVZpKJZWjCkHP3KGcQFYU58VY/+dcT26DML900TW0yJxJIWTCfUQn0f2Qz+ScxoLamJqcf8PsxA67gL//Xt/HBWjRK2vJ4QjtvUaB3VrwL6F3kGlm/Y0Qwpknnc+hSjVR4EHxlRwIMUuflzPjPgD305CX2+pyaYVOTmTEnbDbUk0MBjS6MUwocu8gqZHz9YxZMpXEOIivIAdogQzOZedCrbktEGD8pxN2cg3JaZ5G8XIlmpuNiwzvZ+1g+bS29mTvbpI/PwuQp05qOU0J45cZTBE9XddTgj/Wn+UMBWXzZlzSi7awf9plogxkZHgc49ujvWsB9AVusC/jpnHcSmvKkQ6pAbj6JTg8D5ahSzCfPkcsEfsghMm95SxOHX3izVvATqXNrKx/IRJ5HnHOIAp5Gdbf+3TbGfESwjC3MCBPKrT864GKCL6A7I9EI0RcDcqcXFTx2+H2QkRgTnaXX07Chlffs2qp6z5kSBT/PwttnGEMs6DaF34+kSZR+0SVeVZ4TD85aKatAC8I+2iJRh0nn7vAp5VbP4xZP3f29LvYjK9qlhuMpo++7h1yu3bjVmPNKvdWcsPSewy9gj2uVnWt51zgcR2Sq5bZ+pBz0szc9HjxqGAQuOACnu/UEoeOs9ewgWLp86mNI5BaRh9XR/U+XXz0pdNlRxj39yT83ywhAi1SIBIBNROwqfS9/uHC09TtJTBHbVxI6hGi9ndBTT5Z8v/W7SygLBGofm0qLY3/kUcTkzuCvC0OdCzkpvOZx3cYmucwiihc9RUBZuQFx5PvAAcI5VzN6YIeB0zHrv3LA5Q9Hla1PEiFSpH66YA1XvLY
 RP2jz+CT
 22rzXbGU/l0uZiPwTqlw3qqbgv3TsRwYH8fA8UQoSj78W8vugrZfEPxnr26ihytAwcbGDLgAWW/V39vCeNZ+0E2VKvag2k6Z6FUdz3QM6KI4PmAoEOnZBEQqAnLoHqq6+1yv0z92xIJblfsvkyyMzeHQZxS4X24qV1iPlXNMrN7IoPjGOPjFdzFiGOIC60bgPV3V3AQwXpNsJcRjuE01sOFtICDtzu24Jx+mWs1Ty82cNgdVGs4bOyOe/7knJ96EscGhqlnDFqjqtYVfaPS13qynaWn1AsFALDAhzr9ySgU6o3osdIXDKP5SBBDIDP44H4rMdWn+/fyuv6TBy0bYwXoAKSwbZUAnEKkVaCbHnmKINMg7UdcqZTtYn7g==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Sun, Apr 20, 2025 at 12:26=E2=80=AFAM David Hildenbrand <david@redhat.co=
m> wrote:
>
> On 19.04.25 18:02, Kairui Song wrote:
> > On Tue, Mar 4, 2025 at 12:46=E2=80=AFAM David Hildenbrand <david@redhat=
.com> wrote:
> >>
> >> Currently, we never end up reusing PTE-mapped THPs after fork. This
> >> wasn't really a problem with PMD-sized THPs, because they would have t=
o
> >> be PTE-mapped first, but it's getting a problem with smaller THP
> >> sizes that are effectively always PTE-mapped.
> >>
> >> With our new "mapped exclusively" vs "maybe mapped shared" logic for
> >> large folios, implementing CoW reuse for PTE-mapped THPs is straight
> >> forward: if exclusively mapped, make sure that all references are
> >> from these (our) mappings. Add some helpful comments to explain the
> >> details.
> >>
> >> CONFIG_TRANSPARENT_HUGEPAGE selects CONFIG_MM_ID. If we spot an anon
> >> large folio without CONFIG_TRANSPARENT_HUGEPAGE in that code, somethin=
g
> >> is seriously messed up.
> >>
> >> There are plenty of things we can optimize in the future: For example,=
 we
> >> could remember that the folio is fully exclusive so we could speedup
> >> the next fault further. Also, we could try "faulting around", turning
> >> surrounding PTEs that map the same folio writable. But especially the
> >> latter might increase COW latency, so it would need further
> >> investigation.
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>   mm/memory.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++----=
--
> >>   1 file changed, 75 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 73b783c7d7d51..bb245a8fe04bc 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -3729,19 +3729,86 @@ static vm_fault_t wp_page_shared(struct vm_fau=
lt *vmf, struct folio *folio)
> >>          return ret;
> >>   }
> >>
> >> -static bool wp_can_reuse_anon_folio(struct folio *folio,
> >> -                                   struct vm_area_struct *vma)
> >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >> +static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
> >> +               struct vm_area_struct *vma)
> >>   {
> >> +       bool exclusive =3D false;
> >> +
> >> +       /* Let's just free up a large folio if only a single page is m=
apped. */
> >> +       if (folio_large_mapcount(folio) <=3D 1)
> >> +               return false;
> >> +
> >>          /*
> >> -        * We could currently only reuse a subpage of a large folio if=
 no
> >> -        * other subpages of the large folios are still mapped. Howeve=
r,
> >> -        * let's just consistently not reuse subpages even if we could
> >> -        * reuse in that scenario, and give back a large folio a bit
> >> -        * sooner.
> >> +        * The assumption for anonymous folios is that each page can o=
nly get
> >> +        * mapped once into each MM. The only exception are KSM folios=
, which
> >> +        * are always small.
> >> +        *
> >> +        * Each taken mapcount must be paired with exactly one taken r=
eference,
> >> +        * whereby the refcount must be incremented before the mapcoun=
t when
> >> +        * mapping a page, and the refcount must be decremented after =
the
> >> +        * mapcount when unmapping a page.
> >> +        *
> >> +        * If all folio references are from mappings, and all mappings=
 are in
> >> +        * the page tables of this MM, then this folio is exclusive to=
 this MM.
> >>           */
> >> -       if (folio_test_large(folio))
> >> +       if (folio_test_large_maybe_mapped_shared(folio))
> >> +               return false;
> >> +
> >> +       VM_WARN_ON_ONCE(folio_test_ksm(folio));
> >> +       VM_WARN_ON_ONCE(folio_mapcount(folio) > folio_nr_pages(folio))=
;
> >> +       VM_WARN_ON_ONCE(folio_entire_mapcount(folio));
> >> +
> >> +       if (unlikely(folio_test_swapcache(folio))) {
> >> +               /*
> >> +                * Note: freeing up the swapcache will fail if some PT=
Es are
> >> +                * still swap entries.
> >> +                */
> >> +               if (!folio_trylock(folio))
> >> +                       return false;
> >> +               folio_free_swap(folio);
> >> +               folio_unlock(folio);
> >> +       }
> >> +
> >> +       if (folio_large_mapcount(folio) !=3D folio_ref_count(folio))
> >>                  return false;
> >>
> >> +       /* Stabilize the mapcount vs. refcount and recheck. */
> >> +       folio_lock_large_mapcount(folio);
> >> +       VM_WARN_ON_ONCE(folio_large_mapcount(folio) < folio_ref_count(=
folio));
> >
> > Hi David, I'm seeing this WARN_ON being triggered on my test machine:
>
> Hi!
>
> So I assume the following will not sort out the issue for you, correct?
>
> https://lore.kernel.org/all/20250415095007.569836-1-david@redhat.com/T/#u

Yes, double checked the commit I'm testing
dc683247117ee018e5da6b04f1c499acdc2a1418 (akpm/mm-unstable) includes
this fix.

>
> >
> > I'm currently working on my swap table series and testing heavily with
> > swap related workloads. I thought my patch may break the kernel, but
> > after more investigation and reverting to current mm-unstable, it
> > still occurs (with a much lower chance though, I think my series
> > changed the timing so it's more frequent in my case).
> >
> > The test is simple, I just enable all mTHP sizes and repeatedly build
> > linux kernel in a 1G memcg using tmpfs.
> >
> > The WARN is reproducible with current mm-unstable
> > (dc683247117ee018e5da6b04f1c499acdc2a1418):
> >
> > [ 5268.100379] ------------[ cut here ]------------
> > [ 5268.105925] WARNING: CPU: 2 PID: 700274 at mm/memory.c:3792
> > do_wp_page+0xfc5/0x1080
> > [ 5268.112437] Modules linked in: zram virtiofs
> > [ 5268.115507] CPU: 2 UID: 0 PID: 700274 Comm: cc1 Kdump: loaded Not
> > tainted 6.15.0-rc2.ptch-gdc683247117e #1434 PREEMPT(voluntary)
> > [ 5268.120562] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/201=
5
> > [ 5268.123025] RIP: 0010:do_wp_page+0xfc5/0x1080
> > [ 5268.124807] Code: 0d 80 77 32 02 0f 85 3e f1 ff ff 0f 1f 44 00 00
> > e9 34 f1 ff ff 48 0f ba 75 00 1f 65 ff 0d 63 77 32 02 0f 85 21 f1 ff
> > ff eb e1 <0f> 0b e9 10 fd ff ff 65 ff 00 f0 48 0f b
> > a 6d 00 1f 0f 83 ec fc ff
> > [ 5268.132034] RSP: 0000:ffffc900234efd48 EFLAGS: 00010297
> > [ 5268.134002] RAX: 0000000000000080 RBX: 0000000000000000 RCX: 000ffff=
fffe00000
> > [ 5268.136609] RDX: 0000000000000081 RSI: 00007f009cbad000 RDI: ffffea0=
012da0000
> > [ 5268.139371] RBP: ffffea0012da0068 R08: 80000004b682d025 R09: 00007f0=
09c7c0000
> > [ 5268.142183] R10: ffff88839c48b8c0 R11: 0000000000000000 R12: ffff888=
39c48b8c0
> > [ 5268.144738] R13: ffffea0012da0000 R14: 00007f009cbadf10 R15: ffffc90=
0234efdd8
> > [ 5268.147540] FS:  00007f009d1fdac0(0000) GS:ffff88a07ae14000(0000)
> > knlGS:0000000000000000
> > [ 5268.150715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5268.153270] CR2: 00007f009cbadf10 CR3: 000000016c7c0001 CR4: 0000000=
000770eb0
> > [ 5268.155674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000=
000000000
> > [ 5268.158100] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000=
000000400
> > [ 5268.160613] PKRU: 55555554
> > [ 5268.161662] Call Trace:
> > [ 5268.162609]  <TASK>
> > [ 5268.163438]  ? ___pte_offset_map+0x1b/0x110
> > [ 5268.165309]  __handle_mm_fault+0xa51/0xf00
> > [ 5268.166848]  ? update_load_avg+0x80/0x760
> > [ 5268.168376]  handle_mm_fault+0x13d/0x360
> > [ 5268.169930]  do_user_addr_fault+0x2f2/0x7f0
> > [ 5268.171630]  exc_page_fault+0x6a/0x140
> > [ 5268.173278]  asm_exc_page_fault+0x26/0x30
> > [ 5268.174866] RIP: 0033:0x120e8e4
> > [ 5268.176272] Code: 84 a9 00 00 00 48 39 c3 0f 85 ae 00 00 00 48 8b
> > 43 20 48 89 45 38 48 85 c0 0f 85 b7 00 00 00 48 8b 43 18 48 8b 15 6c
> > 08 42 01 <0f> 11 43 10 48 89 1d 61 08 42 01 48 89 53 18 0f 11 03 0f 11
> > 43 20
> > [ 5268.184121] RSP: 002b:00007fff8a855160 EFLAGS: 00010246
> > [ 5268.186343] RAX: 00007f009cbadbd0 RBX: 00007f009cbadf00 RCX: 0000000=
000000000
> > [ 5268.189209] RDX: 00007f009cbba030 RSI: 00000000000006f4 RDI: 0000000=
000000000
> > [ 5268.192145] RBP: 00007f009cbb6460 R08: 00007f009d10f000 R09: 0000000=
00000016c
> > [ 5268.194687] R10: 0000000000000000 R11: 0000000000000010 R12: 00007f0=
09cf97660
> > [ 5268.197172] R13: 00007f009756ede0 R14: 00007f0097582348 R15: 0000000=
000000002
> > [ 5268.199419]  </TASK>
> > [ 5268.200227] ---[ end trace 0000000000000000 ]---
> >
> > I also once changed the WARN_ON to WARN_ON_FOLIO and I got more info he=
re:
> >
> > [ 3994.907255] page: refcount:9 mapcount:1 mapping:0000000000000000
> > index:0x7f90b3e98 pfn:0x615028
> > [ 3994.914449] head: order:3 mapcount:8 entire_mapcount:0
> > nr_pages_mapped:8 pincount:0
> > [ 3994.924534] memcg:ffff888106746000
> > [ 3994.927868] anon flags:
> > 0x17ffffc002084c(referenced|uptodate|owner_2|head|swapbacked|node=3D0|z=
one=3D2|lastcpupid=3D0x1fffff)
> > [ 3994.933479] raw: 0017ffffc002084c ffff88816edd9128 ffffea000beac108
> > ffff8882e8ba6bc9
> > [ 3994.936251] raw: 00000007f90b3e98 0000000000000000 0000000900000000
> > ffff888106746000
> > [ 3994.939466] head: 0017ffffc002084c ffff88816edd9128
> > ffffea000beac108 ffff8882e8ba6bc9
> > [ 3994.943355] head: 00000007f90b3e98 0000000000000000
> > 0000000900000000 ffff888106746000
> > [ 3994.946988] head: 0017ffffc0000203 ffffea0018540a01
> > 0000000800000007 00000000ffffffff
> > [ 3994.950328] head: ffffffff00000007 00000000800000a3
> > 0000000000000000 0000000000000008
> > [ 3994.953684] page dumped because:
> > VM_WARN_ON_FOLIO(folio_large_mapcount(folio) < folio_ref_count(folio))
> > [ 3994.957534] ------------[ cut here ]------------
> > [ 3994.959917] WARNING: CPU: 16 PID: 555282 at mm/memory.c:3794
> > do_wp_page+0x10c0/0x1110
> > [ 3994.963069] Modules linked in: zram virtiofs
> > [ 3994.964726] CPU: 16 UID: 0 PID: 555282 Comm: sh Kdump: loaded Not
> > tainted 6.15.0-rc1.ptch-ge39aef85f4c0-dirty #1431 PREEMPT(voluntary)
> > [ 3994.969985] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/201=
5
> > [ 3994.972905] RIP: 0010:do_wp_page+0x10c0/0x1110
> > [ 3994.974477] Code: fe ff 0f 0b bd f5 ff ff ff e9 16 fb ff ff 41 83
> > a9 bc 12 00 00 01 e9 2f fb ff ff 48 c7 c6 90 c2 49 82 4c 89 ef e8 40
> > fd fe ff <0f> 0b e9 6a fc ff ff 65 ff 00 f0 48 0f b
> > a 6d 00 1f 0f 83 46 fc ff
> > [ 3994.981033] RSP: 0000:ffffc9002b3c7d40 EFLAGS: 00010246
> > [ 3994.982636] RAX: 000000000000005b RBX: 0000000000000000 RCX: 0000000=
000000000
> > [ 3994.984778] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff889=
ffea16a80
> > [ 3994.986865] RBP: ffffea0018540a68 R08: 0000000000000000 R09: c000000=
0ffff7fff
> > [ 3994.989316] R10: 0000000000000001 R11: ffffc9002b3c7b80 R12: ffff888=
10cfd7d40
> > [ 3994.991654] R13: ffffea0018540a00 R14: 00007f90b3e9d620 R15: ffffc90=
02b3c7dd8
> > [ 3994.994076] FS:  00007f90b3caa740(0000) GS:ffff88a07b194000(0000)
> > knlGS:0000000000000000
> > [ 3994.996939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3994.998902] CR2: 00007f90b3e9d620 CR3: 0000000104088004 CR4: 0000000=
000770eb0
> > [ 3995.001314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000=
000000000
> > [ 3995.003746] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000=
000000400
> > [ 3995.006173] PKRU: 55555554
> > [ 3995.007117] Call Trace:
> > [ 3995.007988]  <TASK>
> > [ 3995.008755]  ? __pfx_default_wake_function+0x10/0x10
> > [ 3995.010490]  ? ___pte_offset_map+0x1b/0x110
> > [ 3995.011929]  __handle_mm_fault+0xa51/0xf00
> > [ 3995.013346]  handle_mm_fault+0x13d/0x360
> > [ 3995.014796]  do_user_addr_fault+0x2f2/0x7f0
> > [ 3995.016331]  ? sigprocmask+0x77/0xa0
> > [ 3995.017656]  exc_page_fault+0x6a/0x140
> > [ 3995.018978]  asm_exc_page_fault+0x26/0x30
> > [ 3995.020309] RIP: 0033:0x7f90b3d881a7
> > [ 3995.021461] Code: e8 4e b1 f8 ff 66 66 2e 0f 1f 84 00 00 00 00 00
> > 0f 1f 00 f3 0f 1e fa 55 31 c0 ba 01 00 00 00 48 89 e5 53 48 89 fb 48
> > 83 ec 08 <f0> 0f b1 15 71 54 11 00 0f 85 3b 01 00 0
> > 0 48 8b 35 84 54 11 00 48
> > [ 3995.028091] RSP: 002b:00007ffc33632c90 EFLAGS: 00010206
> > [ 3995.029992] RAX: 0000000000000000 RBX: 0000560cfbfc0a40 RCX: 0000000=
000000000
> > [ 3995.032456] RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000560=
cfbfc0a40
> > [ 3995.034794] RBP: 00007ffc33632ca0 R08: 00007ffc33632d50 R09: 00007ff=
c33632cff
> > [ 3995.037534] R10: 00007ffc33632c70 R11: 00007ffc33632d00 R12: 0000560=
cfbfc0a40
> > [ 3995.041063] R13: 00007f90b3e97fd0 R14: 00007f90b3e97fa8 R15: 0000000=
000000000
> > [ 3995.044390]  </TASK>
> > [ 3995.045510] ---[ end trace 0000000000000000 ]---
> >
> > My guess is folio_ref_count is not a reliable thing to check here,
> > anything can increase the folio's ref account even without locking it,
> > for example, a swap cache lookup or maybe anything iterating the LRU.
>
> It is reliable, we are holding the mapcount lock, so for each mapcount
> we must have a corresponding refcount. If that is not the case, we have
> an issue elsewhere.

Yes, each mapcount should have a refcount, but could there be any
other thing to increase the refcount?

The check is: folio_large_mapcount(folio) < folio_ref_count(folio), so
maybe something just called folio_try_get and then put the folio
quickly.

>
> Other reference may only increase the refcount, but not violate the
> mapcount vs. refcount condition.
>
> Can you reproduce also with swap disabled?

I'm not sure how to reproduce it else way, I only saw this while the
swap stress test.

>
> --
> Cheers,
>
> David / dhildenb
>