From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3629EC3DA7F for ; Thu, 15 Aug 2024 18:47:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E7996B01AE; Thu, 15 Aug 2024 14:47:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 897126B01B1; Thu, 15 Aug 2024 14:47:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7866E6B01B3; Thu, 15 Aug 2024 14:47:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5B0556B01AE for ; Thu, 15 Aug 2024 14:47:51 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 066621416E2 for ; Thu, 15 Aug 2024 18:47:51 +0000 (UTC) X-FDA: 82455363942.05.5510BB0 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf20.hostedemail.com (Postfix) with ESMTP id ED3B71C000B for ; Thu, 15 Aug 2024 18:47:47 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M6Si+AI5; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723747596; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=000/N2rfsity7gtDbLUmE3L8eRNpnAUmpQF4kSO37nM=; b=5yHOnkDayz5xUTyoasBFBRTvi6x/pWqwm0bqNkP/J6jyDrqaybDkws3qfKRDatu5G1fmx1 mrgFK80OmZvDlD3eUE+6d6lLkUzGM0ivR7WT44QhrvO24M6muKrBIa8Z2K1cdpHBT+g2dj 9XDykOwfastwUWuWcuERZo+HvkdNi/Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723747596; a=rsa-sha256; cv=none; b=lBvmJ3wx4+129VkNBxqyh98moBJ6wpSwSslPaZyCCb2KK2q5UDwRQRKRLQfK2+LWd7/+i9 eTDqQHmOhDTq6P53JeM8TnR1bEK4CzH6H/2Vd9TysNzjIiubigXbMJjLTT/rK9Gb1aQs9h sUUA9UOd8mgMpNIlU3I3JW/7bBwM758= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=M6Si+AI5; spf=pass (imf20.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f181.google.com with SMTP id 38308e7fff4ca-2f1798eaee6so12131121fa.0 for ; Thu, 15 Aug 2024 11:47:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723747666; x=1724352466; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=000/N2rfsity7gtDbLUmE3L8eRNpnAUmpQF4kSO37nM=; b=M6Si+AI5krXcoVvtDOmD4xLRa1m2I671NPtWYojVJjEkHz72x/C9+F811/TQacr7hF 3AT3za/gqOoQ6YorPqqTUnLWOfI2lV9HF6Vr+WKvBVeLgcgboxdQDiblUhx+V8ISHEqm SNTYktFw4zvKVMwxxn8FxJITtRcK5D/k2FqBeF0Kk69/7AWVEo9/icqHTVTvB7A20QJv QVAR5eDLh//VQKoZ/36jMBg9V5Y5EwFYyX2ZlnL//GhaRLWaw8HAWmOrebO/JDLO926J AlG4NaPdCaHANlSKLhvIXSz9f2SW06k/hd6klHgFee4hUG2S4J2bVyRh1iYug9agOdiR AvAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723747666; x=1724352466; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=000/N2rfsity7gtDbLUmE3L8eRNpnAUmpQF4kSO37nM=; b=kWgdGC8EkN3qcvN86e3EsGgLNi+cKVXaJjSdSfRZ/E+ZSIvEfHPY0ub5iKvO9vCg0d R6+arEXyzvVx8njy2CweUEtcTG7cLW6tf3/JJa1c88r+VpJ57XhvO3S4UJLt1skpxQ7w TiDXxooOVkVu3NSIXZj9kyrCCA22+9ngbBJx4PsBTu6PUXmo+K1GueMN/FyFVFN5pfOw QtqFwHRQbDbW9FNxkq41axZ7UrVTYT79uU7OPwDvbCz7/3ZlL886FBI0+ZamAopMFnco OHzX9TBHOfxVjlkZlAeMcJnn9MQzZ3QeQLjcLNe6BpAggidUVG7O0W3oX6mJCKYcdKy1 RbZw== X-Forwarded-Encrypted: i=1; AJvYcCUy9dYlwFyVoMy52uSCH/Lflyunuixncwva+pE61EK+KOsci8ExoyyPBK0oZBl3g9/kO2wYodwBCPkFqPiDOAYovuw= X-Gm-Message-State: AOJu0Yz2nc2pwVa0t6PiaqcQHn2y6gl4mBu3njz/wwwB47HyfXv/ZYdl Yx7yP2ETQ1e3nJgT5D8mkj+PJynIZWYv6hIgZRFwtXxvYRIVoXJKSM3VlkhTmaF7PtDDfz4Z1fG eq9aiE7fAHuYLmcIuEyWN0VlGIvc= X-Google-Smtp-Source: AGHT+IHpd/ooZF4JGJ3IX1yrgySP7VyhGDXtZy/A0T8RkWktaSGfQ8BBDm01aePCmQ5pb9N6/YD07coiAmrNi7gs+pQ= X-Received: by 2002:a2e:741:0:b0:2ef:22e6:234b with SMTP id 38308e7fff4ca-2f3be5a6a70mr2921151fa.26.1723747665566; Thu, 15 Aug 2024 11:47:45 -0700 (PDT) MIME-Version: 1.0 References: <20240813120328.1275952-1-usamaarif642@gmail.com> <20240813120328.1275952-2-usamaarif642@gmail.com> In-Reply-To: <20240813120328.1275952-2-usamaarif642@gmail.com> From: Kairui Song Date: Fri, 16 Aug 2024 02:47:29 +0800 Message-ID: Subject: Re: [PATCH v3 1/6] mm: free zapped tail pages when splitting isolated thp To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: ED3B71C000B X-Stat-Signature: umdpeij8najg6wq8kc7t1u4gbugu5ge4 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1723747667-597683 X-HE-Meta: U2FsdGVkX19HyBxJ51Nnbo1SmI1TYS/eUAcnumv+WeOis05eq8VKS9CASmM/12mwTVWsBJ5k7uuqhw+7RJCJvaHKkjoLaQmien7Ym7gyFu1aX688bdEBTJM6riEoEs4r8gGCbfOh9kyrm478Z57TlS2IQ2+iARqGUTAp/WwpfFTJe2KHyIkyuhkMn9vAMcVneYCtDBjPPU65QNdu2NSOgu6l3deaM06DtffgVSl2GhnmLJj6VoS3K6LUiQUrDl/PBi291xRT48v+LQom4h6K7DzHlJqMcRHiBRVGlyPn3H6XSHf/65PLn26e2R/dH5KLotmSviCJUwg4kaYUVckdWSVVP2r0qOeTUIDKNcoSr2oZc4JUXmsOUHQKl3n+T4mIRPn90ChS5/a8ZRGhWrykHf63qnZC0P2Mgszf0bHYG/1E9kO3tjIPpC717nPDrf1MTYzojremEV25CqYqQwiMETKmtAb9bxROuK2nQyYaoBct7S24Oj8RcoClHft3P7e/zSMzAuFisrEVtwoUlewOMcYRIgNZDktklxf4f9g++lf8bbshRZVHlPD9KG/bvb1W/IEm7oN3UsyLpjKApJfJdk+tMFXPDBjNUyKsL5OHJ7+F3FOJt55D/KACqg91K54gErpCulWU+l6Pn6gqx7Sr8+tjB2tkZRYj2q0mKd3OgtcoSNDGaX9zy8GzXLDjgNRkwLK1En507mt6BBTTc7nUhKpE84J7cutAkwVdRJFDtH4U3EBjXhXDvmGbpV4N8+4uIPmsVn1pJAUsqFwenuGIm/Cg3deIGkv/9wCqOexCIqFfxeqFISMaqgLrMz7gjSoyc1wnxG2UbUgRzihC8hw3VwvsTnJXZ8wsur+dH8OF37/jTcBWeYkysAgzWcyZRR5zvfqnlDz/jlX3Ztfb+BkL4DSI//SZGGzq8j1h6cM8rdCn3ZZ9CivZ5NByZiZm+ylANkajyWEOwgfZT3fM/Cg QdR4zhht EUMtucujSlTz0wxWz/QvMRjtRLhBYr67Dsh4hGNPnGDZ2kqyhXFh8WuPlsx+ZUGPWI7N9ESNKqcq9Db1idvz5G69WV4FfV7eKpYQ2boBsjFdjLuEMiwftX44AGQVSQLwGaMBH4lFqv48xq2do6EEoMEnGXFm6GIMQ6GBh0omcppjHgQdteYak5rOA1CAsa1d560oVCFtRsC3F15nyK6+ZY1+H9QzIfqK+wdLa00rcXVLH3rJt5kJUWLSr9tAF/JxkPo0Rf0kLMb5iI+84bv8oy4m6+tVc2TUdX9KAbphJvmMyw/5r7uQkA81u4gNAVgL3MTVRoL5kZ/wAoWN9vUUOcPFbLvE8TC0OgUDopLCAfiO7Pzc7BMlBRBTHmDqmIna5WJol7Dkbub0z+V07UikGxUbQmQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 13, 2024 at 8:03=E2=80=AFPM Usama Arif = wrote: > > From: Yu Zhao > > If a tail page has only two references left, one inherited from the > isolation of its head and the other from lru_add_page_tail() which we > are about to drop, it means this tail page was concurrently zapped. > Then we can safely free it and save page reclaim or migration the > trouble of trying it. > > Signed-off-by: Yu Zhao > Tested-by: Shuang Zhai > Signed-off-by: Usama Arif > Acked-by: Johannes Weiner > --- > mm/huge_memory.c | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) Hi, Usama, Yu This commit is causing the kernel to panic very quickly with build kernel test on top of tmpfs with all mTHP enabled, the panic comes after: [ 207.147705] BUG: Bad page state in process tar pfn:14ae70 [ 207.149376] page: refcount:3 mapcount:2 mapping:0000000000000000 index:0x562d23b70 pfn:0x14ae70 [ 207.151750] flags: 0x17ffffc0020019(locked|uptodate|dirty|swapbacked|node=3D0|zone=3D2|lastcpu= pid=3D0x1fffff) [ 207.154325] raw: 0017ffffc0020019 dead000000000100 dead000000000122 0000000000000000 [ 207.156442] raw: 0000000562d23b70 0000000000000000 0000000300000001 0000000000000000 [ 207.158561] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 207.160325] Modules linked in: [ 207.161194] CPU: 22 UID: 0 PID: 2650 Comm: tar Not tainted 6.11.0-rc3.ptch+ #136 [ 207.163198] Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015 [ 207.164946] Call Trace: [ 207.165636] [ 207.166226] dump_stack_lvl+0x53/0x70 [ 207.167241] bad_page+0x70/0x120 [ 207.168131] free_page_is_bad+0x5f/0x70 [ 207.169193] free_unref_folios+0x3a5/0x620 [ 207.170320] ? __mem_cgroup_uncharge_folios+0x7e/0xa0 [ 207.171705] __split_huge_page+0xb02/0xcf0 [ 207.172839] ? smp_call_function_many_cond+0x105/0x4b0 [ 207.174250] ? __pfx_flush_tlb_func+0x10/0x10 [ 207.175410] ? on_each_cpu_cond_mask+0x29/0x50 [ 207.176603] split_huge_page_to_list_to_order+0x857/0x9b0 [ 207.178052] shrink_folio_list+0x4e1/0x1200 [ 207.179198] evict_folios+0x468/0xab0 [ 207.180202] try_to_shrink_lruvec+0x1f3/0x280 [ 207.181394] shrink_lruvec+0x89/0x780 [ 207.182398] ? mem_cgroup_iter+0x66/0x290 [ 207.183488] shrink_node+0x243/0xb00 [ 207.184474] do_try_to_free_pages+0xbd/0x4e0 [ 207.185621] try_to_free_mem_cgroup_pages+0x107/0x230 [ 207.186994] try_charge_memcg+0x184/0x5d0 [ 207.188092] charge_memcg+0x3a/0x60 [ 207.189046] __mem_cgroup_charge+0x2c/0x80 [ 207.190162] shmem_alloc_and_add_folio+0x1a3/0x470 [ 207.191469] shmem_get_folio_gfp+0x24a/0x670 [ 207.192635] shmem_write_begin+0x56/0xd0 [ 207.193703] generic_perform_write+0x140/0x330 [ 207.194919] shmem_file_write_iter+0x89/0x90 [ 207.196082] vfs_write+0x2f3/0x420 [ 207.197019] ksys_write+0x5d/0xd0 [ 207.197914] do_syscall_64+0x47/0x110 [ 207.198915] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 207.200293] RIP: 0033:0x7f2e6099c784 [ 207.201278] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 08 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 $ 8 89 e5 48 83 ec 20 48 89 [ 207.206280] RSP: 002b:00007ffdb1a0e7d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 207.208312] RAX: ffffffffffffffda RBX: 00000000000005e7 RCX: 00007f2e609= 9c784 [ 207.210225] RDX: 00000000000005e7 RSI: 0000562d23b77000 RDI: 00000000000= 00004 [ 207.212145] RBP: 00007ffdb1a0e820 R08: 00000000000005e7 R09: 00000000000= 00007 [ 207.214064] R10: 0000000000000180 R11: 0000000000000202 R12: 0000562d23b= 77000 [ 207.215974] R13: 0000000000000004 R14: 00000000000005e7 R15: 00000000000= 00000 [ 207.217888] Test is done using ZRAM as SWAP, 1G memcg, and run: cd /mnt/tmpfs time tar zxf "$linux_src" make -j64 clean make defconfig /usr/bin/time make -j64 > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 04ee8abd6475..85a424e954be 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3059,7 +3059,9 @@ static void __split_huge_page(struct page *page, st= ruct list_head *list, > unsigned int new_nr =3D 1 << new_order; > int order =3D folio_order(folio); > unsigned int nr =3D 1 << order; > + struct folio_batch free_folios; > > + folio_batch_init(&free_folios); > /* complete memcg works before add pages to LRU */ > split_page_memcg(head, order, new_order); > > @@ -3143,6 +3145,26 @@ static void __split_huge_page(struct page *page, s= truct list_head *list, > if (subpage =3D=3D page) > continue; > folio_unlock(new_folio); > + /* > + * If a folio has only two references left, one inherited > + * from the isolation of its head and the other from > + * lru_add_page_tail() which we are about to drop, it mea= ns this > + * folio was concurrently zapped. Then we can safely free= it > + * and save page reclaim or migration the trouble of tryi= ng it. > + */ > + if (list && folio_ref_freeze(new_folio, 2)) { > + VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), = new_folio); > + VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio)= , new_folio); > + VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), ne= w_folio); > + > + folio_clear_active(new_folio); > + folio_clear_unevictable(new_folio); > + if (!folio_batch_add(&free_folios, folio)) { > + mem_cgroup_uncharge_folios(&free_folios); > + free_unref_folios(&free_folios); > + } > + continue; > + } > > /* > * Subpages may be freed if there wasn't any mapping > @@ -3153,6 +3175,11 @@ static void __split_huge_page(struct page *page, s= truct list_head *list, > */ > free_page_and_swap_cache(subpage); > } > + > + if (free_folios.nr) { > + mem_cgroup_uncharge_folios(&free_folios); > + free_unref_folios(&free_folios); > + } > } > > /* Racy check whether the huge page can be split */ > -- > 2.43.5 > >