From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7251CC25B10 for ; Mon, 13 May 2024 15:40:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA22C6B0161; Mon, 13 May 2024 11:40:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C52E26B0163; Mon, 13 May 2024 11:40:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF2D86B0165; Mon, 13 May 2024 11:40:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8B0636B0161 for ; Mon, 13 May 2024 11:40:32 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 47F8BA0EBC for ; Mon, 13 May 2024 15:40:32 +0000 (UTC) X-FDA: 82113784704.01.BD9D2F0 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf27.hostedemail.com (Postfix) with ESMTP id 678064001B for ; Mon, 13 May 2024 15:40:30 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a5BUd7Ji; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715614830; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n4C3Hrs0stE+kgVQ5iHNylL/VPsSbKxrz7oL6n9VAFU=; b=EQaD/gEvjXOoVDGrd/56acm8nvSoUIk/tgEtC9paMT4m3r+KLAI6CQ1Jy4Qd9qxmWI5Z8l Yjwgwl8OKR8F3ly0gnO8hFCMz8VLF/CwXXQIAjtGY/oWY6pgWXWgPtkFc5LxQo7XKizQXO ZR8ruOGkKTCD6zM/XFsWsDkiXFZa8h8= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a5BUd7Ji; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715614830; a=rsa-sha256; cv=none; b=6+b+0wru2DCJF5JiypW28VF5aYNwkHd9JvsjnjBS0JYPK6qam1FlVbeBbdK/7ZVWHjmeVm dU3g8095UQZhi8Hi8B9t40K83KhKWLRN/sRxpOXBVDWdhIShOPyzpIXqbZmou4J1bw13xT jN6p4LAsMCwYOalrbqfG69kelgX6u8M= Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-a5a5cb0e6b7so365602666b.1 for ; Mon, 13 May 2024 08:40:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715614829; x=1716219629; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n4C3Hrs0stE+kgVQ5iHNylL/VPsSbKxrz7oL6n9VAFU=; b=a5BUd7JiX5IJstXsGN/uNK2ABZkkiR4F4FNSTOJj/mPT8cWZxU2Hfm/03lG+tBrXDh 5Vqhdo6Gz8Q5UFbqLP7b20iMW8YWoZztLrWD1THM7E1kYu0n+DLca5beI2MeRi8ecepC 3B1m5YYA73mpCEXUIJ5Lxpp57zpJjAaZwC1EIkVGHuZ76Vx1RuBi9pdo+USEXl40wAZV J2pQllRD/dYbI6s2F/7VwcSLahYZ8Rf0t3+0BFAzena6zJ3DNTlbc1UXwexgYnpcgO0e /kAiIwiuFRQcSm9ze/FTo3LiAgLSMjymG51v7qmqzuT+SpSoTQlcOG8lpuEhARRlQu4f EjDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715614829; x=1716219629; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n4C3Hrs0stE+kgVQ5iHNylL/VPsSbKxrz7oL6n9VAFU=; b=V7tyYErjv4Q89CxnyUGx8xmM7EZ78jpBSl/Sqw7g1xKe5FpUJaB4oorJKhVL8hc436 qse5ymvGNWZBLgeHicIW+9fBzU4+g4C0YvBpLDhCox7/mmbLLp3Gw4KW0Z7LInj6Jyxb 0EkliQtf+X5p+9RxNeA5BljedlbilLx/VZHl2qnuImzr/D6r2+AhP59qyshmgWK6Xt6D pQUk5BTU4w0ZYe9f2lBQaEDkaRhlV1LCVJ4wqzKDDNLdndp4+eyeF4p/5m/py5BW42k1 3HjkIBA1kd3cH2Dz4ljCSF+BC6ycZPgKlkfgnpWvEXAWzlaGRhUZwGACBj4rv5VprXqt Vb1Q== X-Forwarded-Encrypted: i=1; AJvYcCWGKRys+OY8PKQJK9qFuScArhgyCTDhBcdKIwHULLW88lBBW4svwhlREVZTNam+JOd7yyvBnu9F8Ym1HjOJEDnF8mA= X-Gm-Message-State: AOJu0YwDAIQg6/pk5nkEEcpvZsKcUQ6yxJTCB/PNy6J2nLsnzomx09FI bzEIQAz/FQDIPSYy2WEHC0vMExi1z1V5z55GxYHbY7vNeb1APPD41Gwo17C2eXDNcQymYFefnG1 jTqgwesIFnpf53N/1lItNmmuquH2kcQ== X-Google-Smtp-Source: AGHT+IGtm7VkrG7t4KXaJAmNoNAbFLWP8iIFvNAwQeurVIPpTHwQl2ttcvr+yUiI178qOhc1sGOFXSlQmxP7LvLXoiU= X-Received: by 2002:aa7:d458:0:b0:573:5c38:fec5 with SMTP id 4fb4d7f45d1cf-5735c38ff12mr4740651a12.16.1715614488314; Mon, 13 May 2024 08:34:48 -0700 (PDT) MIME-Version: 1.0 References: <20240511032801.1295023-1-linmiaohe@huawei.com> In-Reply-To: <20240511032801.1295023-1-linmiaohe@huawei.com> From: Yang Shi Date: Mon, 13 May 2024 09:34:36 -0600 Message-ID: Subject: Re: [PATCH] mm/huge_memory: mark huge_zero_folio reserved To: Miaohe Lin Cc: akpm@linux-foundation.org, nao.horiguchi@gmail.com, xuyu@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 678064001B X-Stat-Signature: rebhsq731968a8shx91bokg1436936og X-HE-Tag: 1715614830-950856 X-HE-Meta: U2FsdGVkX19NESzZ72L+Of+hc4IysvFoK9C6L2a39HLTZtOXvuY+NmUjtZIDGguP+rkm84Ei/l8D3S/5L50RJrpueu5GFTNvBimFZDNW7QfEjRYrv5BcwGd+TP+MJE8274WlyLu3+QXAEC1K9xp8RXpjiWeY3uotoQjqz+yezkKs466z6rW8XRZSBWyciUP0Qn/snBbnY45ITaOoDOz+CzBhqVrsdXAUF//xZrX9xNOop1DGd58beQUYDAA/ml3BKA+/C59F8sEV5iR5U7cbCcXO+1cp54VEaHXhlLdHm9cG6NrHJrL3OZLp9uKZrw0bZkoHVQKyhNgr40j9VJt2kev5/4vV5iT1W7mZh1ex8VSr+fIrfdhsfuiOFcYnqQz+AOF/JBZC5QXvNzNxqI3X1/f8wloQbzd7mJetfYnn8zv+/ZOu1EKGTwCK7mVlGhq90qE8K97c+JvKP8B28xMNr6n3XniP0wvMayfA+MIvzUndmuTJZS2UvmnN5UcyqnIgFqMJwsbogoj5Lonz6FZ3uABDuwI5/6VpXkJZFiPMzkTbm1qqoLc4Eh27No/SW8Euh82e5/SCjxNFm4dEQ0s3BsTeoW4G3siseXuCYL+H6+iu8I11KvvoYnULKN8ieMUhkH03bqXviROCmoLSrJL7HKHPHNOhZ6gAvEplzFKRFhbW97BJqhKVgKyBbHro/q49HWjO+2uiL+phy48RxldY6RaFWtEEnG2+Ox7+6hOAWiALsodFWx+t3AtvYYAmHhzpi/b5xWZ3hug/jYDQ04KmmsvTU8Yb2h0man0uPkWOQroFs97K/GIv5H7MU6JaL2RUm8rt1qA56Gp7VBA29uWlNp9K8m3NzzqKl8go4NSdAlLZZsvMshxf97NEfJgZIl7zfwM57cJ7LpNWEI0rVkkqiDCdKXibzLWfHWYudTZfsqe9XurYic2Vgg4XSV36qf1dh2Ub0HyzyF7Zrg68HXH /fS9VnbY Ga0RZ+CfBxW7/oYJUaAGCYdxJUs1mfgZQEe7Pj6SryZ525OU6ZjWddLxWedLzTcJzuY5MwxLZaTwuoG61JZiaV8TXJZ5Ol4PkgOykyuTBTYMFe6/MC/K1D2Q1yPHTAxnmlQ6jvc1aGVVbNr4bDyr0UiSb1xpl0HFu4v+Rffao6C2Qbi98Xf8zCm0PcoAiGI/NaXinHUuN3F+m05KXZmlzTyOuv5vSYd9dPq8HaEmpnhhqtO0V3uQDq75NStRE8Fbqn12yCH8WPkEuX6hGr6OUzUUSrdKO7uUCYiYidQNgsdvn/DsFq8hf73++bzB6KWYcjs2n6yvFnkDgjbVzxotlV/MeMwhyQ9oGd7NXQuQk8/4xuhJ7fDWpNoQo5ZRofpcFhwF3Z6ButR0qxOqc75wtEktWEKq1T27rv2PcfZquhnAMifBlqE0u0wOvcZHstWPWYBPV2/w6yT3JivkbIaSRnZFJWrydCUdchm77eo+sBTghZCf/7P4y41jRk2rSwbcpgJ86llJWR6W20Ey7NpaQOhZrrqWebUd71os0htg6LBuh9Sw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 10, 2024 at 9:31=E2=80=AFPM Miaohe Lin w= rote: > > When I did memory failure tests recently, below panic occurs: > > kernel BUG at include/linux/mm.h:1135! > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-= dirty #14 > RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 > RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 > RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 > RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 > RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 > R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 > FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:000000000000= 0000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 > Call Trace: > > do_shrink_slab+0x14f/0x6a0 > shrink_slab+0xca/0x8c0 > shrink_node+0x2d0/0x7d0 > balance_pgdat+0x33a/0x720 > kswapd+0x1f3/0x410 > kthread+0xd5/0x100 > ret_from_fork+0x2f/0x50 > ret_from_fork_asm+0x1a/0x30 > > Modules linked in: mce_inject hwpoison_inject > ---[ end trace 0000000000000000 ]--- > RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 > RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 > RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 > RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 > RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 > R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 > FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:000000000000= 0000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 > > The root cause is that HWPoison flag will be set for huge_zero_folio > without increasing the folio refcnt. But then unpoison_memory() will > decrease the folio refcnt unexpectly as it appears like a successfully > hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) =3D=3D 0) > when releasing huge_zero_folio. > > Fix this issue by marking huge_zero_folio reserved. So unpoison_memory() > will skip this page. This will make it consistent with ZERO_PAGE case too= . If I read the code correctly, unpoison_memory() should not dec refcount for huge zero page by calling put_page_testzero(). The huge zero page's real refcount is actually maintained separately by huge_zero_refcount. It is different from the regular refount in struct folio, see get_huge_zero_page(). > > Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge= _zero_page") > Signed-off-by: Miaohe Lin > Cc: > --- > mm/huge_memory.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 317de2afd371..d508ff793145 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -212,6 +212,7 @@ static bool get_huge_zero_page(void) > folio_put(zero_folio); > goto retry; > } > + __folio_set_reserved(zero_folio); > WRITE_ONCE(huge_zero_pfn, folio_pfn(zero_folio)); > > /* We take additional reference here. It will be put back by shri= nker */ > @@ -264,6 +265,7 @@ static unsigned long shrink_huge_zero_page_scan(struc= t shrinker *shrink, > struct folio *zero_folio =3D xchg(&huge_zero_folio, NULL)= ; > BUG_ON(zero_folio =3D=3D NULL); > WRITE_ONCE(huge_zero_pfn, ~0UL); > + __folio_clear_reserved(zero_folio); > folio_put(zero_folio); > return HPAGE_PMD_NR; > } > -- > 2.33.0 >