From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85B70C47077 for ; Thu, 11 Jan 2024 17:34:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16B476B007D; Thu, 11 Jan 2024 12:34:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 11D196B008C; Thu, 11 Jan 2024 12:34:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFF186B0092; Thu, 11 Jan 2024 12:34:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D9F596B007D for ; Thu, 11 Jan 2024 12:34:44 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5F735A22DB for ; Thu, 11 Jan 2024 17:34:44 +0000 (UTC) X-FDA: 81667730088.04.69FA048 Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by imf19.hostedemail.com (Postfix) with ESMTP id 8D0361A0020 for ; Thu, 11 Jan 2024 17:34:42 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2DXK53T1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704994482; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jaPlkJTNB6VnjXnr1dR+X3Ih9MlJ8cpJxgiuigImRZw=; b=Q1sM5Hy2jrjEgboJvpbglF2HoHA54feMBotqn56zxC2ksqWf5ktjq616AmYDsAHHH5wDXm fVPnSZOLQ/wloP1bWpv1/Q6F5c80E0q54RrEF28dsnhhkRNo8B7uA3l4Nazx31dTzB2Hhi yKuuZ1D0WrD7GeNmZ22tJnjxUqoiEvE= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2DXK53T1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704994482; a=rsa-sha256; cv=none; b=rEcwe2B3ZktipjW/W4qMBoWCC7pwoTYB5AoG/rfxX5t8tq4iXSnDd7C1Xe3HCwMjwVTpA0 IGCtwDPbt/el/DCNjPnklBGojVCSbrmDq0vlPBYUCvMBPT/cHRzGmROB2Z79CU4gYvjE6K 2EMs+fA8lGxvWZ6MAiktYpnFQgDQjAY= Received: by mail-yb1-f169.google.com with SMTP id 3f1490d57ef6-da7ea62e76cso4715731276.3 for ; Thu, 11 Jan 2024 09:34:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704994481; x=1705599281; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jaPlkJTNB6VnjXnr1dR+X3Ih9MlJ8cpJxgiuigImRZw=; b=2DXK53T13sxgEnV0FKZM1kv4IUoSEAV232HeCuirSjm/F8VEqQfUQX+LAYRI7zAPsW OYbEi+sZ0ajfOQ7kr5shNGr4fhYGqPddDP9XQiKnFY3GHVImtXeAismiSbvko8J9ufYn s1YAzd+mTXBiF7S65EO7omlIrg3JMlAPGMosVNvutcDEfCoA8SDCN7RBrIESCyACXRod MbaeAaK0C4F2KOflEfz0C4+YsWmxia3/0PJ/9jlLqNGIEUDaJbhc7VJMIeHmDQeNfbZ/ OZpjkEFJrhq6fT/r3hrm48Yr8PFMMul5xEcJJlzTDdOt39l2S6kH44nGzj6LuPjYeo1X Laew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704994481; x=1705599281; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jaPlkJTNB6VnjXnr1dR+X3Ih9MlJ8cpJxgiuigImRZw=; b=XCFo34ps9uVP+kFN3o9SU/KvSKU+ZXknJDG0srBvLnSP2DMrfyiSqZMZK4I9nTWfk+ 76lD79RUtOuffc7a8eK3kmdSv5M/8Lqd+iTDYtQfExl+v3oTXBu6J7uD1PcLI7WJx7zE V+8RYqrp5nmoZf1PmJmr/x1HqfRp2EES4s1qOaLnhaZjn1Orf1Mn80ieA/eezNk1HZiR KBqqRahQxI93FrY6E8hoAOSGKHhdNjF7L47LNnOFijpQ1uz06Q5CeHejyZUwWIFed5t1 M0qpyKQzLG/Q+L0DN+2teOzf706KeIqFomMT1a7HJHDEMdvpdI7MI/5kc23gDVMjnN/1 zB1g== X-Gm-Message-State: AOJu0YwHrsrznaA37BJNsbqLeAshWyrNWVduffPIW3Mu+tq7ZgOrCsAw Bssi6Gf/yqd7lrFbe3dFdPf9j3yIcVw5Zh14flknxg2M9J6+ X-Google-Smtp-Source: AGHT+IFVkK7feMZmkx+zG0zXzBX9px0Y83Jk5ITlB19psOzUZzxFg58M/+X60McAtFtNfI0XCDbdWnfX97Tkx3VO7Aw= X-Received: by 2002:a25:bfce:0:b0:dbe:32a8:12b2 with SMTP id q14-20020a25bfce000000b00dbe32a812b2mr1371253ybm.106.1704994481401; Thu, 11 Jan 2024 09:34:41 -0800 (PST) MIME-Version: 1.0 References: <20230713001833.3778937-1-jiaqiyan@google.com> <20230713001833.3778937-5-jiaqiyan@google.com> <079335ab-190f-41f7-b832-6ffe7528fd8b@collabora.com> In-Reply-To: From: Jiaqi Yan Date: Thu, 11 Jan 2024 09:34:28 -0800 Message-ID: Subject: Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read To: Muhammad Usama Anjum Cc: Sidhartha Kumar , linmiaohe@huawei.com, mike.kravetz@oracle.com, naoya.horiguchi@nec.com, akpm@linux-foundation.org, songmuchun@bytedance.com, shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, jthoughton@google.com, "kernel@collabora.com" , "Matthew Wilcox (Oracle)" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8D0361A0020 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: dmus4pd1o1etappd8th68wwnyk3gjqzo X-HE-Tag: 1704994482-380270 X-HE-Meta: U2FsdGVkX1/IdKJAHB5URXfoyAUWRNPiEuBBtg+RQm8cB7WfckKxo2IsmIAOfFSVX3LOGNO66cDn+Z6nPlEagzS4++dYaAfhMuVQCBCMRxmV4F1LPD6igCdYaPB91mf/56RF1HMDFo8YvWijNH1oknoMx9/jvjf3s07G7hzawt/5XqYs46EDWkqSpdrMqQ8C2R6qytwOyc2/4hd2JZb66CJgIsue4glnYVavRiXZk8Im+8Ph46dBcTonVW/R/gMlTtRcXxqWfib56CfuwiPHLYg+7NpdIeicPnwD87SrRmdaePb2SyDHNAftHv1svSZLErZ0Pgzj4r6Jt7z30HWxD2PXq63EfD8J6clCwM42ojYQOFpCDZbxAImr01P6MhqnMI9oDT7qRVq09QZq4ghRHge5Y2Jke6XreRay8+sxzxnTiZEYaWMO0iDv10aD4KaVTFJ+VDGlhkS+tR3WRKN6E199mJ+4xSO7hUIjj6Zuu2547AXxDOYAdgButbMt8jVhow7gsyKPCUYutZsvaWA918FE01/EzkyHS9FboPq+Mkp68HMOIWCtejSw7xCDyUSH0NrHkr4vwgdZhwUvXKl2UQ2IfFILui/k1uZvtM3qlP0N91Jnak1qvgLMWgq+zpDIHbL5UVH4qtmg1AgPFddriOa3LHOHC++nOo6UHSGosFktTl574lEQE35BNJO1ANgq4QspRZZ+nDGbk82Yk4Vkm2ngL93NHtkItltowLjFyjHwxpFkj+UqTBk9Bjy1bYY+251I3EBv0G0PyDl5sa6UrJH1r/c9+Tcb6KNMVo+Fgs/neuPgGDYLomNAKx4Bfb5dgWQdTxCXWOYGykjm3cMcXFtIXs7srgRw8r7YpuudrUntniaIhfhy3iauJtQdSnZu5LPL0OTyBSiqWnuXtTV1NrVd6QyXDF5giv3Tbb1rbZMnvredlXm0GgvZJry2Sw0dmZvz7JNocRuChno6yCI 2VwkRl6v gz06+sOnu0i6cCrOZcJR7Ch5hu0vLfCkIVXxg9s/pfmFfBzAwPCMggHA0oK7eDaNfWvf8gnLNphTVCGQZJaHVFPij6TE95qZqodx3E3PYqymhFvqChcXBJDJo7l+/9XdyX3HQypv8nJd8UVxRuV7ibu+ywfWaIE4Z0hP2p3O1yhG0gsIb3WLpYjt8oGXwiJZd7FVnMMwVGCOHDSpvMbg7JLBMAk1pDTD3Cz5kQzRCC1fNXkc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 11, 2024 at 12:48=E2=80=AFAM Muhammad Usama Anjum wrote: > > On 1/11/24 7:32 AM, Sidhartha Kumar wrote: > > On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote: > >> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: > >>> On 1/6/24 2:13 AM, Jiaqi Yan wrote: > >>>> On Thu, Jan 4, 2024 at 10:27=E2=80=AFPM Muhammad Usama Anjum > >>>> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I'm trying to convert this test to TAP as I think the failures > >>>>> sometimes go > >>>>> unnoticed on CI systems if we only depend on the return value of th= e > >>>>> application. I've enabled the following configurations which aren't > >>>>> already > >>>>> present in tools/testing/selftests/mm/config: > >>>>> CONFIG_MEMORY_FAILURE=3Dy > >>>>> CONFIG_HWPOISON_INJECT=3Dm > >>>>> > >>>>> I'll send a patch to add these configs later. Right now I'm trying = to > >>>>> investigate the failure when we are trying to inject the poison pag= e by > >>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. = The > >>>>> test > >>>>> fails as it doesn't expect any business for the hugetlb memory. I'm= not > >>>>> sure if the poison handling code has issues or test isn't robust en= ough. > >>>>> > >>>>> ./hugetlb-read-hwpoison > >>>>> Write/read chunk size=3D0x800 > >>>>> ... HugeTLB read regression test... > >>>>> ... ... expect to read 0x200000 bytes of data in total > >>>>> ... ... actually read 0x200000 bytes of data in total > >>>>> ... HugeTLB read regression test...TEST_PASSED > >>>>> ... HugeTLB read HWPOISON test... > >>>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process > >>>>> virtual > >>>>> address 0x7f28ec101000 > >>>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced= by > >>>>> 511 > >>>>> users > >>>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge > >>>>> page: Failed > >>>>> ... !!! MADV_HWPOISON failed: Device or resource busy > >>>>> ... HugeTLB read HWPOISON test...TEST_FAILED > >>>>> > >>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or= not. > >>>> > >>>> Thanks for reporting this, Usama! > >>>> > >>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c > >>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap > >>>> writeback disabling." > >>>> > >>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) > >>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The > >>>> MADV_HWPOISON injection works and and the test passes: > >>>> > >>>> ... HugeTLB read HWPOISON test... > >>>> ... ... expect to read 0x101000 bytes of data in total > >>>> ... !!! read failed: Input/output error > >>>> ... ... actually read 0x101000 bytes of data in total > >>>> ... HugeTLB read HWPOISON test...TEST_PASSED > >>>> ... HugeTLB seek then read HWPOISON test... > >>>> ... ... init val=3D4 with offset=3D0x102000 > >>>> ... ... expect to read 0xfe000 bytes of data in total > >>>> ... ... actually read 0xfe000 bytes of data in total > >>>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED > >>>> ... > >>>> > >>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process > >>>> virtual address 0x7f75e3101000 > >>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge > >>>> page: Recovered > >>>> ... > >>>> > >>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and = we > >>>> should be able to figure it out via bisection (and of course by > >>>> reading delta commits between them, probably related to page > >>>> refcount). > >>> Thank you for this information. > >>> > >>>> > >>>> That being said, I will be on vacation from tomorrow until the end o= f > >>>> next week. So I will get back to this after next weekend. Meanwhile = if > >>>> you want to go ahead and bisect the problematic commit, that will be > >>>> very much appreciated. > >>> I'll try to bisect and post here if I find something. > >> Found the culprit commit by bisection: > >> > >> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 > >> mm/filemap: remove hugetlb special casing in filemap.c Thanks Usama! > >> > >> hugetlb-read-hwpoison started failing from this patch. I've added the > >> author of this patch to this bug report. > >> > > Hi Usama, > > > > Thanks for pointing this out. After debugging, the below diff seems to = fix > > the issue and allows the tests to pass again. Could you test it on your > > configuration as well just to confirm. > > > > Thanks, > > Sidhartha > > > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > > index 36132c9125f9..3a248e4f7e93 100644 > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *io= cb, > > struct iov_iter *to) > > } else { > > folio_unlock(folio); > > > > - if (!folio_test_has_hwpoisoned(folio)) > > + if (!folio_test_hwpoison(folio)) Sidhartha, just curious why this change is needed? Does PageHasHWPoisoned change after commit "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"? > > want =3D nr; > > else { > > /* > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index d8c853b35dbb..87f6bf7d8bc1 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -973,7 +973,7 @@ struct page_state { > > static bool has_extra_refcount(struct page_state *ps, struct page *p, > > bool extra_pins) > > { > > - int count =3D page_count(p) - 1; > > + int count =3D page_count(p) - folio_nr_pages(page_folio(p)); > > > > if (extra_pins) > > count -=3D 1; > > > Tested the patch, it fixes the test. Please send this patch. > > Tested-by: Muhammad Usama Anjum > > -- > BR, > Muhammad Usama Anjum