From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63CF3C47077 for ; Thu, 11 Jan 2024 08:48:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E78BD8D0003; Thu, 11 Jan 2024 03:48:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E282A8D0001; Thu, 11 Jan 2024 03:48:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA4208D0003; Thu, 11 Jan 2024 03:48:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B4A7D8D0001 for ; Thu, 11 Jan 2024 03:48:40 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7DCA81A0C35 for ; Thu, 11 Jan 2024 08:48:40 +0000 (UTC) X-FDA: 81666404400.14.1A904DB Received: from madrid.collaboradmins.com (madrid.collaboradmins.com [46.235.227.194]) by imf23.hostedemail.com (Postfix) with ESMTP id 9AE03140014 for ; Thu, 11 Jan 2024 08:48:37 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=MyiOfUru; spf=pass (imf23.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.194 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704962917; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iA2QuMI4FjfOnKQCWNS77Dn0YmLH+Hks3+VZ/gEqkWc=; b=idu9m00/NfncmPy/Q5HEFX44AF7fIJWApjl9jjcT7QKifTCJB8nL3sV2riqdsd/v3Zv5/z Q3/PebVSu4jjFwUJ1h3uvyYrTaPNHdAxt/rFd7QhZRsoJUS0xnv1Gx3JKese2onQ3ghPuL de53ZWUJdKU0WVwkjfTykYbOeWT0IxU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704962917; a=rsa-sha256; cv=none; b=R79sTWqW1qxsSl7JU9W0Byee470u0sHgfvrsCH4WIFuiLruLB5EwB8x/a6JLQ0rfc3AdU0 c8VKPl3yfGlFPFywn/SZnRyGpeOICkDQiOvsf1MaX7nl9TcHLF8keqNdcA0uutVC0Cd/PV BT8CTtPGLiuQMewhAf9LBH4oNCjpEOc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=collabora.com header.s=mail header.b=MyiOfUru; spf=pass (imf23.hostedemail.com: domain of usama.anjum@collabora.com designates 46.235.227.194 as permitted sender) smtp.mailfrom=usama.anjum@collabora.com; dmarc=pass (policy=quarantine) header.from=collabora.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1704962916; bh=gflHEkCtZCmwgfIS8rr2o7AExOCHAZTyAKAlUQhbSiA=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=MyiOfUrumz7b5nTsabxHteMpzN7xlDrp6SKJZfTgbB1diocrBU8bbPm/U0n9X/BMs iBFEJairFm1vyV/EJ2ZvCqkZPquIhRtiSehYH3WPA9B90oKLBtio/woDetCIBwptoY hOkMwg1BpfMj8g4zLn1rS7sV/6Xzjr75TSqh8QG1WjRQnOZSshtDccDFSDTEXCcoP7 OCWc9fagZho7Kyz0OMHztf0mOCe4jUgER82vo8cCbErepO+UC2ErYZ5VBD+FN8n+FR k0vlz19II9y6LpO+bL9YBt1lWrsZvJy3t5Rf74LWuApG+gDUBAJJV1b9PSaFxb2wcv fCy1KPRXZo2DQ== Received: from [100.96.234.34] (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madrid.collaboradmins.com (Postfix) with ESMTPSA id 1B534378140A; Thu, 11 Jan 2024 08:48:31 +0000 (UTC) Message-ID: Date: Thu, 11 Jan 2024 13:48:38 +0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: Muhammad Usama Anjum , linmiaohe@huawei.com, mike.kravetz@oracle.com, naoya.horiguchi@nec.com, akpm@linux-foundation.org, songmuchun@bytedance.com, shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, jthoughton@google.com, "kernel@collabora.com" , "Matthew Wilcox (Oracle)" Subject: Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Content-Language: en-US To: Sidhartha Kumar , Jiaqi Yan References: <20230713001833.3778937-1-jiaqiyan@google.com> <20230713001833.3778937-5-jiaqiyan@google.com> <079335ab-190f-41f7-b832-6ffe7528fd8b@collabora.com> From: Muhammad Usama Anjum In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 9AE03140014 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: je3noqcgnmzao4ky9b7fs16ppi5r5ui8 X-HE-Tag: 1704962917-175431 X-HE-Meta: U2FsdGVkX18SHW58mILsD8CGw2Hwnl3AtCgkjIyJ3ysoJ0Atl5bQAIpRi4hhU5yH6aCBQaSIqhFmeOk8p9gOBn1xW84tMdTlnobMBZPvmWCCAOAPvJvH+QWaLI/k2iJLRJGHyQyf3at7wBfNQNJgpMRYV16YaQurPwgvOuYcJ/KDZI+rbVw9L++ZqAeikT+/UX6CeQsQQtYCc/CyUibnFUz9lD/JX9y93zgghYeeeusyN3ZjwjZ/1MDXx+yq41giF+1wxYL3XgLfE0xkf7lCVDO0GTZooTQqZ8Pn6+ucgLQkuaApPFv87p3tKg5i7NYQ4KEaL6LMJCTWLeWEXQwsyEZorxZ0yLoYBvRWNb/pu0eJO2sIAz4BVZcxgMYlk7IgfeYm2ruPjNBXzBw2fMHZBthPoqnHHcpZDNi75GOq8WhGZLj2DpKX7ohN8MpFtVYMe1REJnTAapXEnoMeq+2sRPfGF/eF82MAsKw+B4yZAb5JlZsFCxWin0eNZfmJllhmwldywJm+Xl+9+ojKapYoRsD4A+umrL2oNuUD05Dx1zcMSS0Am4/Myt11SHHl4/aOAAdX+iV2sJt9KnOnzBTjSAXm0GnojJY8e1siQPnWAUIL/3Zyd/cYA2QTns2sP6wIX47nSP9XIAOV/hWkBR3Sd8Ctt7eWPkEXD5IhQoOkCjTKqMnlJyICVAuXdTEmmgnszDihc9EOGlUlZ5ydm01zF7iTxJAX8y3kUHAYGfLG8k0VmUtExeAtIrXh3OlGdZyLACnns+bT6pzxWPNMin4AhDZo7GkfUEzWpOSNUwR5wWHdz4ymg37ab8KVfIB9vEGERAPYTimnAZHTGmAlag4JAEumJr2QOA0HpBkGEMCakXIKPsWLbKjiIkaRh4gYyUggp8zAHbzURhAwhKgDA6SpcPOSQedV9M76hfxP38hNxJqb8Jy7ny7CfmMIX9P4SqrY3IyRKLFmxEdQq4RB50X 5NvEw+Kc aRcPju+PJxYuxkN32kwgyhjUTZGQ0/ROJfmjfrerCrMJj7WDZH109BaJ+tN09IXKZuTzjfq7CcxZd3sIOyvQlRY7RvdXqfXZMNOmJKSHwpxnUjiuQ6k+qVSIbnGi52K7ykIcwE6tk255H+qt9EEI14rvoMFIDePE+FEbKERo2M3jHtVHxji9AQx7UD96bk0aJgaU0/ri8UBRCMCFnM7nStNp/dQ09hiuTMgsP24fHciFSWpWn7wZwSsUHzCYITLeyp51aunAfwK/JyKWEUIZAVkiNx8iqcm6CNVmIQdbpUPRK1qGJiLT14watcA9d9DiTRuAuLt9PiYIpiMOX88Xb+TPIDQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/11/24 7:32 AM, Sidhartha Kumar wrote: > On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote: >> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: >>> On 1/6/24 2:13 AM, Jiaqi Yan wrote: >>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I'm trying to convert this test to TAP as I think the failures >>>>> sometimes go >>>>> unnoticed on CI systems if we only depend on the return value of the >>>>> application. I've enabled the following configurations which aren't >>>>> already >>>>> present in tools/testing/selftests/mm/config: >>>>> CONFIG_MEMORY_FAILURE=y >>>>> CONFIG_HWPOISON_INJECT=m >>>>> >>>>> I'll send a patch to add these configs later. Right now I'm trying to >>>>> investigate the failure when we are trying to inject the poison page by >>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The >>>>> test >>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not >>>>> sure if the poison handling code has issues or test isn't robust enough. >>>>> >>>>> ./hugetlb-read-hwpoison >>>>> Write/read chunk size=0x800 >>>>>   ... HugeTLB read regression test... >>>>>   ...  ... expect to read 0x200000 bytes of data in total >>>>>   ...  ... actually read 0x200000 bytes of data in total >>>>>   ... HugeTLB read regression test...TEST_PASSED >>>>>   ... HugeTLB read HWPOISON test... >>>>> [    9.280854] Injecting memory failure for pfn 0x102f01 at process >>>>> virtual >>>>> address 0x7f28ec101000 >>>>> [    9.282029] Memory failure: 0x102f01: huge page still referenced by >>>>> 511 >>>>> users >>>>> [    9.282987] Memory failure: 0x102f01: recovery action for huge >>>>> page: Failed >>>>>   ...  !!! MADV_HWPOISON failed: Device or resource busy >>>>>   ... HugeTLB read HWPOISON test...TEST_FAILED >>>>> >>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. >>>> >>>> Thanks for reporting this, Usama! >>>> >>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c >>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap >>>> writeback disabling." >>>> >>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) >>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The >>>> MADV_HWPOISON injection works and and the test passes: >>>> >>>>   ... HugeTLB read HWPOISON test... >>>>   ...  ... expect to read 0x101000 bytes of data in total >>>>   ...  !!! read failed: Input/output error >>>>   ...  ... actually read 0x101000 bytes of data in total >>>>   ... HugeTLB read HWPOISON test...TEST_PASSED >>>>   ... HugeTLB seek then read HWPOISON test... >>>>   ...  ... init val=4 with offset=0x102000 >>>>   ...  ... expect to read 0xfe000 bytes of data in total >>>>   ...  ... actually read 0xfe000 bytes of data in total >>>>   ... HugeTLB seek then read HWPOISON test...TEST_PASSED >>>>   ... >>>> >>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process >>>> virtual address 0x7f75e3101000 >>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge >>>> page: Recovered >>>> ... >>>> >>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we >>>> should be able to figure it out via bisection (and of course by >>>> reading delta commits between them, probably related to page >>>> refcount). >>> Thank you for this information. >>> >>>> >>>> That being said, I will be on vacation from tomorrow until the end of >>>> next week. So I will get back to this after next weekend. Meanwhile if >>>> you want to go ahead and bisect the problematic commit, that will be >>>> very much appreciated. >>> I'll try to bisect and post here if I find something. >> Found the culprit commit by bisection: >> >> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 >> mm/filemap: remove hugetlb special casing in filemap.c >> >> hugetlb-read-hwpoison started failing from this patch. I've added the >> author of this patch to this bug report. >> > Hi Usama, > > Thanks for pointing this out. After debugging, the below diff seems to fix > the issue and allows the tests to pass again. Could you test it on your > configuration as well just to confirm. > > Thanks, > Sidhartha > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 36132c9125f9..3a248e4f7e93 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, > struct iov_iter *to) >                 } else { >                         folio_unlock(folio); > > -                       if (!folio_test_has_hwpoisoned(folio)) > +                       if (!folio_test_hwpoison(folio)) >                                 want = nr; >                         else { >                                 /* > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index d8c853b35dbb..87f6bf7d8bc1 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -973,7 +973,7 @@ struct page_state { >  static bool has_extra_refcount(struct page_state *ps, struct page *p, >                                bool extra_pins) >  { > -       int count = page_count(p) - 1; > +       int count = page_count(p) - folio_nr_pages(page_folio(p)); > >         if (extra_pins) >                 count -= 1; > Tested the patch, it fixes the test. Please send this patch. Tested-by: Muhammad Usama Anjum -- BR, Muhammad Usama Anjum