From: Muhammad Usama Anjum <usama.anjum@collabora.com>
To: Sidhartha Kumar <sidhartha.kumar@oracle.com>,
Jiaqi Yan <jiaqiyan@google.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>,
linmiaohe@huawei.com, mike.kravetz@oracle.com,
naoya.horiguchi@nec.com, akpm@linux-foundation.org,
songmuchun@bytedance.com, shy828301@gmail.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
jthoughton@google.com,
"kernel@collabora.com" <kernel@collabora.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: Re: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read
Date: Thu, 11 Jan 2024 13:48:38 +0500 [thread overview]
Message-ID: <dd96e476-e1ad-4cb5-b5d1-556f720acd17@collabora.com> (raw)
In-Reply-To: <a20e7bdb-7344-306d-e8f5-5ee69af7d5ea@oracle.com>
On 1/11/24 7:32 AM, Sidhartha Kumar wrote:
> On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote:
>> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote:
>>> On 1/6/24 2:13 AM, Jiaqi Yan wrote:
>>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum
>>>> <usama.anjum@collabora.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to convert this test to TAP as I think the failures
>>>>> sometimes go
>>>>> unnoticed on CI systems if we only depend on the return value of the
>>>>> application. I've enabled the following configurations which aren't
>>>>> already
>>>>> present in tools/testing/selftests/mm/config:
>>>>> CONFIG_MEMORY_FAILURE=y
>>>>> CONFIG_HWPOISON_INJECT=m
>>>>>
>>>>> I'll send a patch to add these configs later. Right now I'm trying to
>>>>> investigate the failure when we are trying to inject the poison page by
>>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The
>>>>> test
>>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not
>>>>> sure if the poison handling code has issues or test isn't robust enough.
>>>>>
>>>>> ./hugetlb-read-hwpoison
>>>>> Write/read chunk size=0x800
>>>>> ... HugeTLB read regression test...
>>>>> ... ... expect to read 0x200000 bytes of data in total
>>>>> ... ... actually read 0x200000 bytes of data in total
>>>>> ... HugeTLB read regression test...TEST_PASSED
>>>>> ... HugeTLB read HWPOISON test...
>>>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process
>>>>> virtual
>>>>> address 0x7f28ec101000
>>>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by
>>>>> 511
>>>>> users
>>>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge
>>>>> page: Failed
>>>>> ... !!! MADV_HWPOISON failed: Device or resource busy
>>>>> ... HugeTLB read HWPOISON test...TEST_FAILED
>>>>>
>>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not.
>>>>
>>>> Thanks for reporting this, Usama!
>>>>
>>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c
>>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap
>>>> writeback disabling."
>>>>
>>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base)
>>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The
>>>> MADV_HWPOISON injection works and and the test passes:
>>>>
>>>> ... HugeTLB read HWPOISON test...
>>>> ... ... expect to read 0x101000 bytes of data in total
>>>> ... !!! read failed: Input/output error
>>>> ... ... actually read 0x101000 bytes of data in total
>>>> ... HugeTLB read HWPOISON test...TEST_PASSED
>>>> ... HugeTLB seek then read HWPOISON test...
>>>> ... ... init val=4 with offset=0x102000
>>>> ... ... expect to read 0xfe000 bytes of data in total
>>>> ... ... actually read 0xfe000 bytes of data in total
>>>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED
>>>> ...
>>>>
>>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process
>>>> virtual address 0x7f75e3101000
>>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge
>>>> page: Recovered
>>>> ...
>>>>
>>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we
>>>> should be able to figure it out via bisection (and of course by
>>>> reading delta commits between them, probably related to page
>>>> refcount).
>>> Thank you for this information.
>>>
>>>>
>>>> That being said, I will be on vacation from tomorrow until the end of
>>>> next week. So I will get back to this after next weekend. Meanwhile if
>>>> you want to go ahead and bisect the problematic commit, that will be
>>>> very much appreciated.
>>> I'll try to bisect and post here if I find something.
>> Found the culprit commit by bisection:
>>
>> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3
>> mm/filemap: remove hugetlb special casing in filemap.c
>>
>> hugetlb-read-hwpoison started failing from this patch. I've added the
>> author of this patch to this bug report.
>>
> Hi Usama,
>
> Thanks for pointing this out. After debugging, the below diff seems to fix
> the issue and allows the tests to pass again. Could you test it on your
> configuration as well just to confirm.
>
> Thanks,
> Sidhartha
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 36132c9125f9..3a248e4f7e93 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb,
> struct iov_iter *to)
> } else {
> folio_unlock(folio);
>
> - if (!folio_test_has_hwpoisoned(folio))
> + if (!folio_test_hwpoison(folio))
> want = nr;
> else {
> /*
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d8c853b35dbb..87f6bf7d8bc1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -973,7 +973,7 @@ struct page_state {
> static bool has_extra_refcount(struct page_state *ps, struct page *p,
> bool extra_pins)
> {
> - int count = page_count(p) - 1;
> + int count = page_count(p) - folio_nr_pages(page_folio(p));
>
> if (extra_pins)
> count -= 1;
>
Tested the patch, it fixes the test. Please send this patch.
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
--
BR,
Muhammad Usama Anjum
next prev parent reply other threads:[~2024-01-11 8:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-13 0:18 [PATCH v4 0/4] Improve hugetlbfs read on HWPOISON hugepages Jiaqi Yan
2023-07-13 0:18 ` [PATCH v4 1/4] mm/hwpoison: delete all entries before traversal in __folio_free_raw_hwp Jiaqi Yan
2023-07-13 0:18 ` [PATCH v4 2/4] mm/hwpoison: check if a raw page in a hugetlb folio is raw HWPOISON Jiaqi Yan
2023-07-13 0:18 ` [PATCH v4 3/4] hugetlbfs: improve read HWPOISON hugepage Jiaqi Yan
2023-07-13 0:18 ` [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read Jiaqi Yan
2024-01-05 6:27 ` Muhammad Usama Anjum
2024-01-05 21:13 ` Jiaqi Yan
2024-01-10 6:49 ` Muhammad Usama Anjum
2024-01-10 10:15 ` Muhammad Usama Anjum
2024-01-11 2:32 ` Sidhartha Kumar
2024-01-11 8:48 ` Muhammad Usama Anjum [this message]
2024-01-11 17:34 ` Jiaqi Yan
2024-01-11 17:51 ` Sidhartha Kumar
2024-01-11 18:03 ` Matthew Wilcox
2024-01-11 18:11 ` Sidhartha Kumar
2024-01-11 18:30 ` Jiaqi Yan
2024-01-11 18:36 ` Sidhartha Kumar
2024-01-12 6:16 ` Muhammad Usama Anjum
2024-01-19 10:10 ` Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dd96e476-e1ad-4cb5-b5d1-556f720acd17@collabora.com \
--to=usama.anjum@collabora.com \
--cc=akpm@linux-foundation.org \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=kernel@collabora.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=shy828301@gmail.com \
--cc=sidhartha.kumar@oracle.com \
--cc=songmuchun@bytedance.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox