From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F1EDC77B7A for ; Thu, 18 May 2023 16:06:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 036F9900005; Thu, 18 May 2023 12:06:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F02AF900004; Thu, 18 May 2023 12:06:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7C27900005; Thu, 18 May 2023 12:06:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C5F1C900004 for ; Thu, 18 May 2023 12:06:20 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8DB7E14087D for ; Thu, 18 May 2023 16:06:20 +0000 (UTC) X-FDA: 80803852920.08.AF3FB71 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf11.hostedemail.com (Postfix) with ESMTP id 1B0B24035D for ; Thu, 18 May 2023 16:02:43 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=aFzrM7B5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684425764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9lKXMC8ZDEooUJrs8+Dj87sI3je7WZNC6Kp4HFGLDdk=; b=RwYI2uX7lN0hl9AJkDAAbYXz0s8ODGxeweThTf4Sq9I/hYSqqWTSXxs8tka17Ev032WDvQ Lmvzy+NiAi4Llmv57n13LzAyslI/t1HiYsSz1Jz1MUlYNIYy9QGKmWsCFIxcTvBri+UO+v 6eRyWrOH0Fm9HugBzthc3z7Ngda9Kd0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=aFzrM7B5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684425764; a=rsa-sha256; cv=none; b=koJdxfLaV1fli4Qpu4JIyBiy58sdvEna+lIH4FUqQObG/XJQmfgKCbHFEqsFowhMl7zvzH CzAoB/VD1LfXBRUlEDfN/5zNxK2uDqXgC9Alw200TC5KIaO8f99gqHaM5yCCNFUV9/k5g+ 6NIkf0sVRREUQf00S7ZHl9p67wTXecQ= Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-559de1d36a9so25941177b3.1 for ; Thu, 18 May 2023 09:02:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684425762; x=1687017762; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=9lKXMC8ZDEooUJrs8+Dj87sI3je7WZNC6Kp4HFGLDdk=; b=aFzrM7B5ZW40gir22Wv3jo/DqIzSpmd5dJaY1IGS5lgReyvRQdDlk+jRGGddwQr3hB 1+8ZNWVIVsWSM3cxnJnaiA0cu1Aw7DSJ8HYoOgk9bct9+2l9dH/hB60GowPAR3Y0SCO9 k5sPNl4dlZiBboG9YXBKQwHpNvZcTknCKdcKQElG1jrbcFZC7PmYO1oFGlMAan7HYzSy bE1QJ7rS0EzH5f08Ltzzg59xjMMeX7zy7+2rTRBqnVKgNc1CcfjhU70ZGIRDwjef6wmJ 4gWxANssbuRbTyof3hNJUnhUbi6BXDEZT374zrcBDqeJT3oG/IaGzE0agatIbSkuSUy0 vdkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684425762; x=1687017762; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9lKXMC8ZDEooUJrs8+Dj87sI3je7WZNC6Kp4HFGLDdk=; b=LLGB2aSOymC8CzBjhOhLWZfenl596OriBKUUD/Y5fEbxg3r8IJw0Kv2ui2h7bRJeqi AtCll6xKKrY8gOQwBMx00TgIqiMzQFkjklIuDt5OskScQlZyifDE8KDpMs6e1n2DLWEc Jcbl1FzIVxjdWazS87Xfecwp7X+f0bZ8hFEftjfLHbMQL2rg4/mKqhCX9/bwrg6aVALX E9tDtUVLBN22G2EUaUQhQmW6N88ZjswRB6afYV9nhMCp/yPlS2IGK5ZQyOAjZv9Ovfth dBK6YH/s5BLVw7q/I3TWRUdKED78q3f4YaF438vUYnmBLuHdUoItKswVbPYr97U95vGp W66A== X-Gm-Message-State: AC+VfDzxpz2ZlVBsB3v5CIpPEwLz7OG2VBL6IpVSD12h7HlihPepoxlQ uXCe+QkPotnvdC2du1w14Ny0NbbKKEsGXa7QelJJxg== X-Google-Smtp-Source: ACHHUZ60A2pfxRyaKWpe1u8QpzdW6ZSJve7+oeegHC9moxW1EzupOfDcwcx/MtOykb1tDDNl4chwlPheEPr23c6Zwfk= X-Received: by 2002:a0d:df84:0:b0:561:18c6:528c with SMTP id i126-20020a0ddf84000000b0056118c6528cmr2028470ywe.30.1684425762458; Thu, 18 May 2023 09:02:42 -0700 (PDT) MIME-Version: 1.0 References: <20230517160948.811355-1-jiaqiyan@google.com> <20230517233020.GA10757@monkey> In-Reply-To: <20230517233020.GA10757@monkey> From: Jiaqi Yan Date: Thu, 18 May 2023 09:02:31 -0700 Message-ID: Subject: Re: [PATCH v1 0/3] Improve hugetlbfs read on HWPOISON hugepages To: Mike Kravetz Cc: songmuchun@bytedance.com, naoya.horiguchi@nec.com, shy828301@gmail.com, linmiaohe@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com Content-Type: multipart/alternative; boundary="000000000000393d5305fbf9efe9" X-Rspamd-Queue-Id: 1B0B24035D X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: t8y5tyg6gxd7mrt94gx8i4bg4jhzu69q X-HE-Tag: 1684425763-895345 X-HE-Meta: U2FsdGVkX18xBv6eWEyRaiuYyec92BvwhEGC4yWXT/3t/G/ptzXEKYD9q/fEtL2aE2UDj+qW2Axu6Z+luPE7/krpLWmG6yJIskFWDh1grmAjBswmYAgFtRvw+IaPRlMMfTtUUhIYwUpIIykQnOd2y/Lwhca9YnmHKY+R0nOwvdFV2ebhsBaIl5fV2xg1gb64Rck+7fFtcIRDa9NOcXm3h9NS8SYsenrXLnlzD4+3O8MUqxmcxhM8r73zNHoWgPV+RtvYXt86JAqwKv2qsx/2OSs6cfJV4+0DCQslVQmoCB+1/AYQbOWoJRqPMnJYsfKOVg66s3fkI/rPsayqvUjpswd6k0JCwsm8sByEGZJ3X57hOvQvBQ5sEU2yCj1mk9jgunZDVYhJDn4JtD4mDHFjr0yeH+RuvV4AZfh4sJgr/lI8ektWOFc97Ayonnqy6fr0SZwMQtpEzy09hoAgdDLNPMSp9RDnEYhYQ3QcO9JsAHapedgH56OgfCZqETXb/5pML9aLaguMN6AEJ6NGLQQT+GlDWXhJiSOJQjJzUZsWJlIBgyDTn/34uUrCGlsEOEzyQ9zjsLGlgsiPnp9G7zpWeY6HyiWCP94U0573f0yoGwcNl6ahIBHsgiGodbWu6eUw0aL78pNAsNzMSXMVuRGZ/etS714iRAg0Tpdgb5xYlqezon5sH3ZBlRmda+0J8MlsTG+bF21gbE4hCci+t5w3v6DzeSR3jxpStn87de/oEyVJkXtA3Hy3xTuCALrvCXSh7fzAxsPwt0mB7csQlge9DGwo5dSVC5LKDQbBjbkf74aT/DwpEexbWPSXdSwQcXwDi8Y8FMZpz1HCWKknK2lZRKMpvQhpD/vMfI6j4D53279hqLG3KiwPQSeDWGPzrTjlVIDsyI8aeUXs7RXx+Zuq6WQYi5B+C8KiHxCIUyBZGAf/hpUoG4S138w2hx2/+l/YmIfxoMqkizzccMLRIgX 8H5XepZe n6ekiXJm7t6F8HtZzmfD/UIk1o8q3WK3GDL2t3k8fzBS4kFTvMP4dpWiLE0SwM/AZuIhuCKOVwpy6sZyYRnRsSdorRdb9rFYX8ON3niClTMiZ4aZBlaDPHp212Su0k79HiVAPrHn33L//TNKOsuMxairEaGec3YWHv17MhhyvaCxSsfEDMAqbeYGtMClK2vh42PFNXK9bIt3kgR4ViwaZ38ESpIGtSfjpBguEWy8/AdASkzoP3NGub/wFQc3furJqRO+kJkhzg19IsdhKidQDo/8L+V3iW69ZoWzWb42AXyB09S+Tuk77WiQY6b09zzE9x2ZytdRF6O8g5DwaqyjfwP1J90S65InFvJCvr1CKwQekOfblZ7qGEqG4Y+gK1ST2YER/cKIXmJm0V96NaRlC1NcbJD2PUGg766LIGrPjIYM3bV6yJ/MLCO4qYw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000393d5305fbf9efe9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, May 17, 2023 at 4:30=E2=80=AFPM Mike Kravetz wrote: > On 05/17/23 16:09, Jiaqi Yan wrote: > > Today when hardware memory is corrupted in a hugetlb hugepage, > > kernel leaves the hugepage in pagecache [1]; otherwise future mmap or > > read will suject to silent data corruption. This is implemented by > > returning -EIO from hugetlb_read_iter immediately if the hugepage has > > HWPOISON flag set. > > > > Since memory_failure already tracks the raw HWPOISON subpages in a > > hugepage, a natural improvement is possible: if userspace only asks for > > healthy subpages in the pagecache, kernel can return these data. > > Thanks for putting this together. > > I recall discussing this some time back, and deciding to wait and see > how HGM would progress. Since it may be some time before HGM goes > upstream, it would be reasonable to consider this again. > This improvement actually does NOT depend on HGM at all. No page table related stuff involved here. The other RFC [2] I sent earlier DOES require HGM. This improvement was brought up by James when we were working on [2]. In "Future Work" section of the cover letter, I thought HGM was needed but soon I found I was wrong. > One quick question. > Do you have an actual use case for this? It certainly is an improvement > over existing functionality. However, I am not aware of too many (?any?) > users actually doing read() calls on hugetlb files. > I don't have any use case. I did search on Github for around half a hour and all the hugetlb usages are done via mmap. > -- > Mike Kravetz > > > This patchset implements this improvement. It consist of three parts. > > The 1st commit exports the functionality to tell if a subpage inside a > > hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches > > hugetlbfs_read_iter to return as many healthy bytes as possible. > > The 3rd commit properly tests this new feature. > > > > [1] commit 8625147cafaa ("hugetlbfs: don't delete error page from > pagecache") > > > > Jiaqi Yan (3): > > mm/hwpoison: find subpage in hugetlb HWPOISON list > > hugetlbfs: improve read HWPOISON hugepage > > selftests/mm: add tests for HWPOISON hugetlbfs read > > > > fs/hugetlbfs/inode.c | 62 +++- > > include/linux/mm.h | 23 ++ > > mm/memory-failure.c | 26 +- > > tools/testing/selftests/mm/.gitignore | 1 + > > tools/testing/selftests/mm/Makefile | 1 + > > .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ > > 6 files changed, 419 insertions(+), 16 deletions(-) > > create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c > > > > -- > > 2.40.1.606.ga4b1b128d6-goog > > [2] https://lore.kernel.org/linux-mm/20230428004139.2899856-6-jiaqiyan@google.c= om/T/#m97c6edef8ad0cc9b064e1fd9369b8521dcfa43de --000000000000393d5305fbf9efe9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, May 17, 2023 at 4:30=E2=80=AF= PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
On 05/17/23 16:09, Jiaqi Yan wrote:
> Today when hardware memory is corrupted in a hugetlb hugepage,
> kernel leaves the hugepage in pagecache [1]; otherwise future mmap or<= br> > read will suject to silent data corruption. This is implemented by
> returning -EIO from hugetlb_read_iter immediately if the hugepage has<= br> > HWPOISON flag set.
>
> Since memory_failure already tracks the raw HWPOISON subpages in a
> hugepage, a natural improvement is possible: if userspace only asks fo= r
> healthy subpages in the pagecache, kernel can return these data.

Thanks for putting this together.

I recall discussing this some time back, and deciding to wait and see
how HGM would progress.=C2=A0 Since it may be some time before HGM goes
upstream, it would be reasonable to consider this again.
=C2=A0
This improvement actually does NOT depend on HGM at all= . No page table related stuff involved here. The other RFC [2] I sent earli= er DOES require HGM. This improvement was brought up by James when we were = working on [2]. In "Future Work" section of the cover letter, I t= hought HGM was needed but soon I found I was wrong.


One quick question.
Do you have an actual use case for this?=C2=A0 It certainly is an improveme= nt
over existing functionality.=C2=A0 However, I am not aware of too many (?an= y?)
users actually doing read() calls on hugetlb files.
I don't have any use case. I did search on Github for arou= nd half a hour and all the hugetlb usages are done via mmap.
=C2= =A0
--
Mike Kravetz

> This patchset implements this improvement. It consist of three parts.<= br> > The 1st commit exports the functionality to tell if a subpage inside a=
> hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches
> hugetlbfs_read_iter to return as many healthy bytes as possible.
> The 3rd commit properly tests this new feature.
>
> [1] commit 8625147cafaa ("hugetlbfs: don't delete error page = from pagecache")
>
> Jiaqi Yan (3):
>=C2=A0 =C2=A0mm/hwpoison: find subpage in hugetlb HWPOISON list
>=C2=A0 =C2=A0hugetlbfs: improve read HWPOISON hugepage
>=C2=A0 =C2=A0selftests/mm: add tests for HWPOISON hugetlbfs read
>
>=C2=A0 fs/hugetlbfs/inode.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 62 +++-
>=C2=A0 include/linux/mm.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 23 ++
>=C2=A0 mm/memory-failure.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 26 +-
>=C2=A0 tools/testing/selftests/mm/.gitignore=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0|=C2=A0 =C2=A01 +
>=C2=A0 tools/testing/selftests/mm/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0|=C2=A0 =C2=A01 +
>=C2=A0 .../selftests/mm/hugetlb-read-hwpoison.c=C2=A0 =C2=A0 =C2=A0 | 3= 22 ++++++++++++++++++
>=C2=A0 6 files changed, 419 insertions(+), 16 deletions(-)
>=C2=A0 create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoi= son.c
>
> --
> 2.40.1.606.ga4b1b128d6-goog
>



--000000000000393d5305fbf9efe9--