From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70EF0C77B61 for ; Fri, 28 Apr 2023 00:41:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 994226B007B; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91BFD6B007D; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E5496B007E; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 702216B007B for ; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 430228041A for ; Fri, 28 Apr 2023 00:41:56 +0000 (UTC) X-FDA: 80728947432.09.10F5EB3 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf07.hostedemail.com (Postfix) with ESMTP id 7942340017 for ; Fri, 28 Apr 2023 00:41:54 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=VnYGn8ix; spf=pass (imf07.hostedemail.com: domain of 3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642514; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n4TaCEi/qu9zeBFiKyPkDKaAfUVepeL/TJwi41Ch5WU=; b=tVPORKxTsAT66gCXBjNmQ5+Aptnd5UfBs/uyK3A0UdUua5FPVx7ePqiFKLMcRD2ODNDzVH L66QIuS6msecVpPYAgUH2sVuQb1viXnszyamSdZ+Ft7MK3Y/DGhX9ncOCRksgPQMxyS3MC 21lhNwK6Im/c0jnsVtlgT5A1vlM53dM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=VnYGn8ix; spf=pass (imf07.hostedemail.com: domain of 3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642514; a=rsa-sha256; cv=none; b=S4t9V3uwOqugbzPF9KubjZJqY8GfdN+Z+jv+C9s69V5RMA8xYvU6HW/hrKV3ZZfrTXnlOG 79IWsjRoPQnZ5LyKigWzaCkPDvOUY+a/j+rEaJsS1wCFv/MAz2YT4WKUVTdwDJOV6KwY2+ h8Yk8A/UOyua1IxeQH0KWuhc4/YFCv4= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-247a0922a71so4940080a91.3 for ; Thu, 27 Apr 2023 17:41:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642513; x=1685234513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n4TaCEi/qu9zeBFiKyPkDKaAfUVepeL/TJwi41Ch5WU=; b=VnYGn8ix7ehNw4jSdgSliIEL30nuomoEdqnEPKIL+LwGe6Kq9F94+shVVxiheZbkWD 6SCxFWByNoglqaAY8+2jwRBhhMv8is9cTlRndkNUAT+ekUicUpVUKSrYTJE1+1qgY1+T XalfCUNsRxx40XaKcI4Au6GAT6yyjsw2JmxEiO3jC331ijP61wbqZRvaor+AuQRY/EV0 8xhF7Q/kD7NJMX6DeaoffwGfPsl1qiOTx1jh3A1GwOTpnDWdZFX3F96vp3dTkKAd/s8Q x0j8eh66XJ/N6SBvIyYPcrKHiqstUyfTb2pMBzaxOcrjP89vFasMLH8UbLHFT1qmcwmr pIMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642513; x=1685234513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n4TaCEi/qu9zeBFiKyPkDKaAfUVepeL/TJwi41Ch5WU=; b=CV9kECsGEG4EgSd+tpwhBWWsVWOjgSslN3zKCzy0TWijtBW1yR6e08zSqb7lxtzzAg VamhlYb/uioL6su2fmfZbMsZ3pIm1eeQmTp30Wkt+W/KsoUmqqn2LHjzKOR3sPo4SPtH 8ufrezYcciNOq9J8zwL+CwtmeZ7dPF3NsqxdjaM0nHCQRIiDBOuwS/W8ijVajjiZqZFC o2G9WeA2KVgeXDoTj89FcJLRV4QUEOZ6UHdwyhH46vKJaKYSkd1KZDMoRkrEHw5COmEV ZTH6dE9jadTXd7CduY84Wnx+H9PBlyy2S4w0i1OwMbBgXZifqLS2dqB0fz4LUGZR8OSW MIdQ== X-Gm-Message-State: AC+VfDwR4bMoVgPjG00vNILLu9gO8CDFaQ5dcNngbg6RgSPok1Qd2njy 6ITiNNvkMtnhe0eeX6hY1ZV8/dRehjJxsg== X-Google-Smtp-Source: ACHHUZ6lB5FSBQICSdN4ldTt2OldKStIrw0Mu+868RMewcAyv255kFlkEUhCZW+eyXgtrhKbY3nUVvDJARNhtQ== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:902:cecf:b0:1a5:e03:55b with SMTP id d15-20020a170902cecf00b001a50e03055bmr992059plg.11.1682642513448; Thu, 27 Apr 2023 17:41:53 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:37 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-6-jiaqiyan@google.com> Subject: [RFC PATCH v1 5/7] hugetlb: only VM_FAULT_HWPOISON_LARGE raw page From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 7942340017 X-Stat-Signature: 4s68szepsyw8woo11wpf6wrdz9sgtg3e X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1682642514-577524 X-HE-Meta: U2FsdGVkX1/r4WiJl1GW6Tn+ukYQUBFRNkptvDtXUmwDmlkb85z9HqiJhdDGZ+lb9zxmCjZtMSyxI2W9VrBRSWNhi/OJm4MUdL8UignTHlGgB45UuSOqKpagzdkGYWIWGpoVDz3U9ggj2yOrSKEejTrU8mLSXftdE6/8fMhwbmVluq9Ps7zuZXnLw+nFPqJ+kU0+EGjQU4A2/MGNiN/NQWZwiQ/ttGx9IR5uMhNZnwVoVNUoOxAWThlwuAumlapzyLN65DvlU4PV2gPP0kucw7VCtHFkMHFQoNlgSKOwc9CGNdDwzKKZbnv8Mim7ZCe0EOFpTFZamT1YNs+5HKFP+HXEx241JBZhUXTf9h2TOrJKeP0DWgWxI7D4joC9g3smKRExiO5TmuO0KN3JiXRevkSVIDXxD2sCpvnricughDV1j8Ut1W5whwJH0wOR4JoEqLb+4+hCsPQKk5T2BWmNjauwJS81DczYRx0/8qpKl91R4UlDCYAx5UkW2bYABEOKP1rxMvQu2lqILaK/O4RctY6XGLi5ec7++B4qI3af+Z6Qf8ULgTCIssIn2aPlbv2Buw3l7hudji6YSUkPtkTToogOLlBiuSAulbgt2Te2JMf1MQXPscdIBp7jZTOqIwS+A9R3sb7mSTnwjZGx7dQSH2jEqNGRVjnZM/3Ip9A91/rDIIOwXciXp1PdMlfFra1c9RmfH+r9NTGbg6z7QYUga6/8eaFGg/mxXcy+BmidbjjjuFFYqTyYAWw8uFGdurllsBo1xLRNelnuZnDsm1k/oXWZzhHmHu/CGuiKZtXPlBTqbNp2IX/uebN6HestGbBYNpKnAhpO24Cx/Hdn/nx38+KibVUWYnvwgivXrVaSRpbV2uqsfgnyHTYn1yRK8GXR1ZE8zPzZG/ZGJZSRJoW6K26DyFItzbGYekuYb7qnB9rJ1qE9UURGpjWh+AKBmyR/wdQiZMtTLYwjkRfg1Hs iwNP/qje 8c8JsyiLN4UzU91x1eRL8fRxogfuS5H473vbPHYJgWi/E9li7BcUTZT2CT0EL8VtQlBAsVOCT0YPq/O5EJdvlLy9XjaOb7caP5TUM7vQYff+B9oQUkdTByxBbiULIDWk0fKlmwMl8U01r9k1hHOgE8gJYgGJ9jI2jQJfAFpZ+7IrSmriGRThqNghipGyVh1r/WjLm6yFfhHVgCOmdrhaieyv6+pAf6SeDgUYXn8estnovLzXo7LZTQ72f1rNcH9SfH1HJpj0K2X82m5JmvEsZDDl6RkaLN1vxECz8yAVC3I29Ldhr0wX5c12n4Plp44WpLq6NAaWExGHg7dtS+/buQhlhsHXrbAgcieN7AQmDpt9GSsxhaqzSYjuQ4xcH6W7f7wKudUzHCufWG1pIji+u9kzGVPDwXT5No1DT15WTxpctlfZWXiBLJI1akCYwfg5qRksST1++2F7zsrGlrseJh5a7tvZt/7n+QyD3uggi18tEQQtEEPCvTxFFuqBi2yNUZQMKAgqT8AN7MmMTMfU79+YgPxFa+Jfh+uvuoQMqU45tG982/GMoDNltJXOxv3iKpOKx7U3YWp6r40BB2Pkc05sO963vzR8wOG9t/nFCwjO3l85u2nbHw6iMDgbFgXNO5Rl/NIH0rMTOgn0H8Y9P79g+OmnDJGyqD0qe1xZQ6+4AIxw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Memory raw pages can become HWPOISON between when userspace maps a hugepage and when userspace faults in the hugepage. Today when hugetlb faults somewhere in a hugepage containing HWPOISON raw pages, the result is a VM_FAULT_HWPOISON_LARGE. This commit teaches hugetlb page fault handler to only VM_FAULT_HWPOISON_LARGE if the faulting address is within HWPOISON raw page; otherwise, fault handler can continue to fault in healthy raw pages. Signed-off-by: Jiaqi Yan --- include/linux/mm.h | 2 + mm/hugetlb.c | 129 ++++++++++++++++++++++++++++++++++++++++++-- mm/memory-failure.c | 1 + 3 files changed, 127 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index dc192f98cb1d..7caa4530953f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3531,6 +3531,7 @@ extern const struct attribute_group memory_failure_attr_group; * @nr_expected_unmaps: if a VMA that maps @page when detected is eligible * for high granularity mapping, @page is expected to be unmapped. * @nr_actual_unmaps: how many times the raw page is actually unmapped. + * @index: index of the poisoned subpage in the folio. */ struct raw_hwp_page { struct llist_node node; @@ -3538,6 +3539,7 @@ struct raw_hwp_page { int nr_vmas_mapped; int nr_expected_unmaps; int nr_actual_unmaps; + unsigned long index; }; #ifdef CONFIG_HUGETLB_PAGE diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1419176b7e51..f8ddf04ae0c4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6158,6 +6158,30 @@ static struct folio *hugetlb_try_find_lock_folio(struct address_space *mapping, return folio; } +static vm_fault_t hugetlb_no_page_hwpoison(struct mm_struct *mm, + struct vm_area_struct *vma, + struct folio *folio, + unsigned long address, + struct hugetlb_pte *hpte, + unsigned int flags); + +#ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +static vm_fault_t hugetlb_no_page_hwpoison(struct mm_struct *mm, + struct vm_area_struct *vma, + struct folio *folio, + unsigned long address, + struct hugetlb_pte *hpte, + unsigned int flags) +{ + if (unlikely(folio_test_hwpoison(folio))) { + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(hstate_vma(vma))); + } + + return 0; +} +#endif + static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -6287,13 +6311,13 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, /* * If memory error occurs between mmap() and fault, some process * don't have hwpoisoned swap entry for errored virtual address. - * So we need to block hugepage fault by PG_hwpoison bit check. + * So we need to block hugepage fault by hwpoison check: + * - without HGM, the check is based on PG_hwpoison + * - with HGM, check if the raw page for address is poisoned */ - if (unlikely(folio_test_hwpoison(folio))) { - ret = VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(hstate_index(h)); + ret = hugetlb_no_page_hwpoison(mm, vma, folio, address, hpte, flags); + if (unlikely(ret)) goto backout_unlocked; - } /* Check for page in userfault range. */ if (userfaultfd_minor(vma)) { @@ -8426,6 +8450,11 @@ int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, * the allocated PTEs created before splitting fails. */ + /* + * For none and UFFD_WP marker PTEs, given try_to_unmap_one doesn't + * unmap them, delay the splitting until page fault happens. See the + * hugetlb_no_page_hwpoison check in hugetlb_no_page. + */ if (unlikely(huge_pte_none_mostly(old_entry))) { ret = -EAGAIN; goto skip; @@ -8479,6 +8508,96 @@ int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, return ret; } +/* + * Given a hugetlb PTE, if we want to split it into its next smaller level + * PTE, return what size we should use to do HGM walk with allocations. + * If given hugetlb PTE is already at smallest PAGESIZE, returns -EINVAL. + */ +static int hgm_next_size(struct vm_area_struct *vma, struct hugetlb_pte *hpte) +{ + struct hstate *h = hstate_vma(vma), *tmp_h; + unsigned int shift; + unsigned long curr_size = hugetlb_pte_size(hpte); + unsigned long next_size; + + for_each_hgm_shift(h, tmp_h, shift) { + next_size = 1UL << shift; + if (next_size < curr_size) + return next_size; + } + + return -EINVAL; +} + +/* + * Check if address is in the range of a HWPOISON raw page. + * During checking hugetlb PTE may be split into smaller hguetlb PTEs. + */ +static vm_fault_t hugetlb_no_page_hwpoison(struct mm_struct *mm, + struct vm_area_struct *vma, + struct folio *folio, + unsigned long address, + struct hugetlb_pte *hpte, + unsigned int flags) +{ + unsigned long range_start, range_end; + unsigned long start_index, end_index; + unsigned long folio_start = vma_address(folio_page(folio, 0), vma); + struct llist_node *t, *tnode; + struct llist_head *raw_hwp_head = raw_hwp_list_head(folio); + struct raw_hwp_page *p = NULL; + bool contain_hwpoison = false; + int hgm_size; + int hgm_ret = 0; + + if (likely(!folio_test_hwpoison(folio))) + return 0; + + if (hugetlb_enable_hgm_vma(vma)) + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(hstate_vma(vma))); + +recheck: + range_start = address & hugetlb_pte_mask(hpte); + range_end = range_start + hugetlb_pte_size(hpte); + start_index = (range_start - folio_start) / PAGE_SIZE; + end_index = start_index + hugetlb_pte_size(hpte) / PAGE_SIZE; + + contain_hwpoison = false; + llist_for_each_safe(tnode, t, raw_hwp_head->first) { + p = container_of(tnode, struct raw_hwp_page, node); + if (start_index <= p->index && p->index < end_index) { + contain_hwpoison = true; + break; + } + } + + if (!contain_hwpoison) + return 0; + + if (hugetlb_pte_size(hpte) == PAGE_SIZE) + return VM_FAULT_HWPOISON; + + /* + * hugetlb_fault already ensured hugetlb_vma_lock_read. + * We also checked hugetlb_pte_size(hpte) != PAGE_SIZE, + * so hgm_size must be something meaningful to HGM. + */ + hgm_size = hgm_next_size(vma, hpte); + VM_BUG_ON(hgm_size == -EINVAL); + hgm_ret = hugetlb_full_walk_alloc(hpte, vma, address, hgm_size); + if (hgm_ret) { + WARN_ON_ONCE(hgm_ret); + /* + * When splitting using HGM fails, return like + * HGM is not eligible or enabled. + */ + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(hstate_vma(vma))); + } + goto recheck; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 47b935918ceb..9093ba53feed 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1957,6 +1957,7 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page) raw_hwp->nr_vmas_mapped = 0; raw_hwp->nr_expected_unmaps = 0; raw_hwp->nr_actual_unmaps = 0; + raw_hwp->index = folio_page_idx(folio, page); llist_add(&raw_hwp->node, head); if (hgm_enabled) /* -- 2.40.1.495.gc816e09b53d-goog