From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9A25EB64DD for ; Tue, 11 Jul 2023 17:06:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11F466B0072; Tue, 11 Jul 2023 13:06:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A8586B0074; Tue, 11 Jul 2023 13:06:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E63F86B0075; Tue, 11 Jul 2023 13:06:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D16B76B0072 for ; Tue, 11 Jul 2023 13:06:33 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A133E80326 for ; Tue, 11 Jul 2023 17:06:33 +0000 (UTC) X-FDA: 80999959866.02.30E6842 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf10.hostedemail.com (Postfix) with ESMTP id 04309C00FD for ; Tue, 11 Jul 2023 17:05:36 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ecuxrJuc; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689095137; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=egt2ow1fWfYS8Gw95Yl0UW2LZk+PQZpFZ+iHkx63nEU=; b=yJODxXsZW7PSuTR50lp8IFwFmdszArftPBJe/6Peal/dXxb5C2Ad4oajCFvpkJ0yo3PTVf PAuq2X9dFZHYpzImxGwNZ0PFH/yXpDJCRHvRcRh6XlnoBKq0+9zbCAaNSfk23yHQQ1G2Xx KwcuxgWVXJcP2jNYBlyygzkt3k4EA+4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ecuxrJuc; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689095137; a=rsa-sha256; cv=none; b=kj36eRg1HaOOV9fR0zzut6R3kY5vZO+2WSJJxE2feA0N1HUQdqHj8HPp3bqWKft1jQRW3p iYAgwAInuRvrjYsTq5qkpe3IqnD/gfJjFCelvCYuEckor8cUnu4J8yuGQbJSgRnaZOhHkV jOonKC9H03EMfR5JVwfACnVddEov7V8= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-57d24970042so6317897b3.2 for ; Tue, 11 Jul 2023 10:05:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689095135; x=1691687135; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=egt2ow1fWfYS8Gw95Yl0UW2LZk+PQZpFZ+iHkx63nEU=; b=ecuxrJucUhDGppYS/qb0NC3IHv/38+Hvtpuhc9jQb3nZB3pq0gP7fi6JvuTEalDYTt rSIk5nkzXuyFzlPXFlekWycYD0Z+BzSgopxWgQo9E/6lG7VkxAIz6xUr8Kk2zc7wIkrp N1zxq9PI6ERSfBZFkSFm70vq50/bDRsuV6rmTHg0Nqdm4divdscl+t+FACP5CuAk/DOq irtWCuMXtFjCUX+TLeOVjiNawUzNd1RF41x0S7QtJmM+bUagol2k7o5bUrJDohv2INmp 5rHbDonugyo6XIyYRNeZx86NAJ7YurTV9jgFQ/YmGJB7RgPFimMNfSQFJ6ODczlwBiTJ xPjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689095135; x=1691687135; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=egt2ow1fWfYS8Gw95Yl0UW2LZk+PQZpFZ+iHkx63nEU=; b=ZcE9Uzyy7+Q+63BfyWk+Sfspw7PZy/u8rjuidqAM04bHzT6FjvPLAlgj6nxlWoi5Hh TVL1QIUq2CkjF4/nPCftqE15mUaa5/xrt29gefuNg1oIX3eXk/+idoLEV7hIrUgOpSvt stIeq0AkRXZ++iifilOc39ae+Pgcjefjp8aRFU10sCYLsfQ4rW2LR5KWfIF3bNm0lzIT FI21+9b45lADEVMSUKZwAvME7XYKeBc2flNnsSy1rIQQRDPY+KLwK1YSITOigBxPifL5 SbnqTIWZDgsx0YL4s+CMtbXvYDHIKFuxHZh6LRK4zL3uXcXY2IKpWBe9qWkgSEs18eca MYxQ== X-Gm-Message-State: ABy/qLZuY0WxmzcgNLNetsifrc0xh/F/jDag6FWWqiDsMkjz71dANNEB J+d6ujMxbbiqMyq+JwRpH8YtqSNDqBc/twPdrunGcQ== X-Google-Smtp-Source: APBJJlGwX8m5ob2cXU4K9uII5megnsCGiF69LmucGrIx/ZEsclmQh1ho0vxb69t4Dee/wTcKhM3uTHiF2TUVEIRDfoc= X-Received: by 2002:a81:730b:0:b0:56c:fd16:330f with SMTP id o11-20020a81730b000000b0056cfd16330fmr14511958ywc.12.1689095135312; Tue, 11 Jul 2023 10:05:35 -0700 (PDT) MIME-Version: 1.0 References: <20230707201904.953262-1-jiaqiyan@google.com> <20230707201904.953262-3-jiaqiyan@google.com> <6682284d-7ad3-9b59-687d-899f4d08d911@huawei.com> In-Reply-To: From: Jiaqi Yan Date: Tue, 11 Jul 2023 10:05:21 -0700 Message-ID: Subject: Re: [PATCH v3 2/4] mm/hwpoison: check if a subpage of a hugetlb folio is raw HWPOISON To: Miaohe Lin Cc: akpm@linux-foundation.org, mike.kravetz@oracle.com, naoya.horiguchi@nec.com, songmuchun@bytedance.com, shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 04309C00FD X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: dahy8zxxzkzjct4m7h1os5uctq5drm7k X-HE-Tag: 1689095136-868164 X-HE-Meta: U2FsdGVkX1/PKK/ruO5m7YomNtv/Qf281z+rM/YPRp0z4zEKOVU+Z6muXEmlbzmO4gjy2PsPvEwXPb8SDc5nrp7R70Fm4RGEdc+6X8wXJjF0sooMFgzzlI1icwHfjKuqhd/7d3PkDPMKWpewEOl6eL0igUrr+/isOOwWoa+2kkyVz7LHbimLF25pBIxMmX39cbodGk7kAJPR291vUntjUXXHCrqsHisVKAfCHovjm7cHS7z0Mdx9RLHYDVQKFb1fKHi2yHXp5t+JpX9INnNu1tzUumU9MFYfwvtjL8JON67c49VYau7qzNTqRQoOpHMi9V35TwlPyIq+dSTsw2v+W+LpnU8tTl+euttT5aYbvVvURsQbeWp9bDLU0VuH14fVQ5U4kRV9/b7LiG/mskFaOw4ts2lTDA1IND+GzD+gF1h9wlbsuKmLDanCu4gaZep1z8CDJgpPAtS+WQx9zdiJwkc3J4nNrYuEYucgBxqB+wVB+F5djEp6RbFXFjArHyJAH6pxmAC41ahxBoZvJi6+72mY5NVfbQjtp+CX0nudXkIfVwkPYr+DWPzMB10Hy7uxj098gNz1WX4hS4kMbjfTgbwHTZedhTcYuL88pg9/fVzxB+iyTH6ZfiE0Lr0TPkX4sdvALtliSBI31dXSgpOsgTTQeNkMWfWf3DQpB/BCrz5EOjr72hkqU7XdmuQAiemf6jYT+J8SToU+0uH6Fch/sGFK6dILjf7du6Pf434G+kXwfECosqXSuQi2Q8DZ1CyP1qVWW+fecqoJC0aNeoQzSsvbHwp3sS+7F9EmTP9LSOFN5eN0i6pTzPHsp5H/etsk0aDAAED++Q/pqeSozhIMZkPeBLPY3PQDVRhxg+3mLc6WVNZamF3pKiqs0BnG427m2oTLcuFBISrQL29oUV3fwNwVle7FvE6SLpXeFKyYPeGPRko1ZavChb0a/FG4MuOir5jrt7UaF0CLziaOJbT MPcMuGQY xIMim+9qWKENx0b2eMGLk0jaaZh1SG0xGZwpyoHlf9k6VF5b7kkD5lsAx8FUh43OSw2qVdqSzfqyxyPBB7n8WueDMNWBx9T2GRksC6nFzt1hBQNz8p7JYWzHElbmcnWzZIPmo9upDk9ycG4KIbcOxsCu1jtSVatPaQLrp2Alg/H0MC1N3eyVfhASLpVgIvsXBimMXruRHFtcDw2tzrZC2pBGKzWQMJ4GcM9tHoSFB9tZlKfH0sXuNxZghK9Fkj60QdiLlWRLp3yzZBjliRJdosyQdvJnI6L9s5ri0mqQZsCf9klVRXJ33IWdqPzxIdsJsFYFH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 10, 2023 at 8:16=E2=80=AFAM Jiaqi Yan wro= te: > > On Fri, Jul 7, 2023 at 7:57=E2=80=AFPM Miaohe Lin = wrote: > > > > On 2023/7/8 4:19, Jiaqi Yan wrote: > > > Add the functionality, is_raw_hwp_subpage, to tell if a subpage of a > > > hugetlb folio is a raw HWPOISON page. This functionality relies on > > > RawHwpUnreliable to be not set; otherwise hugepage's raw HWPOISON lis= t > > > becomes meaningless. > > > > > > is_raw_hwp_subpage needs to hold hugetlb_lock in order to synchronize > > > with __get_huge_page_for_hwpoison, who iterates and inserts an entry = to > > > raw_hwp_list. llist itself doesn't ensure insertion is synchornized w= ith > > > the iterating used by __is_raw_hwp_list. Caller can minimize the > > > overhead of lock cycles by first checking if folio / head page's > > > HWPOISON flag is set. > > > > > > Exports this functionality to be immediately used in the read operati= on > > > for hugetlbfs. > > > > > > Reviewed-by: Mike Kravetz > > > Reviewed-by: Naoya Horiguchi > > > Signed-off-by: Jiaqi Yan > > > --- > > > include/linux/hugetlb.h | 19 +++++++++++++++++++ > > > include/linux/mm.h | 7 +++++++ > > > mm/hugetlb.c | 10 ++++++++++ > > > mm/memory-failure.c | 34 ++++++++++++++++++++++++---------- > > > 4 files changed, 60 insertions(+), 10 deletions(-) > > > ... > > > -static inline struct llist_head *raw_hwp_list_head(struct folio *fol= io) > > > +bool __is_raw_hwp_subpage(struct folio *folio, struct page *subpage) > > > { > > > - return (struct llist_head *)&folio->_hugetlb_hwpoison; > > > + struct llist_head *raw_hwp_head; > > > + struct raw_hwp_page *p, *tmp; > > > + bool ret =3D false; > > > + > > > + if (!folio_test_hwpoison(folio)) > > > + return false; > > > + > > > + /* > > > + * When RawHwpUnreliable is set, kernel lost track of which sub= pages > > > + * are HWPOISON. So return as if ALL subpages are HWPOISONed. > > > + */ > > > + if (folio_test_hugetlb_raw_hwp_unreliable(folio)) > > > + return true; > > > + > > > + raw_hwp_head =3D raw_hwp_list_head(folio); > > > + llist_for_each_entry_safe(p, tmp, raw_hwp_head->first, node) { > > > > Since we don't free the raw_hwp_list, does llist_for_each_entry works s= ame as llist_for_each_entry_safe? Sorry I missed this comment. Yes they are the same but llist_for_each_entry doesn't need "tmp". I will switch to llist_for_each_entry in v4. > > > > > > + if (subpage =3D=3D p->page) { > > > + ret =3D true; > > > + break; > > > + } > > > + } > > > + > > > + return ret; > > > } > > > > It seems there's a race between __is_raw_hwp_subpage and unpoison_memor= y: > > unpoison_memory __is_raw_hwp_subpage > > if (!folio_test_hwpoison(folio)) -- h= wpoison is set > > folio_free_raw_hwp llist_for_each_entry_safe raw_hwp_lis= t > > llist_del_all .. > > folio_test_clear_hwpoison > > > > Thanks Miaohe for raising this concern. > > > But __is_raw_hwp_subpage is used in hugetlbfs, unpoison_memory couldn't= reach here because there's a > > folio_mapping =3D=3D NULL check before folio_free_raw_hwp. > > I agree. But in near future I do want to make __is_raw_hwp_subpage > work for shared-mapping hugetlb, so it would be nice to work with > unpoison_memory. It doesn't seem to me that holding mf_mutex in > __is_raw_hwp_subpage is nice or even absolutely correct. Let me think > if I can come up with something in v4. At my 2nd thought, if __is_raw_hwp_subpage simply takes mf_mutex before llist_for_each_entry, it will introduce a deadlock: unpoison_memory __is_raw_hwp_subpage held mf_mutex held hugetlb_lock get_hwpoison_hugetlb_folio attempts mf_mutex attempts hugetlb lock Not for this patch series, but for future, is it a good idea to make mf_mutex available to hugetlb code? Then enforce the order of locking to be mf_mutex first, hugetlb_lock second? I believe this is the current locking pattern / order for try_memory_failure_hugetlb. > > > > > Anyway, this patch looks good to me. > > > > Reviewed-by: Miaohe Lin > > Thanks. > >