From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C9E0EB64DA for ; Mon, 19 Jun 2023 08:23:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E1CF68D0005; Mon, 19 Jun 2023 04:23:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DCCA78D0001; Mon, 19 Jun 2023 04:23:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBC5A8D0005; Mon, 19 Jun 2023 04:23:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BE56B8D0001 for ; Mon, 19 Jun 2023 04:23:44 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 908F8140508 for ; Mon, 19 Jun 2023 08:23:44 +0000 (UTC) X-FDA: 80918808768.15.8190E97 Received: from out-12.mta1.migadu.com (out-12.mta1.migadu.com [95.215.58.12]) by imf01.hostedemail.com (Postfix) with ESMTP id 74C6E40014 for ; Mon, 19 Jun 2023 08:23:41 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=CFy7fZtO; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf01.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 95.215.58.12 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687163021; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=e1f+pZURW+w1RCPnnZXu1sVV0QTjxA8H8mKJ4xL39JA=; b=7NI9sQZJrCKnge0YpMlz5d9BlxAV0/yeuaeBiNjE0RDP/Vfs+nhI1MG4cnBB22UNNkqFqs cdqeIxuNviVveydTzT1w5ToQMP9RJXV6+P2bPNCOYY2FbpyrjEy0lF3ECLk4hcverjfsQy +8DywXm5ziLMhyPMZYlg8UbFD8+/LIQ= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=CFy7fZtO; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf01.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 95.215.58.12 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687163021; a=rsa-sha256; cv=none; b=VN4jgBfUa1Ocm4Zsub0kEdM8J3skf0u6shsvVaHtPCDv2P/NFSqPVsv3Va5/TncGtGM/iF cnNRbL/Nwd32jWferfMIibB5XUiUhbOKaN3snmq0I5EYYfjU1mo4/hmRzOwZMB0Uiai1ya BrAMX5uKcVJQmxBicWiJT8jpGhRFTCs= Date: Mon, 19 Jun 2023 17:23:30 +0900 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1687163019; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e1f+pZURW+w1RCPnnZXu1sVV0QTjxA8H8mKJ4xL39JA=; b=CFy7fZtOVlnjuX11edB8SC+1mfUuiV37ttTYe0TfaE5rdfh09ywb23mSOR1z+lfR/cdZ8N 7oirQpd7N8PwmNxLTYrEuo4Xlps07Mi7eoLn3bDwrCJsjLzeAJwZGGffwmso6bZ+u4Srhs zQ64v4AHL/TbjQ/4JsjRHlG1oX93mhk= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Naoya Horiguchi To: Mike Kravetz Cc: Jiaqi Yan , HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+jIOebtOS5nyk=?= , "songmuchun@bytedance.com" , "shy828301@gmail.com" , "linmiaohe@huawei.com" , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "duenwen@google.com" , "axelrasmussen@google.com" , "jthoughton@google.com" Subject: Re: [PATCH v1 1/3] mm/hwpoison: find subpage in hugetlb HWPOISON list Message-ID: <20230619082330.GA1612447@ik1-406-35019.vs.sakura.ne.jp> References: <20230522044557.GA845371@hori.linux.bs1.fc.nec.co.jp> <20230523024305.GA920098@hori.linux.bs1.fc.nec.co.jp> <20230612041901.GA3083591@ik1-406-35019.vs.sakura.ne.jp> <20230616233447.GB7371@monkey> <20230617225927.GA3540@monkey> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230617225927.GA3540@monkey> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 74C6E40014 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ayo9razf9o3d8pphogne3q4h1rh7act5 X-HE-Tag: 1687163021-501728 X-HE-Meta: U2FsdGVkX1+vxEAKMxBzIGXCcXeHSzshqKBiIY3fpW+mt5qbStH9D7fm9bxfHdTQZ1wn/IEZTwHJT7KqnPMnhv4sW71lFE1bIep5eat+Z+mgLFAMwliHcPDWqSL7vngyaK8eyCLuEfT27FvM0HMiEN98qWMicwDddLvnVyPSjnjebBRSlfF7NKVR5SyVDUIdZi3Ih5PFxLRJ2PgsGiw6BHvLN5FpxzFy/6q4mxtfuVzeaKJr638+MdR/yKHkA5grc7wzFoMskPDurPyQdWM+RByypxnNBKr214SjLCfOLc3jC5XcNlwCn9pDgLYkO1GfnweDYJeqBHDqFCDlvsUHUZO8SxXIqYlnBSLnkdukX2EB5RheNzzgUxdQGqTcaNEtENkKViwVkLjUbuUzUfRFIKwivxj3t2Y5WLJzHCSUBaHgR0Vx04m2RETkaDxeotnK5rCTKw5GEHDytCfPJ7nVL4S0kA3rOr5npMtBYPWRkwfQ8olWaGcAqiB8H/ASHqfumyByoTGHZarKiLBZ7jo04OW4rD9frpz44tZuX8PMG3qRgVpBuOc5PUaK1UugKvJZT7oqh1XYjENhX8mZ/EXXIWg3ylgPeyBLJSKk8laFvD8c0CndZCEpeNfKNUaBaxqwzqzGFwmnIV59Mx62lO6UpRSPLuJrbiiWhq/ASj8c0XplvWdPrdcq8iXue9YKWntNCI11Sxk2hq1mzq386W2jG32myeFPjHeNr3+s+numSkwFFrkg5VKgZSVkaxBeoaob+zJoSZ9VFyR2wS4xW2009Z7HbnUKE9MQ400aaKzPOkpo71G3Otf3YRhbpEHVuwb3BDzreNmilrGdJgk+51ethR8BdP2lLA3z00puqC17j3cCPITY/3HWy1NSsvSXH6IP+pG+dNPvLBSiuEa+sS8fwtoVNLST0q1yWKT58JRi3C5GyUpGtDNre8onsQWKxnbrJoWwQCmudf71hPzViva 9tNVq5zJ YhHKDPnHfHIzg+D/DKPd6TXHhbMA+OpMwFWo4lfVQXfWo3zuYPD1BbwUOJcGVjmKl9Pf+NC06gOxFqJxmaWtWK54IFwpluunUZ7K+i7hAXDg6WdfQq7aVCjdlZw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jun 17, 2023 at 03:59:27PM -0700, Mike Kravetz wrote: > On 06/16/23 19:18, Jiaqi Yan wrote: > > On Fri, Jun 16, 2023 at 4:35 PM Mike Kravetz wrote: > > > On 06/16/23 14:19, Jiaqi Yan wrote: > > > > > > > > Now looking again this, I think concurrent adding and deleting are > > > > fine with each other and with themselves, because raw_hwp_list is > > > > lock-less llist. > > > > > > Correct. > > > > > > > As for synchronizing traversal with adding and deleting, I wonder is > > > > it a good idea to make __update_and_free_hugetlb_folio hold > > > > hugetlb_lock before it folio_clear_hugetlb_hwpoison(which traverse + > > > > delete raw_hwp_list)? In hugetlb, get_huge_page_for_hwpoison already > > > > takes hugetlb_lock; it seems to me __update_and_free_hugetlb_folio is > > > > missing the lock. > > > > > > I do not think the lock is needed. However, while looking more closely > > > at this I think I discovered another issue. > > > This is VERY subtle. > > > Perhaps Naoya can help verify if my reasoning below is correct. > > > > > > In __update_and_free_hugetlb_folio we are not operating on a hugetlb page. > > > Why is this? > > > Before calling update_and_free_hugetlb_folio we call remove_hugetlb_folio. > > > The purpose of remove_hugetlb_folio is to remove the huge page from the > > > list AND compound page destructor indicating this is a hugetlb page is changed. > > > This is all done while holding the hugetlb lock. So, the test for > > > folio_test_hugetlb(folio) is false. > > > > > > We have technically a compound non-hugetlb page with a non-null raw_hwp_list. > > > > > > Important note: at this time we have not reallocated vmemmap pages if > > > hugetlb page was vmemmap optimized. That is done later in > > > __update_and_free_hugetlb_folio. > > > > > > > > > > The 'good news' is that after this point get_huge_page_for_hwpoison will > > > not recognize this as a hugetlb page, so nothing will be added to the > > > list. There is no need to worry about entries being added to the list > > > during traversal. > > > > > > The 'bad news' is that if we get a memory error at this time we will > > > treat it as a memory error on a regular compound page. So, > > > TestSetPageHWPoison(p) in memory_failure() may try to write a read only > > > struct page. :( > > > > At least I think this is an issue. > > > > Would it help if dissolve_free_huge_page doesn't unlock hugetlb_lock > > until update_and_free_hugetlb_folio is done, or basically until > > dissolve_free_huge_page is done? > > Unfortunately, update_and_free_hugetlb_folio is designed to be called > without locks held. This is because we can not hold any locks while > allocating vmemmap pages. > > I'll try to think of some way to restructure the code. IIUC, this is a > potential general issue, not just isolated to memory error handling. Considering this issue as one specific to memory error handling, checking HPG_vmemmap_optimized in __get_huge_page_for_hwpoison() might be helpful to detect the race. Then, an idea like the below diff (not tested) can make try_memory_failure_hugetlb() retry (with retaking hugetlb_lock) to wait for complete the allocation of vmemmap pages. @@ -1938,8 +1938,11 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags, int ret = 2; /* fallback to normal page handling */ bool count_increased = false; - if (!folio_test_hugetlb(folio)) + if (!folio_test_hugetlb(folio)) { + if (folio_test_hugetlb_vmemmap_optimized(folio)) + ret = -EBUSY; goto out; + } if (flags & MF_COUNT_INCREASED) { ret = 1; Thanks, Naoya Horiguchi > -- > Mike Kravetz > > > > > TestSetPageHWPoison in memory_failure is called after > > try_memory_failure_hugetlb, and folio_test_hugetlb is tested within > > __get_huge_page_for_hwpoison, which is wrapped by the hugetlb_lock. So > > by the time dissolve_free_huge_page returns, subpages already go > > through hugetlb_vmemmap_restore and __destroy_compound_gigantic_folio > > and become non-compound raw pages (folios). Now > > folio_test_hugetlb(p)=false will be correct for memory_failure, and it > > can recover p as a dissolved non-hugetlb page.