From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AF8BC02181 for ; Wed, 22 Jan 2025 07:39:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B767F6B0085; Wed, 22 Jan 2025 02:39:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2F376B0088; Wed, 22 Jan 2025 02:39:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A15446B0089; Wed, 22 Jan 2025 02:39:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 881896B0085 for ; Wed, 22 Jan 2025 02:39:04 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 14411B0FDE for ; Wed, 22 Jan 2025 07:39:04 +0000 (UTC) X-FDA: 83034286608.30.F468840 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf05.hostedemail.com (Postfix) with ESMTP id 71BBA100003 for ; Wed, 22 Jan 2025 07:39:01 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf05.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737531542; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7nvTunODK14JB+j6dE+4UEzB+x7zH9E7ECDBowRQquQ=; b=i8sLHlALQcHCS39lzLKMYTsWZZRKCVOHjx1H9XOshpZh9AhnlncGW/OGkCsuxnpe4WCAt0 maSRmmJW2tnA9pX+zays3E5B0kjGq95lgLr4N0SvZqOCuMbGgC41ee94Ak6ylkXmH6DUMC hbn5CMx79hZlzIM8FQ/BWmeflN9azxQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737531542; a=rsa-sha256; cv=none; b=7OfR0IKWOMqjTh0DaxYZ9bxOYnSjczznpCBkgFOSBcxfBt1LkRoRkhzkSTE8nGramK/0uF nlc8sxra+tnYcq3zVECRbS5hjCxin5QNwEC0HCvAdim34b5F4BWZs2Y5Lm4jQ91hIDTPKq CGf9/Ooy+t30leBYdcjVD9QFciqk7ic= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf05.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4YdG8N18MQzbp2j; Wed, 22 Jan 2025 15:35:48 +0800 (CST) Received: from kwepemd200019.china.huawei.com (unknown [7.221.188.193]) by mail.maildlp.com (Postfix) with ESMTPS id E8AE914010D; Wed, 22 Jan 2025 15:38:57 +0800 (CST) Received: from [10.173.127.72] (10.173.127.72) by kwepemd200019.china.huawei.com (7.221.188.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Wed, 22 Jan 2025 15:38:57 +0800 Subject: Re: [PATCH v2 1/3] mm: memory-failure: update ttu flag inside unmap_poisoned_folio To: David Hildenbrand CC: , , , , , , Wupeng Ma References: <20250116061657.227027-1-mawupeng1@huawei.com> <20250116061657.227027-2-mawupeng1@huawei.com> <21674fcc-bd5d-3e32-6e45-f0a16ab93202@huawei.com> <34ccd133-7623-4cd8-aad7-08526a97c472@redhat.com> <80984553-a2b9-46b4-acdc-f7abba3c755f@redhat.com> <2fc40750-2ff7-4e73-ba52-c4d9caaa4f4f@redhat.com> From: Miaohe Lin Message-ID: Date: Wed, 22 Jan 2025 15:38:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <2fc40750-2ff7-4e73-ba52-c4d9caaa4f4f@redhat.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.173.127.72] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemd200019.china.huawei.com (7.221.188.193) X-Stat-Signature: jei73w97jffjbef5dtgmrbueey1jatjf X-Rspam-User: X-Rspamd-Queue-Id: 71BBA100003 X-Rspamd-Server: rspam03 X-HE-Tag: 1737531541-606709 X-HE-Meta: U2FsdGVkX18JN9DNFG4tPXAGimzdQfc2M+klHC0wME/D4/TWaXKG90RicF51H6zSJVdb47hKAfhfBDKmvVBIa523pzheYGGh00mtZy047+zuFPtjJpcFdp8aPRl+WsO/uls1yiwZEUgNjinbzI0n2Tun4jW8IpBUHXOsT3d0xkHsaTRUUqre9Nd8AQCAQWumU8+yoHrs3G7yDBV+yXMBBrvqQaMhZh7H1AJ5zTYm5PFzlPGYLl57L58KS4/8Id4xeeh51BJ3CNkq4e1XdZSUmVf1vprLHV9M9KF30k5FAOHcNEoJhEfE2MRh3eFCPltV0ylep25mzCRteb9ZXpx3/H3QXkBlEz49Sg+UfebwIX3XStwLMWGABJwXoZV1SnRamD60o0pAf0SHpHa74fKFNG+HPDnKpV68SiN+RvbIwndeDyj/9UleGahUV6iFt8ks5kMefoP8zJdYyrg7IKVZdV9wgJMb/A1YnA36fMDRA7SYYuCeqM9bp2rFb/xq03FwknNbSGr8xIXIQ+HhHD6JuPCQ4xcztaiyUB4ELxFoDnczEV7nbnXz0bPGiyt8gCtH6xUpNmLYN/BGl5rS6mP6iQHiC5WpiFlUPsWepj4BFcoK9GCO31Xt5dbkjLsx1T+OewmS+PWOqjTIjHmLHLkeo+pMG9EpGNa2RoG4e5v9xjy8rV3lzO3T1Js6C465BJJAcpIgQz5D4IkecDTpw3LZY2/5lzaqLiLriQbyhuZ3+/QMdMDDufjt4DyxEwTAHvbgXiGy8obBRRgymIKiRN1ctCL492OEc9NNIId4sSPgIGnD4i5gjDVzEk8heTb20KUaN3Dn37uyNzGDHu2o77Krkw3cGDqCEPXAi3S1r9x+LWfTY9ZiGV7bexppd0V17yFOAEQUh0C9XW3z53T+RjTh5yy/zKExwqHrgaTG3UkSuyUey/8wS04WGxzh8vJ4H4Wfbp726rDaF+miYi72ob3 2x4SQU7N 2eeyeOdWMNM96gObWX8hU3pP6tbWuaimC0JRwutY8mdwEUzuzICxtp8PGOVoO4WxYdOImYOk5OtgcveSqf/AZ1Laq7qQPgvVDZoa9KofzSs6mgxf+E8h5WDg/RQLi0vyx8/CrdN6iWz6EQdJQCGvAw4SQSwd1spfoDS6l5FLcGRh/zJPpBWZpgtG7w2JKMVBYQown43EeY/zw8l8/MBhsRP/nihNQFd3Q8MAkpSosI/l75ElPaEj6Eui7Q72snwKkUIq0DQltc0eiQt94upACApWL6w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/1/21 15:58, David Hildenbrand wrote: > On 21.01.25 04:20, Miaohe Lin wrote: >> On 2025/1/20 16:46, David Hildenbrand wrote: >>> On 20.01.25 08:49, David Hildenbrand wrote: >>>> >>>>>>         if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) { >>>>>>             struct address_space *mapping; >>>>>>     @@ -1572,7 +1598,7 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu) >>>>>>             if (!mapping) { >>>>>>                 pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n", >>>>>>                     folio_pfn(folio)); >>>>>> -            return; >>>>>> +            return -EBUSY; >>>>>>             } >>>>>>                try_to_unmap(folio, ttu|TTU_RMAP_LOCKED); >>>>>> @@ -1580,6 +1606,8 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu) >>>>>>         } else { >>>>>>             try_to_unmap(folio, ttu); >>>>>>         } >>>>>> + >>>>>> +    return folio_mapped(folio) ? -EBUSY : 0; >>>>> >>>>> Do we really need this return value? It's unused in do_migrate_range(). >>>> >>>> I suggested it, because the folio_mapped() is nowadays extremely cheap. >>>> It cleans up hwpoison_user_mappings() quite nicely. >>> >>> I'm also wondering, if in do_migrate_range(), we want to pr_warn_ratelimit() in case still mapped after the call. IIUC, we don't really expect this to happen with SYNC set. >> >> Do you mean TTU_SYNC? It seems it's not set. > > With your patch it will be now, which is the right thing to do I think. > >> >> There might be a race will hit the proposed pr_warn_ratelimit(): >> >> /* Assume folio is isolated for reclaim, so memory_failure failed to handle it at first time. Then it's put back to LRU. */ >> do_migrate_range >>   folio_test_hwpoison >>    folio_mapped >>    >>     unmap_poisoned_folio >>    >>      pr_warn_ratelimit(folio_mapped) >> >> But I might be miss something. And even this race is possible, it should be really hard to hit. > > Does try_to_unmap() care about isolation? Skimming over the code, I don't think so. I assume once we take the folio lock, races with reclaim are impossible. I think you're right. I missed folio lock in above race. > > In any case, the race is unexpected, so pr_warn_() would be helpful and not harmful. > > Memory offlining code will later simply skip all PageHWPoison() pages, independent of the refcount as it seems. Failing to unmap might not be handled correctly at all ... I think this might be problematic in other regard (e.g., GUP references), but failing to unmap is "obviously" bad I think :) Agree with you. Thanks. .