From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08096C77B6F for ; Thu, 13 Apr 2023 01:51:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89A4D900002; Wed, 12 Apr 2023 21:51:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 84A3F6B0074; Wed, 12 Apr 2023 21:51:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 73874900002; Wed, 12 Apr 2023 21:51:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6108A6B0072 for ; Wed, 12 Apr 2023 21:51:57 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2A403140353 for ; Thu, 13 Apr 2023 01:51:57 +0000 (UTC) X-FDA: 80674691874.22.120127D Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf04.hostedemail.com (Postfix) with ESMTP id B3C1B40010 for ; Thu, 13 Apr 2023 01:51:51 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf04.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681350715; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0NJzBWo/yQy2TDQOUQwKSpMSX3CnjBsGyHMqKB4PQ7U=; b=eizbcHwvMvkklqz+blXylsmPcX2PP6Kfb0q+LLzLZGhXxL3tH7YXHNNiObPOWr4gX/KX3b RSSI527aI0AYHG+4ZIuqvOv0juht0GGz4iENf8YqXcAcHiy66/orykaWPjhvjydBFPTDhW l808JKzlNdo9F+ocpHNVhAVG+p9FE5k= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf04.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681350715; a=rsa-sha256; cv=none; b=rXOBBQtPmI8TRXGvPud90Xjm4UbNXzcEQ/+zqqkq8HefStqQOhXWE0nfLyFnJ3zoDp3Fsv tFU5xDVL90+Zf8lRQW8M3dTBbSPvpYBxYY4SYidhO6FC5Z6J/TwV3+gFQPdR0Uzj77gX7G q1hd7OlfhFGQMXy+Hn/SU21JeuNfVjI= Received: from dggpemm100009.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4PxjGY0KpjzDsgD; Thu, 13 Apr 2023 09:51:01 +0800 (CST) Received: from [10.174.179.24] (10.174.179.24) by dggpemm100009.china.huawei.com (7.185.36.113) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 13 Apr 2023 09:51:45 +0800 Subject: Re: [PATCH -next] mm: hwpoison: support recovery from HugePage copy-on-write faults To: Mike Kravetz , Andrew Morton References: <20230411092741.780679-1-liushixin2@huawei.com> <20230412181350.GA22818@monkey> <20230412145718.0bcb7dd98112a3010711ad0b@linux-foundation.org> <20230412222138.GB4759@monkey> CC: Naoya Horiguchi , Tony Luck , Miaohe Lin , Muchun Song , , From: Liu Shixin Message-ID: <6a5f3acb-bbc5-9e36-e194-84ec15b059b5@huawei.com> Date: Thu, 13 Apr 2023 09:51:44 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <20230412222138.GB4759@monkey> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.24] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm100009.china.huawei.com (7.185.36.113) X-CFilter-Loop: Reflected X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B3C1B40010 X-Stat-Signature: 4yeu97ger7in4aisdnoog17hdfqj5kgb X-Rspam-User: X-HE-Tag: 1681350711-540453 X-HE-Meta: U2FsdGVkX18/r+ZEI+ly8bDQTPDqv+wXM/LKVnLS1z6m1ZsptHFrsOO/tiRBCT1x3pdUlF4d3sS7ESWeaO/OUuS9cbqevr7AU0vPTZMS5/qMgE/EH2CVFhDtUEOjfhkKPI/0D6NSIl+QIyII7bbNRFVgkNyNb4Uq1YyyFQJ25/uvwy+1hjInABhMDCHjz0lI4JOcXSit9gC0ncpsciXWZjEwoNaGDxNT/1H4SzfMF2+IzAM65LcNxcEN4uxJOkNtg4GVIOoCo5XdgoGyMluwr7rC7GGoHon7/gOGvVOx4Dq2uFQI4G33blKiPCiA6K33ilbwx0uh/ZJVXFHzHYlltNVLFeGCTYVq+eJ5Bro6QvR3doARozIP6OI7gwM/iCbGrWqI9y8Codr7u9mDqbI4XXCsCQPeDHs527LidAaPW4htld5SKbayA5bjI67+/UH4MLai87R7pEvOFrSoHDz+vX/EPxltGZttVQagEgUGNroLCQbYparyKfjbyPdwc+IiR5JkynrreaKlY8KE4wijCYoOWcvhefOq7h0aGMPmOgzo0R2iray+LOsjuIRglG8ugL2ptFDuziyslOUrBIXAuXOLGQ6ZrJzaX+GJaOCGm0EIxaaSprxD1fTxy+2BwiciqrB1Gx8gyZjOrIw7AesonIfH5LIkY/433MIAeOphRzYVcp/TYyuJXjvARhbNyGyQFCFF5bpxuYE8rOieVXtQ44oqMNo5xvLYHHU7mckywc6QTG/2zBpctwuB2BBcLwpNnL7G6PbRx8l/0jP8gaxlQrTz294QZy7RakiD+IwRQJFWQItxl9SMT8f2VQuXUpAXSXz4RXa6lTctDtIniJi6+cQGrA1tSbe4v6VRqiOTFm26HIcI1A11VBtU4EqKsY1Q+YchqYkAQtB7BEbX+IkBl5khZR7hQBSyFEq55umKGoUNKAWUgKsxUpyfUBLm/x7G7jMn/T18vwxruL7zfrp FugvyRTg BF1couuxb57Erq2fhJUrDDj6rZkiUNbW/pgweH8sBcD8WbAjSm4/nuntVQxmdGb7xFWrKr9j/NJpAfeTh7Jgq6C8/W/RNhMaYSKvui4xEnCs+mcDODWalNODh+aO470wWiezkZrRIbBjeppELMb1xv1KjkOYh/kdu3qzD+2ibfU1hct0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/4/13 6:21, Mike Kravetz wrote: > On 04/12/23 14:57, Andrew Morton wrote: >> On Wed, 12 Apr 2023 11:13:50 -0700 Mike Kravetz wrote: >> >>> On 04/11/23 17:27, Liu Shixin wrote: >>>> Patch a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") >>>> introduced a new copy_user_highpage_mc() function, and fix the kernel crash >>>> when the kernel is copying a normal page as the result of a copy-on-write >>>> fault and runs into an uncorrectable error. But it doesn't work for HugeTLB. >>> Andrew asked about user-visible effects. Perhaps, a better way of >>> stating this in the commit message might be: >>> >>> Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write >>> faults") introduced the routine copy_user_highpage_mc() to gracefully >>> handle copying of user pages with uncorrectable errors. Previously, >>> such copies would result in a kernel crash. hugetlb has separate code >>> paths for copy-on-write and does not benefit from the changes made in >>> commit a873dfe1032a. > I was just going to suggest adding the line, > > Hence, copy-on-write of hugetlb user pages with uncorrectable errors > will result in a kernel crash as was the case with 'normal' pages before > commit a873dfe1032a. > > However, I'm guessing it might be more clear if we start with the > runtime effects. Something like: > > copy-on-write of hugetlb user pages with uncorrectable errors will result > in a kernel crash. This is because the copy is performed in kernel mode > and in general we can not handle accessing memory with such errors while > in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from > copy-on write faults") introduced the routine copy_user_highpage_mc() to > gracefully handle copying of user pages with uncorrectable errors. However, > the separate hugetlb copy-on-write code paths were not modified as part > of commit a873dfe1032a. Thanks for your advice, I will add these explaination. > >>> Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() >>> so that they can also gracefully handle uncorrectable errors in user >>> pages. This involves changing the hugetlb specific routine >>> ?copy_user_folio()? from type void to int so that it can return an error. >>> Modify the hugetlb userfaultfd code in the same way so that it can return >>> -EHWPOISON if it encounters an uncorrectable error. >> Thanks, but... what are the runtime effects? What does hugetlb >> presently do when encountering these uncorrectable error?