From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA60EC25B74 for ; Thu, 16 May 2024 12:47:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 080A46B0375; Thu, 16 May 2024 08:47:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 031E46B0379; Thu, 16 May 2024 08:47:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E14456B0375; Thu, 16 May 2024 08:47:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C11C46B02D9 for ; Thu, 16 May 2024 08:47:11 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 75EA612061B for ; Thu, 16 May 2024 12:47:11 +0000 (UTC) X-FDA: 82124234262.02.A94BC43 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf19.hostedemail.com (Postfix) with ESMTP id 28B7A1A000F for ; Thu, 16 May 2024 12:47:08 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yxtGzDPQ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=D9IOcnl6; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yxtGzDPQ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=D9IOcnl6; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715863629; a=rsa-sha256; cv=none; b=cl2NiMmbqTTK1ND33vwDlSwQN+Hag5o4LTJfTpW1d4YeebUo57kx8eNKED+ay9elsZCpLL SIsUYQ8IbwBHS8fFHQsK4kwoU/B54Szh2yA+29GFyjD4L/aJmK+jNxYR9yFUvY82hHV46E WtBjx8IX+yR8gZafuTAzVdndpgTbD80= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yxtGzDPQ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=D9IOcnl6; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yxtGzDPQ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=D9IOcnl6; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715863629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8Uunt0wltRfjPfUxaO1z16wqwyf23EJvV6VQSXtDXjQ=; b=yFbSyrnSRV0Lhs/rjSCrSwxuWFgx81ivMVo3zj6mPX0qLhk7PvFmRqWO+1K0yohVnmDSWv 48Wym9o+MTpu13cXZrdI1tcxEawW7CQisH3Vb51FJ/u22Hmk4Iei6+gf2/69nUhXytWAMN BIvGTGq7oqWKdNLN1gEYGTvb9cYoI6k= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 69ACD34940; Thu, 16 May 2024 12:47:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1715863627; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8Uunt0wltRfjPfUxaO1z16wqwyf23EJvV6VQSXtDXjQ=; b=yxtGzDPQYvGfNYfJcuPm0QDabpmpGL4MyQoXPH7E3Kei/l1ld2Vs7oMAjLsnMZYefC+XyU Fd6SLhSFqzQY6u6AXqQVAbC5Lxw7edwEc0A2Fue0HhBgNvCPpfHpyJrcSY5LRAiaGUuRa7 AhWO+2H0mT1IlHPyxZvS45zfcfEqgS4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1715863627; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8Uunt0wltRfjPfUxaO1z16wqwyf23EJvV6VQSXtDXjQ=; b=D9IOcnl6NhZrbmCbZ1pgb5ON6qGsbNBaymMRAfZVtJH5icBX+7udSYYORpBIsB9Ndcpawp XwsCI3y4fEZaqXBA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1715863627; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8Uunt0wltRfjPfUxaO1z16wqwyf23EJvV6VQSXtDXjQ=; b=yxtGzDPQYvGfNYfJcuPm0QDabpmpGL4MyQoXPH7E3Kei/l1ld2Vs7oMAjLsnMZYefC+XyU Fd6SLhSFqzQY6u6AXqQVAbC5Lxw7edwEc0A2Fue0HhBgNvCPpfHpyJrcSY5LRAiaGUuRa7 AhWO+2H0mT1IlHPyxZvS45zfcfEqgS4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1715863627; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8Uunt0wltRfjPfUxaO1z16wqwyf23EJvV6VQSXtDXjQ=; b=D9IOcnl6NhZrbmCbZ1pgb5ON6qGsbNBaymMRAfZVtJH5icBX+7udSYYORpBIsB9Ndcpawp XwsCI3y4fEZaqXBA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 02033137C3; Thu, 16 May 2024 12:47:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id rOU2OUoARmZhUgAAD6G6ig (envelope-from ); Thu, 16 May 2024 12:47:06 +0000 Date: Thu, 16 May 2024 14:47:01 +0200 From: Oscar Salvador To: Jane Chu Cc: linmiaohe@huawei.com, nao.horiguchi@gmail.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail Message-ID: References: <20240510062602.901510-1-jane.chu@oracle.com> <20240510062602.901510-6-jane.chu@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240510062602.901510-6-jane.chu@oracle.com> X-Rspamd-Action: no action X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 28B7A1A000F X-Stat-Signature: ueofzefdyj6beks5otu38qgy9tqgfojx X-HE-Tag: 1715863628-74908 X-HE-Meta: U2FsdGVkX1+x9J8WrQFwS+6tc2cbvcISFx3RFhU2HIFCHEU3icPJs7q0FGvBrPXY+gKhPOqUDRgYDm1AIXbcWydPswctKaLDM7pWLLdiWjMR9bB6BZhHxhyqsAkUe0/e3vgwLz1AkWQjG2ZKtyrIV/+YSWK8edKSGls7vY6sFLQS9QuM/SqpgaKYCG+10bnYWpML1MorhTHsDaH00T+Nm09oNZGuJLy9AFocplCW8vg5hNW0kOucagUIPhYoZqNKX7xW0LW8zpjobnt883ugw2tuCVDKxt/UAnxD6DD/iR0y9eV40lM10w8OiEsL0hccpUb5l1pjIR1yvI5Yqj6riQVcZ57J5FpfAjX1PLc+jdHTPtE7/HJD1mTXTUB/miQRyYRHDBz3sXTzvGyl9TnWnNNz9GXKeuaU4/IeAjCPBCqyPYcnbluazSEN7z68JpiQrwoJamKL6bbRhPNvFeOqJMaTiyYVUTcJwS69xa6ozrKdNu5UCVo0hqkpHvFABVAV6J6Rvv57Wn5PNx4O/VvxngdLRfq1WkFEAisMDitR9NoYvTtt32cniw+HWBYFZfNl4dsHqWFvsJVFXtWJVlUue3XBzjjuaUir585S+1uxTTnCScgAVD9w+cmLU7ZTb6iLyCJGDdWGERspkrZLPDVEduJacIrIwdbW77u4KMrQmhSCYOlKywVOqVasvt4PuqKwCpYghgVqiEpa75zjcUtgAwrVOS9JHdXIqkfbGKY442D9a7It9g1S67w9JFjEIJMDJXeErD1DTO+W5YoCjBWd68GjNI7L68hrPMiBKTLUOH00BB/bQ7aACEvy4ags9+bYW+LQoF7sWLaPg6OZYRv0vHr4IeFmFldahZdBcGvaCpdemK2vyFRf1d5xz3eBraOvyvVWYMH6pSQz6koBkiRmwJH1zVDmMQFq3fpA9DNORrtlDDthzBgm+dppezmxgFTv605Ko3Cj+5nIvB2LnZt VhJjBmRx ryXf6awQsoFqBEWIzcnM36Ez9lyFmoHErMTAAN7/Cb5aK5yzt0A8mz7yoecqITHrXp1Y5INy/JEPolreoFq9fVNbnQhYMzcG4hglhyr/2rWeg0eSw5/ksUuksKM6+zgcLvnJuVlY2PhLItSPt+HvshirJJ8FhM1/oqTUaAwGd2lUoS87TWiuwGtAFnUJsCcKj4M25wQW/aYmQAzTmIdQrViEAAo9O/Mx/zVTRNytmmfy84O6EalhkVYGNXq9xknWzf5/v8bX+U8R49f6AruKUDaHlxEHNzZJ0L0njtGLIpb7+hDQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 10, 2024 at 12:26:02AM -0600, Jane Chu wrote: > When handle hwpoison in a RDMA longterm pinned thp page, > try_to_split_thp_page() will fail. And at this point, there is > little else the kernel could do except sending a SIGBUS to > the user process, thus give it a chance to recover. Well, it does need to be a RDMA longterm pinned, right? Anything holding an extra refcount can already make us bite the dust, so I would not make it that specific. > Signed-off-by: Jane Chu > --- > mm/memory-failure.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 2fa884d8b5a3..15bb1c0c42e8 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p, > return page_action(ps, p, pfn); > } > > -static int try_to_split_thp_page(struct page *page) > +static int try_to_split_thp_page(struct page *page, bool release) > { > int ret; > > @@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page) > ret = split_huge_page(page); > unlock_page(page); > > - if (unlikely(ret)) > + if (ret && release) > put_page(page); I would document whhen and when not we can release the page. E.g: we cannot release it if there are still processes mapping the thp. > +static int kill_procs_now(struct page *p, unsigned long pfn, int flags, > + struct folio *folio) > +{ > + LIST_HEAD(tokill); > + > + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED); > + kill_procs(&tokill, true, pfn, flags); > + > + return -EHWPOISON; You are returning -EHWPOISON here, > +} > + > /** > * memory_failure - Handle memory failure of a page. > * @pfn: Page Number of the corrupted page > @@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags) > * page is a valid handlable page. > */ > folio_set_has_hwpoisoned(folio); > - if (try_to_split_thp_page(p) < 0) { > - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); > + if (try_to_split_thp_page(p, false) < 0) { > + pr_err("%#lx: thp split failed\n", pfn); > + res = kill_procs_now(p, pfn, flags, folio); > + put_page(p); > + res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED); just to overwrite it here with action_result(). Which one do we need? I think we would need -EBUSY here, right? So I would drop the retcode from kill_procs_now. Also, do we want the extra pr_err() here. action_result() will already provide us the pfn and the action_page_types which will be "unsplit thp". Is not that clear enough? I would drop that. -- Oscar Salvador SUSE Labs