From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F7CEC433E1 for ; Tue, 23 Jun 2020 22:27:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4ADA220888 for ; Tue, 23 Jun 2020 22:27:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4ADA220888 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B42D56B0002; Tue, 23 Jun 2020 18:27:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACD356B0003; Tue, 23 Jun 2020 18:27:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 994A96B0005; Tue, 23 Jun 2020 18:27:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 7E4AE6B0002 for ; Tue, 23 Jun 2020 18:27:03 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0E3F6824999B for ; Tue, 23 Jun 2020 22:27:03 +0000 (UTC) X-FDA: 76961913126.24.pull52_4c1841e26e3f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id DFD4B36630 for ; Tue, 23 Jun 2020 22:27:02 +0000 (UTC) X-HE-Tag: pull52_4c1841e26e3f X-Filterd-Recvd-Size: 2751 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Tue, 23 Jun 2020 22:27:01 +0000 (UTC) IronPort-SDR: jfbjneW0KT9kP6OjBpxFAAlGt2XeotFfCrUAnx8DzKox2JQOzidwnkKoHSnm0Wj2raHks2PJTL 8K3hpu00MlsQ== X-IronPort-AV: E=McAfee;i="6000,8403,9661"; a="141751222" X-IronPort-AV: E=Sophos;i="5.75,272,1589266800"; d="scan'208";a="141751222" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2020 15:27:00 -0700 IronPort-SDR: KivnWswfxnKY5P1q74x/ixCXTK2rkkNhH6XanVYnObNm5mLI9agHtMWJH61xmoW0H6+WKH/5At LcGyHppLm1kA== X-IronPort-AV: E=Sophos;i="5.75,272,1589266800"; d="scan'208";a="452411863" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.68]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2020 15:27:00 -0700 Date: Tue, 23 Jun 2020 15:26:58 -0700 From: "Luck, Tony" To: Matthew Wilcox Cc: Borislav Petkov , Naoya Horiguchi , linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, "Darrick J. Wong" , Jane Chu Subject: Re: [RFC] Make the memory failure blast radius more precise Message-ID: <20200623222658.GA21817@agluck-desk2.amr.corp.intel.com> References: <20200623201745.GG21350@casper.infradead.org> <20200623220412.GA21232@agluck-desk2.amr.corp.intel.com> <20200623221741.GH21350@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200623221741.GH21350@casper.infradead.org> X-Rspamd-Queue-Id: DFD4B36630 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 23, 2020 at 11:17:41PM +0100, Matthew Wilcox wrote: > It might also be nice to have an madvise() MADV_ZERO option so the > application doesn't have to look up the fd associated with that memory > range, but we haven't floated that idea with the customer yet; I just > thought of it now. So the conversation between OS and kernel goes like this? 1) machine check 2) Kernel unmaps the 4K page surroundinng the poison and sends SIGBUS to the application to say that one cache line is gone 3) App says madvise(MADV_ZERO, that cache line) 4) Kernel says ... "oh, you know how to deal with this" and allocates a new page, copying the 63 good cache lines from the old page and zeroing the missing one. New page is mapped to user. Do you have folks lined up to use that? I don't know that many folks are even catching the SIGBUS :-( -Tony