From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3407C433E1 for ; Tue, 23 Jun 2020 22:04:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A10C32078E for ; Tue, 23 Jun 2020 22:04:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A10C32078E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2726F6B0002; Tue, 23 Jun 2020 18:04:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 222A66B0003; Tue, 23 Jun 2020 18:04:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 112086B0005; Tue, 23 Jun 2020 18:04:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id EDD2F6B0002 for ; Tue, 23 Jun 2020 18:04:16 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9A7495E6A0 for ; Tue, 23 Jun 2020 22:04:16 +0000 (UTC) X-FDA: 76961855712.26.crook85_5306e4826e3e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 6D070180A5E36 for ; Tue, 23 Jun 2020 22:04:16 +0000 (UTC) X-HE-Tag: crook85_5306e4826e3e X-Filterd-Recvd-Size: 3447 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Tue, 23 Jun 2020 22:04:15 +0000 (UTC) IronPort-SDR: MQltFeY76Z6xzg/Lx4etEBFPLnzBvTORsu72gi5watB07asggw1jCdNJ+Vr0u6AR0R8Xry0ODc d2rzH5Wj2KnQ== X-IronPort-AV: E=McAfee;i="6000,8403,9661"; a="132641704" X-IronPort-AV: E=Sophos;i="5.75,272,1589266800"; d="scan'208";a="132641704" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2020 15:04:13 -0700 IronPort-SDR: U6/VHjLAHIDc7wJRSFOh4hSGDw0bnpwNPm+jz2L51Xa0Aqu/d+FkkroMJB2muFI0l5naqfsMkJ WAGdzb25lOxA== X-IronPort-AV: E=Sophos;i="5.75,272,1589266800"; d="scan'208";a="478904509" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.68]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2020 15:04:13 -0700 Date: Tue, 23 Jun 2020 15:04:12 -0700 From: "Luck, Tony" To: Matthew Wilcox Cc: Borislav Petkov , Naoya Horiguchi , linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, "Darrick J. Wong" , Jane Chu Subject: Re: [RFC] Make the memory failure blast radius more precise Message-ID: <20200623220412.GA21232@agluck-desk2.amr.corp.intel.com> References: <20200623201745.GG21350@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200623201745.GG21350@casper.infradead.org> X-Rspamd-Queue-Id: 6D070180A5E36 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 23, 2020 at 09:17:45PM +0100, Matthew Wilcox wrote: > > Hardware actually tells us the blast radius of the error, but we ignore > it and take out the entire page. We've had a customer request to know > exactly how much of the page is damaged so they can avoid reconstructing > an entire 2MB page if only a single cacheline is damaged. > > This is only a strawman that I did in an hour or two; I'd appreciate > architectural-level feedback. Should I just convert memory_failure() to > always take an address & granularity? Should I create a struct to pass > around (page, phys, granularity) instead of reconstructing the missing > pieces in half a dozen functions? Is this functionality welcome at all, > or is the risk of upsetting applications which expect at least a page > of granularity too high? What is the interface to these applications that want finer granularity? Current code does very poorly with hugetlbfs pages ... user loses the whole 2 MB or 1GB. That's just silly (though I've been told that it is hard to fix because allowing a hugetlbfs page to be broken up at an arbitrary time as the result of a mahcine check means that the kernel needs locking around a bunch of fas paths that currently assume that a huge page will stay being a huge page). For sub-4K page usage, there are different problems. We can't leave the original page with the poisoned cache line mapped to the user as they may just access the poison data and trigger another machine check. But if we map in some different page with all the good bits copied, the user needs to be aware which parts of the page no longer have their data. -Tony