From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA24AC433EF for ; Tue, 26 Apr 2022 15:39:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 338FF6B0073; Tue, 26 Apr 2022 11:39:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E8086B0074; Tue, 26 Apr 2022 11:39:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B04A6B0075; Tue, 26 Apr 2022 11:39:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 091F36B0073 for ; Tue, 26 Apr 2022 11:39:21 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id C3CF6808BC for ; Tue, 26 Apr 2022 15:39:20 +0000 (UTC) X-FDA: 79399439280.15.918D543 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id 7ACE74003D for ; Tue, 26 Apr 2022 15:39:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650987559; x=1682523559; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=+iW/jgFFgS97Z/7p5vKIIjgXHcApOImr+sCkt+MfyEg=; b=Glx69kOAsVpRSiSuPLzhriU9FhmVfk4mDY4X2ionpYu5z9/c4f3xhDUq /KhbyB6kBcow+xEFVEN+mum3URdpjWhaYsFHn6TbUT1kP8xKed1JficlM R3L2gePMlxOoTKzT+dPuXY6L2HyjMdsrAebJq8/Om9vpOhjmcIF5dbWlG 3QSn27Ut4S+B0p175sjc4bVjZwyqLj5JP+hmM6EneW3usbMDwdWLvNB/e alOeAsWdZf0+QZfjPWDkNRb+9oA7PTfbqYoMoPXgZXR5O3Uu5m2xk79Xz ouLZYh7BDu0laBdLDuXZsUFELgLKn5QOKlE1m9xnWq4CQ6mhJ2Agu+Et3 Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10328"; a="263211303" X-IronPort-AV: E=Sophos;i="5.90,291,1643702400"; d="scan'208,223";a="263211303" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2022 08:38:06 -0700 X-IronPort-AV: E=Sophos;i="5.90,291,1643702400"; d="scan'208,223";a="595814310" Received: from dsocek-mobl2.amr.corp.intel.com (HELO [10.212.69.119]) ([10.212.69.119]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2022 08:38:05 -0700 Message-ID: Date: Tue, 26 Apr 2022 08:40:44 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [RFC] Expose a memory poison detector ioctl to user space. Content-Language: en-US To: Jue Wang , Naoya Horiguchi , Tony Luck , Dave Hansen Cc: Jiaqi Yan , Greg Thelen , Mina Almasry , linux-mm@kvack.org References: <20220425163451.3818838-1-juew@google.com> From: Dave Hansen In-Reply-To: <20220425163451.3818838-1-juew@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7ACE74003D X-Stat-Signature: qj3dm5jijhu3z73d5p6doxpiy71jpqwd X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Glx69kOA; spf=none (imf27.hostedemail.com: domain of dave.hansen@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1650987558-437257 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >From your description, you have me mostly convinced that this is something that needs to get fixed. The hardware patrol scrubber(s) address the same basic problem, but don't seem to be flexible to your specific needs. But, have hardware vendors been receptive at all to making the patrol scrubbers more tunable? On 4/25/22 09:34, Jue Wang wrote: > /* Could stop and return after the 1st poison is detected */ > #define MCESCAN_IOCTL_SCAN 0 > > struct SysramRegion { > /* input */ > uint64_t first_byte; /* first page-aligned physical address to scan */ > uint64_t length; /* page-aligned length of memory region to scan */ > /* output */ > uint32_t poisoned; /* 1 - a poisoned page is found, 0 - otherwise */ > uint32_t poisoned_pfn; /* PFN of the 1st detected poisoned page */ > } So, the ioctl() caller has to know the physical address layout of the system? While this is a good start at a conversation, I think you might want to back up a bit. You alluded to a few requirements that you have, like: * Adjustable detector resource use based on system utilization * Adjustable scan rate to ensure issues are found at a deterministic rate * Detector must be able to find errors in allocated, in-use memory What about SEV-SNP or TDX private memory? It might be unmapped *and* limited in how it can be accessed. For instance, TDX hosts can't practically read guest memory. SEV-SNP hosts have special page mapping requirements; the cost can't create arbitrary mappings with arbitrary mapping sizes. What would this ioctl() do if asked to scan a TDX guest private page? Is doing it from userspace a strict requirement? Would the detector just read memory? Are there any other physical addresses which are RAM but should not have the detector used on them?