From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 585D7C4332F for ; Tue, 13 Dec 2022 19:04:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F8418E0003; Tue, 13 Dec 2022 14:04:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A8A48E0002; Tue, 13 Dec 2022 14:04:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 849358E0003; Tue, 13 Dec 2022 14:04:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6E22E8E0002 for ; Tue, 13 Dec 2022 14:04:07 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 405531609E2 for ; Tue, 13 Dec 2022 19:04:07 +0000 (UTC) X-FDA: 80238208134.06.AE4AFD8 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) by imf12.hostedemail.com (Postfix) with ESMTP id 7331E40020 for ; Tue, 13 Dec 2022 19:04:05 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=akNblB5h; spf=pass (imf12.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670958245; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wX7/NhoIaAxo4zTSukgYX9EKILAXR78UEvqHUr9EiWs=; b=AHOGV8Z1t8JiCvLiJ8OtK5iM2AWH0xJghzKD0w9ZSAnc69I59zVbi6/CHcP+fNvK172zZR f9lhYNikLqEySst0/BPjJwmvoCHYxEJOLPWa+PWqvl/Ls/1MYPMTvcKwLfbiotEWOt7ET1 42vQWYTFWB0BSL8x2JiFL7e2/KgqZLg= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=akNblB5h; spf=pass (imf12.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.215.178 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670958245; a=rsa-sha256; cv=none; b=lsGbF+b6Ty0+HwoYsIfbEYww1hD8G8lHWLpazGCM6AMudpBUt7LBKd7y5h5Dqd9rG/9ejn NvAaJqm8129axgNO/uK5M0KSKEX+ZEhdg36uicG1kdPzHtaVN9CGLBz7clO0g7ovAdmahs ZMQdi9Pql7iC8LIHVjSpDDW92uoejCc= Received: by mail-pg1-f178.google.com with SMTP id f9so475985pgf.7 for ; Tue, 13 Dec 2022 11:04:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wX7/NhoIaAxo4zTSukgYX9EKILAXR78UEvqHUr9EiWs=; b=akNblB5hgxxPvE0KPAbQGnBBRVdrkDQjuqaFnCWBh1X93f3fAxvwfqqRp990D5euRl DZx2asGQ9Mjz5e/GJUNn09aZbnSlAtuGFx3lZlMfWAAMaARsUriXpxSWvo2zjvQVzjtz ERM+5ecoCp7lQRyYvYtqm+MF5/0VkM24VL+WBzhAgVMN39HN90sMIfcUizOifv7P/o/+ oi4QaZAEVb4dWkak5GwuTuusVsLuEUpLxYRlg2fSt0IMnZNHXnTongEf95SC4OJBgAY6 VB1x3kzAXi93L20Mr1smLjaAlPEE1l79sT2bhtxaGRqFjzPWEifbV5EFnzx3PIUTJXnw nq0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wX7/NhoIaAxo4zTSukgYX9EKILAXR78UEvqHUr9EiWs=; b=XJ7I/DK3br6h0BP4KDsgAzDjVhSFDaijVXUNHHNg2Il0QXFoCAYYs+WO+9KyCxezGd Qa0bnx2JeFE/rtttW/b39nyrdBcakgixqXSr/HdiwI9FyDmjrdpKZbIm4TtiD3430w/9 FPWTPIqgqP+OnlX/SUJKeV1Ag3SxugA8YFD7EX1x26z5xmK9z1EENuoybT70gKvTTs8S A+HDhJh29NGDYdlLwOlLDRy4JdtSmfK/LB/0rJxPDqjq2DfSy5fmhaZ37SjOs2qqjCmJ KlOATQe3yHcQHp1sj+zRHk9OzMvEmE9eTQiJZF21/8uvVVArIFBrOHrWMj9Qk2gzxCGV wKuQ== X-Gm-Message-State: ANoB5pmubn/bLia2w1eyMv+nz+xDTGI+naF9P6aCBx7mOruOrNwejHNi s9HMYVqBv3YL+QLN7gDzbsFAcK9XawN+A5pOpZ8y+w== X-Google-Smtp-Source: AA0mqf5iIGSPt00CVJE53A/MjvGEcpewfRf6U6yLwCzNph/WseaH12z297gjxeG8FxHwMSE0UfgBN6f8QWKAkcEVanA= X-Received: by 2002:aa7:9534:0:b0:575:c857:edc0 with SMTP id c20-20020aa79534000000b00575c857edc0mr41193843pfp.22.1670958243856; Tue, 13 Dec 2022 11:04:03 -0800 (PST) MIME-Version: 1.0 References: <20221103155029.2451105-1-jiaqiyan@google.com> <6bb93638-5702-076c-b72a-f33b39f35842@google.com> <20221213092743.GA1977915@hori.linux.bs1.fc.nec.co.jp> In-Reply-To: From: Jiaqi Yan Date: Tue, 13 Dec 2022 11:03:52 -0800 Message-ID: Subject: Re: [RFC] Kernel Support of Memory Error Detection. To: "Luck, Tony" , "Ghannam, Yazen" , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , "Vilas.Sridharan@amd.com" Cc: David Rientjes , "dave.hansen@linux.intel.com" , "david@redhat.com" , "Aktas, Erdem" , "pgonda@google.com" , "Hsiao, Duen-wen" , "Malvestuto, Mike" , "gthelen@google.com" , "linux-mm@kvack.org" , "jthoughton@google.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: epsu5bprn4wym6s9xbwn6rcweocebcz4 X-Rspam-User: X-Rspamd-Queue-Id: 7331E40020 X-Rspamd-Server: rspam06 X-HE-Tag: 1670958245-458878 X-HE-Meta: U2FsdGVkX18isgJx922/cCpT7FPL+SnA28aRVkeHwfrzdozj+gsBWlW618X2gQ2Bznpp660hPOpedyFTSMWUPQCxCKK81YeItLg0coi+k5KKRG9tahG49k3qRw/v24FduV5zDrIAAuJEZYhY2fySqBjlrhHR50zLDcxmltTZwL4b23JwDVwh4uinrWSkDp4OvAS7ykpFN2lgo+FjwVztEZAu8kz+0jxIUZmefupS/l+hYVMNUKSyXalOhvVOVUyVCzrWe3wLWyQNCI42He2JDhETpLroedOUUu0VbkFk9cObn90ZIpR+2VZACUlNp+vBNi0nGlT/wZbS8WGbr8I8juXoT27wXRn1mNZ5OSZTa5GzZgYYdiPtN9WglxtHxx7TmDVxKOgWz9LUChFco45OHTGTOKxPclZ7OU3sOndD2kUeYKHpNVz8mTmPXsHcYORfr2VYGTXAy9gm6kjG7V1O9QlGpiKGGeQc0XPhWG2Y3RSzzbiTSCRiglhArPmVYX2jwXbwcDt4DB3VNsYglEd5pLZFM+KJeO0+G/cIG0WSA7SfBoqg+8iuALvMbjmT0HxU3O7+5MFZB45jKp4h3iKvOe2bKEO5fdsV+1ShXsL3Tuu8k5aN7Cx4xBhfUS7hXm2SGDfluTKmCwHjZK99P3KJkzzji3PCeUC+IBChdzgFFHSX+zTwXBBc0k6ftVUykbT4AczuVvWhkdmd7SUmP2T8pndafn0twDmsfnliF2e5ckFhxt6i3hlAlmBiF2b8wsx6caL9ndFT8pDZZwQ51yKsvPqHwpUNgGHqtpEslDSwD8tJAP+TnM69rb8r9xhMWAfqjY4lxccYaaEMVdwyr/siO7Td0gDqfMHzfxgK9Mz0tk+5QW48gMj69BPaLLvAZmaQIjyEC3dAa7RLfE+zD2186xnoNZBnLr5cE9p3Tf76G/zVqEO8jUMETX9O0FKVzlwrx/co54zkOJ0FpK/JJZK 6iuIN9Nn 0DYOIxg985DEN4gYXqgYUgq09CKvcjJIOFJiKJCJO4cpeV8JUeC4y3i3V8Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 13, 2022 at 10:10 AM Luck, Tony wrote: > > > I think that one point not mentioned yet is how the in-kernel scanner f= inds > > a broken page before the page is marked by PG_hwpoison. Some mechanism > > similar to mcsafe-memcpy could be used, but maybe memcpy is not necessa= ry > > because we just want to check the healthiness of pages. So a core rout= ine > > like mcsafe-read would be introduced in the first patchset (or we alrea= dy > > have it)? > > I don=E2=80=99t think that there is an existing routine to do the mcsafe-= read. But it should > be easy enough to write one. If an architecture supports a way to do thi= s without > evicting other data from caches, that would be a bonus. X86 has a non-tem= poral > read that could be interesting ... but I'm not sure that it would detect = poison > synchronously. I could be wrong, but I expect that you won=E2=80=99t see = a machine check, > but you should see the memory controller log a UCNA error reported by a C= MCI. > > -Tony To Naoya: yes, we will introduce a new scanning routine. It "touches" cacheline by cacheline of a page to detect memory error. This "touch" is essentially an ANDQ operation of loaded cacheline with 0, to avoid leaking user data in the register. To Tony: thanks. I think you are referring to PREFETCHNTA before ANDQ? (which we are using in our scanning routine to minimize cache pollution.) We tested the attached scanning draft on Intel Skylake + Cascadelake + Icelake CPUs, and the ANDQ instruction does raise a MC synchronously when an injected memory error is encountered. To Yazen and Vilas: We haven't tested on any AMD hardware. Do you have any thoughts on PREFETCHNTA + MC? /** * Detecting memory errors within a range of memory. * * Input: * rdi: starting address of the range. * rsi: exclusive ending address of the range. * * Output: * eax: X86_TRAP_MC if encounter poisoned memory, * X86_TRAP_PF if direct kernel mapping is not established, * 0 if success (assume this routine never hits X86_TRAP_DE). */ ENTRY(kmcescand_safe_read) /* Zero %rax. */ xor %rax, %rax 1: /* Prevent LLC pollution with non-temporal prefetch hint. */ prefetchnta (%rdi) 2: /** * This andq with constant rax=3D0 prevents leaking memory * content (especially userspace memory content like credentials) * into register. */ andq (%rdi), %rax /** * X86-64 CPUs read memory cacheline by cacheline (64 bytes), * so no need to explicitly do andq 64 bits by 64 bit; * instead increase directly to the next 64 byte memory address. */ add $64, %rdi cmp %rdi, %rsi jne 1b 3: ret /** * The exception handler ex_handler_fault fills eax with * the exception vector (e.g. #MC or #PF). */ _ASM_EXTABLE_FAULT(2b, 3b) ENDPROC(kmcescand_safe_read)