From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCEDFC67861 for ; Sun, 7 Apr 2024 00:08:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB6BC6B0082; Sat, 6 Apr 2024 20:08:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D65A86B0083; Sat, 6 Apr 2024 20:08:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB7956B0085; Sat, 6 Apr 2024 20:08:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9B0176B0082 for ; Sat, 6 Apr 2024 20:08:40 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 294F0160746 for ; Sun, 7 Apr 2024 00:08:40 +0000 (UTC) X-FDA: 81980799600.06.AD90D7E Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by imf08.hostedemail.com (Postfix) with ESMTP id 0FEB9160008 for ; Sun, 7 Apr 2024 00:08:35 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Us5DcWXH; spf=pass (imf08.hostedemail.com: domain of tony.luck@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=tony.luck@intel.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1712448516; a=rsa-sha256; cv=pass; b=VDOLowNeCmG//87vESp1cjpkQZLM0Kn1Gn+iYkkJlJJooX81DG7C+psGBVxmYWL0HgM+mk mPXyto1HKIrLv83iA//43laR6ffL488FrdZae8KTO4mEF/RWLSOOe1iJFhH59JRSvbcwOi 1+FPikrGMwY9EGLmN9Tr3lgQplbKxbI= ARC-Authentication-Results: i=2; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Us5DcWXH; spf=pass (imf08.hostedemail.com: domain of tony.luck@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=tony.luck@intel.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712448516; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qwKYzRdaq2mpBS264gcPZoynSRvKCdlYxbiRZTymPGw=; b=4sk6rCBb0UixJmDQHkgE6y/xvLaswvQZff338mwCmlE2lm4vrAkkfNZycaH6jhaC9ao/hr LeWMPA7N9gpiZKH+/QUmUJEBqPzNk0jBJ3zFfMSchHdx2zDtw36qC6dDwcMsfSro/lY0VR bXvgYLloHBZ8UyKDHV97LpLQD/rcZIU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712448517; x=1743984517; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=sdYPmXM5Ol3/KLyFRyY0PAy6HAbq18nY8fQ/nOh3HGk=; b=Us5DcWXH6wN9p5uEWS5j7xVFTvvMvFnfgZ//MuqC8r3a9+4ZUtc/h089 WazsgyLvuHzxZQVH/fkJYpt00EgQQwqiJxYXH18mJIoA112uxsh6+Mspu bSn/ajnw5PJDzA1Whx1EO64HstQmIUEcSVDbXD+NyzgSIbbgmZ30hK//u 0rj3W1teHH1UXOMNMTp+TMtbYtTHheHXFNyLfr3nIxIVq/5KRa6o1A7dM ORRmT7deG8DO84b7dx+An+l6L/zLP/dJk9kFpwHbLnttFuV2lg0DZldmT qmm8oY3Wpv/oouV/6KMxgn3fz4A0yIw2wVZ54TUJG6QmuvLPWz+iWLaXn A==; X-CSE-ConnectionGUID: 5gaBibSaQuqi1aPmJYjQ8w== X-CSE-MsgGUID: zthR6WtiSmWjztAWRaqndg== X-IronPort-AV: E=McAfee;i="6600,9927,11036"; a="7655361" X-IronPort-AV: E=Sophos;i="6.07,184,1708416000"; d="scan'208";a="7655361" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Apr 2024 17:08:35 -0700 X-CSE-ConnectionGUID: ssYwYdIjSo6L9O0isrC8/A== X-CSE-MsgGUID: oJdxyhuQQ9avBQBojAe6lg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,184,1708416000"; d="scan'208";a="19532065" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by fmviesa006.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 06 Apr 2024 17:08:34 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Sat, 6 Apr 2024 17:08:33 -0700 Received: from fmsmsx601.amr.corp.intel.com (10.18.126.81) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Sat, 6 Apr 2024 17:08:33 -0700 Received: from fmsedg602.ED.cps.intel.com (10.1.192.136) by fmsmsx601.amr.corp.intel.com (10.18.126.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend Transport; Sat, 6 Apr 2024 17:08:33 -0700 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.100) by edgegateway.intel.com (192.55.55.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Sat, 6 Apr 2024 17:08:32 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VBtDZlEpxSkUCwmdfV47VA30q6jInrwnw5CXd+BDqqn5dhSAePUyfnoIo1S9hvjDY9iuJI/bMLBBgoikdpzN9qxaMW0aOdwmAsAGmQKJ73moGoXG1R3QwaJacunPVoA+EITzsq/U62c3Vbt95QjgBLWDZoPRDY9h0nnDa67EjnhDONSOGrQufUTIOa8w/jUNtRKrL+Npwgz0x2XQLc7eWimjZ/6PVfUkBJJgStjwb/OxQCxXQYZss7xSGu4owp62ijfEKIzBvjNC1p+ilQRKDD8qYdPHgvGAho3dx/fTpXS3vJmaydZ1ojQpJChDreS4EEdG2B2ALjMJuX/p6qsyTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qwKYzRdaq2mpBS264gcPZoynSRvKCdlYxbiRZTymPGw=; b=EG49nwpY+mc9BzN68V2BGuFB1ze7w/GsnqRKCLlDN+y7Q7TpfrSQqhBjKzj+ievzk3AiIZ+vQW4ojYi4TV3XST8zGbGCBi84gMpep0zuQeTyweZ27yIDDunFlF6FkWT+uXLcMPRWVinBza8a/qwgQWlxPLC0uoYFg6Ll7AyKLnFtkfKwvVHloNSg6y3jnDe0a6r+6UK/Lgb4eCrMY/64fC+szAY/GNnGQQloTubCQ9Pi7BlgOBCACWhuUzZWi5ZNpDx9IW7eoFWkK7zSnMEqULAN7F83e76/r8XKKYugYMlM8Tgr2Kb820pyf1O1WlNHB3LpqbaV5YErfowj6Cstdg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from SJ1PR11MB6083.namprd11.prod.outlook.com (2603:10b6:a03:48a::9) by PH7PR11MB6747.namprd11.prod.outlook.com (2603:10b6:510:1b5::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7409.31; Sun, 7 Apr 2024 00:08:31 +0000 Received: from SJ1PR11MB6083.namprd11.prod.outlook.com ([fe80::fca9:7c00:b6d0:1d62]) by SJ1PR11MB6083.namprd11.prod.outlook.com ([fe80::fca9:7c00:b6d0:1d62%5]) with mapi id 15.20.7452.019; Sun, 7 Apr 2024 00:08:31 +0000 From: "Luck, Tony" To: Oscar Salvador CC: David Hildenbrand , Borislav Petkov , Yazen Ghannam , Miaohe Lin , Naoya Horiguchi , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: RE: Machine check recovery broken in v6.9-rc1 Thread-Topic: Machine check recovery broken in v6.9-rc1 Thread-Index: AQHahymLPB0TDn1nE027XBXiiAzj6bFZxGewgADPMSmAAApqgIAABVuAgAFIWAA= Date: Sun, 7 Apr 2024 00:08:30 +0000 Message-ID: References: <1e943439-6044-4aa4-8c41-747e9e4dca27@redhat.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ1PR11MB6083:EE_|PH7PR11MB6747:EE_ x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: z2mb7lCQ9rB4SRoLQwvc45p8EF2svkE8nkxDM7+NOdW095mAvxq6wCTq+Dha1oPLGhUl2xTLb1ZO1tAZGOMaOrobihocWct/NJ7tln6A1mpNDj1YRrBI0pi0JyyqT+eRtRGkmXzcOc4czfoVTm/aaRnA+6diRScY/tYO9zndn91y/Yk8uJBszMaYDpQOT2gzm8jhi7H+RvAdYYLrJ2RnN7mIufO9Mb6Hsf6m5zkfnMlRXkaaILCWY1OvMCZUM7bjuap9/IAQEffW3kDlo7NG1LMs4HpqGtjj9fEha/m/cJVFllakIwnKrOpoQystzbAJdsF0D4B5sIrF/a22muDC2pLpjMFvfk+2NOQV29BkEMd32blnedbn7h4bOjl27Oh6Z5d8TL5HPWp6BVmHyTulzMzcdJXSg/b0dO1JsicPr/LY7gOMYYSlK/E9pqMh+6aXPZSJc56TFTj7GiTrN0y4N9U/5A4ag61MhwtMHFAJmoPplknEwMneCLZ7f2fOsiDuzuWGoj4/O6HTRLtERZg+NgFj82GiCfEU/PuYt8sm4Dbm5O7PByCRNhmfthkbUgY4qjMWbDKZ9WflkcFSljCFhH1TWEvJJIft0XC9kZezX64eCw8YhaLlr5IXdh/jA2caXwHyUfuK8GKB8fUuaCODOsC4PVLcbD3ei+C7v8IZYiA= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ1PR11MB6083.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366007)(1800799015)(376005);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?2YcjvS0uk8uf/eRWTPrJT3GDiA5kwsJSwzEvFyxJvBm4jfcfqwB88rVETMo/?= =?us-ascii?Q?FFiaGefUu5ZeB2/E/7/WjsNzymwhgk3Mej06EWUVmWPaby4ni8ajx2ytZGnI?= =?us-ascii?Q?+bqYDeVucLjLBoBeqY1TlEuHNcC0nEoPs4A7aE2hyXmd7WvU6BIVMnpBOaw7?= =?us-ascii?Q?uj2KLNGLZstjlFUF7UC7YIA8vF4pHGFbzvSERFL0CwLUzJXy9xaZwMsPLxEB?= =?us-ascii?Q?BeXn8pP7oeWdttzUzjgJ5cQcEyUVXsmB8a23bMAzUscJlZaQCJdnmRAJFdG2?= =?us-ascii?Q?a0wz7eTUmCVzO4vmbtGtXFYOhcYQfcs0VbKU1vOMvdXMAAsgSmesx/HZ5ffC?= =?us-ascii?Q?VUku4sAaz0SnSalVnV+3olNIZSb/KVLal0QY/rNoGUj74f1/Vn8EuHdsbFuA?= =?us-ascii?Q?s5SeeEKiMzo1EtNYMdE9CnGmSDBnKLRW7cBTzPuh6TmSoRP/90YAi7BEfBU3?= =?us-ascii?Q?EMDS3IZTg1A6SyWFrIlAoeDY4ABkvaRiHiDBios7ineMhvy47pX6XP8L5I2n?= =?us-ascii?Q?Nqpp1thv9lcKXHLowJFdCR7Qa7qP9RXe+y42MwYNS9DVV0usj0hf9ajnY58K?= =?us-ascii?Q?up9KZP4zXxBc+SL1BRE3NQYu5zOsuksAqcIaml2SCRwqRXegwcIAtim2dwOS?= =?us-ascii?Q?JhxQp/ST4aZtMOr2x8mbYf7MKyJTXNIiuL7ukd83PVC2P1Ap4FnppG6Xhu2K?= =?us-ascii?Q?n5HxQotAuofZ5N0rwBMPHaYljlEROFB/uHRax2ufGg69Exe9Cq+2pNWJZJWR?= =?us-ascii?Q?S2uu1HcyYS6yIuKP/161a2ZZQGuTZDO+lxY5AO9VJ6ATscsPd6cKeYb4wui+?= =?us-ascii?Q?4f5dFB73QqZH6vGgc8jwzvCFZcNdriiTAecsrmsEqhra1WVWsGmFHPFNOyOR?= =?us-ascii?Q?itNVFEWCDSCciDX+X+gyc2iZpdPgVN4bjYMC5p+ZB74msS1HQIqZN9jT1Tb7?= =?us-ascii?Q?tVCoYOHDIdRwKmOgAJeVhO8H4TYoDt25iEE8cuP8A40J7en0B5pV/cmRpizQ?= =?us-ascii?Q?DZsOd4c/MCH/YDLLwV2iYxsTEZ+db+LR/l/QQTTGxImyHGb7wCu0r4aQejDi?= =?us-ascii?Q?2bJow8Zn2dyIDPWz8CnC+aAj8c850XNIeqEVdI4OXl1FBtc8I+lXrVYqlTpU?= =?us-ascii?Q?l19cGHFnHei8lNgdOlRXwp0uMuSKct5tahgscR225F/iMRS9DqquarDenUuV?= =?us-ascii?Q?ccXzDwGOnmXZTzBe/w3YxMeKdIuJ0Dhkke85lyYZIAoOVRLasm+ASOEYQHjj?= =?us-ascii?Q?LyIYse28Zs397d57wC0OvWbvRWkyTR8117Qzg4hXaTxPZ/l79G+GJ06kAFtF?= =?us-ascii?Q?9CI+wQaOKWD3VNDZLx5Mg2C067NMqI+l1V6hkurKvVoHytqqV98qyH0yqtnq?= =?us-ascii?Q?jGyJQrL1CS8+4PFAeJKkduXRv9xDG/n+k0GGRY07xU69pc+gHSR5JYR6B6FM?= =?us-ascii?Q?Go+6BJZIR9gUumsEnv+HwkYUovHjRFIyECmr6weUQBhKh8d7+KcsRoPWqyw4?= =?us-ascii?Q?XSRAoP732b4kXH0LGo4vNfaN5qqvBcZSOBh4wMUSO8YKEHstJpy9kNcn4upH?= =?us-ascii?Q?1UE9UYMswbA6V9lehWJwUKSuMOYrLxsliUXhi4Ht?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ1PR11MB6083.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7f809916-03d4-4caf-0e7d-08dc5696dbab X-MS-Exchange-CrossTenant-originalarrivaltime: 07 Apr 2024 00:08:31.0140 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: w3aRW5pP491PwNcTlYR1jpoUFKhfxWa1C1SohsQGeOKfm26yTf4QN67vEdqID6gtsBSvaz2SM0ZHkDBNQk484w== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR11MB6747 X-OriginatorOrg: intel.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0FEB9160008 X-Stat-Signature: z5ubw4o7dg1qm9ww653wcowipjdyxe5g X-Rspam-User: X-HE-Tag: 1712448515-815426 X-HE-Meta: U2FsdGVkX1+6bmNGqWt6CCBsjiSBeKkiSd6r1AoC5vM5VQLT8QSGSKwYaWFXhqdkRKqGhM5liV9ieFN55mGePgtUVTV2B+P6oVkVXIhLPk0Stw4925QwXwvCrhgaR9MhA94UOZlq/wtCUoHtK7PbmYcfR3i46r6/K5b8uF5/LDHxzOZv6vnp9gOOMlIcchR6hSki6H32aMvNOFCsrI/vBMEdshrD1UCfEdtXYWDIN1mbePKMzlXA/3y/lXLgylfbQ7wLLrB6WTryVV9LeCQoV2UnHZJ3y+CsSmsG6Y2ne9uzmAUQWrmiHFrLYBxdTS0bwOSQAy6mfofStEHWOstZLR9AFcBEXzD+X462BB1K8UowcsWjWSWJFAx5qomY/k5v4J/Ix9JOVY8lRURDXZAjhIs6FC//kXt4IkNoiUvlr5neUES9pVQFyq0rFtKcmx/ypGR/IXpH0+7mWlmrFHBV1deZawkPkmo8Cg712HixqupTkuhEswR8KheKg4WA+Y3L8QdhnlsGWpw3dvehlLIMLKtkYBHjeYoxXmJF8FTBvZhMY0pwqmEpNEeAM6xdGJ9omZE8kRX8AZlA0twCZOrLWCuUajRkf1Un0KIef0G7u9fFB9p8PP1YZp/wsxdzTAvgRj817BlRlp5N0dJFOU+KTTweF/EJNEHNgtFgd7yYl6U73KPkXhO5Xjj6ckyNS8ltIxju6i5tMbiMWQ6f4L0TvCw/XXcsDmznorGdLg8BCSpeqvNL7B+sdw0l3CPpOSbXtOaAJbwJ1uNugOG6o8wOy0diSiltWNb5SxRsaASVgNMwdbveg/1WOd7YZjs6wJ8Agqd5GC7OziCi8rjUKH4Jm+ZGHYxcd9ZnWfJqBoR+0ygfYYXvNMOLXrXw5XIHyIOYvGSsn+QCJzaFhKSnngzwS50lbHgeL5XFZxgYpks7DRE5xN/HMXDp+eFNw6out+6ldEipex9hx49g6ey0Z2I 9ZsGqhA9 Qe8i1chEr0fM1lo4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > This one is against 6.1 (previous one was against v6.9-rc2): > Again, compile tested only Oscar. Both the 6.1 and 6.9-rc2 patches make the BUG (and subsequent issues) go aw= ay. Here's what's happening. When the machine check occurs there's a scramble from various subsystems to report the memory error. ghes_do_memory_failure() calls memory_failure_queue() which later calls memory_failure() from a kernel thread. Side note: this happens TWICE for each error. Not sure yet if this is a BIOS issue logging more than once= . or some Linux issues in acpi/apei/ghes.c code. uc_decode_notifier() [called from a different kernel thread] also calls do_memory_failure() Finally kill_me_maybe() [called from task_work on return to the application when returning from the machine check handler] also calls memory_failure() do_memory_failure() is somewhat prepared for multiple reports of the same error. It uses an atomic test and set operation to mark the page as poisone= d. First called to report the error does all the real work. Late arrivals take= a shorter path, but may still take some action(s) depending on the "flags" passed in: if (TestSetPageHWPoison(p)) { pr_err("%#lx: already hardware poisoned\n", pfn); res =3D -EHWPOISON; if (flags & MF_ACTION_REQUIRED) res =3D kill_accessing_process(current, pfn, flags)= ; if (flags & MF_COUNT_INCREASED) put_page(p); goto unlock_mutex; } In this case the last to arrive has MF_ACTION_REQUIRED set, so calls kill_accessing_process() ... which is in the stack trace that led to the: kernel BUG at include/linux/swapops.h:88! I'm not sure that I fully understand your patch. I guess that it is making = sure to handle the case that the page has already been marked as poisoned? Anyway ... thanks for the quick fix. I hope the above helps write a good commit message to get this applied and backported to stable. Tested-by: Tony Luck -Tony