From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CBC6C7EE25 for ; Fri, 9 Jun 2023 03:36:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 84EDE8E0003; Thu, 8 Jun 2023 23:36:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FF6E8E0001; Thu, 8 Jun 2023 23:36:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 651DE8E0003; Thu, 8 Jun 2023 23:36:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 566C78E0001 for ; Thu, 8 Jun 2023 23:36:33 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 26D93803BF for ; Fri, 9 Jun 2023 03:36:33 +0000 (UTC) X-FDA: 80881797066.03.8A1CDED Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf28.hostedemail.com (Postfix) with ESMTP id A9F04C000E for ; Fri, 9 Jun 2023 03:36:27 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=eoBNx82O; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686281788; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wiORaTUU2xpxCXGMr5bQt8ja3/eyAUKmNEY+oG8upKE=; b=dcfdEHyBLoI37HCN9HtnElyXRM3nLr6r5pX17IsEaYZPx8QcAT7cjQKNAF9Ikmt5X7dBIP o4IQ4zdoX82mTx5o+oRV4H2nLpRus7J5PLW5nGKrtZl5H5I+glbjxgh6o22ieZTW7fiLQ8 OyS+A9L+NyRShJyRzeRSRCsLorWiXYg= ARC-Authentication-Results: i=2; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=eoBNx82O; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf28.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1686281788; a=rsa-sha256; cv=fail; b=LZAJlyM/n2Rdu8L4P2Wt1yisMmDJoGA8aI8G+wSpWpj1mwSNo8Vk24+pZqm3WV8v8ctTOq 03G9X32ehxhDBimnN9oVKNUahBVpuWIIaD50Hp0DaCRhbJGwF2d1BMNy4/dQNnIlMmjbyp 7jhUzpJ0hmYSsK2oJneaX1w2VN88REM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686281787; x=1717817787; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=Sr/7gA9G+1eyiUQd5i6TthtgF+XL8wm+BROj+eDL+7g=; b=eoBNx82OfxA8JXq6Vme8N+4BESI7ADGo5unRCMAwbETKWQ5GnhqNzfhq lrsSjQR95qm9lRUu65MB+8ieIhvg3K4apxURfztNJiSgRTT6hN0i8KUj2 503x60ull0wY6DQmIBjQ3kG3lMP9H+XP02QhY0/sVnRoJn9wX+nZv1OZD vAmTmDzlhCDaAvllROpu/I0TrTESqFKNHkR3Ee8TsI2f0GDUysNtVX6Ru KpXxxXgyIsHVUyqdRewv58Oh7BUSehuo9Oe+wFEEp07I+v6jCwPcagONP rZn6loetXIs4HjzYssCeJuxLtja+f6thAOQjA6iOMHxugqBStvgDeBSuM Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="356391973" X-IronPort-AV: E=Sophos;i="6.00,228,1681196400"; d="scan'208";a="356391973" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2023 20:36:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="834471988" X-IronPort-AV: E=Sophos;i="6.00,228,1681196400"; d="scan'208";a="834471988" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by orsmga004.jf.intel.com with ESMTP; 08 Jun 2023 20:36:20 -0700 Received: from orsmsx611.amr.corp.intel.com (10.22.229.24) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 8 Jun 2023 20:36:19 -0700 Received: from orsmsx602.amr.corp.intel.com (10.22.229.15) by ORSMSX611.amr.corp.intel.com (10.22.229.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 8 Jun 2023 20:36:19 -0700 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23 via Frontend Transport; Thu, 8 Jun 2023 20:36:19 -0700 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.100) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.23; Thu, 8 Jun 2023 20:36:18 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EhzIvzHeYRhvRDZ2UkXLUfHXtT8Pa2y01fJugJOHncq9yuxnpM5Y6ZRJX1A+aiGYNyf/j5teA0GjdojaUIgg0/4BYPqDIaB9EE9t79MO7ueG986GGU8XsUIYwV3XyX3Q4vihkO3lO3urjyZPhyynn8vGzAsb4i44eqpJ/wi+sW9kCKnkelxAmCjiEI9Hwl1vQ4hhI7L5RZVlWhC1Uo6ToexXyOa9kYCxgRN2i1J9cjOZXARK3r54dyPi3R8ZbyQoZ4QPvDZ6V0U9oRiw4DwO50qTPIYWnkXMWHWnq27r7ABUp6NT+7UIDcHBrgfSpPrgTFk8OJMgIoyWwUgNKkVgxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wiORaTUU2xpxCXGMr5bQt8ja3/eyAUKmNEY+oG8upKE=; b=j6fLlzT48poOjGvSQZqW9Rw0vu65bDKoY7iurqN7nsDNPzIvpMrvPx6AmwTA9nDNh29IskxjXXW9UBmKINMl25nHyHRrLf1jF9dCfqpP/9LqST6GSCUJRtZ6vuTxnBtQJetReUr419mdE8El/rlX/2+3GVFKB6Wb2uZCjM+NwF+iFL4o3R3QMI7NULEcAtHLehDxaHByMAuXBJAnoLSiQ8adjMFgj+mP73qBFLq398X4KthreMmvY1oGXYvASfpGgD1U7UOj6sc9CVVGzPOg/veWnvV3fkVcfSzjm8S6yH2cqnoH+1JZW/ZTrM/Uc/1YQii65SNGIBrpfg98vIXCGA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from PH8PR11MB8107.namprd11.prod.outlook.com (2603:10b6:510:256::6) by CY8PR11MB7034.namprd11.prod.outlook.com (2603:10b6:930:52::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.32; Fri, 9 Jun 2023 03:36:12 +0000 Received: from PH8PR11MB8107.namprd11.prod.outlook.com ([fe80::95c6:c77e:733b:eee5]) by PH8PR11MB8107.namprd11.prod.outlook.com ([fe80::95c6:c77e:733b:eee5%5]) with mapi id 15.20.6455.030; Fri, 9 Jun 2023 03:36:11 +0000 Date: Thu, 8 Jun 2023 20:36:07 -0700 From: Dan Williams To: Mike Kravetz , Dan Williams CC: David Hildenbrand , Miaohe Lin , James Houghton , Naoya Horiguchi , Peter Xu , Yosry Ahmed , , Michal Hocko , Matthew Wilcox , David Rientjes , Axel Rasmussen , , Jiaqi Yan , Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] HGM for hugetlbfs Message-ID: <64829e26edbc6_1433ac29475@dwillia2-xfh.jf.intel.com.notmuch> References: <20230602172723.GA3941@monkey> <7e0ce268-f374-8e83-2b32-7c53f025fec5@google.com> <7c42a738-d082-3338-dfb5-fd28f75edc58@redhat.com> <75d5662a-a901-1e02-4706-66545ad53c5c@redhat.com> <20230607220651.GC4122@monkey> <64824e07ba371_142af829493@dwillia2-xfh.jf.intel.com.notmuch> <20230608223543.GB88798@monkey> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20230608223543.GB88798@monkey> X-ClientProxiedBy: BY3PR05CA0047.namprd05.prod.outlook.com (2603:10b6:a03:39b::22) To PH8PR11MB8107.namprd11.prod.outlook.com (2603:10b6:510:256::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH8PR11MB8107:EE_|CY8PR11MB7034:EE_ X-MS-Office365-Filtering-Correlation-Id: 1acb46b0-d68b-4d5f-17ff-08db689aab38 X-LD-Processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: kaQ1L9pFL5LfleUNm9pg6Rl8+750wNpe8K0hva8avRNZJflCid++LG3QYiFVVbtNQKXCKVVWwyBsoELC9wG042jbdx3suPo3m5vYNEWZARAN+5kHcoYuMU5R7jtI976f3q71C0GAwF8b9waJRHkPqlp34i0JZRQFDcZ+hxhdceMDCTurLmlBQm0As6ccblchXFCzCnQc/JVwi3n+t4nwdQ2d2bznpVkQnqE2VXFTJy4sumMifT2MdZx/B/i4k5BpiV3IUYTueAN9QiljTQd/3K+tLl9x3LtxpLp3yohvhjfrT+EO5ai5/OD2GoDNs0fbnSWTTcH+j7tYf1IbaseFKRiSmtCYuzTkZjCD3c5xNXnumiucwRGeIddPtI9h8dABREk6o4evbDo8DjIjJuPTlbP22m8caEEbN4r8V/H3T+JJ02jkumAdpMrljPlo69eCeUyARM1iNXb1pyMEb6GbRYVXrIIn9PCwEmNAKAW+aM4nqRFYetMOU9xag8wqbJ9rmZx1gF10LjlDP/UstEbjSaHGRjq6BUaYCDBiSae+5HQ= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH8PR11MB8107.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(346002)(376002)(39860400002)(136003)(366004)(396003)(451199021)(53546011)(6512007)(9686003)(38100700002)(6506007)(41300700001)(6666004)(6486002)(186003)(26005)(966005)(83380400001)(478600001)(54906003)(110136005)(66556008)(66476007)(82960400001)(66946007)(316002)(7416002)(5660300002)(4326008)(2906002)(8676002)(86362001)(8936002);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?BgYYoy/wodZPKcXEKsUMYCbYsv/zePJTbcCmaLWx7nEc2n1rPfvMMLNX5VcR?= =?us-ascii?Q?J0EITDr0pjd8IqHAx+avGobLktPxd4iPeCfEGYkQNbiUDA04VzssIjhDI1BU?= =?us-ascii?Q?C/Oy2lq+/iN+xzQKIu6SWcgdDmbFU7JPjtMc+tTipCl5l7VmChAjZIblA+UW?= =?us-ascii?Q?O0Vjm8r/j0ugKJWBAxLOm7I6JUN2B+d4FPqFlf2CbiPZpd6eK3PhXkMklx0K?= =?us-ascii?Q?DT0iCluekxFxFWAIO6xJUBPD5tmUonE3wMZpf+PFHSYxYPhKm5BkaNmivqwl?= =?us-ascii?Q?Sni3schko2YQ99loBQGekDiXcEJoptHeF7iVmlFc8hZCYN8QsDIjGRAKtNlp?= =?us-ascii?Q?cEXhZ3+nPvDvZehKILu+T7vxgUL1KLpoa2ICY4j3PSALsIDG5c+Y9FUvKWCu?= =?us-ascii?Q?SCzoJ2qc1HxgGhJAVP6CvkRweWk7xIiIw7vs9/oiwVY3ZJiZJZ90FJaZuvNt?= =?us-ascii?Q?/6dMonDY9BhZ9G4kc6ZolTKqidrHegx+AJpYtJkw8/h/XibXd/K2OLKxcF+z?= =?us-ascii?Q?lMjFEtn84NSDFJP7lXLi3Sv0wtgCFFYl6WDJkALp5bXs5HE0u/2i7l+QiDtp?= =?us-ascii?Q?mHPNnKXPwXEa46VzzpRrBHPHftVheRvekX2OBUH5T2Vy4U7ByHXIT08qGrj6?= =?us-ascii?Q?gE67Scz0EWMViOa9jAxmTK+4qNQ33bll04wNsueaIem7P2tjybsI2IqrkkQI?= =?us-ascii?Q?AuT/To2NDDbivVmXpH7TONpc79kP2nOc3ARpyKxfrl+gZdO+JeA+WRe42RjR?= =?us-ascii?Q?RSx6QLdHNO/FDOiP3gOazgJIgsfIUcNL1v9tvBP02GbChNBXiQaif8hOusd4?= =?us-ascii?Q?AVYGQV70vcYUr5UPS+WkRb9KNacXRt9DcTjO/Dn+/Kc3Cb9EgnHvYfv8PvMu?= =?us-ascii?Q?MfYYmsYK6rz5T/wwdBa9UF8eO98zySsCt5ATbITrs6jXsls0r0alv65rY3dw?= =?us-ascii?Q?osr1S7r8qmsdLNFpqEfFeLG1A5LY7vi9ys0/UkAZuyv4EV4lulaWWrpnsbvP?= =?us-ascii?Q?H3s/CEAZnWsyBQn9PcmJzxqceuDPts4+bkNfOXRIhFC0iNZdFDL2wCyq+k0Y?= =?us-ascii?Q?RF8Wf0Cu98TUrv6Naxf1RvpKHNq5ddvqjRSXrxxW0eFqCw0FR9zsRGYiG2xk?= =?us-ascii?Q?TY9BxpcrH9uuRCaY6OulvvRtBn/c2yW+pk04ywYgBcIn4pW2k0EZYciQDyoL?= =?us-ascii?Q?1Bzs+ZeeMN3w0TgtfDGYkjFP/pUVyUgL21khVNUE2/waOBrYfd9Mnq80hxPJ?= =?us-ascii?Q?IJ7hKBWx5wX5ePj9VOUAxZGLFOHJQ4WOPemuPtwm7SG26c3tmmrb4uJGa1E1?= =?us-ascii?Q?64e7Q0bhcS05YMv8OppdSt9Hs2zrj9HFz5IC1pEKTKltFsK1YntRtUCKaAlN?= =?us-ascii?Q?VAsi7+RK6HlqTZAxVqX/UFXaTVXfBcsiTp3LtI2NAI2hA8WEZckI4XhIwIAt?= =?us-ascii?Q?ihoCASl/UHkZxNGEpjOB3AvYPIz4CCcELCGbu+yd377mV2uQpyZiS7PiuqAz?= =?us-ascii?Q?f+JGeXk/3VXkOHGDh/xGp6Wymfldj8TBRrIWGYpOkA0v+TCHWEh4nhOvTBhf?= =?us-ascii?Q?EWZiVOrnWLUpGiLULjGECOHcshpqc0qdfTI0c3cPMulwHx2dvnwSLYaARTQP?= =?us-ascii?Q?IA=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 1acb46b0-d68b-4d5f-17ff-08db689aab38 X-MS-Exchange-CrossTenant-AuthSource: PH8PR11MB8107.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jun 2023 03:36:11.1275 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: B2M9Y8cWI4cZ9pVDr+l+zINqUUORp2rWEl2y/zH1fk+K8lDOwkIaT4TLDX2SJ16kaAKQdWrcmgivdM0KrmBFnkdIeKAWkfwcBCGc1qWCS6Q= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR11MB7034 X-OriginatorOrg: intel.com X-Rspamd-Queue-Id: A9F04C000E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jcomm5975yhn5cgnjmmtqrr7n6wendc6 X-HE-Tag: 1686281787-357354 X-HE-Meta: U2FsdGVkX1/G9Bul7xZ8BZL9T3h1z4FVGSU5ElS+EXNfH3JTQYPR1zWF8Dw49/xIl6WtICIjuepO+gGC9V2/7wkq2zddyBf83b8aHhYF8+b/h4NMStHMbrLvG/1RcVc+iNwoUlkpV1VukW5BFg1wCGWSPVleDowWa3BRmbLg//t7yyrJU4YL5DXwb9wf4rqnzd1A//yqJZjRYOuqdTJg2qX9Icwpf0xMN/A371LVKU191xwBSKca16W25Sj4ereBhmHy4BY8v2fQ/DxGMiJ5miVKTSH3qsiP57nvpg81G1Q045f1e7vCTxVf0vrAORgGxzNJKzqsoRlPUsTriUrFJakwe3PV0WxSU35l2iuVNx7JTC/wPTtcUhX7YomKP8m4wbxo8mYgrdw86wdXso3fFOAaAFAHfRNwUV9/0Du0O99ykrZLUn/fky4KI5ylNDFnf4KfdbbLWpCInoQBdy4VXIs6b1FuoFzWJYIStG93aC7JLiLvW2gz5mn0moyURWpOd3XqcgviZ8WPhvWx5LXmaT77uNnwo+zqH3XBqrQ6JwrfdLmAZDvvxTPaAN4zvHBzmeWkYWyjA561DRbKKmQHjkMoIfp9zNclNhhQFlvMzGYKsDKgmNrawx95QhWZdR6LzMTcqk1lYNQtr7gleTCsLSz9Vr9hENucrEFjzFdGK+LKSSFMPVjrga/olxJ+5uIO/dlgUgoDVZ6lyQdDJ2hdYiH1qrqcFtZfZxyYZX9AEosp2jDM2Y965bcx+EA5SSSMt+MpBDsaagvCJtwbMPj3x1DMmvrWPCQLYkYfUvKDZVQINA7oUXKlDJ2KqcBQTtk6o0/eJ0G/whDIJTZKwjWLdAAxBqjoi1t2in7IeWc1ugOS/coTUlGgfypw1m81IdcRw89gTpUVikrWVcaEptt7JbRydtsL2QqgbNaU69hWlqwyUb1HSOZ9gyVs6HFUnQjHO36Q2yeJI0zVXzbL9SL ng9uLMnT EiZMyzGuShTht4nx8o+SKL1dGoPUN8YoTVhKDv4u+DeLeBqyfUdlICxxPKyjNZXntqPPSQUhGMlk0ug0dHnd7QojNv48aAoeu8RVoRLOH+WvQJMneQmfC5knhh+jPpUyckRmTd8n1RtQD/arxHoZOO/Hn3GNnj2ctL1K0LDe5RKo+q/PKt0gC1HEDGWnjiZQu8GQMDRz2brzoE4bn2+7+n7u0YzSfQlEuMprfr4ZEy5Jnn1mAro3hgFM1W1fxL8JPHqf3lKQ/S02R4nl30pmp0I1sVCDNg6pnM7MJAwunKmoU45vnj/xw+DmwViV2VmDug7gU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [ add Jane ] Mike Kravetz wrote: > On 06/08/23 14:54, Dan Williams wrote: > > Mike Kravetz wrote: > > > On 06/07/23 10:13, David Hildenbrand wrote: > > [..] > > > I am struggling with how to support existing hugetlb users that are running > > > into issues like memory errors on hugetlb pages today. And, yes that is a > > > source of real customer issues. They are not really happy with the current > > > design that a single error will take out a 1G page, and their VM or > > > application. Moving to THP is not likely as they really want a pre-allocated > > > pool of 1G pages. I just don't have a good answer for them. > > > > Is it the reporting interface, or the fact that the page gets offlined > > too quickly? > > Somewhat both. > > Reporting says the error starts at the beginning of the huge page with > length of huge page size. So, actual error is not really isolated. In > a way, this is 'desired' since hugetlb pages are treated as a single page. On x86 the error reporting is always by cacheline, but it's the memory-failure code that turns that into a SIGBUS with the sigaction info indicating failure relative to the page-size. That interface has been awkward for PMEM as well as Jane can attest. > Once a page is marked with poison, we prevent subsequent faults of the page. That makes sense. > Since a hugetlb page is treated as a single page, the 'good data' can > not be accessed as there is no way to fault in smaller pieces (4K pages) > of the page. Jiaqi Yan actually put together patches to 'read' the good > 4K pages within the hugetlb page [1], but we will not always have a file > handle. That mitigation is also a problem for device-dax that makes hard guarantees that mappings will always be aligned, mainly to keep the driver simple. > > [1] https://lore.kernel.org/linux-mm/20230517160948.811355-1-jiaqiyan@google.com/ > > > I.e. if the 1GB page was unmapped from userspace per usual > > memory-failure, but the application had an opportunity to record what > > got clobbered on a smaller granularity and then ask the kernel to repair > > the page, would that relieve some pain? > > Sounds interesting. > > > Where repair is atomically > > writing a full cacheline of zeroes, > > Excuse my hardware ignorance ... In this case, I assume writing zeroes > will repair the error on the original memory? This would then result > in data loss/zeroed, BUT the memory could be accessed without error. > So, the original 1G page could be used by the application (with data > missing of course). Yes, but it depends. Sometimes poison is a permanent error and no amount of writing to it can correct the error, sometimes it is transient like a high energy particle flipped a bit in the cell, and sometime it is deposited from outside the memory controller like the case when a poisoned dirty cacheline gets written back. The majority of the time, outside catastrophic loss of a whole rank, it's only 64-bytes at a time that has gone bad. > > or copying around the poison to a > > new page and returning the old one to broken down and only have the > > single 4K page with error quarantined. > > I suppose we could do that within the kernel, however user space would > have the ability to do this IF it could access the good 4K pages. That > is essentially what we do with THP pages by splitting and just marking a > single 4K page with poison. That is the functionality proposed by HGM. > > It seems like asking the kernel to 'repair the page' would be a new > hugetlb specific interface. Or, could there be other users? I think there are other users for this. Jane worked on DAX_RECOVERY_WRITE support which is a way for a DIRECT_IO write on a DAX file (guaranteed to be page aligned) to plumb an operation to the pmem driver top repair a location that is not mmap'able due to hardware poison. However that's fsdax specific. It would be nice to be able to have SIGBUS handlers that can ask the kernel to overwrite the cacheline and restore access to the rest of the page. It seems unfortunate to live with throwing away 1GB - 64-bytes of capacity on the first sign of trouble. The nice thing about hugetlb compared to pmem is that you do not need to repair in place, in case the error is permanent. Conceivably the kernel could allocate a new page, perform the copy of the good bits on behalf of the application, and let the page be mapped again. If that copy encounters poison rinse and repeat until it succeeds or the application says, "you know what, I think it's dead, thanks anyway". It's something that has been on the "when there is time pile", but maybe instead of making hugetlb more complicated this effort goes to make memory-failure more capable.