From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E459DC7EE2F for ; Tue, 6 Jun 2023 22:40:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DC308E0003; Tue, 6 Jun 2023 18:40:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6648C8E0001; Tue, 6 Jun 2023 18:40:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4DE588E0003; Tue, 6 Jun 2023 18:40:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 3B2028E0001 for ; Tue, 6 Jun 2023 18:40:15 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 066961A086F for ; Tue, 6 Jun 2023 22:40:15 +0000 (UTC) X-FDA: 80873792790.07.83A0685 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf21.hostedemail.com (Postfix) with ESMTP id 328901C0017 for ; Tue, 6 Jun 2023 22:40:12 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=JaPpJrSo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of rientjes@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=rientjes@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686091213; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N96/rRyBooUwZr5Ry6eCzA3CIq5kevjWqF/Xv7kildc=; b=LYL25yt8p9wGEeU+EhT2ZMCYrGEKisr9Igm441bUUAhNgmifh/gHJNxCiDQ9grtu5QvBOK MSMtNqddUhZdpfHGrz2+Wqljlqli4+/QrO33/0mxqPhXBxa+18Wb77+RM+HEaUb0rtDZ8N XBaNstnt9mUJ+sy8u3HuTQLsarewy0A= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=JaPpJrSo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of rientjes@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=rientjes@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686091213; a=rsa-sha256; cv=none; b=u9FpOF9uQXpHcdtbxjJt9dWATfWKSpr+08j40fFKFDozsLdq7N9Cs07thBqqV1Or4C8pLs +Vbh1PSUBBpudoYWfbLMar463V9XJlbC2xELDyk78NZfpwk5dl3GzsF4qrxmSuGHVFnYak do6jwZx//LgfOFNiFYwAdEsQe9Yck1Q= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1b025aaeddbso34635ad.1 for ; Tue, 06 Jun 2023 15:40:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686091212; x=1688683212; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=N96/rRyBooUwZr5Ry6eCzA3CIq5kevjWqF/Xv7kildc=; b=JaPpJrSoxB+PyCzzIR1+t2rGXGx6qYh86gWMQUtin25NCzgYgj5Ffvhz/9e1JiEosZ wzYLd0JrHGLgAI4DOw9d3Weqi9sCnpjl5vfS16jDMB8/zDO2hqDcRYopJ0UYeiLVbYRR csc4Xdv4daFX2vtFEjnlbp4pLafbyLMCHhRV5yK3TKOW1+VYP3YdY3Hs3CB4N9/LfKnK GimGY7iKEMcBy6oArSsQNrYOTXarBktt8gNum8oaIwYGLfSO9/FJxo9ZgvKwG4IFdAPq WLqSZZlHCZPL0DkPUzoOVsecbGBJd3RTMXjChyxr4oJBOkB4sd/JraJYKvPbytLn/Z90 SXng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686091212; x=1688683212; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N96/rRyBooUwZr5Ry6eCzA3CIq5kevjWqF/Xv7kildc=; b=OSWJX6yNf0DVCay7KBCu0njDVSb9oR02gpSp/fbBf/T4P9GEqNeY0KGVfZ8FP92mTH RI+KuaHAF1h1PEzSVnSQfusHZeoHqqLL3AdgFFKWVhbbH5JYriRU6o09ijD7cFMB4fml oib8hqpvQPnMoFFvrj7IguKL3KVNX8DXJtpWV+BmZkmUonn3J9FWg91ih65G5UZAPwpg EbkkvQ6BDfJf7WwkBIZYgsnN/jrweZR7rFanIjnkoo497cGcbY1SzZEfxECJuGB9Dpen X31N8NOvLR4PG7F69cvMD8aiD20i84DeKvhpOCGvyART5TKro7yJxsWLD+BT3rFPVZiX uBIg== X-Gm-Message-State: AC+VfDw1nHmsWIgmqorX+zaS2N3Er9GTb4wEJT+c3efYj9oLDIiwlTH9 NE/E7QfoxsKpTySW2oQnI0HTlw== X-Google-Smtp-Source: ACHHUZ6xyYeoLuD4eN1nPiLvvWDXzHSbVvWg/VP1TSH/GX2NWRYLD6OHiSXsNDas9DqhFFoIbqJdiQ== X-Received: by 2002:a17:903:1d0:b0:1b1:b2a9:f256 with SMTP id e16-20020a17090301d000b001b1b2a9f256mr21955plh.1.1686091211732; Tue, 06 Jun 2023 15:40:11 -0700 (PDT) Received: from [2620:0:1008:11:5406:eb65:fce0:afc2] ([2620:0:1008:11:5406:eb65:fce0:afc2]) by smtp.gmail.com with ESMTPSA id d9-20020a170902cec900b001b06f7f5333sm9014169plg.1.2023.06.06.15.40.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 15:40:11 -0700 (PDT) Date: Tue, 6 Jun 2023 15:40:10 -0700 (PDT) From: David Rientjes To: Mike Kravetz cc: James Houghton , Naoya Horiguchi , Miaohe Lin , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Peter Xu , Michal Hocko , Matthew Wilcox , David Hildenbrand , Axel Rasmussen , Jiaqi Yan Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs In-Reply-To: <20230602172723.GA3941@monkey> Message-ID: <7e0ce268-f374-8e83-2b32-7c53f025fec5@google.com> References: <20230306191944.GA15773@monkey> <20230602172723.GA3941@monkey> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 328901C0017 X-Stat-Signature: 81hwi9o4a5s9gobe3wjtq1czr3363het X-HE-Tag: 1686091212-468684 X-HE-Meta: U2FsdGVkX1+iE5h/QaSVj4/z8G2AJqJszgPHrRbn5EZG4YHotpnkTMox3bUdlIB3LKdjUvIvIyaKQjNh1T/u1TPpaWtMmwb2wVH4vszosq+zJGBGrw61LEYREXMU5HLPHT9V5zx9N6TLll1g8pAcbd63hLBQqcCTlab0Xluvv/LPZJz9s2/3Y9FxL7n7jRgR2Gb/Puqywbz6F3xzWfwDxhntIRz3mb/G4mZSHhATqYNxr4qNQgUU3P4jMUCJATX2bK4/VLOa3UNHeJXgSbfU2+0YAKr7CI0GYr6N5XgctWUydfOaWfb1Ppb+VjaHbx5XNVaVZ4eoIrBIj9OV+b9OQzYHfvVBsAyxxe607ldZMfnxSIA/87zEXicTZF3W4lFq0u/+yv3WmYeAk98HcqzOM/2tgBNpDpMmDqOuwgpA+6eCVJtAZzzsKhVyHwTnUBn/1ZBwF/SJtK3CqoASWtw7E1zePKtM7jPLQXGPV7Is5WIKYE/DTB6nAIPcVkRoEweEQWRVF8nzzexwPkTfbYbYhjlhZzUDOJ3oW6oua6ma7UbvBp8I5ACv109hkDmdrKd5liPdXVfNkUkqbQtL3jBu623EmRm+7RcAC4wPuJNY8+hXnN1sXHdPvFCmwg8VR4zn/S+f+qr58HSzKxTK6eKfDRYh16uyFtoZHxRLuknbkQprpW3DrTyMwtWMaI+KW15EwUFucxUpzbmEiNVtLUBu58cUeWA2Ai56SJLyfc19/elrCwMSAgtOTMZQzbzPSZVbc7DXfgEd+UP04ydfGGP4YIRupMkTTe8Tbg5IlWxfULdHaEMZR+q2P1+Jwi02BV6HV7FxK3Cwya3dOlMs9FrE5+gXXC25seHwy67wyy21ByiuALNbRb7I7W6lwYaToxsEBbJ/J7b+nUjFMIT/P5R8V5vxUyC2uy9y0ALedqDV0ygwzd/gI3wQJ6FGtOIPb1oifK3NStfEXB623VArrz5 ufBw1SkY zcBhdP3XjgVZ30AjTwD983XKxa3pvNYAm7AbTf9h5AcMq+jQr8dBA2PGOeiUygUCE57LTMV3h+Ri9iWy9U1iDWoVVBBfr6AQquHhlxM7l815pfL+PZotsTUBsBmyVImh73R013Sa38OAjsyqVLnuz9Mkf+HD87WVlCHKKyDJu0IenMEFfYRJys4eCaujskC3nPjLs4jz7+PoapO8Yd4VZicxgS+/j7ZCsEgL0GuusnQ1W5mqA1um5YUuZcYWXHmr8R98NjFUQ1P0Wv/QD7Ole8CZ8XaJWUYe3wo/54TbPWowikY++Msv4TdC5HbDbftImlKFzz60JXDlszN3f11Hd5EBD2b6t6CdksOaRRpfYcGUPjDhu9DqK40I2Fw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2 Jun 2023, Mike Kravetz wrote: > The benefit of HGM in the case of memory errors is fairly obvious. As > mentioned above, when a memory error is encountered on a hugetlb page, > that entire hugetlb page becomes inaccessible to the application. Losing, > 1G or even 2M of data is often catastrophic for an application. There > is often no way to recover. It just makes sense that recovering from > the loss of 4K of data would generally be easier and more likely to be > possible. Today, when Oracle DB encounters a hard memory error on a > hugetlb page it will shutdown. Plans are currently in place repair and > recover from such errors if possible. Isolating the area of data loss > to a single 4K page significantly increases the likelihood of repair and > recovery. > > Today, when a memory error is encountered on a hugetlb page an > application is 'notified' of the error by a SIGBUS, as well as the > virtual address of the hugetlb page and it's size. This makes sense as > hugetlb pages are accessed by a single page table entry, so you get all > or nothing. As mentioned by James above, this is catastrophic for VMs > as the hypervisor has just been told that 2M or 1G is now inaccessible. > With HGM, we can isolate such errors to 4K. > > Backing VMs with hugetlb pages is a real use case today. We are seeing > memory errors on such hugetlb pages with the result being VM failures. > One of the advantages of backing VMs with THPs is that they are split in > the case of memory errors. HGM would allow similar functionality. Thanks for this context, Mike, it's very useful. I think everybody is aligned on the desire to map memory at smaller granularities for multiple use cases and it's fairly clear that these use cases are critically important to multiple stakeholders. I think the open question is whether this functionality is supported in hugetlbfs (like with HGM) or that there is a hard requirement that we must use THP for this support. I don't think that hugetlbfs is feature frozen, but if there's a strong bias toward not merging additional complexity into the subsystem that would useful to know. I personally think the critical use cases described above justify the added complexity of HGM to hugetlb and we wouldn't be blocked by the long standing (15+ years) desire to mesh hugetlb into the core MM subsystem before we can stop the pain associated with memory poisoning and live migration. Are there strong objections to extending hugetlb for this support?