From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B138C46467 for ; Thu, 19 Jan 2023 22:45:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC06F6B007B; Thu, 19 Jan 2023 17:45:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B49AA6B007D; Thu, 19 Jan 2023 17:45:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A38766B0080; Thu, 19 Jan 2023 17:45:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 95AE96B007B for ; Thu, 19 Jan 2023 17:45:50 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 70CA5A01C3 for ; Thu, 19 Jan 2023 22:45:50 +0000 (UTC) X-FDA: 80373032460.01.D59C22D Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) by imf26.hostedemail.com (Postfix) with ESMTP id ABDF9140005 for ; Thu, 19 Jan 2023 22:45:48 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=asVroelQ; spf=pass (imf26.hostedemail.com: domain of jthoughton@google.com designates 209.85.221.50 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674168348; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hcDtohvp2B7qEghqHFvGOxURNU+p0D+oExVevoB6IBY=; b=HKV14lw0iJTAIIrFy3dym8x1b1Ka3sbhNXPJsfNmX9eJhrdig5wqjJFubk8iBnBWtW3BjC FlR2t/OA6Iuc9ysfKvbtZI+GKY6XnK8g8VFwAvESNRuCLNXNoP6eRUQLNyLfn5AUmO5V52 PUuZh9WX/HEwzbmncYRNOYhItVLegZA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=asVroelQ; spf=pass (imf26.hostedemail.com: domain of jthoughton@google.com designates 209.85.221.50 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674168348; a=rsa-sha256; cv=none; b=Zq/XingEJmZRlseaX483Po9iNDbFoU1AoO8q+yV/KvK/H6a9/+0U+tEF1GM2/4QddROpA3 d9yLLWX670d59pJmZca/LvBj4zXTQT/Faka4pA8odwPp/tryAYHAq3yTD5/NBfEZKsBE+D IMxHuyMJRMKI/z2fswizzIjyHemJ5hI= Received: by mail-wr1-f50.google.com with SMTP id q10so3315100wrs.2 for ; Thu, 19 Jan 2023 14:45:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hcDtohvp2B7qEghqHFvGOxURNU+p0D+oExVevoB6IBY=; b=asVroelQi2KhRLvgv1N4Xv3vnf0/gxTLwJDcrOZbehdN0rSMpxWM5p/sFRpmkgqtzJ lBTCz+P1xfFdmbAKMuW/s7azblR9ntIxxZqy9ej1pBz9edazBnibHUnIdj5AJzvxdccb 8p9aHGG2/hYZTmK7V9iqx1jFWtBV5aet9qqeSgFaLRK5UVjKPjaQ+buGHtBLrf+kcj0S IWUZr7gbdDdmpGbgsDYK/s3YaVk3oJrcJNGtvOokmYlOkjbavXV24YEZEyH06ig5LJ7Z Lq9sXZAQk+YkgKSiOBi8EUA6tjNVil1X+aoIf1hC+LbKcbvx+nWMO2O391z3vLpCsnCV ofkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hcDtohvp2B7qEghqHFvGOxURNU+p0D+oExVevoB6IBY=; b=X62AeXcoKN4h+4T+5P+r4To7e6Iwlaus70VVyItsi2ev+Nhwgijo4rKL3zjADmVfcU yRnweRDOtRp6zQhQBT3YN6zqNFXq8K+bTdvrdQjHbbvp0UOrcEwSp5SZE7LIwL5ytPiS W/Onj//iMIphTg9eqClWOpKYjJJ4NnBBbO6BaUf0+/4xnj+xDdN5WN/rR7YPvrLyaKBw f7sWJ2eEz3VdKK9CZa/ZhnBbFLa7lKSmNuRWKwKeP2orPxrzcJeBOmY/wF7N5uHpcwno 6n7HsvSHIwy/LcM4GDyHb8izfegsnuflrcrVE/77876TY+ogpPIp4xgvzowuChcbKKef Z8+g== X-Gm-Message-State: AFqh2krlePv4DjWcEb/93pTbNpiNIC5qP1eBtZYs1yVRcyTBtWTEf7jK RHRqFYuT9c5WRul+I8ninEtIXcfKBOOdRt6k6vT+zw== X-Google-Smtp-Source: AMrXdXviF55YDG4+95WndehLJvnHv2oI5pRxAORlf8g2RpR3UXeO55ug2V9znbohpn9NEMiF8BzUz7nG2ac5Q/oMny4= X-Received: by 2002:a05:6000:818:b0:2bd:df18:28f9 with SMTP id bt24-20020a056000081800b002bddf1828f9mr491219wrb.355.1674168347258; Thu, 19 Jan 2023 14:45:47 -0800 (PST) MIME-Version: 1.0 References: <06423461-c543-56fe-cc63-cabda6871104@redhat.com> <6548b3b3-30c9-8f64-7d28-8a434e0a0b80@redhat.com> In-Reply-To: From: James Houghton Date: Thu, 19 Jan 2023 14:45:10 -0800 Message-ID: Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range To: Peter Xu Cc: Mike Kravetz , David Hildenbrand , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 7wwuk1exgs8gdghdams7ri7s94jtsfxn X-Rspamd-Queue-Id: ABDF9140005 X-HE-Tag: 1674168348-233987 X-HE-Meta: U2FsdGVkX19LtiNEIwsqGBaSUn0eBSlGThgjTYwznLQVQcNZwsgNEU9ZADSnMqvbEkCCanoylKEKg89EBMdEVX2PZVw3wGVEcvbVlD9iQ28Im+q5b5xBNQxXK8kWLy7aAWuWpHN5k47qK/fDXuMfr+IzeF8jJVPHFnE+ku8Gz6hHLljWtgeVLlCMXm4FGpuvbR5HoaxPQ2/taVmxtv2b+L+kStYpAq5ytGgKSwe81d9MW3A/EtX32AJxJcmu7KXLs874mf7uvhio1LOymXTIx5nBu4gbHtwEla1hnfbpsK5zXXsiaVFEH5Zg0ihM8jxZK1BRatPJs97gfoFzag6czxWqLEhFfdF7/R+7Hdz5CB1rTdCVP+Zm4IDCLWytmdh/lRtPQKTHq8dHJnj1rYvD/OROygHKJkaEkdlYsg8bAqScwXLGlH9rqS6EM61MDg/yBUwvv7KmwPprioKleIiz9oly0ggRkTpmwz4Oo9yuoO2uFYpn3EWBXhSwonBOED9+jSI7aXDtssmSYJ9WuOV0hMDqpUx7NJc8XrKexBb3dl7XO0PNmCd3flAd/46A+OpK0T3/EJ9duYrPxvCsO57ou3CbQ4ekrABgqzypLoDkWRaULg1Cqr/G/Ir+HFIvHqUgsue7qY6HOVExBAqxjjjkTR+RdM8MY7uF2A9JRRGMrqK3OgzdPUuac4cXusVeovDukJNHGVLt5hct2gVujybnCDXElVkdd0w21SyTqhLnaEzuGN0m0JNUG6Lr4TTAxSm1j9POiBGC6lh9WOCZrF/xjTfHhZAW/fhnHeKL/6Sj/Rwj+TbCyBKMceJGO97ylkN2FHPemfNVYzBykfPUSL+p5mjt8waUOzRLH5WtAtSWaFZFwYpWNNvJYWbUU/YwEAvQD3dj53+QxWJWg3Awr9nrcb9Zf1Sr9ruijYImQbbr2s3lgDC618ddVB1i1xX9MzxIppNrt7Zhx4EvJxR6rhp +olpwi8h xcWq9GyUaniG9lZyONeTXbg/arfv7r5cl9kiftVbLUFU3fYMe2sLu+U6aEIlwsm7O4aCNiO+qVj3T/9ZQocm16LmDQNOtt8utrs8TXBXrHVULMeDsVhUOo0eaPUqOHrAOIufvTS9xtYXDLOqds0QIJ7RjwEShyY/U14mEqwRtsB/Ui2lGU549KHv3eMhvIR9mNyHdIcj9sTy5wL3bwCqzQnbk25SHHpu0NdXcwC5aUm3Q3/Oa30Auf02B2OsW0Tdr1xDBNeJg7aEE5MX7O3FOYyPqjpCu4AJtpG8S0m5Oq/17Bb0PYFOVVWOYLVQxT0SpxDbQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 19, 2023 at 12:53 PM Peter Xu wrote: > > On Thu, Jan 19, 2023 at 11:42:26AM -0800, James Houghton wrote: > > - We avoid problems related to compound PTEs (the problem being: two > > threads racing to populate a contiguous and non-contiguous PTE that > > take up the same space could lead to user-detectable incorrect > > behavior. This isn't hard to fix; it will be when I send the arm64 > > patches up.) > > Could you elaborate this one a bit more? In hugetlb_mcopy_atomic_pte(), we check that the PTE we're about to overwrite is pte_none() before overwriting it. For contiguous PTEs, this only checks the first PTE in the bunch. If someone came around and populated one of the PTEs that lied in the middle of a potentially contiguous group of PTEs, we could end up overwriting that PTE if we later UFFDIO_CONTINUEd in such a way to create a contiguous PTE. We would expect to get EEXIST here, but in this case the operation would succeed. To fix this, we can just check that ALL the PTEs in the contiguous bunch have the value that we're expecting, not just the first one. hugetlb_no_page() has the same problem, but it's not immediately clear to me how it would result in incorrect behavior. > > > This might seem kind of contrived, but let's say you have a VM with 1T > > of memory, and you find 100 memory errors all in different 1G pages > > over the life of this VM (years, potentially). Having 10% of your > > memory be 4K-mapped is definitely worse than having 10% be 2M-mapped > > (lost performance and increased memory overhead). There might be other > > cases in the future where being able to have intermediate mapping > > sizes could be helpful. > > This is not the norm, or is it? How the possibility of bad pages can > distribute over hosts over years? This can definitely affect how we should > target the intermediate level mappings. I can't really speak for norms generally, but I can try to speak for Google Cloud. Google Cloud hasn't had memory error virtualization for very long (only about a year), but we've seen cases where VMs can pick up several memory errors over a few days/weeks. IMO, 100 errors in separate 1G pages over a few years isn't completely nonsensical, especially if the memory that you're using isn't so reliable or was damaged in shipping (like if it was flown over the poles or something!). Now there is the concern about how an application would handle it. In a VMM's case, we can virtualize the error for the guest. In the guest, it's possible that a good chunk of the errors lie in unused pages and so can be easily marked as poisoned. It's possible that recovery is much more difficult. It's not unreasonable for an application to recover from a lot of memory errors. - James