From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3643E69E9A for ; Mon, 2 Dec 2024 22:51:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FA326B0082; Mon, 2 Dec 2024 17:51:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A9F36B0083; Mon, 2 Dec 2024 17:51:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34ADA6B0085; Mon, 2 Dec 2024 17:51:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 15EB66B0082 for ; Mon, 2 Dec 2024 17:51:05 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 95F911A03F9 for ; Mon, 2 Dec 2024 22:51:04 +0000 (UTC) X-FDA: 82851515166.20.39E63FB Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by imf16.hostedemail.com (Postfix) with ESMTP id 19B4518000D for ; Mon, 2 Dec 2024 22:50:49 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ftYhILUd; spf=pass (imf16.hostedemail.com: domain of fvdl@google.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733179854; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xD++Pu2Owlwdz/HJYnfDKrfCE8Tqkbg/Wk+AX/Qx0C8=; b=AfoiMziN4wn00TDeott1pVoy9Hqbl3a+MnOmlm5uX1ldMszF230nx4FSFkbUOoi/hZiW6l L76/z7QXRG34e170tZHkgfmD+RvRylNddnCeXC5Xg5a96cBIkRiMqT3ePm8G57DnIoWBwC egNIfTQbofqMuQokkt/xPn3d8UqV7NM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ftYhILUd; spf=pass (imf16.hostedemail.com: domain of fvdl@google.com designates 209.85.167.52 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733179854; a=rsa-sha256; cv=none; b=T2EKvmMwYmDGrBthBkdS+El5lwrHYFlnRTdDk5+qOHvXTSe0YhDXgHYxvb2oxo7LvsbaFY +3q2NztYzmKAcg0OC8NpalS5d8iv836KD7MRbdhYUcYHpbQ9ydjLCIq10MNEettIAyTBvc 4ggksuCxUh1jgT+JIhpNZdbv+a/rFdE= Received: by mail-lf1-f52.google.com with SMTP id 2adb3069b0e04-53de18437aeso1677e87.1 for ; Mon, 02 Dec 2024 14:51:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733179861; x=1733784661; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=xD++Pu2Owlwdz/HJYnfDKrfCE8Tqkbg/Wk+AX/Qx0C8=; b=ftYhILUd/U1XIYSLdLoVoN8cwXLeAntH4JQF+f0Jm2emKaSO4VGCvGsFJtrnivH4sO B9SLlVUraUxA/N/LqMZkNgiPox00RA8v/D8kXcYuFHT4XArdTkLCJv5J0inOPhD9Dfra DKs3EXfGjPJyLkaViq5E7m+KryfXbIMsQ0OhWe6se7SwZWkU+okO0Ml53FKbtq3NhgNQ Jbz7XQUzfQG8z+DGpqCpfSR05gVVb9bUdu+/46sSGig4LEqOQWVZ05o9b6Woo5vlMySd lYhairhymvxzVt9FMZTMFqC39aJlRJsRScDcY9lrElyUqVstt8YUx2BVEIYr2tkwq+Ym duqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733179861; x=1733784661; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xD++Pu2Owlwdz/HJYnfDKrfCE8Tqkbg/Wk+AX/Qx0C8=; b=YvqzjwYWREHHeQjAPDVcPSJlH+lTHsotlgMsIFgQDGVoYcSeCQGlPXqyTQbYb7FlQE FvAUIyiwP/vcyOdW9zobxSyDgi8gx+oAS3CDcTBe9T4f/ZyCcMmHKi3u84WWb5ozniLu /mUCPfaEw/CmFpiaMevClQoSGSUvxpTdAvReso4Nj/fRSXtHBAfP/YGdzReaBearELv3 BCAmUcRBY1dq+JEvz9cgdrtSzM9v+qVPaZ1TvzYwTtXcnGn2Nkr4SWn+PnDxLhsPa0q3 yvKXYRcER4eDJZFOrtXnsB8LX32mwZaEmpTZnUPiyJcjZHr7OEA96Kdxm343vcedbXb+ HThQ== X-Gm-Message-State: AOJu0Yw/1yikV9cG7eM6HwMkcBPLtaOG32Udm1VxfZsaX/7nOn4mppS+ OOItQNZVY+Wyh2TIwN0vs85qPZvIOkbM4m/Bbe9rTi0RZeh0fEL/eOhUKceTCO6+vH75vQ1j2gM DQSai6B1tasTBGf4Hr7ORlzpEQedATyAXsmTY X-Gm-Gg: ASbGncsbBHsLecCc4HVu3mT5zF8itqIs/LrcKbRgdMM5AJ3mFw2l4tnmQs/tam2RH4F yfeVyKXMRmvkJwqSABPd0L/e/COsL X-Google-Smtp-Source: AGHT+IHtcvBhHAx9qITe3ZXRX+vuVeVnkREr9poKckp6W8ETlL79sRXQdbH7chQZER3ONqKDyN8Ihk+RnC1IMlKztN4= X-Received: by 2002:a05:6512:2c97:b0:53d:b15d:1348 with SMTP id 2adb3069b0e04-53e12938ccfmr41765e87.4.1733179860582; Mon, 02 Dec 2024 14:51:00 -0800 (PST) MIME-Version: 1.0 References: <20241202202058.3249628-1-fvdl@google.com> <3tqmyo3qqaykszxmrmkaa3fo5hndc4ok6xrxozjvlmq5qjv4cs@2geqqedyfzcf> In-Reply-To: <3tqmyo3qqaykszxmrmkaa3fo5hndc4ok6xrxozjvlmq5qjv4cs@2geqqedyfzcf> From: Frank van der Linden Date: Mon, 2 Dec 2024 14:50:49 -0800 Message-ID: Subject: Re: [PATCH] mm/hugetlb: optionally pre-zero hugetlb pages To: Mateusz Guzik Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Muchun Song , Miaohe Lin , Oscar Salvador , David Hildenbrand , Peter Xu , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Stat-Signature: 56ubsnobdaq371psc1tdqrityzgb5izq X-Rspamd-Queue-Id: 19B4518000D X-Rspam-User: X-HE-Tag: 1733179849-14797 X-HE-Meta: U2FsdGVkX18ypR+6haZSZlDbpS4q/Dq7HZtus3d5L0FLfVXOQQRGk8jUXj/3VYR349vb3B4FkJ58ZoOts06Mmi1B9wK2WvSs3x7Aog7bRXOY7nil8H77ACpOC8ClG6fRH1yIJxXNmVb9r8Z04xwM/VAkdrH8/qhzxPe1Bw6cmxqumVsqNu80SOC0T+ScvXSmS9dsWfrpEC7ejLwNqmWtkSk3jY/8GsBDUoRb8gzzFsnLg92N27miT9lQ97qEi0xK5BoCWysXI0VRgginY7JVQBgIPMQK/kaoxwZ4BTnsdUxBA+5JJKoCYQJ2inwt55UEhTecwayE3dLui3CCGaY4cflEHNPY+FMG4kdAXdFFWpghkPZDaX6sJJRIMebbcfSnmCL8A67QuNgCa1OGVt7r0b4AsSpxguJ+Lkqi8ghMn2U7h2ZozDurRKS4QkwRsroHTrExFyyraggFT51V1+uEk3TlWowS/b24XQ/0KyGQ3xViTCxwJ8Gk4ubQyhCmAWXirAGqqVg1dyRsZzr/u9qLGt5hxJMVjzDfs//IsWIduTGGrxEWjzR5FXxHuNAPTCGB08Wqb5FM2DATZY2ATNIawIpoHfq+gJg3oU+ktYMAxoqSj6/+GCXSB0gjlo5pPQmqETJdgovWcIO+vTuQrMMtwogH6OQbXA0B7x+iSzZ5LcgPhRGNJHKYX3oSDMpqz4cwR2lRPb/aW7oWYUsVRLH+39zLMngpiTNm2bLUItfRD8dUbhAidBi1eSQnpKMPVb0lYcojHWR+dajlfSK8O3G0tWLk+FiobdzMl9gs3nT8J4eCbvgLWIC6XpJJ051PzNbwAdVT7ATOecEk6CHykuPDTCN8DY7utlnnqD5Pt56oK/FX5Fb3OO+QMpllI/6S12W3yiM6yTpsuM7E9ELcBYMHQAel6yoLM8ruexyJUpE7O1NKiChUiNci8VdN0T8apOzx+MMWeaFFYrxgcY1SfFa GftVF/BV ibK5R8P3czKEDqosc3vS3JwpFmdyqt6n8m2ad2GKJoSioPuBBJcgz6K6SaCxen+GYBvZKRasEIEy3b4NKt0n6MlDEswFy7SSho0s908KHoNLVyNai5ofdPlPXWbxnvEpjPXYSz2OTW81kFatfXdSyBJPoHbKXXYKDL2nDnAi67Ksz13423H8f3C1Jk20nLSq3oQFuy+ysChLwqfBMy0tayDkGhqbNjERTUvg+oXcjfax6DZLYTODEeAsdYzJigkW3Fs4/wGXJxzuW4O3RI5tN9/r3t/Oj5KhCHmpE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000685, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 2, 2024 at 1:58=E2=80=AFPM Mateusz Guzik wr= ote: > > On Mon, Dec 02, 2024 at 08:20:58PM +0000, Frank van der Linden wrote: > > Fresh hugetlb pages are zeroed out when they are faulted in, > > just like with all other page types. This can take up a good > > amount of time for larger page sizes (e.g. around 40 > > milliseconds for a 1G page on a recent AMD-based system). > > > > This normally isn't a problem, since hugetlb pages are typically > > mapped by the application for a long time, and the initial > > delay when touching them isn't much of an issue. > > > > However, there are some use cases where a large number of hugetlb > > pages are touched when an application (such as a VM backed by these > > pages) starts. For 256 1G pages and 40ms per page, this would take > > 10 seconds, a noticeable delay. > > The current huge page zeroing code is not that great to begin with. > > There was a patchset posted some time ago to remedy at least some of it: > https://lore.kernel.org/all/20230830184958.2333078-1-ankur.a.arora@oracle= .com/ > > but it apparently fell through the cracks. Hi Mateusz, thanks for your reply. I am aware of that patch set, yes. The discussion around it evolved in to one about kernel preemption and the evilness of cond_resched(). You can certainly improve the time it takes to zero out a 1G page by optimizing the code that does it. See also, for example, https://lore.kernel.org/all/20180725023728.44630-1-cannonmatthews@google.co= m/ However, while, say, a 50% improvement in zero out time, at the max, is nice, this still leaves the faulting process spending considerable time doing it. Like you say, that's cost that needs to be paid - but it would be good to avoid paying it inline. This patch avoids doing that altogether, leading to a basically 100% improvement under reasonably good circumstances. > > Any games with "background zeroing" are notoriously crappy and I would > argue one should exhaust other avenues before going there -- at the end > of the day the cost of zeroing will have to get paid. I understand that the concept of background prezeroing has been, and will be, met with some resistance. But, do you have any specific concerns with the patch I posted? It's pretty well isolated from the rest of the code, and optional. > > To that end I would suggest picking up the patchset and experimenting > with more variants of the zeroing code (for example for 1G it may be it > is faster to employ SIMD usage in the routine). See above - happy to pick up older patch(es) as a separate effort, but they won't fully solve the issue for the scenario I'm describing. > > If this is really such a problem I wonder if this could start as a > series of 2MB pages instead faulted as needed, eventually promoted to > 1G after passing some threshold? This idea sounds similar to HGM (high granularity mapping), an idea which was originally posted for the purpose of live migration of VMs (but never made it in). It's not trivial, and seems like overkill. Again, my patch is non-invasive and optional, so I think it's better in that regard. - Frank