From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5808AC369D9 for ; Wed, 30 Apr 2025 17:45:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 064406B00BB; Wed, 30 Apr 2025 13:45:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1ED56B00BC; Wed, 30 Apr 2025 13:45:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBE996B00C1; Wed, 30 Apr 2025 13:45:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BF2DE6B00BB for ; Wed, 30 Apr 2025 13:45:29 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D37E31CB09B for ; Wed, 30 Apr 2025 17:45:29 +0000 (UTC) X-FDA: 83391437178.22.66F2469 Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) by imf04.hostedemail.com (Postfix) with ESMTP id 97CF440003 for ; Wed, 30 Apr 2025 17:45:27 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=NvK9dJwh; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf04.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.178 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746035127; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LX5E5+yrclRPYO7bZmzsNdM00rttbtt2kuIHVXURFJI=; b=vkjlR7dQFnxsaDJ0uCgjhxp6Atyq4pWAnxhM0008Gs+sHDDg2HpS7hULjiFjxPzXf82fqX TfkTPuZZYc27MZW1ThUjrSKH0sNNatADk4GzeEXCDPcSVty0AAEDc0wD7/7lppnyOFQ5UP I+pEpA9JJ7JGhQ6i4ob5WH9BssWl//Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746035128; a=rsa-sha256; cv=none; b=GvK2fQOPyb2Y/rAy5RWvWs9QQPtUzcI8fb5Sxd5zqOnBXyjJ4JJaz2tzi+x2MwUUxyzFpM ztU8Tg93JVk/0V3D2E2rz63Q8ngee+eJQYvArUJojSAKtUSxLU2uM7EJBLMMduGpiKRJTs WaOI4ulxtIlJg5GCIIf6EJe9wh3u8ME= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=NvK9dJwh; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf04.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.178 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org Received: by mail-qt1-f178.google.com with SMTP id d75a77b69052e-47662449055so1311951cf.1 for ; Wed, 30 Apr 2025 10:45:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1746035126; x=1746639926; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=LX5E5+yrclRPYO7bZmzsNdM00rttbtt2kuIHVXURFJI=; b=NvK9dJwh/JgSmmQsHcVWgR33RiTeglXGQ1oAzay3Oq6oYl3TJya89VOFYy3CCXp37x /0punxRu3RVtUajzhNfRohd7YvMQdyg99HBeM+NKBDOY9wEK9QMk6Tx1w9l7lVQS5jHR 1rxEjeD4kAs952z/7hn8NUP1kWprJ6fdPhca+81it0HN5stQOQzEOr3jYwzoQg/RsyfA HGLA6Y0d3fXiBBkeBfIamLYc7z+bwiUPpRljcL19wMEDlXmgOWqFczoqGHziHYh5duVD HxRouehx8ifgL5LrjSJ2G/rGi+lQ7XBw2I6N4IrcrZC5QbaQ7VgLyIx8V6sP2Mjqt9zB WTKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746035126; x=1746639926; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LX5E5+yrclRPYO7bZmzsNdM00rttbtt2kuIHVXURFJI=; b=Jgh/BAMSDFo8vaqHA5vXHYIPEld9E+Sbxb6KylxB75nUImkAxzAMza7KivZeBzdj5f dBQJgo4K43h2TnU4WT/5rTh0qgumDxNwNoTs5L/2J4zoqZs2BN9FIpYsCCtSddLl0/OR zEWuWVlFtLAQLZoZPkj/7P/5iHU+7MBODPCFL3aPwF4OYOGRY/WirveBNR176xvKwNBm yob6MHBUcBoZ66zYlbDf66Y+SSc2X/SlZpW5/qL6CUZPPsUiDNro0f4Drr0wQFKO6zNe 1R4b6zQgiysa03oyLp99gfG8ARZhvWVVIoZYgTIiwBNobX+Cnt1NjH5yEe1ef2kbGtJr wOsQ== X-Forwarded-Encrypted: i=1; AJvYcCVG0DOUY6R1LXENULvUm9Mp/g3r8uSWJfNl265zTQApAAMgI4YpIVXD+70OFkGb7lurLawV48npfg==@kvack.org X-Gm-Message-State: AOJu0Yw6SgLGuR+3glln64c5KakVNQ/XSCjUrda6XiXbyKe8HdQxO4kc kTUmlYibw5raFLsimZbKiPwL6LcVlHozEBoNGF/28t5F1cjQFVcL/1116WaOenI= X-Gm-Gg: ASbGncsB9nowGaMgI6sPLniB8O+Q2P4oAZEPqPm2N1YguPJYqehQblPYPoxPD3Pijcr NNdSYZAfA6F7D1M0UcP7XUC9grcvRxmWg+jg4QY69Uwqre9nG1OqWXnwF8Ktf4JWBUG5Hi943oe Ab57IUrbjaqwIuHFlSqjviCrshyXYCbbMXjdcRl0atL60jPx7saVJQlOZpbynPYU+3mlje4K+Hf aJIJhV/GlelZSLd1McLNt+uF0v4ekTf3kTx/8I4h+6+qYWLqabUfPcHimtv3uuQYrCU2qgWuXws HawvxmUPaGRwX1UZu6Dok+5qfjAjquUkHatcBDN/zdFMzUTR2w== X-Google-Smtp-Source: AGHT+IEIZMLWdLqJD/uNwidEmAHFL2Y/jtLrtYAogpz0B6nNFsiF9JKltQEiqtFavjMny+xp28chhA== X-Received: by 2002:a05:622a:5516:b0:476:87dd:16f9 with SMTP id d75a77b69052e-489c3a9aademr70947351cf.18.1746035126307; Wed, 30 Apr 2025 10:45:26 -0700 (PDT) Received: from localhost ([2603:7000:c01:2716:365a:60ff:fe62:ff29]) by smtp.gmail.com with UTF8SMTPSA id d75a77b69052e-47ea169304dsm94893631cf.54.2025.04.30.10.45.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Apr 2025 10:45:25 -0700 (PDT) Date: Wed, 30 Apr 2025 13:45:21 -0400 From: Johannes Weiner To: Yafang Shao Cc: "Liam R. Howlett" , Zi Yan , akpm@linux-foundation.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, David Hildenbrand , Baolin Wang , Lorenzo Stoakes , Nico Pache , Ryan Roberts , Dev Jain , bpf@vger.kernel.org, linux-mm@kvack.org, Michal Hocko Subject: Re: [RFC PATCH 0/4] mm, bpf: BPF based THP adjustment Message-ID: <20250430174521.GC2020@cmpxchg.org> References: <20250429024139.34365-1-laoar.shao@gmail.com> <42ECBC51-E695-4480-A055-36D08FE61C12@nvidia.com> <8F000270-A724-4536-B69E-C22701522B89@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 97CF440003 X-Rspam-User: X-Stat-Signature: 4uobh8i5ie5dm7q3dpspn96q7rnohp99 X-HE-Tag: 1746035127-606327 X-HE-Meta: U2FsdGVkX1/mc5eoW5I5NewkOAp7Q1uF1gE4A1RbLPZy7TQj1Zu3s0PGZmfzjAtRABPHM0pBD60+VY9VhZmvm82H9/SONYQMQw95f8jMv3rHQFAsme7//uxAYWRo2PM7egPWMi+Seyi9losh+E1dXtGqlPa1n1H5sq5Kar+VvGBESTdKipeYTD53qs27MDdExhTpM36pDb0Dz9R7XwT9f337eU/Cm4FavPfpaKBVlt73h6m/+fdNsokiFZD9PrKoyO6DHBZjdMgbiL6zWU/gIyDWUQ5Z0yVj4KyGnMx49E4QsRYje48ULn2mIxMWXonfOenyacraP2WnJx96vlo/IzHS9AC9oG0olcrW+yqjeceScoR0xkViYFQ0y1W+Pc1jIKWsLnqGQBk3uQxMeFEeFZ43qQ80wxKjJem9A6AQl6sCE5BrykG+yBiK/vvb7zR4clNWg/895K3YZ3aaM+/KBfiDtZNqduIsPCAStXWhvqEEVIvem1lp+YP+uv663RBDpdPAi304/cGxlQopD3XPuuR+06QqhI0go4kemvITVHLTTXNmWZSkZGYt5RqQ7MIkR5cIXC6ieXY95CXtuFPpLYVRjACoRD9gcC/VL/f5+/8oWtLFi9IeFlvIGG0Sj0qj3JNZaa1qlCNRU6RFo860eR0mQQ9thaTQFIe4PJfyGSJEZME36+QazPw/9Q3CkDRXFwMqWHK+GsAVcJnUD1Qm0COnqKGsdE7q199f7nN+NRgmXvk06ajw6k4jD37oUXVfYfYtK1Jy8b1W9CIhx0+MBtdMMMPmLU21X/+jYqacQm2ZROva8LxbLc1rFrI22an5yu0ZVtNJLvNFz+tffw+j8c/j+658Cnn6ZBB3Ojm5WCPtnXshAdfd+dc+6lmOodD0gprs3jQs5ZmJMeZOiFotKVO5Fv9HOUtg4ubx8c5ITvES5ykCuNG87AoG5AxDWvr1X+08atlh9smuzGfq/DO no96piRf Td43R4K+n3PZrcDmZZrzuTMWgpEFvKHs7oTt1xdNEIhqJTCJwN/7SNdJkspPUrwX9tCBWMLtcbrI95Aelw/Y53A6hTbd/eL1jJGX3QdvA1yKMUqlcZs9+JP4wRb7zg5jgcTMbnWZliW0gd7oZaDLh78MrllriX9kJxxDnyL6YnlZfPnZAIgAyC/N2+87B5nAHFutTh6jVnLbmyBaai0cOxrAGj+Ou47KSyCfhXNlYnf5mRRc2T2ANJY7qzywcR/S+dWCWRZ+2MDUIKKLI8ruj7qjuJloU29ctfvCJVeIwZcCrKaSyFE2l8MpPS6upwbvWOfsIYceAwmpGPLgmWhVsjV9iMxpTaaEFGvlwJjU1z7fz2CJUSl6w0r0BM5BZ/YWjuhHJZJ3n6sqnaQLTTYEL9wEOeGITgZ/XXDWY7Fd0RqHC6tw3XiCVx6h5IF2e7tUGbGPRgTdJ9ebn7Hq62KURhhG6wXT7aJaYr6oKP5FwJAT2XywIm7dfXApbe10Ws/fc+JCi6VbE3SrQP7MUyoERzPzRok4MctDk5e6iYi1QpqWpOxjoNsNMutp5qQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 01, 2025 at 12:06:31AM +0800, Yafang Shao wrote: > > > > If it isn't, can you state why? > > > > > > > > The main difference is that you are saying it's in a container that you > > > > don't control. Your plan is to violate the control the internal > > > > applications have over THP because you know better. I'm not sure how > > > > people might feel about you messing with workloads, > > > > > > It’s not a mess. They have the option to deploy their services on > > > dedicated servers, but they would need to pay more for that choice. > > > This is a two-way decision. > > > > This implies you want a container-level way of controlling the setting > > and not a system service-level? > > Right. We want to control the THP per container. This does strike me as a reasonable usecase. I think there is consensus that in the long-term we want this stuff to just work and truly be transparent to userspace. In the short-to-medium term, however, there are still quite a few caveats. thp=always can significantly increase the memory footprint of sparse virtual regions. Huge allocations are not as cheap and reliable as we would like them to be, which for real production systems means having to make workload-specifcic choices and tradeoffs. There is ongoing work in these areas, but we do have a bit of a chicken-and-egg problem: on the one hand, huge page adoption is slow due to limitations in how they can be deployed. For example, we can't do thp=always on a DC node that runs arbitary combinations of jobs from a wide array of services. Some might benefit, some might hurt. Yet, it's much easier to improve the kernel based on exactly such production experience and data from real-world usecases. We can't improve the THP shrinker if we can't run THP. So I don't see it as overriding whoever wrote the software running inside the container. They don't know, and they shouldn't have to care about page sizes. It's about letting admins and kernel teams get started on using and experimenting with this stuff, given the very real constraints right now, so we can get the feedback necessary to improve the situation.