From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82598C677C4 for ; Tue, 10 Jun 2025 17:02:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C467B6B0088; Tue, 10 Jun 2025 13:02:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF73A6B0089; Tue, 10 Jun 2025 13:02:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE6056B008A; Tue, 10 Jun 2025 13:02:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 84E9E6B0088 for ; Tue, 10 Jun 2025 13:02:56 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3E5041010E0 for ; Tue, 10 Jun 2025 17:02:56 +0000 (UTC) X-FDA: 83540110752.10.7C745AE Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf22.hostedemail.com (Postfix) with ESMTP id 39072C001A for ; Tue, 10 Jun 2025 17:02:54 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lc9fMeBL; spf=pass (imf22.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749574974; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xZ7WoPO17MArlU44RnHfwxsqfSq5i/WMEMR9bfmcvug=; b=ALqraVCuP2tUIZJ7nAi99SAC+Pdek79wnWRagp/IZ/O1/uL3X7NkKx9uOfppEAhshP9RGV j2lXz7gJzC+9TJ4eYmfWN7iq8aCkEuPn4/MhTGKPJvJo+VYSuZDfSxhVvbeP8QNApDEt29 SoLZMWK3Zcfdt16aeACw3F+/uPx9MIA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Lc9fMeBL; spf=pass (imf22.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749574974; a=rsa-sha256; cv=none; b=Z9pjizkw4bKtPTld+F3cpMuc6EfUAEMDtzszsWFTIJHR76BvdyBERSdJ7Rt/nRrO0lYZv+ 6mTVy83rwJDM2p67o0pcJHXF9CWdybt46d5ydP6OPeypcSqjujX1T8lfFXd6gbhOqQLM2p w6eMG1kpKeeEFW97LbL8XkOzMBnnuKk= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-ad89ee255easo1029572366b.3 for ; Tue, 10 Jun 2025 10:02:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749574972; x=1750179772; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=xZ7WoPO17MArlU44RnHfwxsqfSq5i/WMEMR9bfmcvug=; b=Lc9fMeBL4lIYcCs+cchHS4KsmZIKPC94KQ0x2uTsi2MdcVCq6ISZek+JYEX+B4Etax xHpUo/zJrigtLWC+/mahRPMRdd0wlf6xOTCKjRS/t/pc0rrvf1B28rO989P/bF8DQetg 4ZmAOwD0furYr9Y3usgkpsLhr5BOmJhx5y6ak7o3UYmkd4UTLl53ppQ5rsyPaRmpkAcZ s9ienNzahzEPeZF3uNxMreBcYLmBbiIFusFfTHZRC8v2VvuI6qyd9EFyq3xmhFzcmHN7 qiz8HSZcEzx2YZbA31BZiImZcliIOwAZz9MquxdmHDXijhb7ew3QpHSWzcdxJLzshnGr nfHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749574972; x=1750179772; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xZ7WoPO17MArlU44RnHfwxsqfSq5i/WMEMR9bfmcvug=; b=Oem2b/Pg4TDATxXfitOjOdDhxBOtbqlC2PcUzwSJOtZTaSTP11g/lsgQQNEMMAdbi2 +7k/mcf//WvA0Q6+P/VfKGzffWHFwbZ7HpMsGDNFlGqFsizUkVRQKVYUTCxjCsTwLlpE ezVElHDWWXHbMzyOjBy7HgOwuxNmGaPbyeuCMyIF7+qoqgZ1uJkDrD5KT5RhT1x3LWqy 6c/fhuNhzDsGAl542YFgW1DC+Qv134zM6GwmzK/HvbQSx8v82w5ytXMTIa5tdDVjb5nK f5l8kkJpD0dx86GJYgG1ZvtDjvJgunTNfH/ZBax4gGfGT+xtGrePsWs2pX2h+BE1QEfW F+Og== X-Forwarded-Encrypted: i=1; AJvYcCUQPdkuJlNU2CzSyJunT7sJ08iOzkQ7IpZPJqPWQp3CKDCfBQVlJpy9eUqcHjff2h9tWYt9WZLBgQ==@kvack.org X-Gm-Message-State: AOJu0YzeSYR9Tx8xZ8FqcGZ9ZOx2JTojEwkcuTv72YK8i3r4AXi5sxWk wwVjwPwRPoO4x82JsCQxWcAAi724pUXY6M/L91BjttLRNvUh9sZIpQrD X-Gm-Gg: ASbGncuQGCuDx859543pmf/e/vqaJ4rIopg00qvAyjGyxvc6S3ZfM/8OvlCTTcb0Dja 7Oa9ZbYgjkSYy7IndFilfVC092R4NLDXm17Mya56ritvRrvv/CzKd9HVjG+eUdN8eizAQL4ITPR 3wRUvFUZWDxpOJEexp9AatI81deSEnZRQ/pOybwZr38X1oPvqljqrb34nOsU7b/g/cdtubfrfZm sO45shXtND3dcpiOkp9XNk4Nbk3a2kQyiQYVkZ280PxfDmGe8vOwKRhidsvD8zDSsz49/i31Sds 6hpxpex6rfYki5EwN6u7taUA3e8R35jlnO0iIE4t/IItl/FDw8aRF6UaBgMr0nERr9K0MwRfwcZ YfhXCVVEqabKO5IJW4bWWvQ8TXXSz8RNsviJpTw== X-Google-Smtp-Source: AGHT+IGXatHfg7cfSKlwjSqN0VephWQcR8xEcFOo9noPah4i9dvsdkIaoWCAYpVPNB6w/FfreH5j2A== X-Received: by 2002:a17:907:803:b0:ad9:db54:ba47 with SMTP id a640c23a62f3a-ade898381femr13304766b.43.1749574972049; Tue, 10 Jun 2025 10:02:52 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:c2f:a34:6718:ee1d? ([2620:10d:c092:500::7:b9b7]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ade2ea143a6sm682025566b.53.2025.06.10.10.02.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Jun 2025 10:02:51 -0700 (PDT) Message-ID: <8e0882d6-2c1b-4097-a7da-471c77a759a7@gmail.com> Date: Tue, 10 Jun 2025 18:02:50 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [DISCUSSION] proposed mctl() API To: Matthew Wilcox Cc: Lorenzo Stoakes , David Hildenbrand , Andrew Morton , Shakeel Butt , "Liam R . Howlett" , Vlastimil Babka , Jann Horn , Arnd Bergmann , Christian Brauner , SeongJae Park , Mike Rapoport , Johannes Weiner , Barry Song <21cnbao@gmail.com>, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Pedro Falcato References: <85778a76-7dc8-4ea8-8827-acb45f74ee05@lucifer.local> <2fd7f80c-2b13-4478-900a-d65547586db3@gmail.com> <8c762435-f5d8-4366-84de-308c8280ff3d@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 39072C001A X-Stat-Signature: f134ix9fnpb7qcbwcqoqwth4fm6sexn7 X-Rspam-User: X-HE-Tag: 1749574974-852592 X-HE-Meta: U2FsdGVkX1/zERTGmaRV1QY042WJbHTsJ6/Txg0DRTHlQNUJLKZuqTEpSk4b1ob7U0Yk0UTi4qZVrPpqyxyuLKyNTEByj226XyL4XhqAtG/AjuhOP2OponbCVqUwo1bkXCo6Fdm/IzLMxcrAg2UN1XftAgPjGeofNhXuiTALleD5bQkuSs4WHka7pk5NGHj6LfcROzEA9w2K4hqoT8R/LvLWn2Hh0kjle509TNmCFFSRipAz/zYgDbHKOh/RG/KCmYgOW5Y/oBCqYwoevlOztVs+OrZvDwYVxkSPjV1TLH4qE8/vv0KNwltIgWkOs2IhZ7XCGZo0xYs1U33n6wy6cFM8JGl2A1+CptBh79PjKY4XecYIO+TFgjPjUiE7mjamfHK6La1tSOajuqHa9HGf4uQ27n+v7gA/oG///OK+KSEVhV10APIxtAKBqa7D1yFWcDMzQ6zkoE8gjJl2GddQ/39NXNMIc5IGYAYOziZJ3tXMTvKm7SEchgjG9wlVVejB0F47k3vzVbQrCuYd/AOsn+1ZnK9CFOkj+9OWzse4TbYwkMghNm6IvC1IHno1clG3+qle3NOgH2dYCmTS8SYtA+qfzFhMEZ5ee1UOmiAtawNtYAcHzwmOh5opo/tTMYXLfQAhB/7BnfHY5yDbw9wp+jZ5PsvsdlaRMN+zNbBufuFE33oSUtrhHgNJd7xnPps1gClk2N83L4vw5XCxnZq48t4Mb8gp/ldhegLYNMk6+RcscfcSrbQ/IXLzj+HwoH8aexrQPeRFatqvKiAl1hIdV9I07m08Vf2XJRid+ZTNdYta62t+bwhK+jGQGnlIha6CGitqyxnFO/FuB0V2fV8c/tL97jIk2JP5k9nf4I98y3VSqp8VFtSTGZ3B9316JuxgddZ1+YlVM3SHhMhnTq0YLvF9YTDhF7rBqzs9Gy76T7WQhHcAoIAdqTFBZPbT5TtOVjZV6cH/GaJ796/+Ynu uw73H378 tn2MDuKdomyPVWGv1OTKK41GdvTbFaWYFOj1RbkX65E+QcjpsUlWjzbDSUjel6BgEiXs4aPsG/GrxRjrRiTRS8UkqmrDK5Kh57SLMQ3wn28gwD+SBQ1vFvUfy8hbPj6+EkILl7kjP9fuV5pOfMr0PCwu7gnafKYiLeDETb4KbSWotzrnd3xvj4PD+Ar5IU9kROtxUSYt8rtnr/XKgVu1tttApdcp5824jtV/6A91N1UtKKGnZkXDq/LK1xeswgk/NWf77GxMl73C4cBjq6FRMG9QLtKE8BS25OCAbf1Ur/Z4XgWAJlr93XTH0y3MZux1Sy8gGFzNtkerO/fd+PuvkJW/YQVhAR8TDnuxofZtdagt23ZsTq/6buPqmcWZcjZHmlaW9f4zJdG3bPgtFLcmDU/jiERhI4Tz3KcTP/6JQFrrQizhSQCcDQUACQ+RFrSqiA3V7cRt+FYc4RjIxpGiZqRjcOQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/06/2025 17:26, Matthew Wilcox wrote: > On Tue, Jun 10, 2025 at 05:00:47PM +0100, Usama Arif wrote: >> On 10/06/2025 16:46, Matthew Wilcox wrote: >>> On Tue, Jun 10, 2025 at 04:30:43PM +0100, Usama Arif wrote: >>>> If we have 2 workloads on the same server, For e.g. one is database where THPs >>>> just dont do well, but the other one is AI where THPs do really well. How >>>> will the kernel monitor that the database workload is performing worse >>>> and the AI one isnt? >>> >>> It can monitor the allocation/access patterns and see who's getting >>> the benefit. The two workloads are in competition for memory, and >>> we can tell which pages are hot and which cold. >>> >>> And I don't believe it's a binary anyway. I bet there are some >>> allocations where the database benefits from having THPs (I mean, I know >>> a database which invented the entire hugetlbfs subsystem so it could >>> use PMD entries and avoid one layer of TLB misses!) >>> >> >> Sure, but this is just an example. Workload owners are not going to spend time >> trying to see how each allocation works and if its hot, they put it in hugetlbfs. > > No, they're not. It should be automatic. There are many deficiencies > in the kernel; this is one of them. > >> Ofcourse hugetlbfs has its own drawbacks of reserving pages. > > Drawback or advantage? It's a feature. You're being very strange about > this. First you want to reserve THPs for some workloads only, then when > given a way to do that you complain that ... you have to reserve hugetlb > pages. You can't possibly mean both of these things sincerely. > Let me try and explain my view better: hugetlb requires 2 things, reserving hugepages and passing MAP_HUGETLB at mmap time i.e. not "transparent". (I know the meaning of transparent even in THP is a bit messed up :)) There are some workload owners that will happily test (and have the resources to do so) to see what is the best point to use hugetlb. They can go in their code and change mmap and make the necessary changes to disrupt workload orchestration so that hugetlb is reserved. This is a small minority. An extremely large majority of workload owners will not be willing to do this (and don't have the resources to do so as well). For them, we have THPs to do it "transparently". If you just give a knob to switch THP=always on/off for *just their workload* without affecting others on the same server, they will be happy to try it and other workloads that are running on the same server in controlled cgroups wont care and won't be affected. i.e.: - if the machine policy (/sys/kernel/mm/transparent_hugepage/enabled) is madvise, workloads can opt-in getting THPs by just having this call (the PR_DEFAULT_MADV_HUGEPAGE version) in systemd. - if the machine policy is always, and they dont benefit, they can opt-out of getting THPs by having this call (the PR_DEFAULT_MADV_NOHUGEPAGE) version in systemd *without* disrupting the other workloads that are running on the same server that do. Doing above is very simple. This is how KSM is done as well. It doesnt require doing any changes to mmap, i.e. is "transparent" (after the prctl/mctl call :)) and doesn't require reserving anything for hugetlb before the application starts.