From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42A3EC54FB3 for ; Thu, 29 May 2025 18:32:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CE37B6B0082; Thu, 29 May 2025 14:32:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C93F76B0083; Thu, 29 May 2025 14:32:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAB326B0085; Thu, 29 May 2025 14:32:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 981096B0082 for ; Thu, 29 May 2025 14:32:18 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 26C478061F for ; Thu, 29 May 2025 18:32:18 +0000 (UTC) X-FDA: 83496790356.20.E0B9CC3 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf27.hostedemail.com (Postfix) with ESMTP id 255CA40003 for ; Thu, 29 May 2025 18:32:15 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YNxrAGNE; spf=pass (imf27.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748543536; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7LGWBvQI1vDUtAHOCvyOaAxPV1uBhtbnvWvq9BppvFw=; b=No/ksGFPuJkhao8LaNoeykTKpqzs3ODfjVLw1+4XGiyA3MnOdygIUAxCIItqI9bs6XKbGl lvQqG8b6yh4LoJ4NF1gCH9bxkZheEjJI5J0I+6Ltigl6eQm7BdJyRdzmxMY18ZhxvktWPb 4HRmpVJJgTS982+i2nSbmSAUvkEVYwA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=YNxrAGNE; spf=pass (imf27.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748543536; a=rsa-sha256; cv=none; b=MdrATMEk+kQ8GQMZIDtMXyP50hNRgT3B3nrxyWAsTN/fddH3a7YmkkIL/linRGeKlXs1YI /y8nSNV5CrLzSSvvxZ60l041ag51GIKrq7Z4cw/Evjmo5tujscksO8zphmLjbHG+Pog0Xy iVTx1RuiXw4EZrPLSCH6rPCW/jq8BGU= Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-ad8826c05f2so230571066b.3 for ; Thu, 29 May 2025 11:32:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748543534; x=1749148334; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=7LGWBvQI1vDUtAHOCvyOaAxPV1uBhtbnvWvq9BppvFw=; b=YNxrAGNE1n2PQOnyijskaiVy/gQJYG/nxZH81x5ZioxI3zNDgE+Fjtms5TDTcIvdJe BK9V944BpxY6iWpzbNwALSY5CMqfpL4mLGjPWJfIKKcZocD5EkmfuKAPW6bHNP6voxbD BYLUoqIlm+NQ0ZhmjOCflE7+efw+I0WrTVYYOvJDtJzikViziprXkBD42sZma85Kw+Kh ecWwRGxdBT2ttYuRV9PplVitYtCAw3MJQrbq5hWM3+VR59jUCHtcv/gl9ILOatO6Yqaw uttPeBlfVloccN/A3c4XjK0F4G0g3T/EdwHCsw9daiO6HXteoPMa/ll9Q1diyyO2J3JD f/jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748543534; x=1749148334; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7LGWBvQI1vDUtAHOCvyOaAxPV1uBhtbnvWvq9BppvFw=; b=X2fRnF6Whj33GojklT6bTMf4dj614AlRMzdbaooNtcPAaRlsIeAP0/AL77YZkgLw42 nHtl5mJZfKcjJKWJ90oYuScUioRGJANKr/YiFbdZKFsqKv94plrhHtz6QmB81uKlDrl6 aqU6gOSOQNJ0LEGklnF2UXdkQ/ZAIARgUjaFT0HK3jLXZ+HPNyjZsrdi9k/jHtlVwBTO QcbzpTdhqpNbxqyd/43XpS40v5fZ/Ts/ZdEiJbS461k4L33WyqnZm0vhCKYd8c5pQRAG ie7mkM6TOEpmWaqOqtS71fIVx9Ml0sIPnIZX4bzhxMAKbgDIeUpugzjdEVwUeV8OvdJB FusQ== X-Forwarded-Encrypted: i=1; AJvYcCUsM5W69aeNay2/EB6NVC71SjQGAiS6mcIrW/l0kTyUNxy35KlLADLzGg0uVTJqD+KKluMMJKPbbQ==@kvack.org X-Gm-Message-State: AOJu0YxrkJuM9Cw6NLNjV6WZ4xjt9qUOTFhsgaINY5gwhQV1p0WOy5WT 8dpxf89yE36ovwgP1w2j8VUVdTsQg1IXOVTjKnETV2lIGIV36isepByc X-Gm-Gg: ASbGncuuyZwnQpU7TLYKXYI/FbuENcUhoDZDWmyothXgoKkUdimRTkYxechIPog+uBZ b2FrRIYhwo71Lj5C861/Fv93OE4mLyNP/h3FlP4esuTSUy/3hH1QgrIRN01Rhi+PitGmt9LyQj6 8cLAYp1h/86rwXGMxKrbTEK4qBdBA+tNvyIoCDdAVrETVu+1Nb0OYy5PjR5rtA3quSFI91x1fWb s8c7Rrc8/1RhwfoLj8N+atLFy64jupEWPy5QYJVFFN8XCZoDfDeEuCY8tdufqavuTOnaWqNQkbB VI/C/Sb9ugzFr0SFSXE3SjuMzLv36jxP8BC2HgYGiFaRA324yyWUC+Ho8po+z0rSF81UJWjuKH0 w66PbR7+Ey5eMWJcnNNMjn/Uw X-Google-Smtp-Source: AGHT+IH3jMaZbwa3JBJe8orI1Uw0kXh5RxmFXNau0vVoGSKJtJHkYT5j19OFg78Tkin/EtohP0YEWw== X-Received: by 2002:a17:907:3f29:b0:ad2:40a1:7894 with SMTP id a640c23a62f3a-adb32582f61mr52689966b.41.1748543534216; Thu, 29 May 2025 11:32:14 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:18cd:67ac:6946:5beb? ([2620:10d:c092:500::6:9f6d]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-60567169d14sm329705a12.74.2025.05.29.11.32.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 29 May 2025 11:32:13 -0700 (PDT) Message-ID: <162c14e6-0b16-4698-bd76-735037ea0d73@gmail.com> Date: Thu, 29 May 2025 19:32:10 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [DISCUSSION] proposed mctl() API To: Matthew Wilcox , Shakeel Butt Cc: Lorenzo Stoakes , Andrew Morton , "Liam R . Howlett" , David Hildenbrand , Vlastimil Babka , Jann Horn , Arnd Bergmann , Christian Brauner , SeongJae Park , Mike Rapoport , Johannes Weiner , Barry Song <21cnbao@gmail.com>, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Pedro Falcato References: <85778a76-7dc8-4ea8-8827-acb45f74ee05@lucifer.local> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 255CA40003 X-Stat-Signature: csk9f91maot464x3hfscefogn6n39t7m X-Rspam-User: X-HE-Tag: 1748543535-389151 X-HE-Meta: U2FsdGVkX19CPOuFJrf/ZZQP/W4YkqKD6x1YJGotyBt4z9f732USqEheh297zJnzwEDJuGFj1sSlBJzmNEUNsR+PTspdt7wFyg9mr7I5azUQ5zZVMZMeij1hjJapR5ErnGdB25ao1NA+JlzkOwK6/gDyLKK1f0cGdV9gNMmxRml74qiANH/B6KCiDdqL3aL+MnkuiBxWJ1YH7ksBlYKK8qWWXkATVTsvw50ihI0MZnw0FnsAHB5SgF0A2Pm5BP0ds8o+ARCMhIlsHNZ972uVIxD8VDc9+f4jnwoQMnGxsVGQ4mxNgPzdeHF0BgQ8tlc9dMOJCxmKmDBT8Zcip+fn4KfF6qk39VNNTEfb3iGXBvoS+mHTvr/ltnskYeq1dqhfkVUI9Cay2UsGoGGvO+u+rzimxU/tQtA32DBRgc72WOrlRF3nxNVquHNFms94I8w7Gv5rbOem91ttCAkttXshqd+A3zDU1dXjWcyypohpDXgOKPeTqo+O7k1+w7axPGHILbU+e7XDljyWpcc6ex2sgsMxfR1vZ2O5Q39nZBcko/XuUiX+sMNDb4CfBb8cABs3+Vb5eBcJI1AgwXmalKMRDaMhSg0Ztmv+P2bR0NiGqsa/Uc5pCcBsdhFGAEx06P36jeS/31BiNRHaEMrXWVVpguwe1V7d807PJ0nRwbSAi0gHTOUBDjc/STDU0VgsKFVxwiKLuXjsQMiNAB1cnaRz94DKLi91mZx9mb8JV1V6dA1ZBPdxXVNaq9hN6V8mNLjmdWLsxe39LZo2odantnY73x7T0tqced3cMomnTDYWreM8t4om+HNtkj0tp/GqIC01QNW5YghPbDvj3ibLz9tpWrYhH9BT7uXi66qQVj73ED19gkv9vCsyrGtUC/B6yo85BqULAj5l8EmWxYp35dVaWfVWEoWbkcXlKQtIgJ8VYVu3B95Oi1/CywCSi0fUv52CRo1mS3CZfFvqyD528Co xccnLhwY n2OZNvygiC0gtDF0Kwgf6Ztb+veVZOUYf83Che/EjaeUfU/M0Zb5srpWo3JAIGMwD9E3//UXbuwZcFAShLGIG/juzBaQubNIlIn8Nn6gC9pXMsd/Pb9aKwPNyQgr9sVOELu6bB+aSoiKiAg15TgHpXimgMXnhp0DAEmQK0xlmca6k6I2yE+cKUZ2Xg7RNlGUsUV0B+V/hP+vMF/BZbZaHnv9sib1kWHzX90C9YD180ovxWrkqCdjgeST1s6CusDPSxm/Dj+ixAFPGIxAjMc55YkuYJI0KFJzq+oK0DDA1iBYRTp9DxSQXd4AUPJuOEFAM+mvxj5JUuCtVRbv2D0uXX0Tr/NaGo4+xQM/9MqIVfTFke5KKO7SsTv19ZQpQU2ff/7DiKjsJtTXOXyEFPWSo3R/sGeKZpkc/hi2YOMVyKCYFmiM/wYkGNGaTt6WexRq/1Q51NAlL8hFr8xs8eaVpNx7Afg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 29/05/2025 19:13, Matthew Wilcox wrote: > On Thu, May 29, 2025 at 10:54:34AM -0700, Shakeel Butt wrote: >> On Thu, May 29, 2025 at 04:28:46PM +0100, Matthew Wilcox wrote: >>> People should put more effort into allocating THPs automatically and >>> monitoring where they're helping performance and where they're hurting >>> performance, >> >> Can you please expand on people putting more effort? Is it about >> workloads or maybe malloc implementations (tcmalloc, jemalloc) being >> more intelligent in managing their allocations/frees to keep more used >> memory in hugepage aligned regions? And conveying to kernel which >> regions they prefer hugepage backed and which they do not? Or something >> else? > > We need infrastructure inside the kernel to monitor whether a task is > making effective use of the THPs that it has, and if it's not then move > those THPs over to where they will be better used. > I think this is the really difficult part. If we have 2 workloads on the same server, For e.g. one is database where THPs just dont do well, but the other one is AI where THPs do really well. How will the kernel monitor that the database workload is performing worse and the AI one isnt? I added THP shrinker to hopefully try and do this automatically, and it does really help. But unfortunately it is not a complete solution. There are severely memory bound workloads where even a tiny increase in memory will lead to an OOM. And if you colocate the container thats running that workload with one in which we will benefit with THPs, we unfortunately can't just rely on the system doing the right thing. It would be awesome if THPs are truly transparent and don't require any input, but unfortunately I don't think that there is a solution for this with just kernel monitoring. This is just a big hint from the user. If the global system policy is madvise and the workload owner has done their own benchmarks and see benefits with always, they set DEFAULT_MADV_HUGEPAGE for the process to optin as "always". If the global system policy is always and the workload owner has done their own benchmarks and see worse results with always, they set DEFAULT_MADV_NOHUGEPAGE for the process to optin as "madvise". > I don't necessarily object to userspace giving hints like "I think I'm > going to use all of this 20MB region quite heavily", but the kernel should > treat those hints with the appropriate skepticism, otherwise it's just > a turbo button that nobody would ever _not_ press. > >>> instead of coming up with these baroque reasons to blame >>> the sysadmin for not having tweaked some magic knob. >> >> To me this is not about blaming sysadmin but more about sysadmin wanting >> more fine grained control on THP allocation policies for different >> workloads running in a multi-tenant environment. > > That's the same thing. Linux should be auto-tuning, not relying on some > omniscient sysadmin to fix it up.