From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B891C3DA6D for ; Tue, 20 May 2025 18:25:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAA066B0098; Tue, 20 May 2025 14:25:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B82106B0099; Tue, 20 May 2025 14:25:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4B1C6B009A; Tue, 20 May 2025 14:25:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 845176B0098 for ; Tue, 20 May 2025 14:25:32 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2C8371CF5A1 for ; Tue, 20 May 2025 18:25:32 +0000 (UTC) X-FDA: 83464114104.02.91CABF9 Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) by imf06.hostedemail.com (Postfix) with ESMTP id 143A418000E for ; Tue, 20 May 2025 18:25:29 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Ec/TqoO7"; spf=pass (imf06.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="Ec/TqoO7"; spf=pass (imf06.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747765530; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2NplcsNYy8UsAypyXRLRttNfePHUJ7ZhLsg1OFHB/1Q=; b=uqrIGRBtfsWoe92j9O1TT3mfmtpZPFmVirFbkBHLeI8eCvdQxXuoqmKE88wQg4fJ1JxuUm aPQ3eGo93350oSzHHjzw4ygAIzdgwD/b815djepVU1e6Pm29U8Nh0znDI82B4Ne+CN4EYQ sfUZcigBjTf3Z7IWEimEHfyXYJWNbVQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747765530; a=rsa-sha256; cv=none; b=hWiZjwhs+YpqWrxlO4ONd7jnHgPueuvdJOBQX8K3EkMes3LPtAP/4Em9+46J5ubJtr5c6M DFnakBMa/UnHhQ+1jJVej4CejdCvYTCX/ctMhFI5Z9gz+x566GOSh9R65EmMG/AH4hvncp 2czqyvfJsPt0D1JBSuH0ydjsCwMVLnE= Date: Tue, 20 May 2025 11:25:05 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1747765527; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2NplcsNYy8UsAypyXRLRttNfePHUJ7ZhLsg1OFHB/1Q=; b=Ec/TqoO75jX1OT9KrUOwWuMr8iofzbL5wfGPk1S5goPxoON98WVwfBs4x3PSBWepaGiklT t3IDtPPwuth5il4uzg3459jOlTU5bSkAro97+s48XdTfB+8XLNMYHn6Ob4jMF/5aNGZP4Y zFLirgjZ8F9BUT349sQdC/oyyVynb/g= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Lorenzo Stoakes Cc: Andrew Morton , "Liam R . Howlett" , David Hildenbrand , Vlastimil Babka , Jann Horn , Arnd Bergmann , Christian Brauner , linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, SeongJae Park , Usama Arif Subject: Re: [RFC PATCH 0/5] add process_madvise() flags to modify behaviour Message-ID: <7tzfy4mmbo2utodqr5clk24mcawef5l2gwrgmnp5jmqxmhkpav@jpzaaoys6jro> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 143A418000E X-Stat-Signature: qeqsmzrww6p7d7ucir89hgsf8pi761xh X-Rspam-User: X-HE-Tag: 1747765529-731109 X-HE-Meta: U2FsdGVkX1/dP9Tcdsk+KDThHXRtczrhOzZsglWAK70W8h6Y8i0MsyaG5zJkttxrAqriuEraUvyf4POiWsxs3uOdWAcu2zWuGiyJRY60q4T73mC2zrvinpZUQqF7JLOaL8RNeaGKIIPBtzv5uDvgfMoZaiOySg4yZgyu5o/4fm1MI6WzkC9RphpNsKdKpxOua2gtkz4V/CUnGaFWbtbUT9S9co6aVAncSLwEC2o4fSfYGIQsXuDliLT+dh9QP5mQCXBKZDXWepc2S/5ZR9rPds6jcgw2qbSEuemoCRkqUFfSqZYjVNGfuOE2OW4YdcmA/XI83q2eBdTot7AaTvGSyzNbChVaSo7HRGd6z1F+yC+XFJnwgGbWAzNtWZ1etYUA3tz/308oNQBtoQroZ+4hMGrHrEz+l9K3uOsUtK6Rp/qBvITrHvhme3V7yLPd+hcsdSkqpXpOk8UrS+0AcMy30PR8rq8SlD/p365vKKTWm6CIhEinW3y33BlohVrrGjxKj0eUoqfCBJEFmpuQK436+L4MdOjAlXqthXpDzzPrpGU6ldV/6U6hAOEG9WejQdpl2EaMui11kIWtSITNgD923c78jD3dJ8iFDmqz7lrXlz8wkQihA2iJszWaH53AsKnoT4LpGWqUKya9jSNFHTuOdP7AkiLBqHxUEvSD8SOz/7GtAaWyqMNh/qImfsz+7nw48xCmms7rVoknyqoqd4Bpyr/Akc624zWDYDQGORyA/5UlI4JRK9nwneZxLbm0VgZMVBKXGM3X2z8bfJFOyJiSnsaqUlRKk6c2qzMvpLX9nPSxiPL4PMn/6u9+IYXYtHUzgNjffb2akIQpJ5iBlktSvqFoP6pdrQu9M1b63XikPKe9NhQTePBgwHg03MQZeEfXe7sqy9BY+In5OrXsQ+QhKsbGELXFqgwROsr8OZycQFLHmN6KCjUGrw2MMAdBC4103d2Pfa09k0rq+YtOn8Y EadzlhD/ hLETndDhP8UK4szdstGUU+iTZfJaWaobMC98aJ1cVnvgqxTkJKHLBDGKBd5mmj+HvbHU3g+TBu8/Hj7+mtb2OMGwLv0Rgu1OjZ08UiSJcrgG7Yx6+qKQZwvJPJw5SUv0Gvp4SmEuc9zkg/Z9wluhrwSs7eVmftjFs+3pm9ehlPSN/OG8zfLImTb3iOIKUqE9SVlXdBlH7wES52szZPIqxwyGG6vS6GTgbpgYGu4ehrZYD/98ezKxo4kowgYiYnANr5mmhldGFjkq9+Ador6H0Bh47F4UJH2eoAbKDP03LzYcyVSpp85q/LwGzyg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 19, 2025 at 09:52:37PM +0100, Lorenzo Stoakes wrote: > REVIEWERS NOTES: > ================ > > This is a VERY EARLY version of the idea, it's relatively untested, and I'm > 'putting it out there' for feedback. Any serious version of this will add a > bunch of self-tests to assert correct behaviour and I will more carefully > confirm everything's working. > > This is based on discussion arising from Usama's series [0], SJ's input on > the thread around process_madvise() behaviour [1] (and a subsequent > response by me [2]) and prior discussion about a new madvise() interface > [3]. > > [0]: https://lore.kernel.org/linux-mm/20250515133519.2779639-1-usamaarif642@gmail.com/ > [1]: https://lore.kernel.org/linux-mm/20250517162048.36347-1-sj@kernel.org/ > [2]: https://lore.kernel.org/linux-mm/e3ba284c-3cb1-42c1-a0ba-9c59374d0541@lucifer.local/ > [3]: https://lore.kernel.org/linux-mm/c390dd7e-0770-4d29-bb0e-f410ff6678e3@lucifer.local/ > > ================ > > Currently, we are rather restricted in how madvise() operations > proceed. While effort has been put in to expanding what process_madvise() > can do (that is - unrestricted application of advice to the local process > alongside recent improvements on the efficiency of TLB operations over > these batvches), we are still constrained by existing madvise() limitations > and default behaviours. > > This series makes use of the currently unused flags field in > process_madvise() to provide more flexiblity. > > It introduces four flags: > > 1. PMADV_SKIP_ERRORS > > Currently, when an error arises applying advice in any individual VMA > (keeping in mind that a range specified to madvise() or as part of the > iovec passed to process_madvise()), the operation stops where it is and > returns an error. > > This might not be the desired behaviour of the user, who may wish instead > for the operation to be 'best effort'. By setting this flag, that behaviour > is obtained. > > Since process_madvise() would trivially, if skipping errors, simply return > the input vector size, we instead return the number of entries in the > vector which completed successfully without error. > > The PMADV_SKIP_ERRORS flag implicitly implies PMADV_NO_ERROR_ON_UNMAPPED. > > 2. PMADV_NO_ERROR_ON_UNMAPPED > > Currently madvise() has the peculiar behaviour of, if the range specified > to it contains unmapped range(s), completing the full operation, but > ultimately returning -ENOMEM. > > In the case of process_madvise(), this is fatal, as the operation will stop > immediately upon this occurring. > > By setting PMADV_NO_ERROR_ON_UNMAPPED, the user can indicate that it wishes > unmapped areas to simply be entirely ignored. Why do we need PMADV_NO_ERROR_ON_UNMAPPED explicitly and why PMADV_SKIP_ERRORS is not enough? I don't see a need for PMADV_NO_ERROR_ON_UNMAPPED. Do you envision a use-case where PMADV_NO_ERROR_ON_UNMAPPED makes more sense than PMADV_SKIP_ERRORS? > > 3. PMADV_SET_FORK_EXEC_DEFAULT > > It may be desirable for a user to specify that all VMAs mapped in a process > address space default to having an madvise() behaviour established by > default, in such a fashion as that this persists across fork/exec. > > Since this is a very powerful option that would make no sense for many > advice modes, we explicitly only permit known-safe flags here (currently > MADV_HUGEPAGE and MADV_NOHUGEPAGE only). Other flags seems general enough but this one is just weird. This is exactly the scenario for prctl() like interface. You are trying to make process_madvise() like prctl() and I can see process_madvise() would be included in all the hate that prctl() receives. Let me ask in a different way. Eventually we want to be in a state where hugepages works out of the box for all workloads. In that state what would the need for this flag unless you have use-cases other than hugepages. To me, there is a general consensus that prctl is a hacky interface, so having some intermediate solution through prctl until hugepages are good out of the box seems more reasonable. > > 4. PMADV_ENTIRE_ADDRESS_SPACE > > It can be annoying, should a user wish to apply madvise() to all VMAs in an > address space, to have to add a singular large entry to the input iovec. > > So provide sugar to permit this - PMADV_ENTIRE_ADDRESS_SPACE. If specified, > we expect the user to pass NULL and -1 to the vec and vlen parameters > respectively so they explicitly acknowledge that these will be ignored, > e.g.: > > process_madvise(PIDFD_SELF, NULL, -1, MADV_HUGEPAGE, > PMADV_ENTIRE_ADDRESS_SPACE | PMADV_SKIP_ERRORS); > I still don't see a need for this flag. Why not the following? process_madvise(PIDFD_SELF, NULL, -1, advise, PMADV_SKIP_ERRORS)?