From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32EEAC54E65 for ; Thu, 22 May 2025 12:12:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C910C6B008C; Thu, 22 May 2025 08:12:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C68296B0092; Thu, 22 May 2025 08:12:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA5AC6B0093; Thu, 22 May 2025 08:12:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9E1506B008C for ; Thu, 22 May 2025 08:12:16 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4B30D1D42B5 for ; Thu, 22 May 2025 12:12:16 +0000 (UTC) X-FDA: 83470431072.05.D317E3F Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf27.hostedemail.com (Postfix) with ESMTP id AC99E4000E for ; Thu, 22 May 2025 12:12:14 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Ebuxk8DE; spf=pass (imf27.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747915934; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VHyffYtqfmC6LGhIpSY1BS8+k8PhALIKC4PCjoZ8vMs=; b=eXz/7cFQ8VCkvBI1raY8BfTBlIkYpyAwYVXn2uV3Ly06UIRI3vHm6CUJt4JqWHlCt6lkDz r3QPv/pupjgJwKE80M2EaXr3bosOTuhgmb3CXwRZPO4SubnIjCdg8oOB5GfQpv7Jp72/xL fxWuSAb3xMcmh4e+d4rs6yTOxJyFlJ0= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Ebuxk8DE; spf=pass (imf27.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747915934; a=rsa-sha256; cv=none; b=JgTZFXVHwBRMGAnUiOuIeHOZyhGIEAWIU9ZLb7z98sxhmhyItF5D4ij19DpWhyx+6RlCPU P01W52uo1iRdZii9RTviBNDfD4N4YkdnwY81WVglJvJmeFy3F2Wc1bC3F2jgYgYbBdQI1B DDx6k4vwJZ9gDYBcsvKTd1tnLCisLLw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id AA2D260EDF; Thu, 22 May 2025 12:12:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20577C4CEE4; Thu, 22 May 2025 12:12:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1747915933; bh=StgdFeaMD6HJ9Me6GTfwzhiv4ow8/NpvUc+y4NUni2E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Ebuxk8DElK2tjp5ZvQk37kLGix2yfChyW3I8J7FaPYbTLnjfLorYdKkX+7JjgSqgl LY5N5YuC/VI1iHjtA8x39j2I5hf52LIGXT+m6csvpWdIx/AuCTHq5Md2yu1gfQcfoH YQnRPECrpFToWi9CiW2CUoy1kaCr5lcz9r0UNC+ql9WEcasfbFweM5E/h5jhl1C23y zUYk/oiHGTD2iNYap4ZuSBn7FBR7jllob9uHds31/Q4MKcCGD5K2SHdX8Vt5oShScH 61P8SWTu0oAQ3/REBM85O1ghJY686cDchWGnVILgvEbFYcTF6BiEDn7lOAe3Q2h4qr mnnenoduIsqMA== Date: Thu, 22 May 2025 15:12:05 +0300 From: Mike Rapoport To: Lorenzo Stoakes Cc: Andrew Morton , "Liam R . Howlett" , David Hildenbrand , Vlastimil Babka , Jann Horn , Arnd Bergmann , Christian Brauner , linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, SeongJae Park , Usama Arif , linux-api@vger.kernel.org Subject: Re: [RFC PATCH 0/5] add process_madvise() flags to modify behaviour Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: AC99E4000E X-Stat-Signature: f9xawk9cd39z7irjju6k4aukad1boa6x X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1747915934-925277 X-HE-Meta: U2FsdGVkX1/lDTn5uHJomxjqi0OaeARcY0G4Wro1G7c4/FlX9oM62ZRd8E+FaJ2lbNBcw9D/VgZ6OsAKfX3SFBy4U7QGQGgVTvtYx5IfU7YSi7RNomwnoesmd8IVBIUtYCbpzemzlZx2XdnUfxVhgDXUyyB3/rwj25OSq0Q7Nq1QUCbc7W8wpejV6LDEKC3IWKt2/LOW1n5+Gg6zE6OZ2lxm3r/tLPAaLbovm5ZZDeJIayps4/Vc1x+yDwxnSZ4qWCtUGo6hgqNY7g8qytT4lDR8oXe6y2YQ2pwWvT0dWpu1+iTqT0/hprHr09yRkjxVwhTF/uOcZ851jfKfV4ARASDIHuTYrPDAAObkHchsDjiAQgeiLm7vdROOv4xfkMhqmTAEsEsNw9UKAOHoIsmieOuF8z8rEtkrNmXEc0efhq8Yx+wB2myg9WnYcfkeR952iIE8wuDFr1M80ETP+rBe4IBnfdqHxJY/L6uWWRrGnaUXmq9GZmwYRjw0wLuRYzEtOdLyoSU6WWM68JnyDfXCyECBpvY+H7lVJ/eEaenq/F0ObUCiSjD7Na0qwytNRKxCEFjWRKVxxSW/bbclCv+IdXwzFECb0Zffpqs1K2BmXT90HCIhvpeCaXvvlbyWpzj7Z4eJct/xgyCKTQNc0KYJjs86kuh3vDSNxmWdVIi5JZ9hRGmB06eCBMRBF1nOwIag1Y50xX0TFpg6+GukXlXzL+/yh7082pnn+xEt2Dy/T9Rp8uzPzsN4nyeUFONmzfUbquSMNnb7xumwPVSJTrcGOYIkzWv7U/sHDh79RGO7qJj4TtDZ0IjjqOC4OkacYlx0wAJ6gwGNCR7jbN42hry16EUG9tmkKLK1PI9BPgAez/9xVa/+t1jtJdc4bBbIVxsXTmK4DuFbaYIpBPxxMvTllTjE4vSzVBiN+aY9JYAdbZ1K9oTPCZQME1hWpyz7xhha67mV6wVLiqHcn12alRF +1uzf/hv MhKESmpBaUOZoroXR4RXWXmje8n7wz4LgGG9wllVBpkFdH53hD+f45ZwLiixYEXVogAXSxMvalGZTDq4m84BiBdu5tUrA/8u7wa+OQwZOKA1Q3IAcO7Tjzoi2tdnVTYs1itWKIAljTv6SwjdN5lmmg0Ak6S8VPV/5x6eo5P/utRanV83BGSvRTXqthG/tRXH8mHjMMVzH4zONY0znbgFmzF8E0ZsGiBVj3GorHV5H8Vqshmw5TxJMSClyR70H0cwId6cq5I75cjSA9NQdVvF3Kw9gFG1p5mZvMIkwmQ0CG49yFfJ1BOHDbhRIe9UsiSBo+7ibOHaxXLVDzCfVaFgsumd3w5uc7I5vHrRQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: (cc'ing linux-api) On Mon, May 19, 2025 at 09:52:37PM +0100, Lorenzo Stoakes wrote: > REVIEWERS NOTES: > ================ > > This is a VERY EARLY version of the idea, it's relatively untested, and I'm > 'putting it out there' for feedback. Any serious version of this will add a > bunch of self-tests to assert correct behaviour and I will more carefully > confirm everything's working. > > This is based on discussion arising from Usama's series [0], SJ's input on > the thread around process_madvise() behaviour [1] (and a subsequent > response by me [2]) and prior discussion about a new madvise() interface > [3]. > > [0]: https://lore.kernel.org/linux-mm/20250515133519.2779639-1-usamaarif642@gmail.com/ > [1]: https://lore.kernel.org/linux-mm/20250517162048.36347-1-sj@kernel.org/ > [2]: https://lore.kernel.org/linux-mm/e3ba284c-3cb1-42c1-a0ba-9c59374d0541@lucifer.local/ > [3]: https://lore.kernel.org/linux-mm/c390dd7e-0770-4d29-bb0e-f410ff6678e3@lucifer.local/ > > ================ > > Currently, we are rather restricted in how madvise() operations > proceed. While effort has been put in to expanding what process_madvise() > can do (that is - unrestricted application of advice to the local process > alongside recent improvements on the efficiency of TLB operations over > these batvches), we are still constrained by existing madvise() limitations > and default behaviours. > > This series makes use of the currently unused flags field in > process_madvise() to provide more flexiblity. > > It introduces four flags: > > 1. PMADV_SKIP_ERRORS > > Currently, when an error arises applying advice in any individual VMA > (keeping in mind that a range specified to madvise() or as part of the > iovec passed to process_madvise()), the operation stops where it is and > returns an error. > > This might not be the desired behaviour of the user, who may wish instead > for the operation to be 'best effort'. By setting this flag, that behaviour > is obtained. > > Since process_madvise() would trivially, if skipping errors, simply return > the input vector size, we instead return the number of entries in the > vector which completed successfully without error. > > The PMADV_SKIP_ERRORS flag implicitly implies PMADV_NO_ERROR_ON_UNMAPPED. > > 2. PMADV_NO_ERROR_ON_UNMAPPED > > Currently madvise() has the peculiar behaviour of, if the range specified > to it contains unmapped range(s), completing the full operation, but > ultimately returning -ENOMEM. > > In the case of process_madvise(), this is fatal, as the operation will stop > immediately upon this occurring. > > By setting PMADV_NO_ERROR_ON_UNMAPPED, the user can indicate that it wishes > unmapped areas to simply be entirely ignored. > > 3. PMADV_SET_FORK_EXEC_DEFAULT > > It may be desirable for a user to specify that all VMAs mapped in a process > address space default to having an madvise() behaviour established by > default, in such a fashion as that this persists across fork/exec. > > Since this is a very powerful option that would make no sense for many > advice modes, we explicitly only permit known-safe flags here (currently > MADV_HUGEPAGE and MADV_NOHUGEPAGE only). > > 4. PMADV_ENTIRE_ADDRESS_SPACE > > It can be annoying, should a user wish to apply madvise() to all VMAs in an > address space, to have to add a singular large entry to the input iovec. > > So provide sugar to permit this - PMADV_ENTIRE_ADDRESS_SPACE. If specified, > we expect the user to pass NULL and -1 to the vec and vlen parameters > respectively so they explicitly acknowledge that these will be ignored, > e.g.: > > process_madvise(PIDFD_SELF, NULL, -1, MADV_HUGEPAGE, > PMADV_ENTIRE_ADDRESS_SPACE | PMADV_SKIP_ERRORS); > > Usually a user ought to prefer setting PMADV_SKIP_ERRORS here as it may > well be the case that incompatible VMAs will be encountered that ought to > be skipped. > > If this is not set, the PMADV_NO_ERROR_ON_UNMAPPED (which was otherwise > implicitly implied by PMADV_SKIP_ERRORS) ought to be set as of course, the > entire address space spans at least some gaps. > > Lorenzo Stoakes (5): > mm: madvise: refactor madvise_populate() > mm/madvise: add PMADV_SKIP_ERRORS process_madvise() flag > mm/madvise: add PMADV_NO_ERROR_ON_UNMAPPED process_madvise() flag > mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT process_madvise() flag > mm/madvise: add PMADV_ENTIRE_ADDRESS_SPACE process_madvise() flag > > include/uapi/asm-generic/mman-common.h | 6 + > mm/madvise.c | 206 +++++++++++++++++++------ > 2 files changed, 168 insertions(+), 44 deletions(-) > > -- > 2.49.0 > -- Sincerely yours, Mike.