From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25824CE7D0B for ; Tue, 1 Oct 2024 10:21:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C4EE280069; Tue, 1 Oct 2024 06:21:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 974D5280068; Tue, 1 Oct 2024 06:21:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 815B3280069; Tue, 1 Oct 2024 06:21:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6226F280068 for ; Tue, 1 Oct 2024 06:21:41 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F24131A0583 for ; Tue, 1 Oct 2024 10:21:40 +0000 (UTC) X-FDA: 82624641960.26.008E565 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf07.hostedemail.com (Postfix) with ESMTP id 4F0DC4000D for ; Tue, 1 Oct 2024 10:21:39 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="QM9h8C/G"; spf=pass (imf07.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727777931; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/JDXXzENg3/WL8/Eno7mNwE9UIWH+D76i5Wwu4cdhBs=; b=C/kZVYD4Bf+5cZbhw8lkAQF9ukB66rthUfXrwzwge9841EHEwWu8GGHG+FdLYyuzSOsX6x 0JoIUv9MJuVvhAuMTLTTq6xPvIQTxnKKupz8Vraw8tOEq56VmyJY7Vbywc1ZjqjQ8K3NbE n1z3KSCIvVnbzWIdx4/3EUHF895T94o= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="QM9h8C/G"; spf=pass (imf07.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727777931; a=rsa-sha256; cv=none; b=hIpywfjobo6O4J3P8+dMXOE2FFDUDFZnqrq+9HUH6cODWc9jHqv+0MZjt8cUpUzOAGSx0Y qlVYkv2cIO2CoN7fkzbNP+PCYUbxgPF0M+7bIeLu+KK62qhL6XmBSJo3QBI7685mwljDAw fDY1F0mPnIvBrSMy0eQbSPV3AHvQJRI= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 1BC825C05CF; Tue, 1 Oct 2024 10:21:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 97CD6C4CEC6; Tue, 1 Oct 2024 10:21:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727778097; bh=wluuHAc01SS7mkJT/FzCZ0xt1SHsHqpNeSN0BbjLliM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QM9h8C/GJ9l7//lxMgAEa90ssI+3WGddnIEmrHtARmkQEtO56Uxd/uLcbM4n5/EP+ AYhGc4Uf2yuqbeDqUsMbushkHpWyS4KRnmevv0iE8zk4zXVZtDSq6qu+6obXHdnCG3 P2Bji66wBL1ymFhNLHlP6Y1aeFWNseR88fv7dSLzChRJUhOgnIirqqZoUnlWozHkTy ZBgjJvOCjLWlhLJgj0EYIcF5+eajsgMUqC6Km3HpYTrTUTuWsgII5SCdX261eRHiVN 8CnoT2556iVH08CoKr5RvwyWiHdBw6DNU+qGEfKLGVZWCcbRBzWPAXvTgK3PEqB7Wt 98SQtQRTWEKMA== Date: Tue, 1 Oct 2024 12:21:32 +0200 From: Christian Brauner To: Lorenzo Stoakes Cc: Oleg Nesterov , Aleksa Sarai , Florian Weimer , Christian Brauner , Shuah Khan , "Liam R . Howlett" , Suren Baghdasaryan , Vlastimil Babka , pedro.falcato@gmail.com, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 0/3] introduce PIDFD_SELF Message-ID: <20241001-stapfen-darfst-5fe2a8b2c6ec@brauner> References: <87ttdxl9ch.fsf@oldenburg.str.redhat.com> <42df57ac-d89c-4111-a04d-290dd2197573@lucifer.local> <20240930-verbiegen-zinspolitik-cafb730c3c84@brauner> <20240930.141721-salted.birth.growing.forges-5Z29YNO700C@cyphar.com> <8b1b376b-3c4f-409a-b8e4-8faf3efecdc8@lucifer.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <8b1b376b-3c4f-409a-b8e4-8faf3efecdc8@lucifer.local> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4F0DC4000D X-Stat-Signature: w8ijx85kfnpiz8muis1xhbaq8f85gmwi X-Rspam-User: X-HE-Tag: 1727778099-412864 X-HE-Meta: U2FsdGVkX1/hERqFDJceBzWQAT3a1LnXPsBkGHl3R0bwcNvWqMmO+nFdZoRAshnrYyuXYEFDBHvNi8J8o63WDC9XnyHlDvRgMgxsrzBFFXtAsBRvvv5xu9Rrhr+ckVfwRNI7AejUgMuRsRnmAfPe63U4TtCVbvmNUkHuETV8k+ZXvw1LDK29Zy7usZ3PficqIX+lyFQgQsyN6cvS6Bvecbhf0Orn3KvWq6lpWVNfW2zWzT066KRx8LNlWrPqmeI+tvV9Pzor31FNyyln65bbFvm0yGYqHDWM4WEEGHB54RW+LUIT3oxnY1avfIS+sfZ5LsuTiPfTDgPVZ0HVD3tWBCznRHuMKw7Ov7MdtsJDOrb8i+Ej8SqXuKpS5GsgjXejkwhVZiaCGpGG40Kzo/N8GZ9SalUTINyS5WtjK3be1hhBF2vdfBXuUoleUhX/Ciy/c3CTEnYjKYMP+nmqqzgAiG0oO5KmKPdydgP5KH2ALsMGoI+zbyoUDDBuHfQW3KFW7U3m7/Soqlu6GWBHNqGMEmcpgt+sQPmGXvJrwrQppi2DaZkCajnpPK+55Dwg8WDD/TiMO26hq+0m0xgMSGS5nU+6dZ6XrbdSv4QCyFmzB1FhB/TTzAFI9ok7hSuTcZMc4QDXeEdKeMZg/oIN6oE9lXj54BifeR1NQGu2oFDNKzaomAqUBHQN6fq9Bp2DA8dw4PwOZnHNIXJIe09jDLjjGjeOYkOxEY9Hu/zbwQVEnw/06Vke7Ll0OLM0ikbMtwovG2l2dgpWh5XrSYEYx/5gs736etbcOgJPLu22boU8ga9+I+FsJh7tCDVqldLiSJr+orEbjNkkZhWHPdvNbZ1RE3pWfjWaYFPHjBOZkz6zpsCpIdal5DqNVjvvgao+GEEKZNsbCDao8tVHgsXomOuLnny5t/T0csU/K4lohi2w1W90cHDlpaGJS9v2qqwyT6QZCwfqAjtAAUqn8MH4kr6 jmOD6YDP 2sB5qToFSHv91HgyD8jhi5jJqPXp5TNtHtwGoQiefAuKNUFz3k8l0vXSbN5jD9VrtokBJEF95VLgvI24MpmbAFiE6/nFQ4JxQ7EW5rNPokq0O7xa+e96BN1/9CMfPTFJ3IP58f8wNY3fUGINvsgf0IjtA9AFey8hmmiQrRtApEOyCc/uI1Ms2rwjQQ68QGlP5JE9ZPIJkxSL6ZMPJsdYdjr3zYgr26XiweqGDdZ046AiyEt8gGyxPEvcaSDx+Rr8XFl+9nyAErwNiz0ZDUsD9mdO2WY66NPjUg2XD+DEP2f8sWuuHZW6quyhjeeHhCyZ4t3/Kt1AZQOvyUjf/bBSiCXvvJqa3p0O+URdPMjCBI1wtrf3lKUsKqNncGqgYcjUzXUneOkoYHlMDri7LO2tPrObqgO/GWw5mMyftnlan9YViNTbBqGpwQ8wjysodmsDIXxzR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Sep 30, 2024 at 03:32:25PM GMT, Lorenzo Stoakes wrote: > On Mon, Sep 30, 2024 at 04:21:23PM GMT, Aleksa Sarai wrote: > > On 2024-09-30, Lorenzo Stoakes wrote: > > > On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote: > > > > On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote: > > > > > On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote: > > > > > > * Lorenzo Stoakes: > > > > > > > > > > > > > If you wish to utilise a pidfd interface to refer to the current process > > > > > > > (from the point of view of userland - from the kernel point of view - the > > > > > > > thread group leader), it is rather cumbersome, requiring something like: > > > > > > > > > > > > > > int pidfd = pidfd_open(getpid(), 0); > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > close(pidfd); > > > > > > > > > > > > > > Or the equivalent call opening /proc/self. It is more convenient to use a > > > > > > > sentinel value to indicate to an interface that accepts a pidfd that we > > > > > > > simply wish to refer to the current process. > > > > > > > > > > > > The descriptor will refer to the current thread, not process, right? > > > > > > > > > > No it refers to the current process (i.e. thread group leader from kernel > > > > > perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above. > > > > > > > > > > > > > > > > > The distinction matters for pidfd_getfd if a process contains multiple > > > > > > threads with different file descriptor tables, and probably for > > > > > > pidfd_send_signal as well. > > > > > > > > > > You mean if you did a strange set of flags to clone()? Otherwise these are > > > > > shared right? > > > > > > > > > > Again, we are explicitly looking at process not thread from userland > > > > > perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try > > > > > to implement that. > > > > > > > > Florian raises a good point. Currently we have: > > > > > > > > (1) int pidfd_tgid = pidfd_open(getpid(), 0); > > > > (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD); > > > > > > > > and this instructs: > > > > > > > > pidfd_send_signal() > > > > pidfd_getfd() > > > > > > > > to do different things. For pidfd_send_signal() it's whether the > > > > operation has thread-group scope or thread-scope for pidfd_send_signal() > > > > and for pidfd_getfd() it determines the fdtable to use. > > > > > > > > The thing is that if you pass: > > > > > > > > pidfd_getfd(PDIFD_SELF) > > > > > > > > and you have: > > > > > > > > TGID > > > > > > > > T1 { > > > > clone(CLONE_THREAD) > > > > unshare(CLONE_FILES) > > > > } > > > > > > > > T2 { > > > > clone(CLONE_THREAD) > > > > unshare(CLONE_FILES) > > > > } > > > > > > > > You have 3 threads in the same thread-group that all have distinct file > > > > descriptor tables from each other. > > > > > > > > So if T1 did: > > > > > > > > pidfd_getfd(PIDFD_SELF, ...) > > > > > > > > and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect > > > > to get the fd from its file descriptor table. IOW, its reasonable to > > > > expect that T1 is interested in their very own resource, not someone > > > > else's even if it is the thread-group leader. > > > > > > > > But what T1 will get in reality is an fd from TGID's file descriptor > > > > table (and similar for T2). > > > > > > > > Iirc, yes that confusion exists already with /proc/self. But the > > > > question is whether we should add the same confusion to the pidfd api or > > > > whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual > > > > calling thread. > > > > > > > > My thinking is that if you have the reasonable suspicion that you're > > > > multi-threaded and that you're interested in the thread-group resource > > > > then you should be using: > > > > > > > > int pidfd = pidfd_open(getpid(), 0) > > > > > > > > and hand that thread-group leader pidfd around since you're interested > > > > in another thread. But if you're really just interested in your own > > > > resource then pidfd_open(getpid(), 0) makes no sense and you would want > > > > PIDFD_SELF. > > > > > > > > Thoughts? > > > > > > I mean from my perspective, my aim is to get current->mm for > > > process_madvise() so both work for me :) however you both raise a very good > > > point here (sorry Florian, perhaps I was a little too dismissive as to your > > > point, you're absolutely right). > > > > > > My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0) > > > behaviour, but you and Florian make a strong case that you'd _probably_ > > > find this very confusing had you unshared in this fashion. > > > > > > I mean in general this confusion already exists, and is for what > > > PIDFD_THREAD was created, but I suspect ideally if you could go back you > > > might actually do this by default Christian + let the TGL behaviour be the > > > optional thing? > > > > > > For most users this will not be an issue, but for those they'd get the same > > > result whichever they used, but yes actually I think you're both right - > > > PIDFD_SELF should in effect imply PIDFD_THREAD. > > > > Funnily enough we ran into issues with this when running Go code in runc > > that did precisely this -- /proc/self gave you the wrong fd table in > > very specific circumstances that were annoying to debug. For languages > > with green-threading you can't turn off (like Go) these kinds of issues > > pop up surprisingly often. > > Yeah, damn, useful insight that such things do happen in the wild. > > > > > > We can adjust the pidfd_send_signal() call to infer the correct scope > > > (actually nicely we can do that without any change there, by having > > > __pidfd_get_pid() set f_flags accordingly). > > > > > > So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread. > > > > > > My question in return here then is - should we introduce PIDFD_SELF_PROCESS > > > also (do advise if you feel this naming isn't quite right) - to provide > > > thread group leader behaviour? > > > > Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe > > they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for > > current's tid)? In principle I guess users might use PIDFD_SELF by > > accident but if we mirror the naming with /proc/{,thread-}self that > > might not be that big of an issue? > > Lol, you know I wasn't even aware /proc/thread-self existed... Wait until you learn that /proc/$TID thread entries exist but aren't shown when you do ls -al /proc, only when you explicitly access them. > > Yeah, that actually makes sense and is consistent, though obviously the > concern is people will reflexively use PIDFD_SELF and end up with > potentially confusing results. > > I will obviously be doing a manpage patch for this so we can spell it out > there very clearly and also in the header - so I'd actually lean towards > doing this myself. > > Christian, Florian? Thoughts? I think adding both would be potentially useful. How about: #define PIDFD_SELF -100 #define PIDFD_THREAD_GROUP -200 This will make PIDFD_SELF mean current and PIDFD_THREAD_GROUP mean current->pid_links[PIDTYPE_TGID]. I don't think we need to or should mirror procfs in any way. pidfds are intended to be usable without procfs at all. I want to leave one comment on a bit of quirkiness in the api when we add this. I don't consider it a big deal but it should be pointed out. It can be useful to allow PIDFD_SELF or PIDFD_THREAD_GROUP to refer to the calling thread even for pidfd_open() to avoid an additional getpid() system call: (1) pidfd_open(PIDFD_SELF, PIDFD_THREAD) (2) pidfd_open(PIDFD_SELF, 0) So if we allow this (Should we allow it?) then (1) is just redundant but whathever. But (2) is at least worth discussing: Should we reject (2) on the grounds of contradictory requests or allow it and document that it's equivalent to pidfd_open(getpid(), PIDFD_THREAD)? It feels like the latter would be ok. Similar for pidfd_send_signal(): // redundant but ok: (1) pidfd_send_signal(PIDFD_SELF, PIDFD_SIGNAL_THREAD) // redundant but ok: (2) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD_GROUP) // weird way to spell pidfd_send_signal(PIDFD_THREAD_GROUP, 0) (3) pidfd_send_signal(PIDFD_SELF, PIDFD_SIGNAL_THREAD_GROUP) // weird way to spell pidfd_send_signal(PIDFD_SELF, 0) (4) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD) I think all of this is ok but does anyone else have a strong opinion?