From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B713DC36008 for ; Wed, 26 Mar 2025 22:40:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5FD22800B8; Wed, 26 Mar 2025 18:40:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B0F662800A5; Wed, 26 Mar 2025 18:40:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D7092800B8; Wed, 26 Mar 2025 18:40:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7A26C2800A5 for ; Wed, 26 Mar 2025 18:40:40 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6FC0B58D0B for ; Wed, 26 Mar 2025 22:40:41 +0000 (UTC) X-FDA: 83265173082.12.24EDEE9 Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) by imf06.hostedemail.com (Postfix) with ESMTP id 5C23218000E for ; Wed, 26 Mar 2025 22:40:39 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b="EAljKR/5"; spf=pass (imf06.hostedemail.com: domain of "prvs=17356e2fe=ptyadav@amazon.de" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=17356e2fe=ptyadav@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743028839; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=clwLN2YhLA41I0snKPHwdNicUbCluLtbHF3lSV2GSW8=; b=rqap6D3QWWyqvqE+CWLoxIJay8Kt3zL/uKSLdQTvNAFiXOJryb4Cuh16IABkK51ox6VXZR uIewWSuG3uYKSnRt3jBCwu3CQrKBWf5dK9UPcVBFm8/RqesYhgUTuqYKTOIcGEVU9Bzq19 7xAl5/PXbLpmhk4UoaN/XfjouL8T4+Y= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=amazon.de header.s=amazon201209 header.b="EAljKR/5"; spf=pass (imf06.hostedemail.com: domain of "prvs=17356e2fe=ptyadav@amazon.de" designates 52.119.213.150 as permitted sender) smtp.mailfrom="prvs=17356e2fe=ptyadav@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743028839; a=rsa-sha256; cv=none; b=OJwZdrtP9AkUQgRImiy6m2tj8WxYnEd63BDKxx0GsMGKvdyLpci1S7/fzpe1lhhkc3o4zg jCOolBWQosFik8W1XHY7m64mAy3z+mlYRjVj/DnKur4gP5EgowNQCNoKQ9SM+zGcf1AAiX R8iMW3JHGHmhuGfDxHH/GPYg124qPa8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1743028839; x=1774564839; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=clwLN2YhLA41I0snKPHwdNicUbCluLtbHF3lSV2GSW8=; b=EAljKR/5aAYAcKmGPNYFBTKVM0TcgtN4JmLmwRdI4/2ZwOoRtuTF9tLc THKl4KJzyVEg7c/paq21aSvo/v4bYcbfCPmdcIJOE9mCoD83WT5h7lajt RasCqEjTPHoSDsao0xVgw1CdxH22HSeplMkrsNHKze3xTzU9w4TN0mHeB U=; X-IronPort-AV: E=Sophos;i="6.14,279,1736812800"; d="scan'208";a="708491672" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Mar 2025 22:40:34 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.38.20:62713] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.62.254:2525] with esmtp (Farcaster) id 21996b69-f902-4603-9409-820b1efec001; Wed, 26 Mar 2025 22:40:33 +0000 (UTC) X-Farcaster-Flow-ID: 21996b69-f902-4603-9409-820b1efec001 Received: from EX19D020UWC002.ant.amazon.com (10.13.138.147) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 26 Mar 2025 22:40:31 +0000 Received: from EX19MTAUWA001.ant.amazon.com (10.250.64.204) by EX19D020UWC002.ant.amazon.com (10.13.138.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Wed, 26 Mar 2025 22:40:30 +0000 Received: from email-imr-corp-prod-iad-all-1a-6ea42a62.us-east-1.amazon.com (10.25.36.214) by mail-relay.amazon.com (10.250.64.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14 via Frontend Transport; Wed, 26 Mar 2025 22:40:30 +0000 Received: from dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com (dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com [172.19.91.144]) by email-imr-corp-prod-iad-all-1a-6ea42a62.us-east-1.amazon.com (Postfix) with ESMTP id 363B540391; Wed, 26 Mar 2025 22:40:30 +0000 (UTC) Received: by dev-dsk-ptyadav-1c-43206220.eu-west-1.amazon.com (Postfix, from userid 23027615) id E67211568; Wed, 26 Mar 2025 22:40:29 +0000 (UTC) From: Pratyush Yadav To: Jason Gunthorpe CC: Christian Brauner , Linus Torvalds , , "Jonathan Corbet" , Eric Biederman , "Arnd Bergmann" , Greg Kroah-Hartman , Alexander Viro , Jan Kara , "Hugh Dickins" , Alexander Graf , "Benjamin Herrenschmidt" , David Woodhouse , James Gowans , Mike Rapoport , Paolo Bonzini , Pasha Tatashin , Anthony Yznaga , "Dave Hansen" , David Hildenbrand , Matthew Wilcox , Wei Yang , Andrew Morton , , , , Subject: Re: [RFC PATCH 1/5] misc: introduce FDBox In-Reply-To: <20250320121459.GS9311@nvidia.com> References: <20250307005830.65293-2-ptyadav@amazon.de> <20250307-sachte-stolz-18d43ffea782@brauner> <20250309-unerwartet-alufolie-96aae4d20e38@brauner> <20250317165905.GN9311@nvidia.com> <20250318-toppen-elfmal-968565e93e69@brauner> <20250318145707.GX9311@nvidia.com> <20250318232727.GF9311@nvidia.com> <20250320121459.GS9311@nvidia.com> Date: Wed, 26 Mar 2025 22:40:29 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-Rspamd-Queue-Id: 5C23218000E X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: isnscdmjkasutwzwkj8h7ntac6mywj89 X-HE-Tag: 1743028839-183411 X-HE-Meta: U2FsdGVkX1/eiAnVWRJpQCMmyaHoXvxGmRNp+57HxtxiS3X8G6pwZhiQpgXZBENHyYozRp1VMENUnaUNa6mS6CnPVSdwKyUc5YQL+xUriEnyV2kMsBePavbfF+ZGA633M1ihOg08zw7x+Gg9+xaEyIQGgQynfEZD9kMoFsvpEbN9K+08wU5sqCEoViPca61palUUATkK5WU4QAhMRjwkUWRlXk0s4Zi+E/yRbwzLHw6F4ak+opJitLVGHPvDgVd4QdXg+8AAWZygOEoyDLfg1piKt7u297TTGq2Crg7IgapXYKxq+Fu6iemPR914M0M4aHMnn9n6Cbp95+EY15D9QkgNogGhBYZAgoscHDCAs0FrZxhyUFwtEYEmhLYn4gAA5Kd350KY4krhF1+fjn2GBWpJEGzM0zmB28e5uQ6zADk6+M0AUtKDMdenaOyO9p0BtksFHQ5Oqi3XKDxtWe1emy6/xa/WT3V/W48XSg52dYoG+vp2wf5/HhlgF5hgdufFCSlQyiDF8PF39BJFbieGg+cuFs4ytAdR7H5DTYtjZWPbxqiGSO3ZaKf21uTb9JY7o7Kb1/NzucuJisnlzvB7U0NBCXnB63iikth+Z28lWXPrgVAHBSIu/B2pOsEEbU2emDOeYyG7k34HsIPrlU2lP/C195HW/xl9FQfAOweG6ateSDmV4HLBnzUzPbk42jEjez7VAQSQSUCQjztW+zERnrbu2cKBYTWAwQE4/q/nQU4rkFLrxOBakoxAFru20CA2COHoICPG42VLUdJcZSpoyo4vuuBofJJsokWtPpkDVn70NMB7QNRDCAs6vmWqJFbw20xIoOwCq3lMtD9MToU82c8IfLoy9oIKaGN9U0UR9vUrL3tn7Egxm+/BF/oezM10wXSB09HKr1ypxm4ljArYTsYn80oWKbKXw6IZMijqyMqGLfw019mD9IKFOnco+3stnuOYXubhgJOQtUOmaIx 9E3qoE5f HHH+ivV+TFRB5/I6SJlm8EmMiTzD7DVfWeoLGlTHqfL/VomaO3DYLy2YIPw8oNa5GI9lkNm2ucdqDTc/kdBMXO0eT5TQWtdxMQ8EHwDdYJKswwbYKaGocAZYBF58xNW8t40IoFsiTDJhRakcHNmdHehFuqUnj0YZqKadpxa39xZG1Sk/n4hl8DOiDh/cXmyGuveOszSmJnr5fiT651Dxj/VKHt1Jw+wEnURWLAb9ntO3Ei7Ott8dilcKwV3oWBkTQK6jSZqO9/1ce8t0Ti4A2UkQNzDbN82kRy9iOZ6FiLBQ+1k0Mf+xAwj0+D98wU1ChM6R9Cxov2hHV+ewVTlm2iPsZgGs4rELn76nTVMqaEusZYXKYVXgL5UQDyQH/CJAyHccVm7pLeVG1us9nb0IyRlF2NqgiHMRLW8c6DsjW2SkmELY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 20 2025, Jason Gunthorpe wrote: > On Wed, Mar 19, 2025 at 01:35:31PM +0000, Pratyush Yadav wrote: >> On Tue, Mar 18 2025, Jason Gunthorpe wrote: >> >> > On Tue, Mar 18, 2025 at 11:02:31PM +0000, Pratyush Yadav wrote: >> > >> >> I suppose we can serialize all FDs when the box is sealed and get rid of >> >> the struct file. If kexec fails, userspace can unseal the box, and FDs >> >> will be deserialized into a new struct file. This way, the behaviour >> >> from userspace perspective also stays the same regardless of whether >> >> kexec went through or not. This also helps tie FDBox closer to KHO. >> > >> > I don't think we can do a proper de-serialization without going >> > through kexec. The new stuff Mike is posting for preserving memory >> > will not work like that. >> >> Why not? If the next kernel can restore the file from the serialized >> content, so can the current kernel. What stops this from working with >> the new memory preservation scheme (which I assume is the idea you >> proposed in [0])? > > It is because the current kernel does not destroy the struct page > before the kexec and the new kernel assumes a zero'd fresh struct page > at restore. > > So it would be very easy to corrupt the struct page information if you > attempt to deserialize without going through the kexec step. > > There would be a big risk of getting things like refcounts out of > sync. Ideally, kho_preserve_folio() should be similar to freeing the folio, except that it doesn't go to buddy for re-allocation. In that case, re-using those pages should not be a problem as long as the driver made sure the page was properly "freed", and there are no stale references to it. They should be doing that anyway since they should make sure the file doesn't change after it has been serialized. Doing that might be easier said than done though. On a quick look, most of the clearing of struct page seems to be happening in free_pages_prepare(). This is usually followed by free_one_page(), which gives the page back to buddy. Though I am not sure how much sense it would make to use free_pages_prepare() outside of page free path. I need to look deeper... > > Then you have the issue that I don't actually imagine shutting down > something like iommufd, I was intending to leave it frozen in place > with all its allocations and so on. If you try to de-serialize you > can't de-serialize into the thing that is frozen, you'd create a new > one from empty. Now you have two things pointing at the same stuff, > what a mess. What do you mean by "frozen in place"? Isn't that the same as being serialized? Considering that we want to make sure a file is not opened by any process before we serialize it, what do we get by keeping the struct file around (assuming we can safely deserialize it without going through kexec)? > >> The seal operation does bulk serialize/deserialize for _one_ box. You >> can have multiple boxes and distribute your FDs in the boxes based on >> the serialize or deserialize order you want. Userspace decides when to >> seal or unseal a particular box, which gives it full control over the >> order in which things happen. > > Why have more than one box? What is the point? I've been thinking we > should just have a KHO control char dev FD for serializing and you can > do all the operations people have been talking about in sysfs, as well > as record FDs for serializing. Main idea is for logical grouping and dependency management. If some FDs have a dependency between them, grouping them in different boxes makes it easy to let userspace choose the order of operations, but still have a way to make sure all dependencies are met when the FDs are serialized. Similarly, on the deserialize side, this ensures that all dependent FDs are deserialized together. [...] -- Regards, Pratyush Yadav