From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47206C83F10 for ; Sat, 26 Aug 2023 02:29:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 288E92800DB; Fri, 25 Aug 2023 22:29:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 239872800CE; Fri, 25 Aug 2023 22:29:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 128FD2800DB; Fri, 25 Aug 2023 22:29:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 01E8D2800CE for ; Fri, 25 Aug 2023 22:29:34 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C466CB27E1 for ; Sat, 26 Aug 2023 02:29:34 +0000 (UTC) X-FDA: 81164674668.06.6752793 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) by imf09.hostedemail.com (Postfix) with ESMTP id D5DF714000E for ; Sat, 26 Aug 2023 02:29:32 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b="mPm/qZCa"; spf=none (imf09.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693016973; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kRz17k12DLtH5ptlx3fiiancvtShyoQwDvuAAOUED4Y=; b=p5SpM8b0j1gkWL/ci6u5/1WfTmq9I+GTodOUAwdzUh+O+nFg9TIugtzP0mskV5RZIDOpUE FFQNvVZSm9o9N7yWt85xQdHqrcW48YgiwIQVRQX5DHSPYXkTKP+iOgI4El10NS1duju2zh QcxbwwtNsL9F9ST0/gRznEZ5Cmsc4lc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693016973; a=rsa-sha256; cv=none; b=SqZHJjry9wR1YEea9eHOHIfpeqqPLgxQ4XAY0sfm0Nzs0AOYqld5M/pl2ipEoA8SCZloG8 bWjY8Ca4t7gB0gDpcGh4CrUbtij+jjn1AX9IoPJig7wTfIzNYHXLVb8ZCvOshn+uSq+Wqi Yd1b+ove5Z9J6xCmX/Y7yxa0u+PKG30= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b="mPm/qZCa"; spf=none (imf09.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=kRz17k12DLtH5ptlx3fiiancvtShyoQwDvuAAOUED4Y=; b=mPm/qZCa7MWW4zdrpIjhXYeIS6 8cbIgOvCf/9Y6gIxhhuc7MbOClbX1o3H4gTlXLJmOpOyil1pufLmdD8oxm6VhlxB4x6C/OQmKz1OC 9Sg7eN60e8HGn0pn/DtWEZ2zi7uHlJyyQNolWeGdXQEgqn78/62UYsbUB8gsUgOw68YkPTd36+fo0 kPzizyM3mKJ55bukbV99w6F1Yx4exn0ELF0EHA2mn4TEClpd2hjjcTiwEbEGv+vsWkEq4IC7ejHjR rO8uE9/3ipL5VpLJjGdd7WJssIdQ5Tj/z+j2E3fQDWI0Qb9TMLDhfRhfBXk5sw2QK4rJI42xn3tDe SBJa1VHw==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1qZj2i-0010QB-1S; Sat, 26 Aug 2023 02:28:52 +0000 Date: Sat, 26 Aug 2023 03:28:52 +0100 From: Al Viro To: Jan Kara Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , Alasdair Kergon , Andrew Morton , Anna Schumaker , Chao Yu , Christian Borntraeger , "Darrick J. Wong" , Dave Kleikamp , David Sterba , dm-devel@redhat.com, drbd-dev@lists.linbit.com, Gao Xiang , Jack Wang , Jaegeuk Kim , jfs-discussion@lists.sourceforge.net, Joern Engel , Joseph Qi , Kent Overstreet , linux-bcache@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org, linux-mtd@lists.infradead.org, linux-nfs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-nvme@lists.infradead.org, linux-pm@vger.kernel.org, linux-raid@vger.kernel.org, linux-s390@vger.kernel.org, linux-scsi@vger.kernel.org, linux-xfs@vger.kernel.org, "Md. Haris Iqbal" , Mike Snitzer , Minchan Kim , ocfs2-devel@oss.oracle.com, reiserfs-devel@vger.kernel.org, Sergey Senozhatsky , Song Liu , Sven Schnelle , target-devel@vger.kernel.org, Ted Tso , Trond Myklebust , xen-devel@lists.xenproject.org, Jens Axboe , Christian Brauner Subject: Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Message-ID: <20230826022852.GO3390869@ZenIV> References: <20230810171429.31759-1-jack@suse.cz> <20230825015843.GB95084@ZenIV> <20230825134756.o3wpq6bogndukn53@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230825134756.o3wpq6bogndukn53@quack3> X-Rspamd-Queue-Id: D5DF714000E X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 8rfuqs4g9ks175ypnns87ckjrct1rrbu X-HE-Tag: 1693016972-284216 X-HE-Meta: U2FsdGVkX195pdkeOwB4XKK2By2A/DwNSLqOXsp6DWUDs3BmpUmB+rFYJaE7DyHz8HRHVoUCtHpxmd2K+ssI0HFI08mu6Arvq7slY/UE1JPIjdsRU9SEpqILHihIhXQzBVFaTjH2Uc9xwQZrhojz9ZNUHIQa6HDoivFAPBjcZUCzRpKzuXQzYmKkblWxTbtHNmrBkARwFfsaOsKlcCbYxSiC/4xQOZ6+kXsVff8ZcLf1pqDkOm9OV4wO3adQRIqHRFdsw6ntcpTKxWhqFt7/9FqG53afKyAbx5kgt33HV2Ic1HWXsvC/SMbybqMjSaom1xxwL4dGPqQTAwAjByFyeF6gPvkckinBuet7SXUYqrDvYAAXZlEaAx5VWNdCr0ur+99Xm1bd0BXLSErAetRQ78+XlLua+GQ4cTqRyPsyWvoW7AUZ9IZZJhwOKK64KX/zJoncPYZdO3bar3i3PLbfrkTMVjvXu4Gno4MIBsr50dvOL7GRc977Zm4dd65XhLCwoWaSqwaYJR9hZw1p7VnCJRqz8rEpqwHJ7EHwBtpdfMwRjzgyLYYccQt9g0eKY818Ev7FPYHWZZGxgaNcnkXTS5dxIy6XDoYVCQBPNoOP6vLpfcK5Cx1xxGdGM7hF+EyNKCLw0naXYfNcCH0BntnDjaPDTNiXQXs3HeaaZFMm8yb9w4cTWrfrAxqoTkH2Laen0vDNVsd5L5Luaw8hdemX5wu0p9GyB08Yp2FNtYoSeZU3mZUO8u8Wukygugb7jldG9S5cVXE0XYCZFOnuOBRjgfKoa1Bgjs6iBZGbDBaN2SHvZDlt6+2QFLMKOkXVf5hRsu9ZfrcOlSpi913oeMZYvNvD8LDBPoSwnKXyO/h5rxmD1I/PCPztE4ENtQqTqsrnzGqkwomRPKnC9i3WcEt6v20OGCF00mT4iUmkge8Uoso9Co4tQGBK5lFf6ms1DwqABb9P5edN7QgEeWv1ZLS rNxhH9BV yCEBoVjZf0QXDMU7ptLP6RbWJreXWwZDswGrz8+zBOM5A59Ea3DRC3vWDnRBdLaE5j5OryseZMqMj/vl6o7bQNFYIY2YUcIP5OqfGT5HlJFMIpmnpm/XrBtDNsjunhLMWzfW0YvAAFn8LeImLjix9wE+my+d7t6uQG2GEHp4mM40uR9vYHLrG5QzzPR5smwfIhxtag7zcNNjg3pR3QFMFQEPPqRRaguA5niDfzA+aDYQwzgJ55XqK7CQVx8WpgeORAjdDV2o4TRHYwivHBhq3KMc4hNv+8cxe+BZpSuG4gUGQxgafcr2JCZT1ti8VCqp4d7PqkRnKgEZQAOIepaGhPnzjcuH4WblwHnkzi3bt5RMT1EU3BiatuFjopBZNqxY2H3Yu92E5u1RYg7Zf63PdTn9g4xX1L7e/rccmK6MHzJ8T8VV0VcqdzLFsX2H/VaKcFIsB8vXN6Yc7Yy6Roez1ozMEa6EZM9R1BjkeMSm3VKP6kvFSxiYwAxOviz/KXAqaJ+IgerQSwICD+NQmnJKlQ9J7RG6U+AfsGgzrrF3VL6KqZi0U9UP/bIwksA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000011, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Aug 25, 2023 at 03:47:56PM +0200, Jan Kara wrote: > I can see the appeal of not having to introduce the new bdev_handle type > and just using struct file which unifies in-kernel and userspace block > device opens. But I can see downsides too - the last fput() happening from > task work makes me a bit nervous whether it will not break something > somewhere with exclusive bdev opens. Getting from struct file to bdev is > somewhat harder but I guess a helper like F_BDEV() would solve that just > fine. > > So besides my last fput() worry about I think this could work and would be > probably a bit nicer than what I have. But before going and redoing the whole > series let me gather some more feedback so that we don't go back and forth. > Christoph, Christian, Jens, any opinion? Redoing is not an issue - it can be done on top of your series just as well. Async behaviour of fput() might be, but... need to look through the actual users; for a lot of them it's perfectly fine. FWIW, from a cursory look there appears to be a missing primitive: take an opened bdev (or bdev_handle, with your variant, or opened file if we go that way eventually) and claim it. I mean, look at claim_swapfile() for example: p->bdev = blkdev_get_by_dev(inode->i_rdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL, p); if (IS_ERR(p->bdev)) { error = PTR_ERR(p->bdev); p->bdev = NULL; return error; } p->old_block_size = block_size(p->bdev); error = set_blocksize(p->bdev, PAGE_SIZE); if (error < 0) return error; we already have the file opened, and we keep it opened all the way until the swapoff(2); here we have noticed that it's a block device and we * open the fucker again (by device number), this time claiming it with our swap_info_struct as holder, to be closed at swapoff(2) time (just before we close the file) * flip the block size to PAGE_SIZE, to be reverted at swapoff(2) time That really looks like it ought to be * take the opened file, see that it's a block device * try to claim it with that holder * on success, flip the block size with close_filp() in the swapoff(2) (or failure exit path in swapon(2)) doing what it would've done for an O_EXCL opened block device. The only difference from O_EXCL userland open is that here we would end up with holder pointing not to struct file in question, but to our swap_info_struct. It will do the right thing. This extra open is entirely due to "well, we need to claim it and the primitive that does that happens to be tied to opening"; feels rather counter-intuitive. For that matter, we could add an explicit "unclaim" primitive - might be easier to follow. That would add another example where that could be used - in blkdev_bszset() we have an opened block device (it's an ioctl, after all), we want to change block size and we *really* don't want to have that happen under a mounted filesystem. So if it's not opened exclusive, we do a temporary exclusive open of own and act on that instead. Might as well go for a temporary claim... BTW, what happens if two threads call ioctl(fd, BLKBSZSET, &n) for the same descriptor that happens to have been opened O_EXCL? Without O_EXCL they would've been unable to claim the sucker at the same time - the holder we are using is the address of a function argument, i.e. something that points to kernel stack of the caller. Those would conflict and we either get set_blocksize() calls fully serialized, or one of the callers would eat -EBUSY. Not so in "opened with O_EXCL" case - they can very well overlap and IIRC set_blocksize() does *not* expect that kind of crap... It's all under CAP_SYS_ADMIN, so it's not as if it was a meaningful security hole anyway, but it does look fishy.