From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BC63C63777 for ; Tue, 24 Nov 2020 17:00:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 673B5206F9 for ; Tue, 24 Nov 2020 17:00:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="pkS1IsRK" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 673B5206F9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7E7C46B0070; Tue, 24 Nov 2020 12:00:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7975F6B0071; Tue, 24 Nov 2020 12:00:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AD4B6B0073; Tue, 24 Nov 2020 12:00:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 460DE6B0070 for ; Tue, 24 Nov 2020 12:00:21 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 48F73362B for ; Tue, 24 Nov 2020 17:00:18 +0000 (UTC) X-FDA: 77519924916.07.lace41_520c5a92736f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 0B39E1803F9A5 for ; Tue, 24 Nov 2020 17:00:18 +0000 (UTC) X-HE-Tag: lace41_520c5a92736f X-Filterd-Recvd-Size: 7125 Received: from mail-qt1-f194.google.com (mail-qt1-f194.google.com [209.85.160.194]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 24 Nov 2020 17:00:17 +0000 (UTC) Received: by mail-qt1-f194.google.com with SMTP id v11so16515361qtq.12 for ; Tue, 24 Nov 2020 09:00:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fWmahOo+BxCVV9NEre8zWFBYKORZfhSvfmLWizAk7Sk=; b=pkS1IsRKMFFQKIiQOFY8WLSirKTKy5vlf5vBnIHJQYY8QvEC/15BTkG00rjnGztDPS MPnwXECAwJubj26/AKIoLsOqJXTzH20co1dwm6n/ihZPA8R/GncgR5Nn7BAW2WaJHH5L tpbCDHrNvMRBQDAisHHKNo7eGDqcuHeaFIgkH8ZC9CSiQc7xH6mkLcfA+jtEEIACMHfa u3OVfgqj6hqatE6LBJWSWupUvafe+983boP6OtbOOXJy3CJYXaRV2xsgTTuY8gHl3o7O iFEoPfxU2lYaOEolDJw8+Asj3AIZ8QN2eniPlD554d4692r1CYqrwtcGwNWS1nLFzZNO sY1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=fWmahOo+BxCVV9NEre8zWFBYKORZfhSvfmLWizAk7Sk=; b=kwGa6wmuQW54Dfp8US9qP1345rQ/FRbyGXUFIjaZcMAm+KjO0u5x4TiewJYG5hr3qg 7EcgJBQdZALBssEXY3GLAlnWNwv5xLLjECy5N41wM/k+0Ifg50BeZFbHVoRAK/CYTwpQ qwS14Qa4BSY5q5L7ZkV8CvE9hU8ip5Mz1vnZJYzB/sxtbJv/+etmNwfzqeaQZB8iz9kT XakkqW+L+LBa31TMij7qLUfF/p6QRUe1zYuolH7MLu+yb0ZU/pKxQ3McM4But+hfdTrR Rb21AeQsBKMtazn/adiPN2VZU18eKIMeZA/U8DmVLd/jHo8+4kkdbP48Zmhaus/sllfj xl8g== X-Gm-Message-State: AOAM530tr4ZCSDbVnWyeOOA4SPwN/Qsizgsgh70/Wjpat73FyNPk9rFI e8YcqDFGv/w+/1D4cEPjzsM= X-Google-Smtp-Source: ABdhPJw9lXoUdb+9nSlmHwoziLxai+Rt3I7zgIJ1oTBqcvjWZC4/0fG7FN8qIJjp6NzaYQdsfu0bWw== X-Received: by 2002:a05:622a:18d:: with SMTP id s13mr5279508qtw.306.1606237211875; Tue, 24 Nov 2020 09:00:11 -0800 (PST) Received: from localhost (dhcp-6c-ae-f6-dc-d8-61.cpe.echoes.net. [72.28.8.195]) by smtp.gmail.com with ESMTPSA id q20sm13543500qke.0.2020.11.24.09.00.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Nov 2020 09:00:11 -0800 (PST) Date: Tue, 24 Nov 2020 11:59:49 -0500 From: Tejun Heo To: Christoph Hellwig Cc: Jens Axboe , Josef Bacik , Konrad Rzeszutek Wilk , Coly Li , Mike Snitzer , dm-devel@redhat.com, Richard Weinberger , Jan Kara , linux-block@vger.kernel.org, xen-devel@lists.xenproject.org, linux-bcache@vger.kernel.org, linux-mtd@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 11/20] block: reference struct block_device from struct hd_struct Message-ID: References: <20201118084800.2339180-1-hch@lst.de> <20201118084800.2339180-12-hch@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201118084800.2339180-12-hch@lst.de> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, This is great. So much simpler & better. Some nits below. > diff --git a/block/partitions/core.c b/block/partitions/core.c > index a02e224115943d..0ba0bf44b88af3 100644 > --- a/block/partitions/core.c > +++ b/block/partitions/core.c > @@ -340,12 +340,11 @@ void delete_partition(struct hd_struct *part) > device_del(part_to_dev(part)); > > /* > - * Remove gendisk pointer from idr so that it cannot be looked up > - * while RCU period before freeing gendisk is running to prevent > - * use-after-free issues. Note that the device number stays > - * "in-use" until we really free the gendisk. > + * Remove the block device from the inode hash, so that it cannot be > + * looked up while waiting for the RCU grace period. > */ > - blk_invalidate_devt(part_devt(part)); > + remove_inode_hash(part->bdev->bd_inode); I don't think this is necessary now that the bdev and inode lifetimes are one. Before, punching out the association early was necessary because we could be in a situation where we can successfully look up a part from idr and then try to pin the associated disk which may already be freed. With the new code, the lookup is through the inode whose lifetime is one and the same with gendisk, so use-after-free isn't possible and __blkdev_get() will reliably reject such open attempts. ... > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 4c4d6c30382c06..e94633dc6ad93b 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -870,34 +870,50 @@ void __init bdev_cache_init(void) > blockdev_superblock = bd_mnt->mnt_sb; /* For writeback */ > } > > -static struct block_device *bdget(dev_t dev) > +struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) > { > struct block_device *bdev; > struct inode *inode; > > - inode = iget_locked(blockdev_superblock, dev); > + inode = new_inode(blockdev_superblock); > if (!inode) > return NULL; > > - bdev = &BDEV_I(inode)->bdev; > + bdev = I_BDEV(inode); > + spin_lock_init(&bdev->bd_size_lock); > + bdev->bd_disk = disk; > + bdev->bd_partno = partno; > + bdev->bd_contains = NULL; > + bdev->bd_super = NULL; > + bdev->bd_inode = inode; > + bdev->bd_part_count = 0; > + > + inode->i_mode = S_IFBLK; > + inode->i_rdev = 0; > + inode->i_bdev = bdev; > + inode->i_data.a_ops = &def_blk_aops; Missing the call to mapping_set_gfp_mask(). > > - if (inode->i_state & I_NEW) { > - spin_lock_init(&bdev->bd_size_lock); > - bdev->bd_contains = NULL; > - bdev->bd_super = NULL; > - bdev->bd_inode = inode; > - bdev->bd_part_count = 0; > - bdev->bd_dev = dev; > - inode->i_mode = S_IFBLK; > - inode->i_rdev = dev; > - inode->i_bdev = bdev; > - inode->i_data.a_ops = &def_blk_aops; > - mapping_set_gfp_mask(&inode->i_data, GFP_USER); > - unlock_new_inode(inode); > - } > return bdev; > } ... > /** > * bdgrab -- Grab a reference to an already referenced block device > * @bdev: Block device to grab a reference to. > @@ -957,6 +973,10 @@ static struct block_device *bd_acquire(struct inode *inode) > bd_forget(inode); > > bdev = bdget(inode->i_rdev); > + if (!bdev) { > + blk_request_module(inode->i_rdev); > + bdev = bdget(inode->i_rdev); > + } One side effect here is that, before, a device which uses the traditional consecutive devt range would reserve all minors for the partitions whether they exist or not and fail open requests without requesting the matching module. After the change, trying to open an non-existent partition would trigger module probe. I don't think the behavior change is consequential in any sane not-crazily-arcane setup but it might be worth mentioning in the commit log. Thank you. -- tejun