From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6313AC3DA4A for ; Mon, 19 Aug 2024 20:01:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ED21B6B007B; Mon, 19 Aug 2024 16:01:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E81436B0082; Mon, 19 Aug 2024 16:01:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D49256B0083; Mon, 19 Aug 2024 16:01:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B6B256B007B for ; Mon, 19 Aug 2024 16:01:27 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 37E5D16149A for ; Mon, 19 Aug 2024 20:01:27 +0000 (UTC) X-FDA: 82470064614.26.36F5A6A Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf06.hostedemail.com (Postfix) with ESMTP id DDBD9180005 for ; Mon, 19 Aug 2024 20:01:24 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=MxHtpE18; spf=pass (imf06.hostedemail.com: domain of axboe@kernel.dk designates 209.85.215.169 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724097607; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H2mIXT54Vz7IA9OZ4shkVd+tu83njeLY/qswpba/Pp4=; b=OvApLhfvsljm/JPZjE/u571Z4y2OqD2JyWFAUMm21kHkgg93MxKpcASl6aFZbGzjsiMwmV a+8hqV+NJJex36eLN9QutCF36veRn+1xpGZpma/La/OBwvU0/ZwNK2aFEvv8CwUrPUMVCC pC+ITY5EJc9SfUG1WhT9A0JiwBg5NXw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724097607; a=rsa-sha256; cv=none; b=IqlPIDar9r/lLc8F7Ni+0DpYXF3eV6xYNZdW0io57anOS1L3jS1noDeW8ljYEHGBR1f5jM kTfArXANGxQrTAc/4DDwzeVekiRZrK68Jxz9gfiJwrJlVqdYoqMZN8K63nBkkHX9otLYKp k8Adre/gQWZ+HBNxP55ACuevYyiFJfY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=MxHtpE18; spf=pass (imf06.hostedemail.com: domain of axboe@kernel.dk designates 209.85.215.169 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-78f86e56b4cso410889a12.3 for ; Mon, 19 Aug 2024 13:01:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1724097683; x=1724702483; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=H2mIXT54Vz7IA9OZ4shkVd+tu83njeLY/qswpba/Pp4=; b=MxHtpE18RrlDdRIVm6bkSXFPNwStzcuR7Xa5MThnlDzpgj7k2iNKeKYmGewo1cIDzY odmWEVG3XDVL+oMnrA5fALepLn5BefXqnJCXfY/hZxSarfYTVpr/s3GAL0LVmWCRNMI+ uCe97d0Fz7RlXMha+Q6IR3bI8vVFpULbXZTHYbjt08V1f6Fx4J3XOrq+n9BkzOWX9naW oVfhop5FEbF0pn8nuh9t7rdoCxbpGsJntLxZCGK0V65ZMpfQfoh0vm3SeTAX4r/Y6aFz 4RSVbr+FeMFYaPBduppryEWKzakPDvROxRFXd4HC4Zz0GIDJnlZVXcBKkEuPABtc95bY mbAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724097683; x=1724702483; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=H2mIXT54Vz7IA9OZ4shkVd+tu83njeLY/qswpba/Pp4=; b=YBvq6+/V2oaEBPwLUs/r9cTeAc9bEGPLpcEaTrcB/vsXB3VL5LTuDwf3i4QjRKIPp4 BUJYXO8Yu+UF5C0A7/o+GMXRiVHCDCuHs3ix7EV+Y10Zka2qIkidJEWMF/5zgLvwBZ9b L46aKIeYwnCj+CMAv83ZH3ENvlWxk/OP7gbqtQjwN+8uoXw3L91DnGsssFM5WOYLPEh+ V9gPgYcoDileFWv/An9DKSE0UFOIPjgIxTNCNX1+QhJvqwCLIiv/ZQQzc097rM+4BczF 3flqT/t0OZvv34fEKv0k0UTwKh7Ri0mXB/Wpk5zyVHMbGE3UZ4yNfUhCckaPXXqy8LFr GKCA== X-Forwarded-Encrypted: i=1; AJvYcCW0fQ+mLUsvZwtCV1W+CG06UvRFSXHew8qZ0J2T2MqbADL9ca0KKCcBlkDiH4KJJlONZqMgqK+2Zg==@kvack.org X-Gm-Message-State: AOJu0YzEFTe63yfn1E0G+F9ZWLkkGY7mTeCrYl7M9UWvuBt0Nwz12AsO tACIeDq7CcNgD0ewuL0CQZ2W0nOIa/oFRrYh5Fz0BcJnNOakd3PYcZOMNiISeDc= X-Google-Smtp-Source: AGHT+IGxK0714j0qnuBA0v3/eGwfOW/pLE6uS1AGzxKfJHOdxJfoqgI1eCAmaWhTaP21fMWQQG51Eg== X-Received: by 2002:a17:902:d492:b0:202:371c:3331 with SMTP id d9443c01a7336-202371c3702mr44311775ad.9.1724097683015; Mon, 19 Aug 2024 13:01:23 -0700 (PDT) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-201f038a7a6sm66237685ad.195.2024.08.19.13.01.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 19 Aug 2024 13:01:22 -0700 (PDT) Message-ID: Date: Mon, 19 Aug 2024 14:01:21 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 5/5] block: implement io_uring discard cmd To: Ming Lei Cc: Pavel Begunkov , io-uring@vger.kernel.org, Conrad Meyer , linux-block@vger.kernel.org, linux-mm@kvack.org References: <6ecd7ab3386f63f1656dc766c1b5b038ff5353c2.1723601134.git.asml.silence@gmail.com> <4d016a30-d258-4d0e-b3bc-18bf0bd48e32@kernel.dk> Content-Language: en-US From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 1rr9hgqpy9sf15xbre91z13ps5cpa8mn X-Rspamd-Queue-Id: DDBD9180005 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724097684-82740 X-HE-Meta: U2FsdGVkX1+wbONVu9jO1pUNItbgRHcqcFS22Nod9kEMaUXAOKFauX/dYhKfOP6MO0pyuuqxYTHpkgQEZoEP0jFQdkL4bP+XPWkXNikzssJtazFLGUh3nvvLFizxeo/aGk4FD3X+vx8BZkP7rWrI/44LD2JE2ZZvsRz726CRKBQtn7hajP0ZfGfhwXM49RgifXCGp73I8yItXDvuqBBb6fQ6GqP+VO4viQdu7VEDLigPJy/q96uJr/Hu5iI2yyc5+58lClbZxoS16hjjlk4cQ7fkn5hEWLPtIxBk3iGyXoHNIGzb2g9xbwjqUxcWEyulFBze05worpweaGl2tQC2vgQRyRN5vI1F1elWssGJJJDfants3PbIKzZPeU4ivslh8JF4/72vXKkqHekauVGtGIsN22myXijEH5rQssnDC7vXSd7zX5uAQ+CYMAOUV/ptHI2JA37s76ZBCqyfWRAY7XDiDcSvLlsHrH5prqyP7DQ5A9ZTpi2ALsF0b9CSoNrlgIQ9d8muudpB3ssTUgmRbEfEP9NSrOphaKJ23+boZHMaZLsz9zYl0Mh9B1Y7CNsdXtUFs+rcT6ZXtIZznAWrlEgI/oVN0UWdY27D/9aMpqUcnOaufTQXgGEywmD7+WBvCCpdIlJMBH23mb8ws7B+l1rbUIBgk9R63J+S3K7RYvFyyjj31iyQCA8o6h3UewiNytIoM1V+RUlFD6pdUmXstayqVjdSpKRoYU4QY56ey286miJa63OiCDzXxIKmaLVrPwGkYgjhXw1rx5tzkbqDb0nyEb9ftcLrRfEBfLmPsZXxOGXHdnUv1sALceSiSDOgFt2oqn6b0GcSATbkFyoQTFPMgaTyLUrnbhjr30xfMeengGRzd5GBzbmeaCEjq0M4TcaUVwZKlTqow6Kv2rfLFuEkdpL0gHMsk9lGbrQwP7jOt/PzGfMVzmhN12Ev+bO1CxbNAueUUAXE/stSoA6 nv+fhZoU sjkLXwh9HKLTOQOuKqJMEnEbTe022gYRb8+jLewqeMtCFU1uMAP2S3RFRtOF72jyivYuBgic5oxeuojdRKP6JiyS6lTmHLWImRlkR9eMJrgiPF/b9ogUqvncyOpCQ/mcbZk/4W9AhmYzYYCnQA6m47EZtY36mOZW0zjSsVOoqVdonmSbO3OPE1mhL96SCIqc9m5dyEemeeFLWIP2PFFohjabUHDptGr00coC3qo638RVSwnWhUfNlQ3Ix9CpWQofVxsrnRFUi1hTTY2N/bZTwN2GegT0k4ZctD4IKyxxlO4VsriJlCpJ/GnZa1hmKeuWUyAL2Y1sqMw6zgZMy9eYjhOJwT8b+6YtGz5+NaVCFxsc11Z6It8vAq9m9erUq/alt7AfJsdCnOEWyg+0nJ091yOrzzo7fkSI01iLrR4ALjwqaZYkLI5+2t46LpcYDW8jYEbS37vtebd+jGqdpgQFzgfYAjhtpkI2ooUsZGniGg3lVc1h/Q1IxlsxXuvXer/CSiH6//0BP3osynvozGFQtYd8jSw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/15/24 7:45 PM, Ming Lei wrote: > On Thu, Aug 15, 2024 at 07:24:16PM -0600, Jens Axboe wrote: >> On 8/15/24 5:44 PM, Ming Lei wrote: >>> On Thu, Aug 15, 2024 at 06:11:13PM +0100, Pavel Begunkov wrote: >>>> On 8/15/24 15:33, Jens Axboe wrote: >>>>> On 8/14/24 7:42 PM, Ming Lei wrote: >>>>>> On Wed, Aug 14, 2024 at 6:46?PM Pavel Begunkov wrote: >>>>>>> >>>>>>> Add ->uring_cmd callback for block device files and use it to implement >>>>>>> asynchronous discard. Normally, it first tries to execute the command >>>>>>> from non-blocking context, which we limit to a single bio because >>>>>>> otherwise one of sub-bios may need to wait for other bios, and we don't >>>>>>> want to deal with partial IO. If non-blocking attempt fails, we'll retry >>>>>>> it in a blocking context. >>>>>>> >>>>>>> Suggested-by: Conrad Meyer >>>>>>> Signed-off-by: Pavel Begunkov >>>>>>> --- >>>>>>> block/blk.h | 1 + >>>>>>> block/fops.c | 2 + >>>>>>> block/ioctl.c | 94 +++++++++++++++++++++++++++++++++++++++++ >>>>>>> include/uapi/linux/fs.h | 2 + >>>>>>> 4 files changed, 99 insertions(+) >>>>>>> >>>>>>> diff --git a/block/blk.h b/block/blk.h >>>>>>> index e180863f918b..5178c5ba6852 100644 >>>>>>> --- a/block/blk.h >>>>>>> +++ b/block/blk.h >>>>>>> @@ -571,6 +571,7 @@ blk_mode_t file_to_blk_mode(struct file *file); >>>>>>> int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode, >>>>>>> loff_t lstart, loff_t lend); >>>>>>> long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg); >>>>>>> +int blkdev_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags); >>>>>>> long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg); >>>>>>> >>>>>>> extern const struct address_space_operations def_blk_aops; >>>>>>> diff --git a/block/fops.c b/block/fops.c >>>>>>> index 9825c1713a49..8154b10b5abf 100644 >>>>>>> --- a/block/fops.c >>>>>>> +++ b/block/fops.c >>>>>>> @@ -17,6 +17,7 @@ >>>>>>> #include >>>>>>> #include >>>>>>> #include >>>>>>> +#include >>>>>>> #include "blk.h" >>>>>>> >>>>>>> static inline struct inode *bdev_file_inode(struct file *file) >>>>>>> @@ -873,6 +874,7 @@ const struct file_operations def_blk_fops = { >>>>>>> .splice_read = filemap_splice_read, >>>>>>> .splice_write = iter_file_splice_write, >>>>>>> .fallocate = blkdev_fallocate, >>>>>>> + .uring_cmd = blkdev_uring_cmd, >>>>>> >>>>>> Just be curious, we have IORING_OP_FALLOCATE already for sending >>>>>> discard to block device, why is .uring_cmd added for this purpose? >>>> >>>> Which is a good question, I haven't thought about it, but I tend to >>>> agree with Jens. Because vfs_fallocate is created synchronous >>>> IORING_OP_FALLOCATE is slow for anything but pretty large requests. >>>> Probably can be patched up, which would involve changing the >>>> fops->fallocate protot, but I'm not sure async there makes sense >>>> outside of bdev (?), and cmd approach is simpler, can be made >>>> somewhat more efficient (1 less layer in the way), and it's not >>>> really something completely new since we have it in ioctl. >>> >>> Yeah, we have ioctl(DISCARD), which acquires filemap_invalidate_lock, >>> same with blkdev_fallocate(). >>> >>> But this patch drops this exclusive lock, so it becomes async friendly, >>> but may cause stale page cache. However, if the lock is required, it can't >>> be efficient anymore and io-wq may be inevitable, :-) >> >> If you want to grab the lock, you can still opportunistically grab it. >> For (by far) the common case, you'll get it, and you can still do it >> inline. > > If the lock is grabbed in the whole cmd lifetime, it is basically one sync > interface cause there is at most one async discard cmd in-flight for each > device. Oh for sure, you could not do that anyway as you'd be holding a lock across the syscall boundary, which isn't allowed. > Meantime the handling has to move to io-wq for avoiding to block current > context, the interface becomes same with IORING_OP_FALLOCATE? I think the current truncate is overkill, we should be able to get by without. And no, I will not entertain an option that's "oh just punt it to io-wq". -- Jens Axboe