From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3A2DC5B543 for ; Fri, 30 May 2025 13:43:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A7F66B0123; Fri, 30 May 2025 09:43:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 657C16B0124; Fri, 30 May 2025 09:43:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 546896B0125; Fri, 30 May 2025 09:43:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 361036B0123 for ; Fri, 30 May 2025 09:43:58 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id BE57FC08DE for ; Fri, 30 May 2025 13:43:57 +0000 (UTC) X-FDA: 83499692514.12.5BA2D88 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf08.hostedemail.com (Postfix) with ESMTP id D131B16000A for ; Fri, 30 May 2025 13:43:55 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EaZN1YJA; spf=pass (imf08.hostedemail.com: domain of amir73il@gmail.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748612635; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cg27jRbqgT6Vx96cOZmKzT7mRTBBHTh/rcABmd3nHl0=; b=TKxjuPY/liRCP33uu2rzV6et0iPnp2UrcEdNGHIuk4LMQLQbXInb0VPrsRbg98Eyyh8pwq Bw1mgKFo9tzCphisv0OqrP+afgkbpREJp1Vm5Yi2NbMH7KYyVC5d+9nK200DdG6D6J5cnh CkdEvqhrmOWsqsAbPqJCUl8GcDTr0/k= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EaZN1YJA; spf=pass (imf08.hostedemail.com: domain of amir73il@gmail.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=amir73il@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748612635; a=rsa-sha256; cv=none; b=L6J2V6suf19wEu8SlBDECKk1psYcOA2hvohtEAb+lohIaAl2hfLuRez8zVKQ1WgPUs35Lr RBoLWvebosfjZkWqfLwBLDaYC22RL17guTS3a1W8aRvnaMa+9527HRzjQiU9U96T+lJ0Gs 1CdfxRqWSzXt2ehKm4j+t0+TaqXPP8A= Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-acae7e7587dso303553566b.2 for ; Fri, 30 May 2025 06:43:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748612634; x=1749217434; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cg27jRbqgT6Vx96cOZmKzT7mRTBBHTh/rcABmd3nHl0=; b=EaZN1YJAqeCCTJJHfXN2iMRl+1lrrKNFFdEORE+yksdpz04jlVycTCHXakMd3o8AM5 os4igqt9rKA6vfO84Lztc7sQLKXKHqa6/5EPSuuiEn5LXmkoECEA3myNUCslHwWlPNSj KFLsOyT3upRg3h3vme/WgTfpDyFbQ1zATCptWSNoI1zjW9EUNiI/mCLZwZkEQhBlXkki tD4IMUJf+sLeYt3OmcMMAs7IFKFxeE8IgA4VzomrtZN1gn7uMqmGDHuvYiC1OvHvYyi0 mfDB/hfXblHSrlg8RFGs7xLXbZyE4uNFaVgzukr7BHT0G4dQH5X2AjSLBJ+pbiPW5O5U 45HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748612634; x=1749217434; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cg27jRbqgT6Vx96cOZmKzT7mRTBBHTh/rcABmd3nHl0=; b=B3VNBK7TOzzwp08Mn2qY5XETpQQCsWrgKJE0A0B6pHSKtsloD/wefl41dO+cMp6tZP OCTXuNgf6bKl2Zcx3xq8IH+WnIaE8ZgJN4e9VJnI/nt9nP7FMY66SblZ59dLjvl2cujY pQ5bgBharN5QPwsyAZF0EbbkwM1JzqsfNDwzioPR0+VNE2BWoJBgIxjZP2wGvbxHatQf mi3yA3Zl+QMh5lCf2oc9WcMstl/ZVkID8ZI1yrJi2bKR+a09w3288EHwbN8nf9ZVtTFt 08z3Ha5XNdMZVmhi3KfhonlmHbPJOn83ffLu2u+IZmVKNCpuyoVZAMxeRly1dOWTAeIP +0eA== X-Forwarded-Encrypted: i=1; AJvYcCWhNKZjoDFGSIcHU4otiI7MMn2aFIeiMewm5C0DjuHiih+UrCu2FtjaBeACwG3HPzPxCxMkQ8lpAg==@kvack.org X-Gm-Message-State: AOJu0YyyWgEdi30muGm0auQOi00PEtUaKKWyvNmWbrCgndgFT50qqAhn 1EsApoMDIK9bYf8p5RO17RweNz+lPPOC/AdKaWDrj0jxdu/I32agA/crIKkITIKS3UBOUI1t4mA 7z3yfJtFk92E1x6XiHtfcSniXfVBTFc/G4S/NBi56Kw== X-Gm-Gg: ASbGncvEgHLQSJUSp2nqGfxL9T6f8EplOltZVgtfXY74CSWE7eEWlWbtSlXiF+r6tL9 pqIFRHrLLXm5ptlfWACfUKE32e/q+2OkLz9iONrSYeS8/v7lXn2ZDrogGs1Rlkjm6v1iDZWMX3c b+49pdy1nCNoi/6CgqbvjvpdDSnksjKEZu X-Google-Smtp-Source: AGHT+IHvBpjR/endSRQUt5FTF0eDLYH+Kdk5Qc8W51rWZR6eg4HtuFzDot8VeE8MsEjFbuxlldLzfxh7KxymP/Pqh8M= X-Received: by 2002:a17:907:9282:b0:ad8:959c:c55d with SMTP id a640c23a62f3a-adb322457f2mr313580966b.2.1748612634069; Fri, 30 May 2025 06:43:54 -0700 (PDT) MIME-Version: 1.0 References: <20250530103941.11092-1-tao.wangtao@honor.com> <20250530103941.11092-2-tao.wangtao@honor.com> In-Reply-To: <20250530103941.11092-2-tao.wangtao@honor.com> From: Amir Goldstein Date: Fri, 30 May 2025 15:43:43 +0200 X-Gm-Features: AX0GCFshwnygoYp9lqAc5QflEKgeVEHCIAtDwsKnm5KS2y3iB6_sX14JgNpIns4 Message-ID: Subject: Re: [PATCH v3 1/4] fs: allow cross-FS copy_file_range for memory-backed files To: wangtao Cc: sumit.semwal@linaro.org, christian.koenig@amd.com, kraxel@redhat.com, vivek.kasireddy@intel.com, viro@zeniv.linux.org.uk, brauner@kernel.org, hughd@google.com, akpm@linux-foundation.org, benjamin.gaignard@collabora.com, Brian.Starkey@arm.com, jstultz@google.com, tjmercier@google.com, jack@suse.cz, baolin.wang@linux.alibaba.com, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, bintian.wang@honor.com, yipengxiang@honor.com, liulu.liu@honor.com, feng.han@honor.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D131B16000A X-Stat-Signature: rtfiba8m4zq6fpdbtmbmmuw5iuy1rwb9 X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1748612635-874014 X-HE-Meta: U2FsdGVkX1/I6D1MhVTh7QOhXT94gPQn5Omq0OsoFeze11uEKNHJTFjP0JJ9+xSoexrTes5K2ZJOpVYexJTB7qQVGgybZtTzA5OdYVj4Zx6xcnsg/wZ/xSDbn637wciIYtR3AoJx2nAJMbpjM0dInrwked5kUtZCb789kUod5Qa1ebf3CmVVYi5YuarCygZ1OuyEGu08UPG8cnTmAFzmps3IrxkgJcgTvx/RB/4ClVnqRNw/rI9nEUbDx5uCxzg6Lk02rHgWwEMMG01/17Lm/ZIM0BL8I73tCGo677mPF08FiK3kReLfxBnPXbrEMpRfLHqi7ypAv4HAQN3WWIUKSQ3Sw4bN8g8cpQ6l3SdH2KMv/qeeGFEctmig7FByf9AldwpV1G1QPX5TG2uC8zJZ//EKpl47NUFJXCD9TGUEybFGb7WW/BGgt+NqKeUu8TJf5+/1LbyvBTI2X9qExGlzVsvOqJ1i085/xOrWNmP7q95PtDvsEzLqoirWUKD3qhbVXbsqg+0tNZmmDxKVkOeBQALWqfzSFC2J+2OQ6DgnZMozqJfZ60Ex+R5ba1HcpkxBPm4uFi7pPg+oKRXhBlOZHDC2O0n8f4Xui/7YiuMvc7f6b3VoMkWJQWUONebs1aKw/sk+RQJ/PZYXRtCX8fsULYHy1pEaXEq0Zewlj3YQh5vvi1Ab++S6n6g7vYrrU9bbggHrcij/J2Xhr9IzUk5iT8QdL8VOj7E1JYK+1UVLCtm9ttUh0tVTGWLOVW7UkishZWnxdM2sugJvVuuIHkjykb1vTi1iBpJcjJu2eBIWFt85G9QWzoVCjhhHAkHplHjIhtDcZOUqQswLYojoTksrvS0Kmnl2Zq2HrU/tHfG7jaQRX7l2Pq3H/Rpe2xDCt4WIo/l/IE4bZBYObPC3MepZA+jrj8c9SfjUdkvwynAW2i4wlhtwptu7JoxRfkr/nT/Au2xIgBYPnSB9yGsHzzw MvxouKcR 4wU8awBfL4NAKjkW2vO56um6InvoPlKCkkGF2LHxF//ruKRvvPD6m+qMkUzaVjd0E/ekZmDhxRSGO+pSrDGfz3KMdt0e8n3y8TJW4hZF6arQ9ZRbbr7Ul3GjVcuCnc2b0tGyZy5+2Tgd7f0zkFM01eaQA6BdkwgwSB/r87WAEl2ow7abmtVulYMcKgeyR113hDSXV4g1KYUaTDhMF8n0IpuptIe6kmW28smuM0OR2LtIGnde5MSuDtlm2Euo92auT9847mQCLwMUeKrVqR2YiEy1k4VPp/8m7gxrFQvVrkyd5xgvPtf1gHIVLO3Nbg1inni1kSCfI2NE61GzXSEOPZutEBDaT9ISo1YJXDeiiLIVe77UEpo0U37uQpI5Hv2CMvcHTPIIdZiV2NcO1ADVpt3ykAbZFnW/ux/5cjj+qTuY+37g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 30, 2025 at 12:40=E2=80=AFPM wangtao wr= ote: > > Memory-backed files can optimize copy performance via > copy_file_range callbacks. Compared to mmap&read: reduces > GUP (get_user_pages) overhead; vs sendfile/splice: eliminates > one memory copy; supports dmabuf zero-copy implementation. > > Signed-off-by: wangtao > --- Hi wangtao, Let me rephrase my reaction gently: Regardless of the proposed API extension, and referring to the implementation itself - I wrote the current code and I can barely understand the logic every time I come back to read it. Your changes make it too messy to be reviewed/maintained. I have a few suggestions for simplifications (see below). The complication arise from the fact that normally the destination fops are used to call the copy_range() method and this case is a deviation from this standard, we need to make the cope simpler using a helper to choose the fops to use. > fs/read_write.c | 71 +++++++++++++++++++++++++++++++++------------- > include/linux/fs.h | 2 ++ > 2 files changed, 54 insertions(+), 19 deletions(-) > > diff --git a/fs/read_write.c b/fs/read_write.c > index bb0ed26a0b3a..591c6db7b785 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1469,6 +1469,20 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, in= t, in_fd, > } > #endif > > +static inline bool is_copy_memory_file_to_file(struct file *file_in, > + struct file *file_out) > +{ > + return (file_in->f_op->fop_flags & FOP_MEMORY_FILE) && > + file_in->f_op->copy_file_range && file_out->f_op->write_i= ter; > +} > + > +static inline bool is_copy_file_to_memory_file(struct file *file_in, > + struct file *file_out) > +{ > + return (file_out->f_op->fop_flags & FOP_MEMORY_FILE) && > + file_in->f_op->read_iter && file_out->f_op->copy_file_ran= ge; > +} > + Please combine that to a helper: const struct file_operations *memory_copy_file_ops(struct file *file_in, struct file *file_out) which returns either file_in->f_op, file_out->f_op or NULL. > /* > * Performs necessary checks before doing a file copy > * > @@ -1484,11 +1498,23 @@ static int generic_copy_file_checks(struct file *= file_in, loff_t pos_in, > struct inode *inode_out =3D file_inode(file_out); > uint64_t count =3D *req_count; > loff_t size_in; > + bool splice =3D flags & COPY_FILE_SPLICE; > + bool has_memory_file; > int ret; > > - ret =3D generic_file_rw_checks(file_in, file_out); > - if (ret) > - return ret; > + /* Skip generic checks, allow cross-sb copies for dma-buf/tmpfs *= / > + has_memory_file =3D is_copy_memory_file_to_file(file_in, file_out= ) || > + is_copy_file_to_memory_file(file_in, file_out); > + if (!splice && has_memory_file) { > + if (!(file_in->f_mode & FMODE_READ) || > + !(file_out->f_mode & FMODE_WRITE) || > + (file_out->f_flags & O_APPEND)) > + return -EBADF; I don't like that this both duplicates code and does not check all the chec= ks in generic_file_rw_checks() for the non-memory file side. I do not currently have a suggestion how to make this less messy, more human readable and correct. > + } else { > + ret =3D generic_file_rw_checks(file_in, file_out); > + if (ret) > + return ret; > + } > > /* > * We allow some filesystems to handle cross sb copy, but passing > @@ -1500,7 +1526,7 @@ static int generic_copy_file_checks(struct file *fi= le_in, loff_t pos_in, > * and several different sets of file_operations, but they all en= d up > * using the same ->copy_file_range() function pointer. > */ > - if (flags & COPY_FILE_SPLICE) { > + if (splice || has_memory_file) { > /* cross sb splice is allowed */ This comment is not accurate - it should say "cross fs-type", because it sk= ips the test that compares the ->copy_file_range pointers, not the sb. If you are making changes to this function, this should be clarified. > } else if (file_out->f_op->copy_file_range) { > if (file_in->f_op->copy_file_range !=3D > @@ -1581,23 +1607,30 @@ ssize_t vfs_copy_file_range(struct file *file_in,= loff_t pos_in, > * same sb using clone, but for filesystems where both clone and = copy > * are supported (e.g. nfs,cifs), we only call the copy method. > */ > - if (!splice && file_out->f_op->copy_file_range) { > - ret =3D file_out->f_op->copy_file_range(file_in, pos_in, > - file_out, pos_out, > - len, flags); > - } else if (!splice && file_in->f_op->remap_file_range && samesb) = { > - ret =3D file_in->f_op->remap_file_range(file_in, pos_in, > - file_out, pos_out, > - min_t(loff_t, MAX_RW_COUNT, len), > - REMAP_FILE_CAN_SHORTEN); > - /* fallback to splice */ > - if (ret <=3D 0) > + if (!splice) { > + if (is_copy_memory_file_to_file(file_in, file_out)) { > + ret =3D file_in->f_op->copy_file_range(file_in, p= os_in, > + file_out, pos_out, len, flags); > + } else if (is_copy_file_to_memory_file(file_in, file_out)= ) { > + ret =3D file_out->f_op->copy_file_range(file_in, = pos_in, > + file_out, pos_out, len, flags); > + } else if (file_out->f_op->copy_file_range) { > + ret =3D file_out->f_op->copy_file_range(file_in, = pos_in, > + file_out, pos_out= , > + len, flags); > + } else if (file_in->f_op->remap_file_range && samesb) { > + ret =3D file_in->f_op->remap_file_range(file_in, = pos_in, > + file_out, pos_out, > + min_t(loff_t, MAX_RW_COUNT, len), > + REMAP_FILE_CAN_SHORTEN); > + /* fallback to splice */ > + if (ret <=3D 0) > + splice =3D true; > + } else if (samesb) { > + /* Fallback to splice for same sb copy for backwa= rd compat */ > splice =3D true; > - } else if (samesb) { > - /* Fallback to splice for same sb copy for backward compa= t */ > - splice =3D true; > + } This is the way-too-ugly-to-live part. Was pretty bad before and now it is unacceptable. way too much unneeded nesting and too hard to follow. First of all, please use splice label and call goto splice (before the comm= ent) instead of adding another nesting level. I would do that even if not adding memory fd copy support. Second, please store result of mem_fops =3D memory_copy_file_ops() and start with: + if (mem_fops) { + ret =3D mem_fops->copy_file_range(file_in, pos_in, + file_out, pos_out, + len, flags)= ; + } else if (file_out->f_op->copy_file_range) { ... and update the comment above to say that: + * For copy to/from memory fd, we alway call the copy method of the memory= fd. */ I think that makes the code not more ugly than it already is. Thanks, Amir.