From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C02FEEB64DC for ; Sun, 25 Jun 2023 12:20:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 058366B0071; Sun, 25 Jun 2023 08:20:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 007D36B0072; Sun, 25 Jun 2023 08:20:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E12A46B0074; Sun, 25 Jun 2023 08:20:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D48296B0071 for ; Sun, 25 Jun 2023 08:20:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A0A17A0163 for ; Sun, 25 Jun 2023 12:20:41 +0000 (UTC) X-FDA: 80941178682.28.C5800A9 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf15.hostedemail.com (Postfix) with ESMTP id CBD96A0006 for ; Sun, 25 Jun 2023 12:20:39 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=cZKLxUCp; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of idryomov@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=idryomov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687695639; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V7KPNN3dcJM6yUTrzmIqZSy+gTXexfBKQa7fvuuRSyc=; b=cpwl5q3q/ROfKJ1Cf1aNnhcYqfnLkUdj53VOFQwirw7axw2zcuidrcaWs/NQkP0VVu8C8W fNCr+f2GbxPZSqSbvOIG9k0ItnBYhsYp+VCSeGLMfrW6aWTTppx02HsxL2V5sA+aKbkwd8 KMNERtn/SZSGYUVkrQJr5SL8g/YZ43o= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=cZKLxUCp; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of idryomov@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=idryomov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687695639; a=rsa-sha256; cv=none; b=W7wWQbGx1Vb3I+fPfoBVdTY4Kp5AZkMovrJxqz/o6pK+iy3qViBkNy7ksrI2cG+Ss6O9B4 lCpvAw5VbN8LmsP645xsHPatW0K/IjI0jqcSpw9WLj9yXJspnueOaWoa2CoCdrDTU794xN np3B8YMs/tyHukK2WnF/WvYm3ZzdSgA= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-98e1b1d1698so103023566b.2 for ; Sun, 25 Jun 2023 05:20:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687695638; x=1690287638; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=V7KPNN3dcJM6yUTrzmIqZSy+gTXexfBKQa7fvuuRSyc=; b=cZKLxUCppVrYmgBeC4LLrwI2N0DBxnvV6gp+t7dX1Y5mxHj8MHxz4QHPYLiIBOplhN pf6iiwdh+drvo6yKZIx9pxLUfk1WeH9vd+nM7nPA8Im3H7XVy0VC1ggf0ZQS1DEd9+M7 QFX8wzk8WMHQUA61YooaN5MvF3GnQafJYtfc4TXA3z0pslH1vPZyNdcUEx4BVjtpNUOK gZ4q0h3p7S4NeH5TC7POjGpoZh2iW1Qc8F8EKr5nnG6/FE7hSCz6emUamQgOM/iqdlmY shATfJLRR55YpKoFHbtUmh7KddrDs5FZWvWlW/6wa/keyt9+kBc75yhLq7FWDR8w8jFd bNkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687695638; x=1690287638; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V7KPNN3dcJM6yUTrzmIqZSy+gTXexfBKQa7fvuuRSyc=; b=TPLeDaWcfmzBtVgLYBOLPiT8JSyYYbwbcsVT2Kpe7gRaoQVO4lG9pDz9roW55/H1dL 8aHpjT48e0il0KI7IlyWf88XxVjDeT3T56dzc2TyBsrmJKmOzMEhRbBgzJfkNbCEfyS7 zAYJHsIQ/wObicJwn5aZ702xjFEJOO+jQupU1A6GQHRKLtRgKxPMyHdO01yI3kPqvnsX cj9CUpFAt0IC+IDcD8VljYGcF7vV5BnZ9rCRvcBPvQCqOvU4+ZZD2VtLzpDDBDbWTee2 8UpNgrYxdr+VfnClCd5AHJDO/W9Mbofq2lvO/dzMptFN4bW93Pw47GNXHm8mo5Kzh3F9 CQBQ== X-Gm-Message-State: AC+VfDy2n02nCJTE7E0cGfUEV80ciLLxyRXpsKpjtfDR26FErLc5CKGG 8xEEm/JjquQVs2h4ky8oJQz+q1BRqsN5lLYHXgc= X-Google-Smtp-Source: ACHHUZ7KwJHRqX1MuUtimjsWYw/kK4Obl2eXn5iOMkVJDNbY7TLmwUvwuWQEXwGbqlfggQ7cjnwnCsW2h4IaJ80NM80= X-Received: by 2002:a17:907:7892:b0:98e:933:28fe with SMTP id ku18-20020a170907789200b0098e093328femr2558618ejc.66.1687695638195; Sun, 25 Jun 2023 05:20:38 -0700 (PDT) MIME-Version: 1.0 References: <20230623225513.2732256-1-dhowells@redhat.com> <20230623225513.2732256-4-dhowells@redhat.com> In-Reply-To: <20230623225513.2732256-4-dhowells@redhat.com> From: Ilya Dryomov Date: Sun, 25 Jun 2023 14:20:26 +0200 Message-ID: Subject: Re: [PATCH net-next v5 03/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage To: David Howells Cc: netdev@vger.kernel.org, Alexander Duyck , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiubo Li , Jeff Layton , ceph-devel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: CBD96A0006 X-Stat-Signature: fu1dqydzfmigoxzq57ufmd8ru7s493xp X-Rspam-User: X-HE-Tag: 1687695639-470233 X-HE-Meta: U2FsdGVkX1/xV2OERDUpw9OwcufAiud8e5nnsE5hzkKvbpXOpXI92QVICHBmR+94NPXQKzSJm84XQ4kvGq7FZa0suCdnhuqAmXPF4ogxpikJKufgciH4mvW/7NZf7lWiovu/PMcS1/OynpJDtqaYyudKvyC0DJNWcUUNowCDXiLf+4QeuAK9tUzdYn9fldeV5rBsJJykrRtWqhSQRLkJm7NisKCrNCEA/P/baQAk2+caPvfEwGWosS6Cc5x+CdBKFLEgXjaoL9835bGNF3lAPHBEi/oBvbZmp7CbTtklejwtBCE7IcBjIk0J86RKSZesHgUuJNmd9v2rSdSR5EpVlBDDuvzVHXcSQdOEU9gfkjt99WNBDlLGhDl/8fF9NJs05nr7T4uXWGj2OPfFub1lra9oBqOpH0gq0i1iMEwgeXZgpO6ZsBG052gIdDDVMAoOXaprjLoq4C/KMoXBDcmi7+P4N2KY9epOY90cgi8uAo5EBPvVEfjbrlrrvejCfpkBV0RgLmTaw1YM9f3iq6Sj4WRG4epV2ptfw3ogeJuGpib1PtSvWo98NYTvWil50cTMBV40flIrjzQ7UdgeBb6MfFCFmC7spBKyqiCX7/qGyHNlknblPbE6ACgPUkfIsDWXeqfQynb28KsKS8FcrkUprL1dUnEfGTWqY3eeIlF+DoveqzKF0J9aHYq6lXLWbhKipkINIikGnd1RfDFuMxcy9aHvdDF/RlrMFC7RRBu39wMUf/1AyslxkoanKPx7980RhWrKQLHNUm7F4b5N8g9bdBj139hUIFRWANf46U3fvMz+HbXuIgBVxi20yooBf70SQeZ/jNO412OrT6w0kP5va8KltLOLflPKmB6Vz+Crs6VG252HsUaYboms8tqTG91UeoPeKCkIf12hnkKhUKVkVcdIC8ArpFxwF52BLOHcll+3538LBC+rksUz+g94sDq5dcVwR1WTw6AbjeYzgMO 53RW85vZ sleXSveOwo/VWw20qNEUaoI+l8LIV4F0W8Gq8ybJ+d38yWfHs3CR43+Wk15NBb67pBsd1bNMsK9lxfzW+YgwgswGnQb/noa8jfI5H7kfRoR+zg6FJx4PnGEkp5iM5UiyJwJvfCT2njgo1dqJkVDtY8KyrVlnaozddMgUYa6opdoarLrHXELBv3AcOuwjCwDuA3K9GXPumSf0n3pY1Xouy6fx924s5FNkI7fp8kYh7KQcfhIEapPqAEu7iI18SvX93KuTDmRtc+26DP0LI0GhaSle8F2kvHGFLOrlVpN6ww7gzOlrWm4HCZrhWjHGzigvboREfks/llvBwZDp+aCEbAql54fnVQmuwEfVEqKj/B2gF0/0deiJbWLLS6RF0FAQZTbrfdk1mBHFh/I8ZbSMsPVlJKuTHzj+qV3Yc5kRSkrDzwJdDzpl9XxpWJJP1AoYcB3eL4RFMyeGlp9xbgTzXQaibhWPpe5pVhEHj1w49a4QDGsoiqgcR3pHOy1FpLYQysA8m0J1MfS7aDbuheneZeCgIWAZfoTaVsqlQSUqaKlFNxo4U3/EyAooofO9uf1mbmQ4iZUzXmQvryAAYz/To3eXIV3C0UYLlXizuBclPk6rn8QyTQqEe5+6Emchx1Wqq2+W60JYZUJHdPJ53KWXgTF4RZXCc0JNgeReM+nKolzB+4rqbR97cgerGjEJK6jppeWl63hR2Yw3KVUMwlPq8ykQqBQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jun 24, 2023 at 12:55=E2=80=AFAM David Howells wrote: > > Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when > transmitting data. For the moment, this can only transmit one page at a > time because of the architecture of net/ceph/, but if > write_partial_message_data() can be given a bvec[] at a time by the > iteration code, this would allow pages to be sent in a batch. > > Signed-off-by: David Howells > cc: Ilya Dryomov > cc: Xiubo Li > cc: Jeff Layton > cc: "David S. Miller" > cc: Eric Dumazet > cc: Jakub Kicinski > cc: Paolo Abeni > cc: Jens Axboe > cc: Matthew Wilcox > cc: ceph-devel@vger.kernel.org > cc: netdev@vger.kernel.org > --- > > Notes: > ver #5) > - Switch condition for setting MSG_MORE in write_partial_message_dat= a() > > net/ceph/messenger_v1.c | 60 ++++++++++++++--------------------------- > 1 file changed, 20 insertions(+), 40 deletions(-) > > diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c > index d664cb1593a7..814579f27f04 100644 > --- a/net/ceph/messenger_v1.c > +++ b/net/ceph/messenger_v1.c > @@ -74,37 +74,6 @@ static int ceph_tcp_sendmsg(struct socket *sock, struc= t kvec *iov, > return r; > } > > -/* > - * @more: either or both of MSG_MORE and MSG_SENDPAGE_NOTLAST > - */ > -static int ceph_tcp_sendpage(struct socket *sock, struct page *page, > - int offset, size_t size, int more) > -{ > - ssize_t (*sendpage)(struct socket *sock, struct page *page, > - int offset, size_t size, int flags); > - int flags =3D MSG_DONTWAIT | MSG_NOSIGNAL | more; > - int ret; > - > - /* > - * sendpage cannot properly handle pages with page_count =3D=3D 0= , > - * we need to fall back to sendmsg if that's the case. > - * > - * Same goes for slab pages: skb_can_coalesce() allows > - * coalescing neighboring slab objects into a single frag which > - * triggers one of hardened usercopy checks. > - */ > - if (sendpage_ok(page)) > - sendpage =3D sock->ops->sendpage; > - else > - sendpage =3D sock_no_sendpage; > - > - ret =3D sendpage(sock, page, offset, size, flags); > - if (ret =3D=3D -EAGAIN) > - ret =3D 0; > - > - return ret; > -} > - > static void con_out_kvec_reset(struct ceph_connection *con) > { > BUG_ON(con->v1.out_skip); > @@ -464,7 +433,6 @@ static int write_partial_message_data(struct ceph_con= nection *con) > struct ceph_msg *msg =3D con->out_msg; > struct ceph_msg_data_cursor *cursor =3D &msg->cursor; > bool do_datacrc =3D !ceph_test_opt(from_msgr(con->msgr), NOCRC); > - int more =3D MSG_MORE | MSG_SENDPAGE_NOTLAST; > u32 crc; > > dout("%s %p msg %p\n", __func__, con, msg); > @@ -482,6 +450,10 @@ static int write_partial_message_data(struct ceph_co= nnection *con) > */ > crc =3D do_datacrc ? le32_to_cpu(msg->footer.data_crc) : 0; > while (cursor->total_resid) { > + struct bio_vec bvec; > + struct msghdr msghdr =3D { > + .msg_flags =3D MSG_SPLICE_PAGES, Hi David, This appears to be losing MSG_DONTWAIT | MSG_NOSIGNAL flags which were set previously? > + }; > struct page *page; > size_t page_offset; > size_t length; > @@ -493,10 +465,13 @@ static int write_partial_message_data(struct ceph_c= onnection *con) > } > > page =3D ceph_msg_data_next(cursor, &page_offset, &length= ); > - if (length =3D=3D cursor->total_resid) > - more =3D MSG_MORE; > - ret =3D ceph_tcp_sendpage(con->sock, page, page_offset, l= ength, > - more); > + if (length !=3D cursor->total_resid) > + msghdr.msg_flags |=3D MSG_MORE; > + > + bvec_set_page(&bvec, page, length, page_offset); > + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, le= ngth); > + > + ret =3D sock_sendmsg(con->sock, &msghdr); > if (ret <=3D 0) { And this is losing munging -EAGAIN -> 0? > if (do_datacrc) > msg->footer.data_crc =3D cpu_to_le32(crc)= ; > @@ -526,7 +501,10 @@ static int write_partial_message_data(struct ceph_co= nnection *con) > */ > static int write_partial_skip(struct ceph_connection *con) > { > - int more =3D MSG_MORE | MSG_SENDPAGE_NOTLAST; > + struct bio_vec bvec; > + struct msghdr msghdr =3D { > + .msg_flags =3D MSG_SPLICE_PAGES | MSG_MORE, > + }; > int ret; > > dout("%s %p %d left\n", __func__, con, con->v1.out_skip); > @@ -534,9 +512,11 @@ static int write_partial_skip(struct ceph_connection= *con) > size_t size =3D min(con->v1.out_skip, (int)PAGE_SIZE); > > if (size =3D=3D con->v1.out_skip) > - more =3D MSG_MORE; > - ret =3D ceph_tcp_sendpage(con->sock, ceph_zero_page, 0, s= ize, > - more); > + msghdr.msg_flags &=3D ~MSG_MORE; > + bvec_set_page(&bvec, ZERO_PAGE(0), size, 0); > + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, si= ze); > + > + ret =3D sock_sendmsg(con->sock, &msghdr); > if (ret <=3D 0) Same here... I would suggest that you keep ceph_tcp_sendpage() function and make only minimal modifications to avoid regressions. Thanks, Ilya