From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69A2EEB64DD for ; Sun, 25 Jun 2023 12:35:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F3376B0071; Sun, 25 Jun 2023 08:35:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A36F6B0072; Sun, 25 Jun 2023 08:35:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66A9B6B0074; Sun, 25 Jun 2023 08:35:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 542F76B0071 for ; Sun, 25 Jun 2023 08:35:02 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 13B1C404B9 for ; Sun, 25 Jun 2023 12:35:02 +0000 (UTC) X-FDA: 80941214844.15.C1ECD27 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf13.hostedemail.com (Postfix) with ESMTP id 31E8220004 for ; Sun, 25 Jun 2023 12:34:59 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=qFIX6t2d; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of idryomov@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=idryomov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687696500; a=rsa-sha256; cv=none; b=mTBul7Uh3aTo0NrlJIoP7u465evdwVrnpg+UYCSPqa5L702r1m73L5N1zN4Ih3PZgl0oju lk9R7tyyz9ynJ+hZRUbt1hP7b6mGeXTYP9LrqygRaumqzT6wKPMZup+Jc5f5glFq23roSL axJnd/tL4KWjIYLPaEI+wg2WWra7bMc= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=qFIX6t2d; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of idryomov@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=idryomov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687696500; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mnTM6AH2QXF2gOF8mDFwaIyALk1zi2atW6weIU7EpDk=; b=aYudHxtGY9mvnFrZKXpWrqVocu9+RhY0n1egXgcMtaHk3f0rg0YvfWcvFNlZD0fBPjKpGP Do+61B+XNrJU2YtVqAPW4yrHNLbXu61Y+iqSk9yYbceIfaqoDwiXIoFGmvsTrRa5BlT8ss UZfGcu++soF3nhXb3jGxKBU28psS5PA= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-988a2715b8cso624469466b.0 for ; Sun, 25 Jun 2023 05:34:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687696499; x=1690288499; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mnTM6AH2QXF2gOF8mDFwaIyALk1zi2atW6weIU7EpDk=; b=qFIX6t2djOPZ3NFLEEsWh6kVaD6cm1PqqOZB43uvYQtB5FCIFt1MLynosh38SZn6a+ nhuZmKYbdzDSJk4+eyBR5rHzn6xJFUofmi9IVSoYk/vGPIi3N0OpCm39foIfQaadO1CP z6ypR6oH5OE7LXfQHJTlnyQSiR1DU4LSWA3qobpWXR0NmKzvFBrQiWwLgwRAP9lrxDg7 b9gVil1hvTLGYONUJovI7F+s+FIRB6bh/Ba78lhpdd5/sAuly6PbxuKn4gWAH+BBT1uG J93qqgCZ9J9x5tZmdcPi2cmRE6/WlD11lJgi5ei9dYsk9ZG3WHqurlZ31eEwc2WX8PTW V9FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687696499; x=1690288499; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mnTM6AH2QXF2gOF8mDFwaIyALk1zi2atW6weIU7EpDk=; b=W1AWELgkwUSsvVNVWQ37NFd3/+E3njcbDA0LWG9L5NhZTVgf/KnGoKh4eLoPdCcRpm +jZmoTcRkRd2wTOZSwGdcOheMpRBN8TgTrCroE5JUa4tAuBGsSaLVxlUXvRDN8nL2oRM 8GwkImZsSm9xLGBi8DfKKWZ8ueul3Wl8pDGvsj940uXqYqMq3iSD44zhOjXZ7TjuQwZx rABJjbjttKoSl2HPhsUodk9ZNr7Ob2NW7w7Pd8KmRk+DSziKGzPgcjqobmbREgfhwgy6 NTLqGJrRFF1UbmWz3qUMxBn1mrNQyBQeXKYCZGoB8HWiumhz2OnG1B7AVxfiH2dHxCh3 hCHQ== X-Gm-Message-State: AC+VfDzmfRxAG/b46bWqkLBuxWZxyJBDj9jw4G7uKkxB8gjKPcug0McN fnSBDmerjal7RM/ytL7hUUQyhgQZfigwj7JPQvw= X-Google-Smtp-Source: ACHHUZ4Nd6vLIj3xJTixgektcZuykdsnE4als2ZZkaqjSlSc82/AQVEifoR+5LZ+d971Eya9AkiGpbVc22+tDb1da6M= X-Received: by 2002:a17:907:1ca8:b0:973:84b0:b077 with SMTP id nb40-20020a1709071ca800b0097384b0b077mr25773762ejc.33.1687696498611; Sun, 25 Jun 2023 05:34:58 -0700 (PDT) MIME-Version: 1.0 References: <20230623225513.2732256-1-dhowells@redhat.com> <20230623225513.2732256-5-dhowells@redhat.com> In-Reply-To: <20230623225513.2732256-5-dhowells@redhat.com> From: Ilya Dryomov Date: Sun, 25 Jun 2023 14:34:46 +0200 Message-ID: Subject: Re: [PATCH net-next v5 04/16] ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() To: David Howells Cc: netdev@vger.kernel.org, Alexander Duyck , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiubo Li , Jeff Layton , ceph-devel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 31E8220004 X-Stat-Signature: s49bp8tmp73tumwd6pgfmzpiah3kw1hf X-HE-Tag: 1687696499-243159 X-HE-Meta: U2FsdGVkX1/RuLJZJH67vKH4H6B1Rd79n7Bt+DMgX0T3FzCesNFCvCnHYnQCklPEr0L5kDLpUcRJtyqvdIKZ48tacwamiOMjYxMGeLHSgyJXNdTmDnXRupot38dDKy7bomPDHxg5OE6iasHhIoPzvi9L9gQFyURP3XJGTqyQfQAoYmW5ObwzvoDxov5txMYqY2bTTQYY/Thy1z7YcBDTQnl+FHuN10P6uqJ/HH09S9kh3QZJp3HVKemJ4gAyDDLKD8oHDaetNhxZUOLfWnGV+Yv0RYyMxPQd/zmmPmoCcMAmYyoez/eqXNccFpiCMYi/jSra1DpQvafSvlub6aP7Vr4H7+hZPqdxuUwcXX1nQA53Br/L1DT4s+3eDUPj/KMcYx39gxE6qTxeizFAY44Lusa9nbOZoGJc9tiEuz7VCHL81lBsNl45+v2C5n1y8kc6EXL+kueSZMYEmCvo936r3ICPzH/wMVQrZn+F7oFGSI6rch4xPV7gIzl//S8C+JFezE9xZrDOwaW7W1BgcHYMrg9gAsw7EmVmy+UfS6tNR1KqmMRhm9/T30KzQKHD9PewOsG8tk9loLQoDVM41ZysjGWYAdVSPe+PZOJcMz9IOzBnVtVvhOd008H4YVw3TgoFgeteSAC9sDH4i4MFNsn/fKP9/CCapszZghoXI82Po1AiCLe2NhTseI3ptykvfPoaP7jITSdQVMUzss2g/YDFEMOWyh0Siz+xFdsFRkDaZiLymRZO4ZqiD1HnSIH8u/5C2znP23SjTLwTKaj4aCuBTv3Ph3QhrwSuzid6T28Ux8JFIUQuolzfyrA7QBkyMD+xm1lwz1SShenEKwMqIAgKhB3VZ6kPH631nvS5MZRW/9mNQ6FYCvlQqSSOSo1nPlUjb4voO0TEIir6FWJnBiEZpyFx7/Tkvi9rhKbRnNn0SS2Eg2kA21/i5IxtqUCzSXUg3rrIxoSRgyNDKhAGF4P 7FzqRyu8 NyYLYZbEgWY+OI6wQ6YQz7HadF8UM8JZHJIn+or/e45RLPkWfDex6CgS62WIVB59cKbPA6OQ114WPkzpADL5QfGvW+dluJ8vy1bV0+TcUyCh3IkaQ1CkrZ1bT696/6uL9da17gU4LVhEC2vPDIGKuiQ3CB/e4Nwu0IauWH75e2m4wN7VKSL56EejQW19X+6QUA2A3KsxKof8pAFXnER1hcSu0S5COJYunYRVWTErXr+suKTKbPyJ3jH1WOAosnKls1xn1ptcJ0D0JsK9HImWfxxtnGr3H+pmTKCA9x++rzksPqLoi8IhBsGfEDbSafWMPI7KqcbzrP+S6KMk5swSoFyba5w1kgRmmDDT1SUkcQ1A2EwZsNmB72l8scXOogAMcZbhGSNe76zDq4tgHf9g2b+4MHVAKdKc/3iwiYGkd8RkspufoJwYGgX8U2jW9VRHvP8X8fnvlrM7Sd+/0UMYSyTKgbNFnJ3+jdHJSZH6pqYrTSvadxSYn7kis2Hnlo69wzXIDytjmCu0iR+FoCHajnzoEKDaacnZAUE3M48m2jsh1qe9OFk8Qf3Jy3IOaJKyVDVD2u0vhFFbzcdALPD1KIRD71Ie5vC9j4TsY5ONIa0gb8UCavbXwS1WEHIuZ/mwvo0iB63WaZpoC5kwONo60JE41jkcCWK9bLVSDCutDUrdkQsuuDxPn5yfp0PReFZOtNryfxL3Kz1ZjW1DQuOhBW78bow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jun 24, 2023 at 12:55=E2=80=AFAM David Howells wrote: > > Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when > transmitting data. For the moment, this can only transmit one page at a > time because of the architecture of net/ceph/, but if > write_partial_message_data() can be given a bvec[] at a time by the Hi David, write_partial_message_data() is net/ceph/messenger_v1.c specific, so it doesn't apply here. I would suggest squashing the two net/ceph patches into one since even the titles are the same. Also, we tend to use "libceph: " prefix for net/ceph changes. > iteration code, this would allow pages to be sent in a batch. > > Signed-off-by: David Howells > cc: Ilya Dryomov > cc: Xiubo Li > cc: Jeff Layton > cc: "David S. Miller" > cc: Eric Dumazet > cc: Jakub Kicinski > cc: Paolo Abeni > cc: Jens Axboe > cc: Matthew Wilcox > cc: ceph-devel@vger.kernel.org > cc: netdev@vger.kernel.org > --- > net/ceph/messenger_v2.c | 91 +++++++++-------------------------------- > 1 file changed, 19 insertions(+), 72 deletions(-) > > diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c > index 301a991dc6a6..87ac97073e75 100644 > --- a/net/ceph/messenger_v2.c > +++ b/net/ceph/messenger_v2.c > @@ -117,91 +117,38 @@ static int ceph_tcp_recv(struct ceph_connection *co= n) > return ret; > } > > -static int do_sendmsg(struct socket *sock, struct iov_iter *it) > -{ > - struct msghdr msg =3D { .msg_flags =3D CEPH_MSG_FLAGS }; > - int ret; > - > - msg.msg_iter =3D *it; > - while (iov_iter_count(it)) { > - ret =3D sock_sendmsg(sock, &msg); > - if (ret <=3D 0) { > - if (ret =3D=3D -EAGAIN) > - ret =3D 0; > - return ret; > - } > - > - iov_iter_advance(it, ret); > - } > - > - WARN_ON(msg_data_left(&msg)); > - return 1; > -} > - > -static int do_try_sendpage(struct socket *sock, struct iov_iter *it) > -{ > - struct msghdr msg =3D { .msg_flags =3D CEPH_MSG_FLAGS }; > - struct bio_vec bv; > - int ret; > - > - if (WARN_ON(!iov_iter_is_bvec(it))) > - return -EINVAL; > - > - while (iov_iter_count(it)) { > - /* iov_iter_iovec() for ITER_BVEC */ > - bvec_set_page(&bv, it->bvec->bv_page, > - min(iov_iter_count(it), > - it->bvec->bv_len - it->iov_offset), > - it->bvec->bv_offset + it->iov_offset); > - > - /* > - * sendpage cannot properly handle pages with > - * page_count =3D=3D 0, we need to fall back to sendmsg i= f > - * that's the case. > - * > - * Same goes for slab pages: skb_can_coalesce() allows > - * coalescing neighboring slab objects into a single frag > - * which triggers one of hardened usercopy checks. > - */ > - if (sendpage_ok(bv.bv_page)) { > - ret =3D sock->ops->sendpage(sock, bv.bv_page, > - bv.bv_offset, bv.bv_len= , > - CEPH_MSG_FLAGS); > - } else { > - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bv, 1,= bv.bv_len); > - ret =3D sock_sendmsg(sock, &msg); > - } > - if (ret <=3D 0) { > - if (ret =3D=3D -EAGAIN) > - ret =3D 0; > - return ret; > - } > - > - iov_iter_advance(it, ret); > - } > - > - return 1; > -} > - > /* > * Write as much as possible. The socket is expected to be corked, > - * so we don't bother with MSG_MORE/MSG_SENDPAGE_NOTLAST here. > + * so we don't bother with MSG_MORE here. > * > * Return: > - * 1 - done, nothing (else) to write > + * >0 - done, nothing (else) to write It would be nice to avoid making tweaks like this to the outer interface as part of switching to a new internal API. > * 0 - socket is full, need to wait > * <0 - error > */ > static int ceph_tcp_send(struct ceph_connection *con) > { > + struct msghdr msg =3D { > + .msg_iter =3D con->v2.out_iter, > + .msg_flags =3D CEPH_MSG_FLAGS, > + }; > int ret; > > + if (WARN_ON(!iov_iter_is_bvec(&con->v2.out_iter))) > + return -EINVAL; Previously, this WARN_ON + error applied only to the "try sendpage" path. There is a ton of kvec usage in net/ceph/messenger_v2.c, so I'm pretty sure that placing it here breaks everything. > + > + if (con->v2.out_iter_sendpage) > + msg.msg_flags |=3D MSG_SPLICE_PAGES; > + > dout("%s con %p have %zu try_sendpage %d\n", __func__, con, > iov_iter_count(&con->v2.out_iter), con->v2.out_iter_sendpage= ); > - if (con->v2.out_iter_sendpage) > - ret =3D do_try_sendpage(con->sock, &con->v2.out_iter); > - else > - ret =3D do_sendmsg(con->sock, &con->v2.out_iter); > + > + ret =3D sock_sendmsg(con->sock, &msg); > + if (ret > 0) > + iov_iter_advance(&con->v2.out_iter, ret); > + else if (ret =3D=3D -EAGAIN) > + ret =3D 0; Hrm, is sock_sendmsg() now guaranteed to exhaust the iterator (i.e. a "short write" is no longer possible)? Unless that is the case, this is not an equivalent transformation. This is actually the reason for > * Return: > * 1 - done, nothing (else) to write specification which you also tweaked. It doesn't make sense for ceph_tcp_send() to return the number of bytes sent because the caller expects everything to be sent when a positive number is returned. Thanks, Ilya