From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0277DC25B50 for ; Mon, 23 Jan 2023 11:28:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E90D6B0071; Mon, 23 Jan 2023 06:28:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 598D66B0072; Mon, 23 Jan 2023 06:28:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 461026B0073; Mon, 23 Jan 2023 06:28:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 367396B0071 for ; Mon, 23 Jan 2023 06:28:48 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CA6411A08A1 for ; Mon, 23 Jan 2023 11:28:47 +0000 (UTC) X-FDA: 80385841494.17.072A2A1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id AA5461A000F for ; Mon, 23 Jan 2023 11:28:44 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=I2oRuOmA; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674473324; a=rsa-sha256; cv=none; b=RehPH8br8H8qEumB93RxyKcjNdsQJJ9u17EF0EPBBKhkiLb7Glg1G9CyhxHUl92HwYymdy KeC+tP11gdZ72OcwwFifzZgggBez9CZzsnp9s8JIiFeyXTLWxBiFGk6/ce10Iq3+x1DeQD n5wbpPHdqOcpYnRkDblYBnUTU+fnfX0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=I2oRuOmA; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674473324; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZX5Hy9BPLoGgTqpi/nKPSetncHjmIrp5YM4+B525Z+s=; b=k8Id3lFxJfyQTy2nZO/BOe9sLYFt4raSJJqy+fntYHrp6Du9XjtqnRQ6u3dTzqWNq/2PCD pBDz0VT4LviwRnzaPD9YGDafupUHkgzFgWpGTK0OiBoee2Q7NZ1qIXvlivrP5W6s/c6QLI DAX30P4eH6NatzihqXCcfXPq+2e/Mno= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674473324; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZX5Hy9BPLoGgTqpi/nKPSetncHjmIrp5YM4+B525Z+s=; b=I2oRuOmA0pxj6DvLOef8g+1jhSG/wQeUpNcV4+kKhDDnr1MnmbNpGQHb0yB/HvE/K+YQYS byQg4lYZca0kgEPysl/kWl6SQ3dkzRKvrne+3MWthLzDEFKURYMpF2IMuSQV9F4yVkieTV uwBg6jtKUe/rnkpJg8HV8H2gmW6abs8= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-665-AIwe09QgM7Gtnrr23d0Ogw-1; Mon, 23 Jan 2023 06:28:43 -0500 X-MC-Unique: AIwe09QgM7Gtnrr23d0Ogw-1 Received: by mail-wr1-f72.google.com with SMTP id v20-20020adfc5d4000000b002bdfcdb4c51so1903760wrg.9 for ; Mon, 23 Jan 2023 03:28:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZX5Hy9BPLoGgTqpi/nKPSetncHjmIrp5YM4+B525Z+s=; b=yppGI02icv3lEND0cYqzJGgj1EebeXjik8StTVFvb0PgOX2dc75cedBlDKRN51LP5q 4X0pqpM90jfXlY+kxCzTVV3IzlmJghbtTtsIrMz4Bvjd1NFawt9vMUTDgSyznz3+58Nw qLTTzjuNNytXxXad3ix5cSD56sB9yzNZTf7kv/tzfqdQ4ZuneX3mx/IODWxntFz3Raow KX23x8eg8ZxTKKBQkLiuANbCLw8QrLcXszo4X4dpTTnBzYyFJNGwKC2yOikzWnCCXhYc ATrBk9v3XdGpYP8o98d/yiuRoQ17l+W0TpWNK1+OGEmkQsWAzKIYBRXmWP9LS7tbGIWD mdrg== X-Gm-Message-State: AFqh2ko9K2RTB3ntngcYJNi/HlvQSbUfIv8bJgOfm0irxBVHox8ArhBL 4GfIxdsLGE97Nnl0jl/EE02DQh3BcHdXFXbcyndBM6ZevHUHhB8oR+tycYSDAlGhNoE23NZkGQQ KWotqobxyn3c= X-Received: by 2002:a5d:4388:0:b0:293:1868:3a15 with SMTP id i8-20020a5d4388000000b0029318683a15mr22903780wrq.34.1674473321827; Mon, 23 Jan 2023 03:28:41 -0800 (PST) X-Google-Smtp-Source: AMrXdXvsZvlYF/jZLofTOnAYPcXTpB66JeK+sIp1VC7T9xXsAa4KWrWUh7F1iBjJxzgN6kKKZJ/VjA== X-Received: by 2002:a5d:4388:0:b0:293:1868:3a15 with SMTP id i8-20020a5d4388000000b0029318683a15mr22903763wrq.34.1674473321523; Mon, 23 Jan 2023 03:28:41 -0800 (PST) Received: from ?IPV6:2003:cb:c704:1100:65a0:c03a:142a:f914? (p200300cbc704110065a0c03a142af914.dip0.t-ipconnect.de. [2003:cb:c704:1100:65a0:c03a:142a:f914]) by smtp.gmail.com with ESMTPSA id q15-20020a5d574f000000b00272c0767b4asm4103815wrw.109.2023.01.23.03.28.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Jan 2023 03:28:41 -0800 (PST) Message-ID: <246ba813-698b-8696-7f4d-400034a3380b@redhat.com> Date: Mon, 23 Jan 2023 12:28:40 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [PATCH v7 2/8] iov_iter: Add a function to extract a page list from an iterator To: David Howells , Al Viro , Christoph Hellwig Cc: Matthew Wilcox , Jens Axboe , Jan Kara , Jeff Layton , Logan Gunthorpe , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Christoph Hellwig , John Hubbard , linux-mm@kvack.org References: <20230120175556.3556978-1-dhowells@redhat.com> <20230120175556.3556978-3-dhowells@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: <20230120175556.3556978-3-dhowells@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: AA5461A000F X-Rspamd-Server: rspam01 X-Stat-Signature: tqwj5a4pfg6ecz7nwky315ctht33a4r9 X-HE-Tag: 1674473324-132786 X-HE-Meta: U2FsdGVkX1+Tz8C1MHfDjo54lZs8Kes0MBmcQuCYnEhhOs3yjwM7r6mich7mT4BNyzgYh06iBHqYp5Nh7CCLd8AxHNK5eSBM7l6T4BV/EdOfwzhnEUp1dviqcq+BHDz1ZfavE7u3qADjUfTTUU/xPyYlo+6QJRmNmkXA7QdPt0Gi5g3UzQzZzZ3NqsMxwE6sRvX2VCF5RAs8f6R2nBPZEb1GXDooIACsF04+ix9coXA83E9ugebp1eTClZdntsfH9XS1OxZUDSC46pPbJtNwT0sf0auqjMMLgNSWaj3Gupd+gUCozi8mJvDZXzvfbjy89JrzppDIRip0PiEWwpKK7crWRG+qMYiwyM4GXNXoTrv1GG0trt8WGOKvBMesNdjBsKWX9uR5qzBCoxinuR0p0PD96PIYRUpO4pFX9n1m5iHn6d7FXYrL43zCOK7sHliAClqZm7g0jV1z7Z4uf3TxdOZvjPcyT2nkVRS9LsoGa8l9aKqH+iDEIYEne17FvUyi6XaWelXPtBjuTZWY+/OmFtXQFJ8UjQbUPz9ID6IQRfheB5+sNprFgtf0FdLneVL0GxpCyaChEX5P77UIfiO0d6FRlgPdAKoiU8Jwe6Sq894RPPtFvPhoiebWyC6oD2MPV1M7VX/t8Cft2DKJZpRBKxthOpqOYIwTZRKj0ggO48zRujmqgXF0EEn6RH1V7EngS3qZIiHvoOwoeuAFO8qphTk8V9aajOW/JXw3HN1G5jWdCG44MJcvbm31uyaEsCkrl+8cqIPlx6Pp8J5OIOhuAqTMubbIjuqPK2zN653cc5ePZd5Y4scYqLesyJzK8OX4Os/L0PJNMeiZXdc41ZAVRbu0Ycx5BC7/NargHS4wHXlnFp6pM500Ov3i+l9NnHUM6rgr4yLuNPHdw5vV61pKls4N3oE3hUPj5zz+K/Ys8BV07x5XtCyJ0G+FqgsYyW/SJSKbiZWqLpj2I3H4rru I1JMZ/8V gThbemJYF7mi10lmCzlD7i2FT6H1CH3ZpHxBzpmVUcGKhr/yf/fetyM9IRo6wH1Gkhq4myefpf8pJz9NCOWHuVTVjky/7z1kzduJJe9imOLtx+wpIhhg5VkwpSS+yJWVt/z6cNwXIsTLhEJuVxI0yzg8uFqLj3P7xGJXfnyYKhvZ+HmOahS7oDeS1ta6hFgQPVT2zS88tbvflFz/iYmDXUHA95tZdJZTqVs3RGbB8E8HFi8PLOFnnWY5oqaKNQ5FJ/dJU6/QL6teS/EA6eO31mc+pO0QX0/5R13pQKhlpfeLiw7bI4s4DkpyH94yEgaIhuCjUcU4N5nqMHM+Aq7I2OTY6mMPv1EoGhdGiY+Bs1b8st9Y9vMQOcSg3A0yUEXt+iL62kUTT2svoen0gEE7ulueIL/xoSPxlIdro2IBAR8cKIrKDypTSNI41Q5wUt40BuJW1i1xD/k5Yu6ngpjjc6OpfaHsKaIGVQqfunEHweTD8x2CVEzcdeAu/5QnqxUpUwJEQp7bcPzTLAlVbjlIloug9TcEx9PLDbnsQbkr8z53jbgeUYCntvOWc6et7V7VXXbcAJFiFrfL5rLT0SDM63X3SAegeZ31wJWjMgumGvwCpa0y13jBvmm8sLE3v+gWPI4mTJio40izQSQJzX888ks1CT2ZZ5JwoFDPQTOG1EH67SaxmEVyeASuU2bzc+5vz06XOszaMAuF4XzefQeDKNXPHDYDtp94DwT9PwppwqeKeFUj2zdCNUnxEbwOVj4NohYhFydPsvd4xjT3PKQ9HLupFmxXgVya3tIN8Vlafbi7zyti9xggqGZCSG78sz5N3wPnMAw5do96UfF2dgExCtnoXcIQTrps2WPsM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20.01.23 18:55, David Howells wrote: > Add a function, iov_iter_extract_pages(), to extract a list of pages from > an iterator. The pages may be returned with a reference added or a pin > added or neither, depending on the type of iterator and the direction of > transfer. The caller must pass FOLL_READ_FROM_MEM or FOLL_WRITE_TO_MEM > as part of gup_flags to indicate how the iterator contents are to be used. > > Add a second function, iov_iter_extract_mode(), to determine how the > cleanup should be done. > > There are three cases: > > (1) Transfer *into* an ITER_IOVEC or ITER_UBUF iterator. > > Extracted pages will have pins obtained on them (but not references) > so that fork() doesn't CoW the pages incorrectly whilst the I/O is in > progress. > > iov_iter_extract_mode() will return FOLL_PIN for this case. The > caller should use something like unpin_user_page() to dispose of the > page. > > (2) Transfer is *out of* an ITER_IOVEC or ITER_UBUF iterator. > > Extracted pages will have references obtained on them, but not pins. > > iov_iter_extract_mode() will return FOLL_GET. The caller should use > something like put_page() for page disposal. > > (3) Any other sort of iterator. > > No refs or pins are obtained on the page, the assumption is made that > the caller will manage page retention. ITER_ALLOW_P2PDMA is not > permitted. > > iov_iter_extract_mode() will return 0. The pages don't need > additional disposal. > > Signed-off-by: David Howells > cc: Al Viro > cc: Christoph Hellwig > cc: John Hubbard > cc: Matthew Wilcox > cc: linux-fsdevel@vger.kernel.org > cc: linux-mm@kvack.org > Link: https://lore.kernel.org/r/166920903885.1461876.692029808682876184.stgit@warthog.procyon.org.uk/ # v2 > Link: https://lore.kernel.org/r/166997421646.9475.14837976344157464997.stgit@warthog.procyon.org.uk/ # v3 > Link: https://lore.kernel.org/r/167305163883.1521586.10777155475378874823.stgit@warthog.procyon.org.uk/ # v4 > Link: https://lore.kernel.org/r/167344728530.2425628.9613910866466387722.stgit@warthog.procyon.org.uk/ # v5 > Link: https://lore.kernel.org/r/167391053207.2311931.16398133457201442907.stgit@warthog.procyon.org.uk/ # v6 > --- > > Notes: > ver #7) > - Switch to passing in iter-specific flags rather than FOLL_* flags. > - Drop the direction flags for now. > - Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA. > - Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter. > - Add support for extraction from KVEC-type iters. > - Use iov_iter_advance() rather than open-coding it. > - Make BVEC- and KVEC-type skip over initial empty vectors. > > ver #6) > - Add back the function to indicate the cleanup mode. > - Drop the cleanup_mode return arg to iov_iter_extract_pages(). > - Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter > data_source. > > ver #4) > - Use ITER_SOURCE/DEST instead of WRITE/READ. > - Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in. > > ver #3) > - Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access > to get/pin_user_pages_fast()[1]. > > include/linux/uio.h | 28 +++ > lib/iov_iter.c | 424 ++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 452 insertions(+) > > diff --git a/include/linux/uio.h b/include/linux/uio.h > index 46d5080314c6..a4233049ab7a 100644 > --- a/include/linux/uio.h > +++ b/include/linux/uio.h > @@ -363,4 +363,32 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, > /* Flags for iov_iter_get/extract_pages*() */ > #define ITER_ALLOW_P2PDMA 0x01 /* Allow P2PDMA on the extracted pages */ > > +ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages, > + size_t maxsize, unsigned int maxpages, > + unsigned int extract_flags, size_t *offset0); > + > +/** > + * iov_iter_extract_mode - Indicate how pages from the iterator will be retained > + * @iter: The iterator > + * @extract_flags: How the iterator is to be used > + * > + * Examine the iterator and @extract_flags and indicate by returning FOLL_PIN, > + * FOLL_GET or 0 as to how, if at all, pages extracted from the iterator will > + * be retained by the extraction function. > + * > + * FOLL_GET indicates that the pages will have a reference taken on them that > + * the caller must put. This can be done for DMA/async DIO write from a page. > + * > + * FOLL_PIN indicates that the pages will have a pin placed in them that the > + * caller must unpin. This is must be done for DMA/async DIO read to a page to > + * avoid CoW problems in fork. > + * > + * 0 indicates that no measures are taken and that it's up to the caller to > + * retain the pages. > + */ > +#define iov_iter_extract_mode(iter, extract_flags) \ > + (user_backed_iter(iter) ? \ > + (iter->data_source == ITER_SOURCE) ? \ > + FOLL_GET : FOLL_PIN : 0) > + > How does this work align with the goal of no longer using FOLL_GET for O_DIRECT? We should get rid of any FOLL_GET usage for accessing page content. @John, any comments? -- Thanks, David / dhildenb