From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EFA8C77B73 for ; Wed, 24 May 2023 08:47:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DE60280001; Wed, 24 May 2023 04:47:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9670B900003; Wed, 24 May 2023 04:47:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8084B280001; Wed, 24 May 2023 04:47:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6C6D2900003 for ; Wed, 24 May 2023 04:47:24 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3409AADFB3 for ; Wed, 24 May 2023 08:47:24 +0000 (UTC) X-FDA: 80824519608.13.8392419 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 41B9D80010 for ; Wed, 24 May 2023 08:47:21 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PWU6n06f; spf=pass (imf30.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684918042; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CDhpK8+26Y4l0U6FHR/UwNa7pZESmpTlQt8dJppAy50=; b=c3AWT8XvuhZvMTMdDjxGILC1iUBHMZV+9/qf5iKevdGtKEoAcCGMythVChVfDD4xLEpv+c h2p646OdPxFkZgfIHJwVdqGbrStl8r6LEBsdxJWG3AEL7lCVimaVgBVLH7ponc2saaFRkr x/d3Ww/ehpvJ+lFA7M+pzDoB2zr2QNA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684918042; a=rsa-sha256; cv=none; b=TQUul8YeeakwrY5aYwbvwN8XWK87FTvmKkwb0VWF28cQIK9kbV5ukcynugUW6VReSRSJJi A0JwgmWMiLQskMDMdcZ+qP44WdMiLHpvG5aHFy44FvuwOqU3rQkXfogqA6iUB8yhWuWTEv eV/G8PwOGkcEBKmZtFoYyFQc/uGQQoI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PWU6n06f; spf=pass (imf30.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684918041; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CDhpK8+26Y4l0U6FHR/UwNa7pZESmpTlQt8dJppAy50=; b=PWU6n06f07jGqHbgC1X3bSsaG0hcL/MQocaDiN8T2TXwa26kWJXtlMQqrVS8AYhnAJJUtg Yp5xnPPvMo8kpCkaRZ/Q6XJ6pKlG/79SuGD3y2ILHNtKQhvj6q9STyg78F4pGrJoxKeqQQ /LDghhJ42fKqGJrtpQqKo0XXoVLAYDA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-345-V0pDKK8rM169n3tqu65WWQ-1; Wed, 24 May 2023 04:47:14 -0400 X-MC-Unique: V0pDKK8rM169n3tqu65WWQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 08EC385A5AA; Wed, 24 May 2023 08:47:14 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9CDA840CFD45; Wed, 24 May 2023 08:47:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: References: <20230522205744.2825689-1-dhowells@redhat.com> <3068545.1684872971@warthog.procyon.org.uk> To: Christoph Hellwig Cc: dhowells@redhat.com, Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: Extending page pinning into fs/direct-io.c MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3215176.1684918030.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Wed, 24 May 2023 09:47:10 +0100 Message-ID: <3215177.1684918030@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Rspamd-Queue-Id: 41B9D80010 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: rn6rp8escc5zpm5jrcsy3x57jahqhnjq X-HE-Tag: 1684918041-29289 X-HE-Meta: U2FsdGVkX1/HEzgQtpjmtAq02Y7Uj1yCgtwDbP8yVjqIH/yx6k/vY+WXhrVchRq1DBsCXC9fxD7H96jNRcP76HKrBllkaP+tvePoiItLbqMrVVSYbzeGwWb85rDf0eTlTO/sfR1LYbGFJPxz3WJoLVWMquFMQFTwIk2CGOFQagyNenAkpODu+SxD9xD/Gf64XRltnSnVzQ2hCjUIxfbOrasVXBhEzkAsa8Is9rNuGgBdBkCW80v4WgKPxf9ByjeCGu2/Gfmyqxqz5GWynluJiEEkzc9Q5jGK4CPpTagbIt9xsuA6yygzEGC9PsW2J/NxI6nT8yMH5lt30XtQa4iNwk1cogFrKQXhd9DXP720r9dGZuCkmljMDgki6B/MWhXlv2IJbHuKTOUffXEVLN90zvxGeKdjo12XJs90irKzEkO0s0Fu5Mi14RO2C29ykzXiI+K0dTJZttebgVZhSAt6sXdXIobykXCPKRzr0lLV46iDsSVN8rzbtU/enLLbtT+G6GnzitSdoX9JSkEH+FbeIIwzM2UpVBLkP/KvWgseWJXPNhEFbOsffSyiIg5dXpQYvARnoooYKDlEQVI+0dsJhDPYq8F9hb0kiVo4GkAYH5NTjS/YBj0DRf+kUMNeCJV8wrBHByZ1BBTjxiSToQjcwQ4C+HgD7FsCOI6rSVs1f6OBU9B/rESVwLZx+WjDtGJ2PyMW9nV27NECQnCJ6Rg7IZ3vLrhq04r6xj7nTv7ubow/SbUhUeuTkTSad+qZNdtpIGXQbAaiKWcKvZfSQexlOPG93Udq9Z5TeGYW8Y9Og7UxhKpb07Bltc86CkiQNX4MdcUXOAuJYnFeWgDzRPTOz06PiuU2N72b5LtNJOqzDNpzgQqGLM++/Mpl06dOmEyiUz+ZMhbHFt7vB2FC4JM5QM7DQHUQLXqqx2u6CFifXJg3uav3Brln82+FDFQq/gVB9Xqw/xNZovxImr335AK r7U0u/V5 OF8CrmGabzkyNSGxOC2KGdrYy8wJSzG3XzcI+Zy1Blf/i4kKcNwQZLPwpz4f19/eTQLKRNWbdTk/VFoI4UdAQzZr49yVnW4FbTg8vkfEjJUTJvJciHri3PWjKCb4swgQawfXOQd8utjdXWjqd7Pmj6CuRSdrjQIk1VS8ZdEC+jfqGcJDYRmBcGi4Z0RG/7r2QQxBIcPrexRpFri6aAl1ZyrN4uA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Christoph Hellwig wrote: > > What I'd like to do is to make the GUP code not take a ref on the zero= _page > > if, say, FOLL_DONT_PIN_ZEROPAGE is passed in, and then make the bio cl= eanup > > code always ignore the zero_page. > = > I don't think that'll work, as we can't mix different pin vs get types > in a bio. And that's really a good thing. True - but I was thinking of just treating the zero_page specially and nev= er hold a pin or a ref on it. It can be checked by address, e.g.: static inline void bio_release_page(struct bio *bio, struct page *page= ) { if (page =3D=3D ZERO_PAGE(0)) return; if (bio_flagged(bio, BIO_PAGE_PINNED)) unpin_user_page(page); else if (bio_flagged(bio, BIO_PAGE_REFFED)) put_page(page); } I'm slightly concerned about the possibility of overflowing the refcount. = The problem is that it only takes about 2 million pins to do that (because the zero_page isn't a large folio) - which is within reach of userspace. Crea= te an 8GiB anon mmap and do a bunch of async DIO writes from it. You won't h= it ENOMEM because it will stick ~2 million pointers to zero_page into the pag= e tables. > > Something that I noticed is that the dio code seems to wangle to page = bits on > > the target pages for a DIO-read, which seems odd, but I'm not sure I f= ully > > understand the code yet. > = > I don't understand this sentence. I was looking at this: static inline void dio_bio_submit(struct dio *dio, struct dio_submit *= sdio) { ... if (dio->is_async && dio_op =3D=3D REQ_OP_READ && dio->should_dirty) bio_set_pages_dirty(bio); ... } but looking again, the lock is taken briefly and the dirty bit is set - wh= ich is reasonable. However, should we be doing it before starting the I/O? David