From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 017E9C433F5 for ; Mon, 10 Jan 2022 15:52:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 637B06B0071; Mon, 10 Jan 2022 10:52:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E7F06B0072; Mon, 10 Jan 2022 10:52:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AF536B0074; Mon, 10 Jan 2022 10:52:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 3E4736B0071 for ; Mon, 10 Jan 2022 10:52:58 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id ED0F793F28 for ; Mon, 10 Jan 2022 15:52:57 +0000 (UTC) X-FDA: 79014820794.17.4C1F5F3 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf21.hostedemail.com (Postfix) with ESMTP id DEDF31C0008 for ; Mon, 10 Jan 2022 15:52:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=u7yHDJoDf7iJguxRQRkt60FKuSt9ivq3kgtFkWFrtoo=; b=tthbolTDnjsmFU0w8l2MtFzahS 1JjipTuQ0+MUZgTRSS+KUd0n3zHjxahhGYOkukz6uxEOgxoEp9LqXgoQ9JiHo4Mqjl6bITG7Glk7P SP3Nilm72LwaRA7HStcRWc/fiTMk02HSCH8FeDJNLKadJ+guKhtmVmzQ8J5iRJ9OTQ4Ik/cCOzacj /OO3wUDjLgWpqkf8Ru+XOkpl1fa29eY4prxlphPzZ8mgNqLVskl0CDQkTxJC5VZOLIImxRplOPeoA HmnSpme/BkUZpwC+nlvM2RpaHyxg9szTLkGVudnfR9wBB6yv4uOwatpB7oUOkTCrVCHFtpfmGm2vx IK6zArdA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1n6wyZ-002Xgb-VY; Mon, 10 Jan 2022 15:52:52 +0000 Date: Mon, 10 Jan 2022 15:52:51 +0000 From: Matthew Wilcox To: Jan Kara Cc: John Hubbard , linux-mm@kvack.org, Andrew Morton , Christoph Hellwig Subject: Re: [PATCH 14/17] gup: Convert for_each_compound_head() to gup_for_each_folio() Message-ID: References: <20220102215729.2943705-1-willy@infradead.org> <20220102215729.2943705-15-willy@infradead.org> <20c2d9d3-bbbe-2f11-f6bf-a0e3578c6a71@nvidia.com> <20220110152208.w3tj5hjnbwjd6n2l@quack3.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220110152208.w3tj5hjnbwjd6n2l@quack3.lan> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: DEDF31C0008 X-Stat-Signature: m7dnbkafkpzrn3ht5q5me63yarxheord Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=tthbolTD; spf=none (imf21.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none X-HE-Tag: 1641829976-775782 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 10, 2022 at 04:22:08PM +0100, Jan Kara wrote: > On Sun 09-01-22 00:01:49, John Hubbard wrote: > > On 1/8/22 20:39, Matthew Wilcox wrote: > > > On Wed, Jan 05, 2022 at 12:17:46AM -0800, John Hubbard wrote: > > > > > + if (!folio_test_dirty(folio)) { > > > > > + folio_lock(folio); > > > > > + folio_mark_dirty(folio); > > > > > + folio_unlock(folio); > > > > > > > > At some point, maybe even here, I suspect that creating the folio > > > > version of set_page_dirty_lock() would help. I'm sure you have > > > > a better feel for whether it helps, after doing all of this conversion > > > > work, but it just sort of jumped out at me as surprising to see it > > > > in this form. > > > > > > I really hate set_page_dirty_lock(). It smacks of "there is a locking > > > rule here which we're violating, so we'll just take the lock to fix it" > > > without understanding why there's a locking problem here. > > > > > > As far as I can tell, originally, the intent was that you would lock > > > the page before modifying any of the data in the page. ie you would > > > do: > > > > > > gup() > > > lock_page() > > > addr = kmap_page() > > > *addr = 1; > > > kunmap_page() > > > set_page_dirty() > > > unlock_page() > > > put_page() > > > > > > and that would prevent races between modifying the page and (starting) > > > writeback, not to mention truncate() and various other operations. > > > > > > Clearly we can't do that for DMA-pinned pages. There's only one lock > > > bit. But do we even need to take the lock if we have the page pinned? > > > What are we protecting against? > > > > This is a fun question, because you're asking it at a point when the > > overall problem remains unsolved. That is, the interaction between > > file-backed pages and gup/pup is still completely broken. > > > > And I don't have an answer for you: it does seem like lock_page() is > > completely pointless here. Looking back, there are some 25 callers of > > unpin_user_pages_dirty_lock(), and during all those patch reviews, no > > one noticed this point! > > I'd say it is underdocumented but not obviously pointless :) AFAIR (and > Christoph or Andrew may well correct me) the page lock in > set_page_dirty_lock() is there to protect metadata associated with the page > through page->private. Otherwise truncate could free these (e.g. > block_invalidatepage()) while ->set_page_dirty() callback (e.g. > __set_page_dirty_buffers()) works on this metadata. Yes, but ... we have an inconsistency between DMA writes to the page and CPU writes to the page. fd = open(file) write(fd, 1024 * 1024) mmap(NULL, 1024 * 1024, PROT_RW, MAP_SHARED, fd, 0) register-memory-with-RDMA ftruncate(fd, 0); // page is removed from page cache ftruncate(fd, 1024 * 1024) Now if we do a store from the CPU, we instantiate a new page in the page cache and the store will be written back to the file. If we do an RDMA-write, the write goes to the old page and will be lost. Indeed, it's no longer visible to the CPU (but is visible to other RDMA reads!) Which is fine if the program did it itself because it's doing something clearly bonkers, but another program might be the one doing the two truncate() steps, and this would surprise an innocent program. I still favour blocking the truncate-down (or holepunch) until there are no pinned pages in the inode. But I know this is a change in behaviour since for some reason, truncate() gets to override mmap().