From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A90EBEB64D9 for ; Tue, 4 Jul 2023 19:36:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD6362800B1; Tue, 4 Jul 2023 15:36:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D8648280096; Tue, 4 Jul 2023 15:36:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C748D2800B1; Tue, 4 Jul 2023 15:36:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B5688280096 for ; Tue, 4 Jul 2023 15:36:10 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 34C8E120387 for ; Tue, 4 Jul 2023 19:36:10 +0000 (UTC) X-FDA: 80974935300.03.A9F7906 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf12.hostedemail.com (Postfix) with ESMTP id CD64340013 for ; Tue, 4 Jul 2023 19:36:07 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=mlsaLhBH; dmarc=none; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688499368; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A4IKew0kN3UlZ/53V8vTAFpGpJXOcLX/p8Q1Xyl4NZo=; b=Q525l0+r3n6BHrIhrgncOcBTNjtBwDiWKBfgAm3nAl8ZOOMqfp9kMQXYhZYAlBCG+1r8PN Kugtg7LGHvdSRsWKRsHijzM+hdN1rYqygV+FR78OdtpJzQItrR0M3Gbhdg0V26Mo7Xly/a oZ8FtISireEIn/B/3YE1krgMxzRMKNw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=mlsaLhBH; dmarc=none; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688499368; a=rsa-sha256; cv=none; b=SIsxCvbD0cb5oTgmcc7HHnjtnulq2ovPuTmH6qojSKDIIbvNRCqr8mb+G5fftXvIjuacs/ V36M74V61pkYFptoeFPhKhDQflmwtNa9yd20SZjkrSrSgY8VyJsaedHq+8/2P3efZ+5aJd 3UMl91QQFr6zX8pxYMJKcKlyx7ZaoLo= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=A4IKew0kN3UlZ/53V8vTAFpGpJXOcLX/p8Q1Xyl4NZo=; b=mlsaLhBHoQBje1BQR2ufPx/gJu UzG8a+yaHAe6vPT4WeTAGvPuFNMNPyFZoYJ55MgU5Ug/vlosGKQZsNtOBdqn+YwjpGUr2MXiTAjlf Ti4Ia8ZHlWFTZIvUgXYK4SBqVM3XYYUgr+xlmMvtz7oMpkrreg+TjKMkgDPrlIqhUx8rqtWf8dzAB 3zZyVZi0oQZlETbkSTSMhilFRevsNZFfh+EcMhJPaTRerfqzrhT5bkXFpxHh8EKKn1YBuUXji9N2J 7dp4+XKpR1W7DL6ydA5sRIfTa5shVt1UtoEwa1DqOtLG2qbqb5rUQsPuZT78YWMtTbpZLyGMjpzUp 6wdAWIjw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1qGlod-009PfR-Mx; Tue, 04 Jul 2023 19:35:59 +0000 Date: Tue, 4 Jul 2023 20:35:59 +0100 From: Matthew Wilcox To: Sidhartha Kumar Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org Subject: Re: [PATCH 1/4] mm/memory: convert do_page_mkwrite() to use folios Message-ID: References: <20230703055850.227169-1-sidhartha.kumar@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230703055850.227169-1-sidhartha.kumar@oracle.com> X-Rspamd-Queue-Id: CD64340013 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: xh8hkxoii16fwoh7oxxjqw4bu1uq1x8f X-HE-Tag: 1688499367-645194 X-HE-Meta: U2FsdGVkX1+MB+Q7Dj8IdHvnRQ/U/P7KSqs1Wjtm5grAESTzmC6zEGBurvIBy5EBdx1vbp4eyAMmhmGflMsR/KJt4WvhKG0kS0GbmE/Q2jioh9Nd2LJFsM5RdC08w4yHNh1y36tTheVMvo+R7mRKar+4FySrrGoR0FK9ZRYpPNwv+87V8M60wf7TrDN71b27Jx0WRW4D7su41hu7Oq+XvpoX8ojJeZIimg87btrKZk76rr1EOOjPNLJ3REsoQWHdrR3ScTYCJwLR04ptve/x29JuFozzr59DG3ZNe8WMXfgqSzWZQZUJFmEFChEImyDTdYPCop72PyPlNhbhKsBHUU4/FoVV3Ts+DC4huxfqtaB057nV1zQGIXf0fRNW5PJOWTzEdfiG4DTaiNpv4lOleTJe/2yraybIJttfaMFrfAmZF9ROjGScmJS979mkCld0CWGCJ0nDiT+VfoYUJYHJztLTx2ggPZuFOibKUc00iB95inOQ0KmevIVBK5tsQg0Jr5diTZ4vsgs8Pg3Y5HFVbJ+2aYwm4kfERwpOLU0HFOq6LLOWR5C9esFI/Euqcxpk2LKx4vo9Q2ehgXuG209KnIAoE+9gna1SA9hZ8KJzK/jzhu4A9FQcee6PaTAKhxl8I+H+ssjQ/z/3j6olAg37pXXTHFC4qYbbTQbARpghHmGMijFkhR9DXFT4FqkiA99pYQrK6clm+wvohpmwNiX4GFGF/g//zhl4HTQh8t2Qdd40Qv0V6eRtAqaYPvuVRFtVrHCdLqCOvjMKA0p49OEb4YH38AILOZyk+VO6jHfFEX5ogcNrBF+/wZWrrAPjvIMiFOpp05qvQ3mlUHof29eyySW0rl1kg4OjCTEhd8AfBUxvp4XcOqMiR8rXS9yiI24MYMSmuxCN2sXhHHGe2udp17zgRQ0ws7o6x+l1T7junsXE2srox7f+6QUEp5DhVc+YccdgaybIwQo7rszUZJX foVjOMka Bgn0IR5Bki+iG91CCzFKlvxR4PmtV0aADoh5d1PyQj+6U2ZCz4KcdYSM1kVTF8j8/hV5EtRcrgtWIr1Xc81jpXB99/FKy2+TF1GDWmZUPjaU4uZVsR1W8SrVZQsobPBm7I8psQfrhQzQGmyDcxqMB1zimk6nuGXj3SwDae8yqndP8hFRQCTH8JxcwAg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jul 02, 2023 at 10:58:47PM -0700, Sidhartha Kumar wrote: > @@ -2947,14 +2947,14 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf) > if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) > return ret; > if (unlikely(!(ret & VM_FAULT_LOCKED))) { > - lock_page(page); > - if (!page->mapping) { > - unlock_page(page); > + folio_lock(folio); > + if (!folio_mapping(folio)) { > + folio_unlock(folio); I promised to explain this better once I had time, and I have time now. folio->mapping is used for a multitude of purposes, unfortunately. Maybe some future work will reduce that, but for now, These Are The Rules. If the folio is marked as being Slab, it's used for something else. The folio does not belong to an address space (nor can it be mapped, so we're not going to see it here, but sometimes we see it in other contexts where we call folio_mapping()). The bottom two bits are used as PAGE_MAPPING_FLAGS. If they're both 0, this folio belongs to a file, and the rest of folio->mapping is a pointer to a struct address_space. Or they're both 0 because the whole thing is NULL. More on that below. If the bottom two bits are 01b, this is an anonymous folio, and folio->mapping is actually a pointer to an anon_vma (which is not the same thing as an anon vma). If the bottom two bits are 10b, this is a Movable page (anon & file memory is also movable, but this is different). The folio->mapping points to a struct movable_operations. If the bottom two bits are 11b, this is a KSM allocation, and folio->mapping points to a struct ksm_stable_node. When we remove a folio from the page cache, we reset folio->mapping to NULL. We often remove folios from the page cache before their refcount drops to zero (the common case is to look up the folio in the page cache, which grabs a reference, remove the folio from the page cache which decrements the refcount, then put the folio which might be the last refcount). So it's entirely possible to see a folio in this function with a NULL mapping; that means it's been removed from the file through a truncate or siimlar, and we need to fail the mkwrite. Userspace is about to get a segfault. If you find all of that confusing, well, I agree, and I'm trying to simplify it. So, with all that background, what's going on here? Part of the "modern" protocol for handling page faults is to lock the folio in vm_ops->page_mkwrite. But we still support (... why?) drivers that haven't been updated. They return 0 on success instead of VM_FAULT_LOCKED. So we take the lock for them, then check that the folio wasn't truncated, and bail out if it looks like it was. If we have a really old-school driver that has allocated a page, mapped it to userspace, and set page->mapping to be, eg, Movable, by calling folio_mapping() instead of folio->mapping, we'll end up seeing NULL instead of a non-NULL value, mistakenly believe it to have been truncated and enter an endless loop. Am I being paranoid here? Maybe! Drivers should have been updated by now. The "modern" way was introduced in 2007 (commit d0217ac04ca6), so it'd be nice to turn this into a WARN_ON_ONCE so drivers fix their code. There are only ~30 implementations of page_mkwrite in the kernel, so it might not take too long to check everything's OK.