From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15D12E77173 for ; Fri, 6 Dec 2024 16:28:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 632D16B02A0; Fri, 6 Dec 2024 11:28:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E20C6B02A1; Fri, 6 Dec 2024 11:28:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AA0B6B02A2; Fri, 6 Dec 2024 11:28:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2B4ED6B02A0 for ; Fri, 6 Dec 2024 11:28:23 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D5A9D161ED9 for ; Fri, 6 Dec 2024 16:28:22 +0000 (UTC) X-FDA: 82865066340.22.D8C87BF Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf10.hostedemail.com (Postfix) with ESMTP id 28E25C000E for ; Fri, 6 Dec 2024 16:28:11 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=PY6nlUbr; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733502484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=NRDLd6CEG6JNHOthjXxwzrsPa4qmeNOu5ZjFDrTV9Jo=; b=3ddWXXELd4Ty1MBqQLzbMPYtCmDkQk3bF2YnReTkpwPKJ6C4X7SUppYed5myST7CLeHsy7 L1zezoNeaLAOXwA1sBiWn8gHUYbaxkVkI0gR2WeO2jSQVy6fiu9l5sUkiTEpbWnuaR/QWE QfLoqlJ7hlija+1rvkVKeeOgr1UoYHA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733502484; a=rsa-sha256; cv=none; b=3KH+DzbJMbggliXTzjsf+xyy8JD9U+wHVXXFXOcsfgY+ZAQkwSgHcjaDRkS+PkjCyVPmFV xAaoGw5ukswJx7Gfz0sOLBhko3bqKKm9crmxhokIkMeJGc2JR4htQuAwEUWbvXHvkc6WZt cXbym2p1uctSzYuGJypsAtmwcFkJhio= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=PY6nlUbr; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=NRDLd6CEG6JNHOthjXxwzrsPa4qmeNOu5ZjFDrTV9Jo=; b=PY6nlUbr5E67moWd4JYZXi+qbd PpCrZ8pUQjEZsrmqnuoWWsUqlcRDkhx24XxndxhOVZHcQFyHMHdbqMwFdLntxZcxBCbIxSB69LTfL loGFrW1aGN1D6ku4jJIg8yEDsfvKn3dbJL3Y3dAvrCXbfUizWL9/9cOgsgpDCVrb35EiJAhvWQjNr NH8QCLyAeZMhIdMdGzC2tgsE/yNIC2TJ2Tg2j+tTufrUeOyGQZ7ZbS+S7bXXRV05/txYIrPTrswGx OOOdBbiNduKD+qZX791I9fn4beRoznzY6XF0x9JEvzknqUmhV/6lOzQVAOk/XcEt3QpYIRwAoro0E br8rejJw==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tJbBh-0000000Ej40-1GsG; Fri, 06 Dec 2024 16:28:17 +0000 Date: Fri, 6 Dec 2024 16:28:17 +0000 From: Matthew Wilcox To: linux-mm@kvack.org Cc: Uladzislau Rezki , David Hildenbrand , Christoph Hellwig Subject: Mapping vmalloc pages to userspace Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Queue-Id: 28E25C000E X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: f5953bghgf3t4iur4ddiz8hbujcwwyo6 X-HE-Tag: 1733502491-66202 X-HE-Meta: U2FsdGVkX18lvL3g3vzHH3dRVVmxNo7AYDgNzLjWTxL9sM8tgizWkGH1EfMqSN4UbryYcu7JMQkhL5L6U1wRn6Uvwb7wgTTAO/AwRgpg8riFg8ZnwWvCiHHmdRsd62E5VG0c/c/TIo2pioDrxnH+6dGqas34rbWyp4E8gy8oGu4epUCceldnJI8vj0a++jSaO/PUKMAM4FhHGLwqmPPRk58NpOLJ35jwWPsDYU9oG8EmFNJKNPw5siCP1WG+b0iK1qaWfGB47OT8IqVrZOAyoiLhHXb7m3fXYaFfJzmBVGgQ0fnAfQrc+mPwZqJ+0MuDK/+TNlajzjj0N0m1zUAsE9tHP2kUkxId9+oLBIkRbSKJmZCkQ+x/aHBDvbIT7DqwpIEC16bZaaxDFhjwylGM7tCT0YTHXUTmpotmd6MlXfT7F3DfCIDb67/x9ChO2FHANXPCgxVPP4DCvvFL1B526snG7kJ0UGp7HwRgcUc+Ox04VB0ex9BiPg+8qMpyNN8gjjxZ+LjRceCj2V2M12D17s04BnoaCtN5WYgVAeVB+LMvEg3qPQgM5l/OFT6hwqf6omLuMFt9ia2C4LJBcTzNnqkq0mhLxsSTF5CrSJmD4s6gZEJtl3Jx0Lq+2TBOzDT92E5PH85EDXS1Tff/3yoBO0ROO96VQUQW7I33gjmpDXo4MtxltBFFgZNRyj4hK/AYt4j6QkwbtqnPLkzLJeYlF/zhO9zaxPMUy1J1xKT1mokaAU+51gBn/BAQBRehVPhetZcb2lHYR9PtjAAZNy3oVU+lnTBlHEFh82YnFGgNnBMl6CDnkgH+CvfDv8bqeQFX9bZurWtD/u3fwGDJ48G9dVoyRR5BXsPDXHeIMEzpw5FtudrILsdarQAdtZNrlH1k58Zojq2WNePqtxJArwlGPforp4f4ORd8TyhGVGLz6Yc81wPVkdV7JAFp+CiXkXS2xeG3QYICPohrgEAm4+a aH4Lm80a WJRhYiy3tov+XSJCisYx9VWhjzi/WA5V/rekeJNfJrpGq9rpAGZNy3zl4BtLoYDgpx/6jTSnMSzry+7hozpHnWuo02kbhXI6jf6vRgJtNhggTUIKZtiRh+G4qp9E+2OiUbbRxadV+EvGQ5sv6PO3QNeEExXz1wcch4kSeRcCA46fV7aw0n5zbJH74l4NxFldVk/h7 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Today we have a very useful helper, remap_vmalloc_range() (and _partial()) which lets drivers call vmalloc(), then map that memory to userspace. It does so using vm_insert_page() which ends up calling folio_get() and folio_add_file_rmap_pte(), so jiggling both the refcount and the mapcount. As you all know by now, we're looking to eliminate both mapcount and refcount from struct page. I have four options for consideration, some of which I like more than others. 1. We could introduce a vmalloc memdesc that has a per-page mapcount and refcount. This seems like unnecessarily high overhead for a precision of tracking that is, perhaps, not warranted. 2. We could do no tracking at all of vmalloc pages. Insert the PFNs of the allocated pages and rely on the driver to track everything correctly, not freeing the vmalloc allocation until the mmap has been torn down. This implies not supporting GUP. This option feels risky to me; we're depending on device driver writers to get this right, and if they get it wrong, it's quite the UAF hole; letting an attacker get access to pages which could be allocated to any purpose. 3. Embed a refcount into struct vm_struct. We can support GUP if we want. Calling GUP bumps the refcount on the entire struct. When the refcount hits zero, we free the entire allocation. There's no need for a mapcount or pincount because we don't need to distinguish between temporary and longterm gups. 4. Introduce an indirection structure between the page and vm_struct which contains the refcount. I'm most in favour of #3, but there's probably ramifications I haven't considered.