From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21716E77173 for ; Fri, 6 Dec 2024 21:01:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A49DE6B02ED; Fri, 6 Dec 2024 16:01:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D2EB6B02EE; Fri, 6 Dec 2024 16:01:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 873736B02EF; Fri, 6 Dec 2024 16:01:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 658EF6B02ED for ; Fri, 6 Dec 2024 16:01:52 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 223851A0244 for ; Fri, 6 Dec 2024 21:01:52 +0000 (UTC) X-FDA: 82865755266.08.09E3E5E Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf12.hostedemail.com (Postfix) with ESMTP id 4000A40022 for ; Fri, 6 Dec 2024 21:01:42 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=qvkWLsYi; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733518902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sYRAjcOZBCHWV0qpmAAtJ1pW/oAAVg1WQbrpoCHgAMM=; b=UzSaKd5NFEFums/QmhjC9ToGe/UajVMGCkDQWT7ZspQq6vEaAGC8jgs9/93EhoCbvHyj8Y nRZQD5bSJ+EHBSfAAdLO5RCsxON2aEHTohdN08QP5yUL0LC7jSqJkXi5AwlZPPdvLedXnK RSZJpoSNfZE+VGLetmfzKKa4ry/aMEk= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=qvkWLsYi; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733518902; a=rsa-sha256; cv=none; b=Fc4pQRHIYkekKYWbhiM7hORxm1E8hdOglj0HJiLXuyKIUCBM7/AVGyVJSTbZ6dajS1PX7R DzHlm4wQR7crDwu5UjGBipc/I0WZVzCPkpQBRZdqqTyDfJ1ODwgOBvGpCIQHJUqfWSJGMC zFl/pavZdJt1plXSoaIavcJQ0fJHbIc= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=sYRAjcOZBCHWV0qpmAAtJ1pW/oAAVg1WQbrpoCHgAMM=; b=qvkWLsYi154PteiPwsMT0u9qLv arBLdlc5GNWIyNWjT98CZOVKyIFRclLXqs7vg4X/G+KZR+3wXKU+ufT1QjqYR9q7uJu+8HeEfijW9 wYJETgFFk8mlGQviGsIXxTmfIEuXqDygPdQsW4zNwhswN0Q67iL3KZLMTcN8zthOOowz4LkBl1oZB IbikDaRBewFbfRj3jdoa+WL8HfUxtu4iDUtcwTET8u91X7BcccfLbpKtM+A9gpbtijM/RR15lYbuY ghY6mFqdt+ZSiS5Ul60c9VGg7h7TQBHLhkyFy83VzKD80g4GJZwkDjUPZ2CFvWjHFL3EPpauu+edH y4pL7JWg==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tJfSL-0000000F064-03Yh; Fri, 06 Dec 2024 21:01:45 +0000 Date: Fri, 6 Dec 2024 21:01:44 +0000 From: Matthew Wilcox To: linux-mm@kvack.org Cc: Uladzislau Rezki , David Hildenbrand , Christoph Hellwig Subject: Re: Mapping vmalloc pages to userspace Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4000A40022 X-Stat-Signature: nx3iqf1snxunqghjr835pyki4s9gt3oj X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1733518901-623299 X-HE-Meta: U2FsdGVkX1+4Oh4ZJreTn/P12N04axK7OsfzVyFNzpkDcsaCucULRYsdps8pLQm8+wDqQluhDk68aG8BiizWcVSRaRPpUardgmBJYwJKMXNpCdZldRPGuBVdksJzYkdm+N2SUeuug6Y6YCO6S1+sz3VveIdKyZAJyVVIKpV4VkxX0xSRtjdIlxt0cOtaqffacIgfGx3QwZohKFKstnpXMdTVJQE7cgUZw9UoKWjvqCBRw2L/kzzxWjyak+D1WMeMuc+UISkxOEJcXeTdjqOkr84ZcixlTX8VeoBnDXdAy+JdroqOJwU5X6OQhuZTbt4FiquUXXz68qReqjCw7ZbDXXEJEerH/tTwPNd3doZm1euzy0rBC8mim+WvWvKoj2v5cD4zzn5G6UVjpFhnrzxtHeSXO/XOSDVIMoHC4MI88GLdfg3gvjnli1vAuBn1g7KWCIRgpIS8Onng5ubTZm+Y02fXFLVgUSrGPE5Et3/A3SJ6Yz6tE9JbwmN2NBZFmR0q1jRelSLnmJlX3rTfZ9cNgESPIlL10nFtmo4EVHPOCPuhsDxYC5iEAWmoan75L2H48x2zQAqOSQlD0jKm2hh0hbFDedJijVQ5dEb3TuFyK3pLSUGxvKXIC/XeB+Faht4WXp+rZMgqui8AtuKc2dsqY8nwZgYd0qwCsFK7ECXIPCq0M7vfuuTrqNz3JjN67Z6+23fEYn1N3vqxaSDFii0wArPu2812Cxae9WKrjsxRfE8jOwaOpWf9+dxHbZjq9GGnmRcw2vQhEuVI9CZURop9vVbnYAboZGebRY9NYY+ccVnongZ8Ls615TVyqwLfRStR/F71yweFxQ5gm87pKrZlmf2wqRaUnlRcOB36RuRkr49Wjd6aUbOshwek7Wso/CL79jmS5+4r2XLBQLd2HxG2dspTuMR6vWF+YZTNqVMSA58zTrC0EAZXAGoIaEQJEYlpgwInshNbErEPjWjOfmx zHNSuxU5 uTgLv2OZTVCQ9T0VSgLJGqjeOPfXbTOXzrYieF2EWYRFyCKSAM28FfQb8ikYZ159uvqRUBXhTwA9iuD1P4EtT4IxYvmrn2XLetqrGug829gR9NVLlNeNBvRkUdB1x/ZedFD1Ffi/pfHpS94tk0R/fwoPgdjEcr+ei+gmm4hBmfFqy32OkkgUAqLaSCMjFgMEMbgsARvBKc7vVcS3uwz5u7d0sLa+PNqCdlXfFnlpno24wwPo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 06, 2024 at 04:28:17PM +0000, Matthew Wilcox wrote: > 4. Introduce an indirection structure between the page and vm_struct which > contains the refcount. I'm starting to really warm up to this one. There are a number of places that we allocate "some pages", but want to treat them as a single object, not just vmalloc. Let's call this a 'scamem', short for "scattered memory". But this is going to be challenging. Assuming we want to support GUP, we need to be able to go from page->scamem [1]. In the skinniest version of shrinking struct page, we have just 8 bytes per page, and we need to both store a pointer to the scamem and store information like node, zone, section for _each_ page. We don't need to worry about this for folios/slabs/... because all pages in the folio have the same node/zone/section, so we can store this information once in the folio and then copy it back to the page on free. We can't do that for scamem without a (potentially large) allocation. And even if we do something like: struct scamem { unsigned int nr; refcount_t refcount; unsigned long flags[]; }; to be able to implement page_to_nid() on a page, we'd have to figure out which page within the scamem this was. So either we have to give up on our dream of an 8 byte memdesc, or figure out some other way to do this. So what if we store the scamem pointer in vma->vm_file->private_data, or vma->vm_private_data. That would let us keep the node/section/zone in the struct page. GUP has the VMA, so this can work. Yet another possibility would be if we can look up the page's pfn in some data structure and reconstruct the zone/section/node information at freeing time. I don't fully understand the meaning of this information, so I have no idea if this is possible. My current thought is: struct scamem { unsigned int nr; refcount_t refcount; struct page *pages[]; }; and changing vm_struct: - struct page **pages; + struct scamem *scamem; (I don't think we want to embed it in vm_struct, since we want vm_struct to have one refcount on scamem, and for the scamem to be freed once its refcount reaches zero rather than freed as part of vm_struct)