From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6306C02188 for ; Mon, 27 Jan 2025 22:33:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 31EBC2801B9; Mon, 27 Jan 2025 17:33:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CF3A2801B8; Mon, 27 Jan 2025 17:33:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BDD52801B9; Mon, 27 Jan 2025 17:33:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F29C32801B8 for ; Mon, 27 Jan 2025 17:33:43 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A3CAB1A07AB for ; Mon, 27 Jan 2025 22:33:43 +0000 (UTC) X-FDA: 83054685126.30.9A5757F Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf22.hostedemail.com (Postfix) with ESMTP id C4815C000F for ; Mon, 27 Jan 2025 22:33:41 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=HKtB5XDG; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738017221; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uU5zmDVWzC+ARLP8TVIonfrBC2XqOqBJaZlJi6jan+U=; b=1MdKEaHN3wd6s1dZmmCJp8q/QusXpSliJa/3IMzhXX8MaWx6Np/KjfBjAUc+zvQmOfBt0q B40ousTI8KAfB3H3qXoFQp3fDfvnrnrr1or6rIXEqC57ke6sa3iMHt8DDGUooeAEjmqUHC m8HGKucH2Dl3T3U6yL/3t+NRYjNxC2w= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=HKtB5XDG; spf=pass (imf22.hostedemail.com: domain of akpm@linux-foundation.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738017221; a=rsa-sha256; cv=none; b=I/QWgYhEwUqt9KpfLkP4QTdqxnJae2lVHaLrO0llUr7naAdWhwYZnEzptFjY7olNrTr4XK T3GjyX5NkE7nparSAlc1ZUpFrFUc6rfS9J6l9U4CePrHOCOdQ1s5df9e+yOIX9qFVgO436 1ubaO0DqlPne9SSYt9Tq4wcRr+Ys/68= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 13B02A41B13; Mon, 27 Jan 2025 22:31:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 851EAC4CED2; Mon, 27 Jan 2025 22:33:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1738017220; bh=AFXXyE8v95CfzioBCb2Xv7hlTXv5LhiICb5s0v8I0To=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=HKtB5XDGDekFry9rzSFFDaex3oQMQ/tpN4FCxjAiLEFevNResmj1C1sgTwE80DpA/ jreG49ccUJUMkOs3fN7yIa80aLcv71bkZ2Ug15J6h7FyXcEoEeKQlRYSZDVDEMxYd1 ZsooKlUPqEE7sNYWjk2FcGIAax/Q1/KA5WbX0hRI= Date: Mon, 27 Jan 2025 14:33:39 -0800 From: Andrew Morton To: Anthony Yznaga Cc: willy@infradead.org, markhemm@googlemail.com, viro@zeniv.linux.org.uk, david@redhat.com, khalid@kernel.org, jthoughton@google.com, corbet@lwn.net, dave.hansen@intel.com, kirill@shutemov.name, luto@kernel.org, brauner@kernel.org, arnd@arndb.de, ebiederm@xmission.com, catalin.marinas@arm.com, mingo@redhat.com, peterz@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, tglx@linutronix.de, cgroups@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhiramat@kernel.org, rostedt@goodmis.org, vasily.averin@linux.dev, xhao@linux.alibaba.com, pcc@google.com, neilb@suse.de, maz@kernel.org Subject: Re: [PATCH 00/20] Add support for shared PTEs across processes Message-Id: <20250127143339.b1f6b6d5586f319762c5e516@linux-foundation.org> In-Reply-To: <20250124235454.84587-1-anthony.yznaga@oracle.com> References: <20250124235454.84587-1-anthony.yznaga@oracle.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: C4815C000F X-Stat-Signature: 6xmzy3dny4bqzsohtt5zjbbwb517yk4a X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1738017221-973146 X-HE-Meta: U2FsdGVkX18f0FIEkPz/THAJZ0Plx/TFDpol/KUFcqcxCJM4wOS734gIPCMFKnUry8rnVTpJgoGnKlDZst/IMBV4VvLv3smRJK7TkEN169Gpsf58ULRTOOCA8LUYE0su5UYIpUxX8yNGIFCwmy3WtxeTRf8avDa7Oj4JlBqzKCWV+/n3jhBXagNkLDC6DlVKpURCK3/pnf7AwZxx1+/tIHVussqZU7Q1BMVNlnpQzFb9TSJ4lxmWKzNV7BQmRcS3QxmdxO3l7SFztaMsGev7muhAclLmsz0esOGCl2mzM83RWFm+DMFVGMYJFesNc5GaWVgDhryhf6AhlVFLV5n4gW6Hxbufoy6yMG9PWLKfVTo8CxgKLYeEZAaAUmGwakj+23rQX11ebHCwh4vYLzgCT1MaNtYoQZORNzvW4T3dxnjfQlHpnsAS2wK+PavYmmO2/gTx3M2WWBb+Wuohzyedzj5yWy/ph1fxnR4o/pABAOPbKZiYdhgVzXdtrmlhjg7MQIhLo4LPoct5IiTPMxtuRiNDnrKUheRgPS3PlAK50jM63HNmT+w0tkFETlJBxVTCox92bSLBAhEzk7ZQLqCxbVF++OQiQ+WYsXFsm5BwEuojpEtf+H7JmwrBPfh7T+n0cTBgvcrsymUwxWY8lEkdAillYySD9qOeX1hVvdGiJx2ntnYgW0di0ECIiUzRBT7ICyCQom9bilJ00ahcuqeCmqHTYajrNh6AfbpfpQ5h0oy/jQ+0EtDEOKFT9vvD3uFuzsL4jfPtmMOkwErOulp3CGgd/VozKFeX7eJCrGXOrygHPwWkVdEVS248e4/Gqk4IEQyC49jOL3wIlZ4KmTjvXf00uvpZHQ2fsuDMhe+oXchzkY2T+F5LAjAh67/2dEyq/iLil4s8eEkxrgsieS/wlX1wwq4FLh2BeGXAPhpM7182tRXrkysKPhQYim23moN8bTwrUtO420WYJcdR+L0 FBWZn2dJ DsrinirOhUiUx/r6umliS+FoMDvT64HAhVf5T0OXmS68JTO6q871kNpwJsToqemhtYbEi0ViFTr6IW6bboXCRguqyhmXd57DEJDz/Jr/jOai7RVWrPZC57W1QlJOpOLUvxbUhCP2GruUOCRwWr7hSubPLEKllMZ59cAu+F3cYOcOjlSuBw1UjvCH0S/yOdrHHbzCVvXAcpv/C989Yuu61j4qkVYvdCcPJf2UCzaNt/1IKvBGnf162A9gnynoLBYyl15PHH5K6Ewb6xX2GCVxr8otxNw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 24 Jan 2025 15:54:34 -0800 Anthony Yznaga wrote: > Memory pages shared between processes require page table entries > (PTEs) for each process. Each of these PTEs consume some of > the memory and as long as the number of mappings being maintained > is small enough, this space consumed by page tables is not > objectionable. When very few memory pages are shared between > processes, the number of PTEs to maintain is mostly constrained by > the number of pages of memory on the system. As the number of shared > pages and the number of times pages are shared goes up, amount of > memory consumed by page tables starts to become significant. This > issue does not apply to threads. Any number of threads can share the > same pages inside a process while sharing the same PTEs. Extending > this same model to sharing pages across processes can eliminate this > issue for sharing across processes as well. > > ... > > API > === > > mshare does not introduce a new API. It instead uses existing APIs > to implement page table sharing. The steps to use this feature are: > > 1. Mount msharefs on /sys/fs/mshare - > mount -t msharefs msharefs /sys/fs/mshare > > 2. mshare regions have alignment and size requirements. Start > address for the region must be aligned to an address boundary and > be a multiple of fixed size. This alignment and size requirement > can be obtained by reading the file /sys/fs/mshare/mshare_info > which returns a number in text format. mshare regions must be > aligned to this boundary and be a multiple of this size. > > 3. For the process creating an mshare region: > a. Create a file on /sys/fs/mshare, for example - > fd = open("/sys/fs/mshare/shareme", > O_RDWR|O_CREAT|O_EXCL, 0600); > > b. Establish the starting address and size of the region > struct mshare_info minfo; > > minfo.start = TB(2); > minfo.size = BUFFER_SIZE; > ioctl(fd, MSHAREFS_SET_SIZE, &minfo) > > c. Map some memory in the region > struct mshare_create mcreate; > > mcreate.addr = TB(2); > mcreate.size = BUFFER_SIZE; > mcreate.offset = 0; > mcreate.prot = PROT_READ | PROT_WRITE; > mcreate.flags = MAP_ANONYMOUS | MAP_SHARED | MAP_FIXED; > mcreate.fd = -1; > > ioctl(fd, MSHAREFS_CREATE_MAPPING, &mcreate) I'm not really understanding why step a exists. It's basically an mmap() so why can't this be done within step d? > d. Map the mshare region into the process > mmap((void *)TB(2), BUF_SIZE, PROT_READ | PROT_WRITE, > MAP_SHARED, fd, 0); > > e. Write and read to mshared region normally. > > 4. For processes attaching an mshare region: > a. Open the file on msharefs, for example - > fd = open("/sys/fs/mshare/shareme", O_RDWR); > > b. Get information about mshare'd region from the file: > struct mshare_info minfo; > > ioctl(fd, MSHAREFS_GET_SIZE, &minfo); > > c. Map the mshare'd region into the process > mmap(minfo.start, minfo.size, > PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > > 5. To delete the mshare region - > unlink("/sys/fs/mshare/shareme"); > The userspace intergace is the thing we should initially consider. I'm having ancient memories of hugetlbfs. Over time it was seen that hugetlbfs was too standalone and huge pages became more (and more (and more (and more))) integrated into regular MM code. Can we expect a similar evolution with pte-shared memory and if so, is this the correct interface to be starting out with?