From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A680C48BF6 for ; Thu, 29 Feb 2024 14:12:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E382C6B009B; Thu, 29 Feb 2024 09:12:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DE7FA6B009C; Thu, 29 Feb 2024 09:12:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD7CA6B009E; Thu, 29 Feb 2024 09:12:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BE3066B009B for ; Thu, 29 Feb 2024 09:12:11 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 65AF341372 for ; Thu, 29 Feb 2024 14:12:11 +0000 (UTC) X-FDA: 81845030862.11.8AA3B83 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf03.hostedemail.com (Postfix) with ESMTP id 3470320023 for ; Thu, 29 Feb 2024 14:12:07 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="Qw/zImbU"; dmarc=none; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709215929; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lErWlM+jsWVW2ni+l4ArNm2ITyS0OLcOgGVCJLU5KdU=; b=uKXV6VWPS9mK/qZMT2aln8x3o0ugfH0MvJC1PLhyfHgsL4Bgp0vcO0e12Isx8Ig16vYElR z+vWoypJlMvmwG/bCpZ9sRiC07dK+jd/yroW+0H4JGh/wTgw42NWf8XaSLncsSLgHjuwLh ihvKDTDfS1i3X2etVbe3wvEIG0KsO50= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="Qw/zImbU"; dmarc=none; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709215929; a=rsa-sha256; cv=none; b=c+Ni6F8miZM0oaG5b25hPtz1d0FddUfJ03cArXcukn8EP9MWOA8x1SbOZHYYsi2Fn0n6iA JRFdxXg/Ls9CHsuBS7kaMxIMh2DUHQ9NhL1KMq707NYH+NjFCQVaZMBiaicxY99bcxO0M6 o4IAQgbkPmoLs6yopVKKfv2sn43AfA0= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=lErWlM+jsWVW2ni+l4ArNm2ITyS0OLcOgGVCJLU5KdU=; b=Qw/zImbUzzz0Nkkr/j5se7jk4S 84MHOL6BxhhPsTfU/OHDdHkJsA5X304aDeTPJJT6Wvcei3gPUbFxP0yvRZWF9+DbMWmMyZWfdE2ri 33ctczCZ/Z8RcQnheo9UK+zZLwt9Y0flFaTFMjAX1YfAEkxncUU8UEmSzpOQHvw40iGSxBhUrsNnb eYerN0DfUUtYdaxiH+dCMEot7N61AUizGdxVaAllmIYt/UZEMWAf/mtziS0r3bQTRMeTPJiYo/k1h V1akAnwQdVOTb954erae7B7LyYpeJuZJUVbxZnaRMua3gSCtvsnVC6fQ5mAlpD/UXt10I8mcqZGRv cq1YLA2w==; Received: from willy by casper.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rfh8m-00000008Dwm-05kT; Thu, 29 Feb 2024 14:12:04 +0000 Date: Thu, 29 Feb 2024 14:12:03 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: Khalid Aziz , lsf-pc@lists.linux-foundation.org, "linux-mm@kvack.org" Subject: Re: [LSF/MM/BPF TOPIC] Sharing page tables across processes (mshare) Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Stat-Signature: ymhem6kwxu71b1sqdqmrftgepupwsd4u X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3470320023 X-HE-Tag: 1709215927-881855 X-HE-Meta: U2FsdGVkX1/BCPkO+LsZ72uRxxJzDozKskh+037OFi2QFuC2H5vVKwMLsVk3WKDuJ3j7BrrbMRRmZMnPOdf96Z+e7TA99eSVZtqDYCmq1QD4f1V4ovejlKBv2HVc44PxM6GIqXq7sFgpPKP8UMx8iT4Ex+IbPFwuXBXMpEL3T7U3ustdNziZcNp4+W3w5zqK+wNbIMWCBctxrLq+CKKQn/rKSSr52V15OtkThWCQ8I5veTeifzSRLq2Vo1D9Hw8G9Njl806zLzGRxQFp3y6KqvkJPzoEpvRcBozK4xpVse9KPc4nt+A5g+UohoTsvYhFVHwHvDZW+BCFwt/QkAttJlbOW1r1sGJEUJIybaR6pGgWx7FYyVf7IUKWzihqDBAzQbF3CLcw4fnK3gduZmVhrva//ppWhNkKB9KJDXGHJnponwL9sCn4CLH013+OXyALD0HU/88GAyWV8IeVMjGHpFKK756GS3ZOf9nWcxmUG/s+ivnLKEUGAqdf1Q9RJrmbhtOdkwyrDGa2YaOzqIVyHlHGeQ5qxAHlmr4xYZSWRXBWTkCOC0kLAtwXqN1nHbdqt4ar2RvtFxBks8BXKDNlM6QyJ5ajo9A2zt3QjI03P39rMnrup/cxaFuPO4kOume+W8bhjZf5KCWMIoq+9dlIjSV5wZdgYlBni2nW4bLg5IJf60TyWD/I3SkgSNn3QT2JW01TNlVK55X0qijtCxYhmWw83msXT7wS4MlxENNh4RHhdnXyW2c0d1FCP7CdNlu6hChv8EoX5QZMxapav59mmF1MsD//PY1aw/KHNWdNtUSMmfXyTvXuxJ9iStmwH1ud0FWHQ2Yz2+gtGDPHp++wcWxy6BLJRqFKfuv9FVFtA+bGbFTcrGGI9p5JISIQ/0Cq X-Bogosity: Ham, tests=bogofilter, spamicity=0.015521, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 29, 2024 at 10:21:26AM +0100, David Hildenbrand wrote: > On 28.02.24 23:56, Khalid Aziz wrote: > > Threads of a process share address space and page tables that allows for > > two key advantages: > > > > 1. Amount of memory required for PTEs to map physical pages stays low > > even when large number of threads share the same pages since PTEs are > > shared across threads. > > > > 2. Page protection attributes are shared across threads and a change > > of attributes applies immediately to every thread without any overhead > > of coordinating protection bit changes across threads. > > > > These advantages no longer apply when unrelated processes share pages. > > Large database applications can easily comprise of 1000s of processes > > that share 100s of GB of pages. In cases like this, amount of memory > > consumed by page tables can exceed the size of actual shared data. > > On a database server with 300GB SGA, a system crash was seen with > > out-of-memory condition when 1500+ clients tried to share this SGA even > > though the system had 512GB of memory. On this server, in the worst case > > scenario of all 1500 processes mapping every page from SGA would have > > required 878GB+ for just the PTEs. > > > > I have sent proposals and patches to solve this problem by adding a > > mechanism to the kernel for processes to use to opt into sharing > > page tables with other processes. We have had discussions on original > > proposal and subsequent refinements but we have not converged on a > > solution. As systems with multi-TB memory and in-memory databases > > are becoming more and more common, this is becoming a significant issue. > > An interactive discussion can help us reach a consensus on how to > > solve this. > > Hi, > > I was hoping for a follow-up to my previous comments from ~4 months ago [1], > so one problem of "not converging" might be "no follow-up discussion". > > Ideally, this session would not focus on mshare as previously discussed at > LSF/MM, but take a step back and discuss requirements and possible > adjustments to the original concept to get something possibly cleaner. I think the concept is clean. Your concept doesn't fit our use case! So essentially what you're asking for is for us to do a lot of work which doesn't solve our problem. You can imagine our lack of enthusiasm for this.