From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0D1CC433F5 for ; Mon, 7 Mar 2022 14:38:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78E868D0002; Mon, 7 Mar 2022 09:38:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 73ED68D0001; Mon, 7 Mar 2022 09:38:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 609978D0002; Mon, 7 Mar 2022 09:38:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 51D958D0001 for ; Mon, 7 Mar 2022 09:38:08 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1930621C28 for ; Mon, 7 Mar 2022 14:38:08 +0000 (UTC) X-FDA: 79217845056.07.4A843B8 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf05.hostedemail.com (Postfix) with ESMTP id F272210000B for ; Mon, 7 Mar 2022 14:38:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=NW9RpFh+hvH2B+kdkMZrrBGYcByfxt2ECpS3XLUg5ic=; b=tJW4g9xzh7Hr7eTiR6n30j60WO IYwXR9LZ0duAlYTPi8RrvkjPPngfb7+5UiJ9/ZQDEmqZxuVEE7Qbp6lMMYytUbktkhxgCkWSDTHz/ 1HZ0/A46EkR5RhOL/6a/GA/m7T0Xdl7GYI14I9F+ekXEYTzuywuHs6g05qrRgYv8hr/bztXDiu2e+ NN0Vkgbei6YVv2EJhvRxLOhyg0NOBGLz0ad8LGOORcltIsEZCfhwsdgFVWe1FyuU/IK3Nyj+5fUwi f50yHXIse9zePYwFp7mtZTHd3yptwZqau6qA/Lwjq3Ry8EyNIJpG+QOGy9Q7M8mbbEBcnSDA3dMvA /ZHTriAA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nREUe-00FIYz-EZ; Mon, 07 Mar 2022 14:37:48 +0000 Date: Mon, 7 Mar 2022 14:37:48 +0000 From: Matthew Wilcox To: Dave Hansen Cc: Andrew Morton , Jarkko Sakkinen , Dave Hansen , Nathaniel McCallum , Reinette Chatre , linux-sgx@vger.kernel.org, jaharkes@cs.cmu.edu, linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org, intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, codalist@telemann.coda.cs.cmu.edu, linux-unionfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH RFC v2] mm: Add f_ops->populate() Message-ID: References: <20220306032655.97863-1-jarkko@kernel.org> <20220306152456.2649b1c56da2a4ce4f487be4@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: F272210000B X-Stat-Signature: e7xfnhtrqqqfnp9tousnzksa7y8f4pq9 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=tJW4g9xz; spf=none (imf05.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none X-Rspam-User: X-HE-Tag: 1646663886-816632 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Mar 06, 2022 at 03:41:54PM -0800, Dave Hansen wrote: > In short: page faults stink. The core kernel has lots of ways of > avoiding page faults like madvise(MADV_WILLNEED) or mmap(MAP_POPULATE). > But, those only work on normal RAM that the core mm manages. > > SGX is weird. SGX memory is managed outside the core mm. It doesn't > have a 'struct page' and get_user_pages() doesn't work on it. Its VMAs > are marked with VM_IO. So, none of the existing methods for avoiding > page faults work on SGX memory. > > This essentially helps extend existing "normal RAM" kernel ABIs to work > for avoiding faults for SGX too. SGX users want to enjoy all of the > benefits of a delayed allocation policy (better resource use, > overcommit, NUMA affinity) but without the cost of millions of faults. We have a mechanism for dynamically reducing the number of page faults already; it's just buried in the page cache code. You have vma->vm_file, which contains a file_ra_state. You can use this to track where recent faults have been and grow the size of the region you fault in per page fault. You don't have to (indeed probably don't want to) use the same algorithm as the page cache, but the _principle_ is the same -- were recent speculative faults actually used; should we grow the number of pages actually faulted in, or is this a random sparse workload where we want to allocate individual pages. Don't rely on the user to ask. They don't know.