From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5255FC63797 for ; Tue, 17 Jan 2023 14:46:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F92D6B0071; Tue, 17 Jan 2023 09:46:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A8D36B0073; Tue, 17 Jan 2023 09:46:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7988F6B0074; Tue, 17 Jan 2023 09:46:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 66A936B0071 for ; Tue, 17 Jan 2023 09:46:50 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2FB4F1A01A2 for ; Tue, 17 Jan 2023 14:46:50 +0000 (UTC) X-FDA: 80364567780.16.192E82E Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf17.hostedemail.com (Postfix) with ESMTP id 9FA2C40016 for ; Tue, 17 Jan 2023 14:46:47 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=fNbxNyde; spf=none (imf17.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673966808; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xsjwaxoJxsL4xfEAzKAlmKveHujEr6BeVNdcPgevdoE=; b=Ihzr/vospPaa59F9JprFHZxv0CfR6ff7dq2VTRFWY1LJUs0rsdHD7H6CEUDqHVYGAORIqp XB0Ctj+dKHVNOvhqhttki1TaIa4H5Fmw9CEKU5RJairCH/WAV/TfK8QKJxh1JJ2t7UkpSJ 9kCmkrWq/TAXlX5NrE2kDTBQ9IFRYow= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=fNbxNyde; spf=none (imf17.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673966808; a=rsa-sha256; cv=none; b=w3dZR6NA5JQxChqHD34TUtziNTOhGjZlJABEsVhK0Z/BjRnIVcdrL+fshAJ8rMZjQs6vNe 8y8l+BnQrYO4ucN2kPi5RjI7ln+KgZ0cvbyXBnqhlby8uLcKTp8WyiPWm5q31OZ8WzBqhh m+37PIGuO1qXuUcX2Yo98UKvwwPtoKY= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=xsjwaxoJxsL4xfEAzKAlmKveHujEr6BeVNdcPgevdoE=; b=fNbxNydeDqYNMqPlTAW+lm8VHb yPD/zCTMULKf1G1B76xM0vrBbECGNL0bG9hguSsX/c9YsE3LsRUPm8VD7yn2KFKjS7NNbSoNzf1sA C1FiabFpSTOwGYz0MRBuYvPJHg/03Vs3olsVutYbt/ylDmf0QcKJK6vN34K79ZpvxBPOihBZjbmKg JoaO2fdU8nqc0KIarDommws8e9UkMJJg0ihZbbcuyhQb1KJuoEtJflFP+OZszFRDbl3NRV7gkeIua LPS5vXbnRJQ1gYXJawh+v9QP7DTD1c0+RP7NUjvn08XMiowQ5uURmIbzotV0kKqxVu73ccJoos6gT z7OFnyVA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pHnEZ-009l3s-SO; Tue, 17 Jan 2023 14:46:43 +0000 Date: Tue, 17 Jan 2023 14:46:43 +0000 From: Matthew Wilcox To: David Hildenbrand Cc: "Yin, Fengwei" , linux-mm@kvack.org, dave.hansen@intel.com, tim.c.chen@intel.com, ying.huang@intel.com Subject: Re: [RFC PATCH] mm: populate multiple PTEs if file page is large folio Message-ID: References: <20230113163538.23412-1-fengwei.yin@intel.com> <2100e7c9-2a5c-4c1a-6e22-9710a2b0c66f@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 9FA2C40016 X-Stat-Signature: 33czprfr1xuukhd4n3syze9dub7gz4oo X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1673966807-41655 X-HE-Meta: U2FsdGVkX1+XmGJnenusobfZ0RkDl2BIb9vW/bdzJYxayrcg8eF6BMwItbtpql5I46xpk0tiNB36tlM5u6CKfRYREO0ZFd7jOpLY8auzHPdGGUphTH2FCaYcRdFBQfqN6aiReSkdJRT03CREbNwZZZwPdXngTtfR2zjqQB6jDiPYYp/u4NpnfQi+uO+7zeunT7iu5KPBy6N2N+qujNGmODgIgmGe4bvJ986+fYvFC2JythyFwfdVFAkJuiBSQgO/j+vsmH6zQF/WU2LVBITlEe89Kzqs12cY17c2syfjNvI0AUrQSs0cF5j84mElTCeP+j2vEvymf2aQNIowSM86ttQ3FGWkU2PbgXv5LEO3B1tl/7aByyQ9RWtZ1prLs/MzxOu2ySEvqPFgSfLzff1rmkMEaA5g/RsKMv1N2MwsfMGxmNesgi564rgTj/frVupNMzozRxfbuFHw97XHLA9C28WzRnar8ht1ky+/rKqVkefAgl0X8P+xlhodn68XXh90+fFSEFDb3Tney46EOrW/lN4fjrPlYhOxYdsHWyMptD5R6ru4qbfKvju5H3hLIn/wEdw4udq0NqYT/v4nB3WyE4TjhNPWxgq5ngAx2ej1SRfTb1eqUWaC4ZrsfJtl/aOZGA46gKKSZL2AyqzpyxLnY2r2C3fJU2bpCwDlzdrDCTZz28mg1+7xmlAwbH0hKVpUXG/0mg+9t23ThIYMsh6gsN4sOcIi3pA8nbQSaYlQnWGAEwYtYJYy7Ig99yU6qSQqeZ6q0fp33EHrADqmi3IPEWWUiK97PQk9pqOKrHbkuLUKi3orkV9yj9lkVCWWqtqeJKM0YcOu/IlonMDVDW2m0Fmt9Z1XS/wX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 11:37:05AM +0100, David Hildenbrand wrote: > On 17.01.23 10:19, Yin, Fengwei wrote: > > > > > > On 1/14/2023 2:13 AM, Matthew Wilcox wrote: > > > On Sat, Jan 14, 2023 at 12:35:38AM +0800, Yin Fengwei wrote: > > > > The page fault number can be reduced by batched PTEs population. > > > > The batch size of PTEs population is not allowed to cross: > > > > - page table boundaries > > > > - vma range > > > > - large folio size > > > > - fault_around_bytes > > > > > > I find this patch very interesting. But is it really worth it? Most > > > file-backed page faults are resolved through the ->map_pages() path > > > which is almost always filemap_map_pages(), which does something > > > fairly similar to this already. Do you have any performance numbers? > > > > > I tried the will-it-scale page_fault3: > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault3.c > > with 96 processes on a test box with 48C/86T. > > > > The test result got about 3.75X better with 4.1X less page fault number > > with this patch. > > > > But It's a micro benchmark which shows extreme friendly case to this patch. > > > > I didn't see observed performance gain with other workloads. I suppose > > shared file write operations may not be common operations? Thanks. > > One question I have after reading "which does something fairly similar to > this already", if both paths could be unified. I've been thinking about this already; not so much in terms of "unifying these two implementations" but rather "What's the right API for mapping the parts of a folio that fit into userspace in response to a fault". I haven't got quite as far as drafting code, but I'm thinking there should be an API where we pass in the vmf, a folio and some other information, and that function takes care of mapping however many pages from this folio that it can. And the reason to split it up like that is to batch as many page operations into this folio as possible. eg filemap_map_pages() was written "to get it working", not "to be efficient", so it does stupid things like call folio_ref_inc() every time it maps a page instead of counting how many pages it maps and calling folio_ref_add() at the end. Similar optimisations should be done for mapcount, which implies some kind of batched equivalent of page_add_*_rmap().