From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DC0AC54EBD for ; Fri, 13 Jan 2023 16:33:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C0308E0002; Fri, 13 Jan 2023 11:33:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 86FE68E0001; Fri, 13 Jan 2023 11:33:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 737BC8E0002; Fri, 13 Jan 2023 11:33:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6452C8E0001 for ; Fri, 13 Jan 2023 11:33:42 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 27A9AAB13F for ; Fri, 13 Jan 2023 16:33:42 +0000 (UTC) X-FDA: 80350321884.25.2181873 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf01.hostedemail.com (Postfix) with ESMTP id BF8DC4000F for ; Fri, 13 Jan 2023 16:33:39 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lJHAkY6h; spf=pass (imf01.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673627620; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=M6/qfL+54rP83MAbjrV5FY2RT3DXtwJyPvIhImnF3C4=; b=4HzDNG5z+0gTy6mGp7VxwVdNE3JGqjMVUJtm9/hjW/Ih2yNbC4LNRKlxlGlgoUsF1rmZ/b PfhY0cZsb0LdE6ZGCchkaxW31wnP9g0qnv/DgfOJtQ/p68iVm6eQrnDIMXb0h0XNh7wAk4 FV0ymU5NP6uaFHRz/KFx4pJDgX2MoyA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lJHAkY6h; spf=pass (imf01.hostedemail.com: domain of fengwei.yin@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=fengwei.yin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673627620; a=rsa-sha256; cv=none; b=8jujqXVOwdbox9iLsdfGwvnYknBuJudio2OZfye7astLAySVOlyW4PPNysTtdX57YdiJYd wQD7sX2UntUBMfNivkxZoMWtF4iJuKdcpGM+iobsxsdxM8dJAzdcD+1gMh1a65Rwn0zN56 lPFj3FuE4g3TAySFxAGkBLRh90gIQw0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1673627619; x=1705163619; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=05y6GId7R3dbWz/DuhSMrqZ1pWUIV00yF/yKxuTK9/U=; b=lJHAkY6h4gis6JZswRmdMEmkK7faRKFDRKhZBs/NvUq5KDV7RJ1ALg5L YqDjBaI8kk9rGvjpy3GhyTn3L3+sXZwK7a8m/5VXIVTfVf7hJfjChjLpC 375xMY27za933KHatyQitvNluuAOgQMIhGFm0x3SFoQqLk8wp9s1ef/3Q by/xSoSh5kSDhYLvALR4x41jhAWOiXAQX/OwOQ3GE25J1U4eBP7XUNO4Q FmYDoazLlCwn6PSIk+fSsMwYToFODVaj2/mZ7Jnysnw2G239nf/8BVEcl X4xuzwZ40p5wy9mmd18KVGqs+pJPDifqBT53uFDyXviQZFYDHhS3QHP99 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10589"; a="307578401" X-IronPort-AV: E=Sophos;i="5.97,214,1669104000"; d="scan'208";a="307578401" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2023 08:33:32 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10589"; a="726761587" X-IronPort-AV: E=Sophos;i="5.97,214,1669104000"; d="scan'208";a="726761587" Received: from fyin-dev.sh.intel.com ([10.239.159.32]) by fmsmga004.fm.intel.com with ESMTP; 13 Jan 2023 08:33:30 -0800 From: Yin Fengwei To: willy@infradead.org, linux-mm@kvack.org Cc: dave.hansen@intel.com, tim.c.chen@intel.com, ying.huang@intel.com, fengwei.yin@intel.com Subject: [RFC PATCH] mm: populate multiple PTEs if file page is large folio Date: Sat, 14 Jan 2023 00:35:38 +0800 Message-Id: <20230113163538.23412-1-fengwei.yin@intel.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: fzggdwc6mdupmj1mor9m7jx6bjom3fk6 X-Rspamd-Queue-Id: BF8DC4000F X-HE-Tag: 1673627619-225530 X-HE-Meta: U2FsdGVkX19KEi3ojq8RBpFjE51UVAe9LYUkkUR/MnDL3xot0HWrTOCiROPgs84HHbQBMy9KCkRCavkJIaby5CvytbZiVhqzyDsfvk4S75P8esBLLHrVjKw/B9TDcNWF4ShipG/l0oAJ76Te1dGGhUjvxoHpsn/MJLTN1HWTUSk1w/diBxvDo8dxFtERMZdqjOiCw6qfOXDOLML8Al1EDFLetzf/TuaelcMpNvNVohpNco1OCJMEVO4clc5cJLT0njHSWaP52ZNYcg6GCDqUSx+W0VAwRgyfaE0ChzcUYjVkvp3SwVraDgnJo5U9B5RKtYM+t6//rLymRNokP0Pq1lU3rMJIYttxGBO0/FK+jxESRpoHBc6uzRoatHfCPms4/QKuZhMHVuwRbZfzCMJq17G2ZPbcrPnNn3F7l/8sqwqkCUNaW1dHBlKOw0HE4aCcTlMOhwT26wyZ9N+cCHcTuOF3utw1CtUfrUwFq+2gQRyWAu6GBCi1P0WsTtVqEOJsQ0z8CYw7Dc7ETo/MXs28yhv1Ae4ScEDDCu79hHxUIgUNUkolR3K8ggVUkfw8mSuKO0HZGlqez0F/p/d1TdY31mSebtOCy6DZHcqfZYXPXmpFc2VMBUnDrWhtZ70dApsdXRU684XrcqQR8VVqANqZVU68SX/p4CUfrlghdz1qVXSXo5KUAbmQhXUo+BdooofM0HcB7PCNxjFdANm7baEimsLX47oZ0gxz0EgOYw4HHhOJiXuaKqrr4RwZk337+8qlwj63mWqDkJMDmSeOzpChX4X5okpPpQaB6voouQ58KEoGsrSgEIl4cj5nTbZA5iZfAkP2+YUJJUycgLO7/0/hMWmgWlkcMN5TfHXx8VtCXqvVhNSp4Cjm0UZuAJcLNpzhsoBJT2Y2fykzgBwWDmx1KQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The page fault number can be reduced by batched PTEs population. The batch size of PTEs population is not allowed to cross: - page table boundaries - vma range - large folio size - fault_around_bytes fault_around_bytes allows to control batch size if user has attention to to so. Signed-off-by: Yin Fengwei --- * base on next-20230112 mm/memory.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 99 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 56b571c83a0e..755e6e590481 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -104,6 +104,10 @@ EXPORT_SYMBOL(mem_map); #endif static vm_fault_t do_fault(struct vm_fault *vmf); +static inline bool allowed_batched_set_ptes(struct vm_fault *vmf, + struct page *page); +static void do_set_multi_ptes(struct vm_fault *vmf, struct page *page, + unsigned long addr); /* * A number of key systems in x86 including ioremap() rely on the assumption @@ -4359,10 +4363,16 @@ vm_fault_t finish_fault(struct vm_fault *vmf) /* Re-check under ptl */ if (likely(!vmf_pte_changed(vmf))) { - do_set_pte(vmf, page, vmf->address); + if (allowed_batched_set_ptes(vmf, page)) + do_set_multi_ptes(vmf, page, vmf->address); + else { + do_set_pte(vmf, page, vmf->address); - /* no need to invalidate: a not-present page won't be cached */ - update_mmu_cache(vma, vmf->address, vmf->pte); + /* no need to invalidate: a not-present page + * won't be cached + */ + update_mmu_cache(vma, vmf->address, vmf->pte); + } ret = 0; } else { @@ -4476,6 +4486,92 @@ static inline bool should_fault_around(struct vm_fault *vmf) return fault_around_bytes >> PAGE_SHIFT > 1; } +/* Return true if we should do fault-around for file fault, false otherwise */ +static inline bool allowed_batched_set_ptes(struct vm_fault *vmf, + struct page *page) +{ + struct folio *folio = page_folio(page); + + if (uffd_disable_fault_around(vmf->vma)) + return false; + + if (!folio_test_large(folio)) + return false; + + /* TODO: Will revise after anon mapping support folio */ + if ((vmf->flags & FAULT_FLAG_WRITE) && + !(vmf->vma->vm_flags & VM_SHARED)) + return false; + + return fault_around_bytes >> PAGE_SHIFT > 1; +} + +static void do_set_multi_ptes(struct vm_fault *vmf, struct page *pg, + unsigned long addr) +{ + struct folio *folio = page_folio(pg); + struct vm_area_struct *vma = vmf->vma; + unsigned long size, mask, start, end, folio_start, folio_end; + int dist, first_idx, i = 0; + pte_t *pte; + + /* in page table range */ + start = ALIGN_DOWN(addr, PMD_SIZE); + end = ALIGN(addr, PMD_SIZE); + + /* in fault_around_bytes range */ + size = READ_ONCE(fault_around_bytes); + mask = ~(size - 1) & PAGE_MASK; + + /* in vma range */ + start = max3(start, (addr & mask), vma->vm_start); + end = min3(end, (addr & mask) + size, vma->vm_end); + + /* folio is locked and referenced. It will not be split or + * removed from page cache in this function. + */ + folio_start = addr - (folio_page_idx(folio, pg) << PAGE_SHIFT); + folio_end = folio_start + (folio_nr_pages(folio) << PAGE_SHIFT); + + /* in folio size range */ + start = max(start, folio_start); + end = min(end, folio_end); + + dist = (addr - start) >> PAGE_SHIFT; + first_idx = folio_page_idx(folio, pg) - dist; + pte = vmf->pte - dist; + + do { + struct page *page = folio_page(folio, first_idx + i); + bool write = vmf->flags & FAULT_FLAG_WRITE; + bool prefault = page != pg; + pte_t entry; + + if (!pte_none(*pte)) + continue; + + flush_icache_page(vma, page); + entry = mk_pte(page, vma->vm_page_prot); + + if (prefault) + folio_get(folio); + + if (prefault && arch_wants_old_prefaulted_pte()) + entry = pte_mkold(entry); + else + entry = pte_sw_mkyoung(entry); + + if (write) + entry=maybe_mkwrite(pte_mkdirty(entry), vma); + + inc_mm_counter(vma->vm_mm, mm_counter_file(&folio->page)); + page_add_file_rmap(page, vma, false); + + set_pte_at(vma->vm_mm, start, pte, entry); + update_mmu_cache(vma, start, pte); + } while (pte++, start += PAGE_SIZE, i++, start < end); +} + static vm_fault_t do_read_fault(struct vm_fault *vmf) { vm_fault_t ret = 0; -- 2.30.2