From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9342EC04FFE for ; Wed, 8 May 2024 13:56:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B2A86B0096; Wed, 8 May 2024 09:56:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 263256B0098; Wed, 8 May 2024 09:56:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12A926B0099; Wed, 8 May 2024 09:56:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E92686B0096 for ; Wed, 8 May 2024 09:56:29 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8E61F1610B4 for ; Wed, 8 May 2024 13:56:29 +0000 (UTC) X-FDA: 82095378498.13.981C0BD Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf27.hostedemail.com (Postfix) with ESMTP id ABA5A40020 for ; Wed, 8 May 2024 13:56:26 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715176587; a=rsa-sha256; cv=none; b=s8cFll7nV7EwI/5fMwtYmCsy6bQv/MxbG42JIwLVUqJMmJtgXXcje5CKOjmKZdLfXAMDjz M++zjveC/0sgXx0Dvi16Sq358a8+d+C8kB/jGWy3kAfuGA3ejPBP7QmqCyGjyWdvbFtCpn d6r5AlzUlF/x681lqToCQ/KyYQuyQew= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715176587; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f8gLnKfFNvgeo6WwKYB7IpJLGDTQrXgsBdWH6oZr2IQ=; b=jSuhvQgJimJZEtkk6F4ZqldFelpUfcsyydtPVLO+0B7tJ8HJ3XJ3xEWJN3HbDuRkmsua47 56NUbcP5Vzr7u1oow7rYGD2EUUx5lt0bzA1G6RiXUoLbQqJ/1uJiLPA7cAUNnosvQf/GXX qdGR8817OqTLegJEHU26t3zIEEuldbA= Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4VZGn8233Pz1RCch; Wed, 8 May 2024 21:53:00 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 35AD5180AA1; Wed, 8 May 2024 21:56:20 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Wed, 8 May 2024 21:56:19 +0800 Message-ID: <7779878b-3ff5-4b4e-881a-f66426dceb6f@huawei.com> Date: Wed, 8 May 2024 21:56:19 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH rfc 3/4] mm: filemap: move __lruvec_stat_mod_folio() out of filemap_set_pte_range() Content-Language: en-US To: David Hildenbrand , Andrew Morton CC: "Matthew Wilcox (Oracle)" , , References: <20240429072417.2146732-1-wangkefeng.wang@huawei.com> <20240429072417.2146732-4-wangkefeng.wang@huawei.com> <0bf097d2-6d2a-498b-a266-303f168b6221@redhat.com> <609abbe8-cf88-4145-b1d0-397c980aff28@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm100001.china.huawei.com (7.185.36.93) X-Stat-Signature: srqrou3855mdipk1xhoh9jgwnauizdx9 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: ABA5A40020 X-HE-Tag: 1715176586-591633 X-HE-Meta: U2FsdGVkX196B+sJ73zjTkHyhSqoYIXHde/CvmEQAUH6xwXo0D4ndUHMEA+MvyLrxDYjATC5YdVtAbntO71tEdygIpdN7v2trnXRG4ij1g4qamyx36I4gsX7CR5uK9MXgNngNq7kWD1vfOAykCxnmdAYN+qBd7Arz60yj2LEFNmjWCs+AZezE88aS9HTa7/xjSYducwJv2Lk7d7l6fi5rsFr8ExbBw7lF0EJz4HV2N1ZDlav8qDrwCJwgMH2TCR50p6AJlV0ii8QPA+qgrPX7xwgUjdT9F/L19iRylp38N5y5icsI20mjtc24XMfZBeYsbtqIAC6E8usPR4y35JoxEkT2lrv6XUb55GXDc1Ou0Ye83wB8S6uCdlDTCgvAcXsWC+lmolyDrVcvZJWj9TUR06oAubbTZMOpf7Hqm/3hZDcjlZIwKBwLoJzSvzu9ya39rNFUJQ79dgFVl7w70K10jSwQ86X15vzlt6tz8+V5V/NAYzvK3KhDb3hWcX4dOMRBHbxQz3hgnI0wPT2kpMWjWXzVwlMezSJiEOO3rDPTj982dzBpyvw4LGsmEzp3T6UxTIg9C4K9bnB3izOey3UzjoSu2HWNSZ6YYm8SSEo4ZpqyaS9cNF3g8yF64Zmt/6LAWZhF3GdWKGZ3riw1UkBmYGR01JrnR+sbAoJNFuil0iuY0SqD0mNaWeoBvOEzI/ZezUDfy3fvt9ToJtmhicZigNLba+ilDbWednrhPYAo+7P+vYj2BiYJ6MtExOJhNyLsJiVsUAZAAW/LqyYL38os2B3h+HXlUQFJW/fCOoj27sBehumpzx4R4Ajs+fmpoTINeKrsPVlGHRN706yPYWQtHSDZA81P0Qe+MMeFR+DMD6JsMqJV4TteD3f22+fe4E6mzVvKJdQVuvcdyVbrIbxcFsPBcLgP7FFoUlvvAXzqrb9D+fSMThpBdd3khslBf4gKBpDTFqZjqIwq0ww6m3 xzup+xnb JXBg59uDv6io5GbSBxHTnkqnmezRbCeaLd9APBMlZ23S2x8CWJVJyv3C6FKNXGxfyNcKQa0scVk4JaEy7LYdGn0jpOAQBcx9mnxa4rdgN7EwNQucVMFu3obHVkR2nm9Z8PHFXu1moTDE1XFaKbAcbcOwp6OKKttfFFdgHOpjvbvM/7L1u8dlikAp5Blv9Npp3RqN0RiWmV0BXoeTEpGyVxfYFBs0QFt1fg5gdqPBLwLucLD7qTXRPvTvMKrEEZWljuDC6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/5/8 19:27, David Hildenbrand wrote: > On 08.05.24 13:15, Kefeng Wang wrote: >> >> >> On 2024/5/8 17:33, David Hildenbrand wrote: >>> On 07.05.24 15:12, Kefeng Wang wrote: >>>> >>>> >>>> On 2024/5/7 19:11, David Hildenbrand wrote: >>>>> On 29.04.24 09:24, Kefeng Wang wrote: >>>>>> Adding __folio_add_file_rmap_ptes() which don't update lruvec >>>>>> stat, it >>>>>> is used in filemap_set_pte_range(), with it, lruvec stat updating is >>>>>> moved into the caller, no functional changes. >>>>>> >>>>>> Signed-off-by: Kefeng Wang >>>>>> --- >>>>>>     include/linux/rmap.h |  2 ++ >>>>>>     mm/filemap.c         | 27 ++++++++++++++++++--------- >>>>>>     mm/rmap.c            | 16 ++++++++++++++++ >>>>>>     3 files changed, 36 insertions(+), 9 deletions(-) >>>>>> >>>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h >>>>>> index 7229b9baf20d..43014ddd06f9 100644 >>>>>> --- a/include/linux/rmap.h >>>>>> +++ b/include/linux/rmap.h >>>>>> @@ -242,6 +242,8 @@ void folio_add_anon_rmap_pmd(struct folio *, >>>>>> struct page *, >>>>>>             struct vm_area_struct *, unsigned long address, rmap_t >>>>>> flags); >>>>>>     void folio_add_new_anon_rmap(struct folio *, struct >>>>>> vm_area_struct *, >>>>>>             unsigned long address); >>>>>> +int __folio_add_file_rmap_ptes(struct folio *, struct page *, int >>>>>> nr_pages, >>>>>> +        struct vm_area_struct *); >>>>>>     void folio_add_file_rmap_ptes(struct folio *, struct page *, int >>>>>> nr_pages, >>>>>>             struct vm_area_struct *); >>>>>>     #define folio_add_file_rmap_pte(folio, page, vma) \ >>>>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>>>> index 7019692daddd..3966b6616d02 100644 >>>>>> --- a/mm/filemap.c >>>>>> +++ b/mm/filemap.c >>>>>> @@ -3501,14 +3501,15 @@ static struct folio >>>>>> *next_uptodate_folio(struct xa_state *xas, >>>>>>     static void filemap_set_pte_range(struct vm_fault *vmf, struct >>>>>> folio >>>>>> *folio, >>>>>>                 struct page *page, unsigned int nr, unsigned long >>>>>> addr, >>>>>> -            unsigned long *rss) >>>>>> +            unsigned long *rss, int *nr_mapped) >>>>>>     { >>>>>>         struct vm_area_struct *vma = vmf->vma; >>>>>>         pte_t entry; >>>>>>         entry = prepare_range_pte_entry(vmf, false, folio, page, nr, >>>>>> addr); >>>>>> -    folio_add_file_rmap_ptes(folio, page, nr, vma); >>>>>> +    *nr_mapped += __folio_add_file_rmap_ptes(folio, page, nr, vma); >>>>>> + >>>>>>         set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr); >>>>>>         /* no need to invalidate: a not-present page won't be >>>>>> cached */ >>>>>> @@ -3525,7 +3526,8 @@ static void filemap_set_pte_range(struct >>>>>> vm_fault *vmf, struct folio *folio, >>>>>>     static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >>>>>>                 struct folio *folio, unsigned long start, >>>>>>                 unsigned long addr, unsigned int nr_pages, >>>>>> -            unsigned long *rss, unsigned int *mmap_miss) >>>>>> +            unsigned long *rss, int *nr_mapped, >>>>>> +            unsigned int *mmap_miss) >>>>>>     { >>>>>>         vm_fault_t ret = 0; >>>>>>         struct page *page = folio_page(folio, start); >>>>>> @@ -3558,7 +3560,8 @@ static vm_fault_t >>>>>> filemap_map_folio_range(struct >>>>>> vm_fault *vmf, >>>>>>             continue; >>>>>>     skip: >>>>>>             if (count) { >>>>>> -            filemap_set_pte_range(vmf, folio, page, count, addr, >>>>>> rss); >>>>>> +            filemap_set_pte_range(vmf, folio, page, count, addr, >>>>>> +                          rss, nr_mapped); >>>>>>                 if (in_range(vmf->address, addr, count * PAGE_SIZE)) >>>>>>                     ret = VM_FAULT_NOPAGE; >>>>>>             } >>>>>> @@ -3571,7 +3574,8 @@ static vm_fault_t >>>>>> filemap_map_folio_range(struct >>>>>> vm_fault *vmf, >>>>>>         } while (--nr_pages > 0); >>>>>>         if (count) { >>>>>> -        filemap_set_pte_range(vmf, folio, page, count, addr, rss); >>>>>> +        filemap_set_pte_range(vmf, folio, page, count, addr, rss, >>>>>> +                      nr_mapped); >>>>>>             if (in_range(vmf->address, addr, count * PAGE_SIZE)) >>>>>>                 ret = VM_FAULT_NOPAGE; >>>>>>         } >>>>>> @@ -3583,7 +3587,7 @@ static vm_fault_t >>>>>> filemap_map_folio_range(struct >>>>>> vm_fault *vmf, >>>>>>     static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf, >>>>>>             struct folio *folio, unsigned long addr, >>>>>> -        unsigned long *rss, unsigned int *mmap_miss) >>>>>> +        unsigned long *rss, int *nr_mapped, unsigned int *mmap_miss) >>>>>>     { >>>>>>         vm_fault_t ret = 0; >>>>>>         struct page *page = &folio->page; >>>>>> @@ -3606,7 +3610,7 @@ static vm_fault_t >>>>>> filemap_map_order0_folio(struct vm_fault *vmf, >>>>>>         if (vmf->address == addr) >>>>>>             ret = VM_FAULT_NOPAGE; >>>>>> -    filemap_set_pte_range(vmf, folio, page, 1, addr, rss); >>>>>> +    filemap_set_pte_range(vmf, folio, page, 1, addr, rss, >>>>>> nr_mapped); >>>>>>         return ret; >>>>>>     } >>>>>> @@ -3646,6 +3650,7 @@ vm_fault_t filemap_map_pages(struct vm_fault >>>>>> *vmf, >>>>>>         folio_type = mm_counter_file(folio); >>>>>>         do { >>>>>>             unsigned long end; >>>>>> +        int nr_mapped = 0; >>>>>>             addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT; >>>>>>             vmf->pte += xas.xa_index - last_pgoff; >>>>>> @@ -3655,11 +3660,15 @@ vm_fault_t filemap_map_pages(struct vm_fault >>>>>> *vmf, >>>>>>             if (!folio_test_large(folio)) >>>>>>                 ret |= filemap_map_order0_folio(vmf, >>>>>> -                    folio, addr, &rss, &mmap_miss); >>>>>> +                    folio, addr, &rss, &nr_mapped, >>>>>> +                    &mmap_miss); >>>>>>             else >>>>>>                 ret |= filemap_map_folio_range(vmf, folio, >>>>>>                         xas.xa_index - folio->index, addr, >>>>>> -                    nr_pages, &rss, &mmap_miss); >>>>>> +                    nr_pages, &rss, &nr_mapped, >>>>>> +                    &mmap_miss); >>>>>> + >>>>>> +        __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr_mapped); >>>>>>             folio_unlock(folio); >>>>>>             folio_put(folio); >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>> index 2608c40dffad..55face4024f2 100644 >>>>>> --- a/mm/rmap.c >>>>>> +++ b/mm/rmap.c >>>>>> @@ -1452,6 +1452,22 @@ static __always_inline void >>>>>> __folio_add_file_rmap(struct folio *folio, >>>>>>             mlock_vma_folio(folio, vma); >>>>>>     } >>>>>> +int __folio_add_file_rmap_ptes(struct folio *folio, struct page >>>>>> *page, >>>>>> +        int nr_pages, struct vm_area_struct *vma) >>>>>> +{ >>>>>> +    int nr, nr_pmdmapped = 0; >>>>>> + >>>>>> +    VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); >>>>>> + >>>>>> +    nr = __folio_add_rmap(folio, page, nr_pages, RMAP_LEVEL_PTE, >>>>>> +                  &nr_pmdmapped); >>>>>> + >>>>>> +    /* See comments in folio_add_anon_rmap_*() */ >>>>>> +    if (!folio_test_large(folio)) >>>>>> +        mlock_vma_folio(folio, vma); >>>>>> + >>>>>> +    return nr; >>>>>> +} >>>>> >>>>> I'm not really a fan :/ It does make the code more complicated, and it >>>>> will be harder to extend if we decide to ever account differently >>>>> (e.g., >>>>> NR_SHMEM_MAPPED, additional tracking for mTHP etc). >>>> >>>> If more different accounts, this may lead to bad scalability. >>> >>> We already do it for PMD mappings. >>> >>>>> >>>>> With large folios we'll be naturally batching already here, and I do >>>> >>>> Yes, it is batched with large folios,but our fs is ext4/tmpfs, there >>>> are not support large folio or still upstreaming. >>> >>> Okay, so that will be sorted out sooner or later. >>> >>>> >>>>> wonder, if this is really worth for performance, or if we could find >>>>> another way of batching (let the caller activate batching and drain >>>>> afterwards) without exposing these details to the caller. >>>> >>>> It does reduce latency when batch lruvec stat updating without large >>>> folio, but I can't find better way, or let's wait for the large folio >>>> support on ext4/tmpfs, I also Cced memcg maintainers in patch4 to >>>> see if >>>> there are any other ideas. >>> >>> I'm not convinced this benefit here is worth making the code more >>> complicated. >>> >>> Maybe we can find another way to optimize this batching in rmap code >>> without having to leak these details to the callers. >>> >>> For example, we could pass an optional batching structure to all rmap >>> add/rel functions that would collect these stat updates. Then we could >>> have one function to flush it and update the counters combined. >>> >>> Such batching could be beneficial also for page unmapping/zapping where >>> we might unmap various different folios in one go. >> >> It sounds better and clearer, I will try it and see the results, thanks >> for your advise! > > To batch across multiple folios, it might be sufficient to remember in > the batching structure for now: > > * folio_memcg(folio) > * folio_pgdat(folio) > * NR_ANON_MAPPED diff > * NR_FILE_MAPPED diff > > If the memcg of pgdat would change, we simply flush. Otherwise we batch > and the caller flushes. for memcg = NULL, we need flush when pgdat changed. > > Likely we mapping/unmapping multiple folios they belong to the same > pgdat+memcg. Yes, the batch should be suitable for mapping/unmapping. > > The only tricky bit is the rcu_read_lock() around folio_memcg(); that > might require some thought. Indeed, we need stable folio/memcg bounding. Thanks. > > Hm .... >