From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40751C04FFE for ; Wed, 8 May 2024 11:15:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A3A76B012E; Wed, 8 May 2024 07:15:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 654326B012F; Wed, 8 May 2024 07:15:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51BA66B0130; Wed, 8 May 2024 07:15:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 33B8B6B012E for ; Wed, 8 May 2024 07:15:45 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C4F7514144C for ; Wed, 8 May 2024 11:15:44 +0000 (UTC) X-FDA: 82094973408.30.1FD2B5E Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf18.hostedemail.com (Postfix) with ESMTP id 501D31C0011 for ; Wed, 8 May 2024 11:15:40 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715166943; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=woEzUaZsVYWIyT4tqgQYt7IB/WBk6ZT9PfAFwmZCRKM=; b=RnPBzh4KtYkZgNvCZ8z+q9ASrK/x3uGJk4VHSJjbs/9Jo7mBJUWsEnGJ0mh1zNu2cE5so4 I+13G8JuzXQ2lHxqDmxnuWr4lyGZkQlXTcoD0kH1jyEk1kkjteU/vPPuUZaMbpHPqryIcM C/PF1HutxnBM9evQZS292/0vZdjCS0c= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715166943; a=rsa-sha256; cv=none; b=Zrz9ZVC7DQdiqC8X9aQBZUtctAHsKbOA2VcY64AcMvQ3b0VmSahv2fsdmKkbwVdDMY2K1t X0zV9RXyEDiKTFDyrdkro+IOUBIFA2tRheCAN31CW72RtLeGVUR/UYDR0AKM50Ay6Iu8DL CasY8nSm1ZvJf1eRt9GkBUcBR7qtlXw= Received: from mail.maildlp.com (unknown [172.19.162.254]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4VZCC62DWSzYnXn; Wed, 8 May 2024 19:11:46 +0800 (CST) Received: from dggpemm100001.china.huawei.com (unknown [7.185.36.93]) by mail.maildlp.com (Postfix) with ESMTPS id 122831800C9; Wed, 8 May 2024 19:15:37 +0800 (CST) Received: from [10.174.177.243] (10.174.177.243) by dggpemm100001.china.huawei.com (7.185.36.93) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Wed, 8 May 2024 19:15:36 +0800 Message-ID: <609abbe8-cf88-4145-b1d0-397c980aff28@huawei.com> Date: Wed, 8 May 2024 19:15:36 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH rfc 3/4] mm: filemap: move __lruvec_stat_mod_folio() out of filemap_set_pte_range() Content-Language: en-US To: David Hildenbrand , Andrew Morton CC: "Matthew Wilcox (Oracle)" , , References: <20240429072417.2146732-1-wangkefeng.wang@huawei.com> <20240429072417.2146732-4-wangkefeng.wang@huawei.com> <0bf097d2-6d2a-498b-a266-303f168b6221@redhat.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm100001.china.huawei.com (7.185.36.93) X-Stat-Signature: nu7isnh3rdjsfciodennufdjsc3dhysf X-Rspam-User: X-Rspamd-Queue-Id: 501D31C0011 X-Rspamd-Server: rspam05 X-HE-Tag: 1715166940-753131 X-HE-Meta: U2FsdGVkX1+w9AuNjrWABeiAB0CJv8Vt7e2JP7gR5f52/V4jrssMoFiwkyyOq/mnA2vpEEHEm2S7+fvuqm+s6CEjMcKsQ215beR3xkospyp8tLAUNz6YGrMzSuEOYZjDRBfWasJ3boh0prOAoVPBRvh3ytTE9ITp6h6xvomh4OXu1wHo4I9R2IzVurY22j/efYolh9GoLOvGrfOTdMPwlApU5TmnRt+WrcG9mfMg9IZRqcDvb58UZzPjn+oHuRFDj8OwwFrNuWvAt2HPFyx8J+nm/uWVSWLmTfZg6ajcnsxwyqt9IDzjKUOltyUizmSLjIeQzwkHZLdilZedU6wpa2+jplLOqCnrqeETkleq27fkm7F+2hEwAlLLPnd4Nw62iLM0NIRJwp5uoe8mtO4GU5pKN2gUOWAlzYYkIY/FS/lLyHt/p0At2oxP9CWuzRp+9QtlZKk6Piag2m6gFKAqCQX4VT73IQPUV3vH6DbYOSjhuZiwuNkvnhu1UrtQYO2q0naSrq8qRvq4Lv0zMWvUa5ArZBwAohCLHAxe5gelpHlB2bLm04BFIp03kjzM2eQsZc5kbHWcPV347Ysb+pmYwHU6wg1KnTHSCML8DvPkyvViXXO/4U8xMIeJELatX6kOWUOut6h81I/IBD9oxqvzXMYP+akQKs/3irF2EL5029MqtIEdG4Kki/dcUV/tLca8jD37pynyYDanLBVIjtzGcjWkqdKEvwauA0zN/zJENoXtsU+2qzXDye0qsUXPRWVxWpJf2sUKyH9ee75lu8Coz1ewyNrpyxmA75vHDVdK8C5D93UjXruglkko0tZNc4LdhIx31dnKqcWCZQS3MX1cBiSMNDstQNDVdwTyFZy5CRz63SlEoZzW/zSLA1WEhMwflyr8xX/9CbVFeREP/aIFjl8wBOKABGqWo2qxtYAP0es4zUD7DhbhU7pyD8RPeZxPOq9ZdrKkuVbBkWgof1w lryHy85y bxcnGLSqKawDt4vfkOKlpZdV/3pokrKt6KcjSLCSALmFTXYqLp3Om5NCRAsoiJIXuYKAlcDcLqfBEx9BpB31isA0OzxR2Q9Zd+UPrQdZ2gmmuTp3xDaDJtHdO1jVE6jmjWiFL0Y0K2aNuYhRAQNFyHsziWVNONfmb0R5Lr3LjU4sbBG4dytoYCYf0Ht2uqXZhJiyai5AqeySmRaneH3jwltk254RlhxaNDgU5hcOPXjasDQbYZusxf0BdszUEuUYM3DHCZo75itucJ0Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/5/8 17:33, David Hildenbrand wrote: > On 07.05.24 15:12, Kefeng Wang wrote: >> >> >> On 2024/5/7 19:11, David Hildenbrand wrote: >>> On 29.04.24 09:24, Kefeng Wang wrote: >>>> Adding __folio_add_file_rmap_ptes() which don't update lruvec stat, it >>>> is used in filemap_set_pte_range(), with it, lruvec stat updating is >>>> moved into the caller, no functional changes. >>>> >>>> Signed-off-by: Kefeng Wang >>>> --- >>>>    include/linux/rmap.h |  2 ++ >>>>    mm/filemap.c         | 27 ++++++++++++++++++--------- >>>>    mm/rmap.c            | 16 ++++++++++++++++ >>>>    3 files changed, 36 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h >>>> index 7229b9baf20d..43014ddd06f9 100644 >>>> --- a/include/linux/rmap.h >>>> +++ b/include/linux/rmap.h >>>> @@ -242,6 +242,8 @@ void folio_add_anon_rmap_pmd(struct folio *, >>>> struct page *, >>>>            struct vm_area_struct *, unsigned long address, rmap_t >>>> flags); >>>>    void folio_add_new_anon_rmap(struct folio *, struct >>>> vm_area_struct *, >>>>            unsigned long address); >>>> +int __folio_add_file_rmap_ptes(struct folio *, struct page *, int >>>> nr_pages, >>>> +        struct vm_area_struct *); >>>>    void folio_add_file_rmap_ptes(struct folio *, struct page *, int >>>> nr_pages, >>>>            struct vm_area_struct *); >>>>    #define folio_add_file_rmap_pte(folio, page, vma) \ >>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>> index 7019692daddd..3966b6616d02 100644 >>>> --- a/mm/filemap.c >>>> +++ b/mm/filemap.c >>>> @@ -3501,14 +3501,15 @@ static struct folio >>>> *next_uptodate_folio(struct xa_state *xas, >>>>    static void filemap_set_pte_range(struct vm_fault *vmf, struct folio >>>> *folio, >>>>                struct page *page, unsigned int nr, unsigned long addr, >>>> -            unsigned long *rss) >>>> +            unsigned long *rss, int *nr_mapped) >>>>    { >>>>        struct vm_area_struct *vma = vmf->vma; >>>>        pte_t entry; >>>>        entry = prepare_range_pte_entry(vmf, false, folio, page, nr, >>>> addr); >>>> -    folio_add_file_rmap_ptes(folio, page, nr, vma); >>>> +    *nr_mapped += __folio_add_file_rmap_ptes(folio, page, nr, vma); >>>> + >>>>        set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr); >>>>        /* no need to invalidate: a not-present page won't be cached */ >>>> @@ -3525,7 +3526,8 @@ static void filemap_set_pte_range(struct >>>> vm_fault *vmf, struct folio *folio, >>>>    static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >>>>                struct folio *folio, unsigned long start, >>>>                unsigned long addr, unsigned int nr_pages, >>>> -            unsigned long *rss, unsigned int *mmap_miss) >>>> +            unsigned long *rss, int *nr_mapped, >>>> +            unsigned int *mmap_miss) >>>>    { >>>>        vm_fault_t ret = 0; >>>>        struct page *page = folio_page(folio, start); >>>> @@ -3558,7 +3560,8 @@ static vm_fault_t filemap_map_folio_range(struct >>>> vm_fault *vmf, >>>>            continue; >>>>    skip: >>>>            if (count) { >>>> -            filemap_set_pte_range(vmf, folio, page, count, addr, rss); >>>> +            filemap_set_pte_range(vmf, folio, page, count, addr, >>>> +                          rss, nr_mapped); >>>>                if (in_range(vmf->address, addr, count * PAGE_SIZE)) >>>>                    ret = VM_FAULT_NOPAGE; >>>>            } >>>> @@ -3571,7 +3574,8 @@ static vm_fault_t filemap_map_folio_range(struct >>>> vm_fault *vmf, >>>>        } while (--nr_pages > 0); >>>>        if (count) { >>>> -        filemap_set_pte_range(vmf, folio, page, count, addr, rss); >>>> +        filemap_set_pte_range(vmf, folio, page, count, addr, rss, >>>> +                      nr_mapped); >>>>            if (in_range(vmf->address, addr, count * PAGE_SIZE)) >>>>                ret = VM_FAULT_NOPAGE; >>>>        } >>>> @@ -3583,7 +3587,7 @@ static vm_fault_t filemap_map_folio_range(struct >>>> vm_fault *vmf, >>>>    static vm_fault_t filemap_map_order0_folio(struct vm_fault *vmf, >>>>            struct folio *folio, unsigned long addr, >>>> -        unsigned long *rss, unsigned int *mmap_miss) >>>> +        unsigned long *rss, int *nr_mapped, unsigned int *mmap_miss) >>>>    { >>>>        vm_fault_t ret = 0; >>>>        struct page *page = &folio->page; >>>> @@ -3606,7 +3610,7 @@ static vm_fault_t >>>> filemap_map_order0_folio(struct vm_fault *vmf, >>>>        if (vmf->address == addr) >>>>            ret = VM_FAULT_NOPAGE; >>>> -    filemap_set_pte_range(vmf, folio, page, 1, addr, rss); >>>> +    filemap_set_pte_range(vmf, folio, page, 1, addr, rss, nr_mapped); >>>>        return ret; >>>>    } >>>> @@ -3646,6 +3650,7 @@ vm_fault_t filemap_map_pages(struct vm_fault >>>> *vmf, >>>>        folio_type = mm_counter_file(folio); >>>>        do { >>>>            unsigned long end; >>>> +        int nr_mapped = 0; >>>>            addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT; >>>>            vmf->pte += xas.xa_index - last_pgoff; >>>> @@ -3655,11 +3660,15 @@ vm_fault_t filemap_map_pages(struct vm_fault >>>> *vmf, >>>>            if (!folio_test_large(folio)) >>>>                ret |= filemap_map_order0_folio(vmf, >>>> -                    folio, addr, &rss, &mmap_miss); >>>> +                    folio, addr, &rss, &nr_mapped, >>>> +                    &mmap_miss); >>>>            else >>>>                ret |= filemap_map_folio_range(vmf, folio, >>>>                        xas.xa_index - folio->index, addr, >>>> -                    nr_pages, &rss, &mmap_miss); >>>> +                    nr_pages, &rss, &nr_mapped, >>>> +                    &mmap_miss); >>>> + >>>> +        __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr_mapped); >>>>            folio_unlock(folio); >>>>            folio_put(folio); >>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>> index 2608c40dffad..55face4024f2 100644 >>>> --- a/mm/rmap.c >>>> +++ b/mm/rmap.c >>>> @@ -1452,6 +1452,22 @@ static __always_inline void >>>> __folio_add_file_rmap(struct folio *folio, >>>>            mlock_vma_folio(folio, vma); >>>>    } >>>> +int __folio_add_file_rmap_ptes(struct folio *folio, struct page *page, >>>> +        int nr_pages, struct vm_area_struct *vma) >>>> +{ >>>> +    int nr, nr_pmdmapped = 0; >>>> + >>>> +    VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); >>>> + >>>> +    nr = __folio_add_rmap(folio, page, nr_pages, RMAP_LEVEL_PTE, >>>> +                  &nr_pmdmapped); >>>> + >>>> +    /* See comments in folio_add_anon_rmap_*() */ >>>> +    if (!folio_test_large(folio)) >>>> +        mlock_vma_folio(folio, vma); >>>> + >>>> +    return nr; >>>> +} >>> >>> I'm not really a fan :/ It does make the code more complicated, and it >>> will be harder to extend if we decide to ever account differently (e.g., >>> NR_SHMEM_MAPPED, additional tracking for mTHP etc). >> >> If more different accounts, this may lead to bad scalability. > > We already do it for PMD mappings. > >>> >>> With large folios we'll be naturally batching already here, and I do >> >> Yes, it is batched with large folios,but our fs is ext4/tmpfs, there >> are not support large folio or still upstreaming. > > Okay, so that will be sorted out sooner or later. > >> >>> wonder, if this is really worth for performance, or if we could find >>> another way of batching (let the caller activate batching and drain >>> afterwards) without exposing these details to the caller. >> >> It does reduce latency when batch lruvec stat updating without large >> folio, but I can't find better way, or let's wait for the large folio >> support on ext4/tmpfs, I also Cced memcg maintainers in patch4 to see if >> there are any other ideas. > > I'm not convinced this benefit here is worth making the code more > complicated. > > Maybe we can find another way to optimize this batching in rmap code > without having to leak these details to the callers. > > For example, we could pass an optional batching structure to all rmap > add/rel functions that would collect these stat updates. Then we could > have one function to flush it and update the counters combined. > > Such batching could be beneficial also for page unmapping/zapping where > we might unmap various different folios in one go. It sounds better and clearer, I will try it and see the results, thanks for your advise! >