From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89F90C00140 for ; Tue, 26 Jul 2022 14:31:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2214F94000A; Tue, 26 Jul 2022 10:31:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CFF3940007; Tue, 26 Jul 2022 10:31:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 070F194000A; Tue, 26 Jul 2022 10:31:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id ED45C940007 for ; Tue, 26 Jul 2022 10:31:42 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CCA0D40C8B for ; Tue, 26 Jul 2022 14:31:42 +0000 (UTC) X-FDA: 79729489644.21.E2E2254 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf17.hostedemail.com (Postfix) with ESMTP id CD593400C0 for ; Tue, 26 Jul 2022 14:31:41 +0000 (UTC) Received: from dggpemm500024.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4LsfRM0Lpwz1M89Z; Tue, 26 Jul 2022 22:28:47 +0800 (CST) Received: from dggpemm500001.china.huawei.com (7.185.36.107) by dggpemm500024.china.huawei.com (7.185.36.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 26 Jul 2022 22:31:38 +0800 Received: from [10.174.177.243] (10.174.177.243) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 26 Jul 2022 22:31:38 +0800 Message-ID: Date: Tue, 26 Jul 2022 22:31:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH] shmem: support huge_fault to avoid pmd split Content-Language: en-US To: Matthew Wilcox , Liu Zixian CC: , , , References: <20220726124315.1606-1-liuzixian4@huawei.com> From: Kefeng Wang In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.177.243] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658845902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W31GYqym9A+gZ5QekTshEKaBUBzeLvIvaFscw+Aqu4Y=; b=hg3RheCXelzeWuqlUCignUP9W6DbtG1HyOSoQVqUOOoV8/lii6U0JXeBf9trJi7txa+NP0 iDZoqKbFpVjpLNkOlaxz+kaAKbYy6mC15B1lvvSQjYKrFd/A6NSLTk685hbVJ9nNUYPjpS Gk7EMAhjMuv63FsCBGZkghYWy0ACh4s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658845902; a=rsa-sha256; cv=none; b=4o3SGxosOUmi98LTlhDJrg4EqcAW4sAJ/CQOI2MDD8wNab3mJxkyCKyzJ73sGmef+Jhudq FK9XKSiKmqgVHx6vNYZeE8cVJygOb3QYgUK9wTqrx5D6RoeIDeymLwhBhRNkw1DZ43B3BC p/nbH+zBn0gI6ih3GEJDEhFa7kNWpVw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of wangkefeng.wang@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=wangkefeng.wang@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Rspam-User: X-Rspamd-Queue-Id: CD593400C0 X-Rspamd-Server: rspam05 X-Stat-Signature: o79nppkmr9h44nhej869qktw91kixk8y X-HE-Tag: 1658845901-197973 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/7/26 21:09, Matthew Wilcox wrote: > On Tue, Jul 26, 2022 at 08:43:15PM +0800, Liu Zixian wrote: >> Transparent hugepage of tmpfs is useful to improve TLB miss, but >> it will be split during cow memory fault. > That's intentional. Possibly misguided, but there's a tradeoff to > be made between memory consumption and using large pages. > >> This will happen if we mprotect and rewrite code segment (which is >> private file map) to hotpatch a running process. >> >> We can avoid the splitting by adding a huge_fault function. >> >> Signed-off-by: Liu Zixian >> --- >> mm/shmem.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 46 insertions(+) >> >> diff --git a/mm/shmem.c b/mm/shmem.c >> index a6f565308..12b2b5140 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -2120,6 +2120,51 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf) >> return ret; >> } >> >> +static vm_fault_t shmem_huge_fault(struct vm_fault *vmf, enum page_entry_size pe_size) >> +{ >> + vm_fault_t ret = VM_FAULT_FALLBACK; >> + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; >> + struct page *old_page, *new_page; >> + int gfp_flags = GFP_HIGHUSER_MOVABLE | __GFP_COMP; There are many vmf->vma, so better to add 'struct vm_area_struct *vma = vmf->vma;' and use vma directly >> + >> + /* read or shared fault will not split huge pmd */ >> + if (!(vmf->flags & FAULT_FLAG_WRITE) >> + || (vmf->vma->vm_flags & VM_SHARED)) >> + return VM_FAULT_FALLBACK; >> + if (pe_size != PE_SIZE_PMD) >> + return VM_FAULT_FALLBACK; return ret; >> + >> + if (pmd_none(*vmf->pmd)) { >> + if (shmem_fault(vmf) & VM_FAULT_ERROR) >> + goto out; >> + if (!PageTransHuge(vmf->page)) >> + goto out; >> + old_page = vmf->page; >> + } else { >> + old_page = pmd_page(*vmf->pmd); >> + page_remove_rmap(old_page, vmf->vma, true); >> + pmdp_huge_clear_flush(vmf->vma, haddr, vmf->pmd); >> + add_mm_counter(vmf->vma->vm_mm, MM_SHMEMPAGES, -HPAGE_PMD_NR); MM_SHMEMPAGES -> mm_counter_file(page) >> + } >> + directly use GFP_TRANSHUGE_LIGHT? >> + new_page = &vma_alloc_folio(gfp_flags, HPAGE_PMD_ORDER, >> + vmf->vma, haddr, true)->page; >> + if (!new_page) add   count_vm_event(THP_FAULT_FALLBACK); >> + goto out; >> + prep_transhuge_page(new_page); > vma_alloc_folio() does the prep_transhuge_page() for you. > >> + copy_user_huge_page(new_page, old_page, haddr, vmf->vma, HPAGE_PMD_NR); >> + __SetPageUptodate(new_page); >> + >> + ret = do_set_pmd(vmf, new_page); >> + >> +out: >> + if (vmf->page) { >> + unlock_page(vmf->page); >> + put_page(vmf->page); >> + } >> + return ret; >> +} >> + >> unsigned long shmem_get_unmapped_area(struct file *file, >> unsigned long uaddr, unsigned long len, >> unsigned long pgoff, unsigned long flags) >> @@ -3884,6 +3929,7 @@ static const struct super_operations shmem_ops = { >> >> static const struct vm_operations_struct shmem_vm_ops = { >> .fault = shmem_fault, >> + .huge_fault = shmem_huge_fault, >> .map_pages = filemap_map_pages, >> #ifdef CONFIG_NUMA >> .set_policy = shmem_set_policy, >> -- >> 2.33.0 >> >> > .