From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FDC0C27C53 for ; Thu, 13 Jun 2024 00:51:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EE956B0092; Wed, 12 Jun 2024 20:51:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 478176B0093; Wed, 12 Jun 2024 20:51:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2F09E6B0096; Wed, 12 Jun 2024 20:51:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0F0B66B0092 for ; Wed, 12 Jun 2024 20:51:57 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7CF951C2B54 for ; Thu, 13 Jun 2024 00:51:56 +0000 (UTC) X-FDA: 82224038232.24.99FC59F Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) by imf02.hostedemail.com (Postfix) with ESMTP id B650D80017 for ; Thu, 13 Jun 2024 00:51:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=px9BvqIl; spf=pass (imf02.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718239914; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HPjLzQBCigEVLIfXbDL0xmW8b3HFx8AMly+LLTrutZo=; b=TGeHt6jbrYU+dswYn8fjmYOtS7PeagzdtuVxqumQwx5T387poiX+2IJGz1a0Og94ZYRKVD LjnrzbITey8+MXwvS8T43uNxI+qdQccPvvlcXfanDWCvJ/idZazd18DxW9OEdLHwX8EsVE iG+613Z3tFOqfb5HQ7NnriG1bo78Xh0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=px9BvqIl; spf=pass (imf02.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718239914; a=rsa-sha256; cv=none; b=FxnfuteaYaj1eLJAg5Rg5aIgAQIr+H92ajdTov2Ow40iz4GFs3sS/hj/157CjNcbpOnTCh aeZM/n21BQUrLcExesItP/rxAmrramE6gJWF0JHubMc5FUyv8IOynolVVjwMZ5/BFaHrc5 xWP+0iDA3EGSaZ/GF18Aszj9yWYSCs8= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1718239906; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=HPjLzQBCigEVLIfXbDL0xmW8b3HFx8AMly+LLTrutZo=; b=px9BvqIldH4WACJnXdhtq9C0YDemf8u4KOJEuCOeEmhwEOGh5GiqY9U9jipAlqC9Sm5xrGeCX3YGXcWmAuBueVtEeQk+oIlMhlHEO0k+B0zvjk+ws9yNF60jQq/mKY4u6CY/7LU6swkMEv4LHppC7al9EIs8zEeYpJ2yOe5WkQM= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R581e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033022160150;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0W8M6ggj_1718239904; Received: from 30.97.56.57(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W8M6ggj_1718239904) by smtp.aliyun-inc.com; Thu, 13 Jun 2024 08:51:45 +0800 Message-ID: <86e8c75b-1717-4335-9a1f-c663192dbf84@linux.alibaba.com> Date: Thu, 13 Jun 2024 08:51:43 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 1/6] mm: memory: extend finish_fault() to support large folio To: Kefeng Wang , akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <3a190892355989d42f59cf9f2f98b94694b0d24d.1718090413.git.baolin.wang@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: jwz5ja1ikqjgb6fh9toge1x61hr6y5u8 X-Rspam-User: X-Rspamd-Queue-Id: B650D80017 X-Rspamd-Server: rspam02 X-HE-Tag: 1718239910-702070 X-HE-Meta: U2FsdGVkX1+bYofFJxa2s20Eld0CmsWsg2KdBsNr68dcF4NZv0QgiwMdU/m7SikEnVg6ubLHtJt3i9ELBy1zxB24TUxl7thNxV6yyXrmAyS1fwA6DBgXSDXFf61M5LsWaSNEXvNvSYy3K2jxOYPGHpBHguCKEqyb4b8ctAkegR4ue/Xma3Hc31XfTN2kXGrTOCwROSiw64AXT2rY6RZ7tX5JAaeM0Ofr86j+8P6/u8OhgPACVmF+fx/ehVvHOod6bV7FNv9AmoKTb4rJqxYkVGrTLzNxsvG8QjCZa3gAnFR5os9GEi+hAwcx/pznKGY/AQbHqJ8TiYiK4InjvjEqi5Zz/mObBQs3Dz9O9i848pkvDzHsW2A9B2nsynE6MnxPQfGN+VCZQxJBnir0Slfwjac9epqf6dfhnMAL/H8+RYj8UjSbw8t6biH/CJT7RMLGKPv8yoYnabnQIlRMBeX7XQSMVwpnUSc10ofDirN6RZQPrEouoXpRQvyhbnHRpkrZO4tFac+dhqPONKqiD6zqyxwbilKFuWGHCx/A42EAjAiGb/NO9hEuVzxvJmyxlPPApqftOk82Cof8SNbf9Ieklj0exWdgCEDmcqnwunn2nNCSeJh5u7gecAuX3xccH3Q6HAGTu99ou6W3rbXFnvSOAkO0v7o8laETr9C+WuMXXdDVlpqE8rJ1M4Bk18HUcs+lxJF9Yvp0TvfnbKIAJ+AIMUO59rhraeLBjBsFCwI8OMuF76d8O+JoqDDTf/AfCZePQAySTcVk51BQaBZYJfIwKu/aMYzIyJbo1wSHnbGpA/xVc2WAz5ELZhHKj5tPngu9fwAsxUUHZ0w5KaKfoJjRl5CPRLBb3oV6XxcT/yaI5j6gosHxySXsFwBn42SlrkiDEd3HmzMezo+QA7KSSz2eIAoVCOTJfDEAhesM8kXJnskaHJwWnRQyQsTREuxQ0nul4T/TQvgTnoRMNmPc32S apr2gnCV BNXhhuwb3HOtXK8cyJrcZUvPds9Vx4w3YTads1ze8XP/LBxGFoXJ3z5goBj4UDxaaYcQoABzU+gy8ASdnqP+yVjpLPEz/gkiUWTH9X+1t0opVh72KsvyV06KoCD3kwBJDw/qNeCUVldSbzXSvkDxs/WH34VIqiJGRwNijwECoYrxCO5ZLwL7/tZnLfk0RAGg8zfAf6PkMnlqsXrZvzRtWXgOARt5bFLgU/0UI6Da4EOUPWp61HzImxQMVllIPpGPFc+VKjtp/99kZDTamncJElDgEQNE6LrSatsQ592lSdIOdrDzhIDbDqjKAK+U3hfuGIT0CTjbTp36SzvBP5gTgeFpj86KV+rJMdk/RXUIH2bdwHJIifexT2qO1fwfLUUGnE8BxBJJ1us8dfils8ymnTWRIKwrVIbvb4Pnz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/6/12 21:40, Kefeng Wang wrote: > > > On 2024/6/11 18:11, Baolin Wang wrote: >> Add large folio mapping establishment support for finish_fault() as a >> preparation, to support multi-size THP allocation of anonymous shmem >> pages >> in the following patches. >> >> Keep the same behavior (per-page fault) for non-anon shmem to avoid >> inflating >> the RSS unintentionally, and we can discuss what size of mapping to build >> when extending mTHP to control non-anon shmem in the future. >> >> Signed-off-by: Baolin Wang >> --- >>   mm/memory.c | 57 +++++++++++++++++++++++++++++++++++++++++++---------- >>   1 file changed, 47 insertions(+), 10 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index eef4e482c0c2..72775ee99ff3 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4831,9 +4831,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) >>   { >>       struct vm_area_struct *vma = vmf->vma; >>       struct page *page; >> +    struct folio *folio; >>       vm_fault_t ret; >>       bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && >>                 !(vma->vm_flags & VM_SHARED); >> +    int type, nr_pages; >> +    unsigned long addr = vmf->address; >>       /* Did we COW the page? */ >>       if (is_cow) >> @@ -4864,24 +4867,58 @@ vm_fault_t finish_fault(struct vm_fault *vmf) >>               return VM_FAULT_OOM; >>       } >> +    folio = page_folio(page); >> +    nr_pages = folio_nr_pages(folio); >> + >> +    /* >> +     * Using per-page fault to maintain the uffd semantics, and same >> +     * approach also applies to non-anonymous-shmem faults to avoid >> +     * inflating the RSS of the process. >> +     */ >> +    if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) { >> +        nr_pages = 1; >> +    } else if (nr_pages > 1) { >> +        pgoff_t idx = folio_page_idx(folio, page); >> +        /* The page offset of vmf->address within the VMA. */ >> +        pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff; >> + >                         vma->vm_pgoff > >> +        /* >> +         * Fallback to per-page fault in case the folio size in page >> +         * cache beyond the VMA limits. >> +         */ >> +        if (unlikely(vma_off < idx || >> +                 vma_off + (nr_pages - idx) > vma_pages(vma))) { >> +            nr_pages = 1; >> +        } else { >> +            /* Now we can set mappings for the whole large folio. */ >> +            addr = vmf->address - idx * PAGE_SIZE; > >             addr -= idx * PAGE_SIZE; > >> +            page = &folio->page; >> +        } >> +    } >> + >>       vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, >> -                      vmf->address, &vmf->ptl); >> +                       addr, &vmf->ptl); > > no newline now, > >>       if (!vmf->pte) >>           return VM_FAULT_NOPAGE; >>       /* Re-check under ptl */ >> -    if (likely(!vmf_pte_changed(vmf))) { >> -        struct folio *folio = page_folio(page); >> -        int type = is_cow ? MM_ANONPAGES : mm_counter_file(folio); >> - >> -        set_pte_range(vmf, folio, page, 1, vmf->address); >> -        add_mm_counter(vma->vm_mm, type, 1); >> -        ret = 0; >> -    } else { >> -        update_mmu_tlb(vma, vmf->address, vmf->pte); >> +    if (nr_pages == 1 && unlikely(vmf_pte_changed(vmf))) { >> +        update_mmu_tlb(vma, addr, vmf->pte); >> +        ret = VM_FAULT_NOPAGE; >> +        goto unlock; >> +    } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { >> +        update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); >>           ret = VM_FAULT_NOPAGE; >> +        goto unlock; >>       } > > We may add a vmf_pte_range_changed(), but separate it. > > Some very small nits, up to you, > > Reviewed-by: Kefeng Wang Thanks for reviewing. If a new version is needed, then I will clean up these coding style issues.