From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5F37C19F4F for ; Wed, 8 May 2024 09:06:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C6EF6B00C3; Wed, 8 May 2024 05:06:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 575136B00C5; Wed, 8 May 2024 05:06:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48B066B00C6; Wed, 8 May 2024 05:06:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2B5856B00C3 for ; Wed, 8 May 2024 05:06:39 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D3614413E2 for ; Wed, 8 May 2024 09:06:38 +0000 (UTC) X-FDA: 82094648076.29.2AF1CF9 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf12.hostedemail.com (Postfix) with ESMTP id 9C47F40012 for ; Wed, 8 May 2024 09:06:35 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qzHefu3b; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715159196; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WHW+W0hGmLqX3Dp0aX8KDuI73KYPXnLkoFXURlwq6lM=; b=C2CfbuMV7fN8E9CqfR7dIWltPKOFlFvHUQgGVTOJvAjofo/9ovP8QnzZDQCZWKtBZ4ooL5 L4KAomENoDUJQfWaBlwoYlHhXLNDVGfpKr7i4TLoDj6qHWoKwIQs13byRtXwtwcJmAXzFk tQT917tHSRmxORPhBpf2uyb1HCvZiGI= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qzHefu3b; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715159196; a=rsa-sha256; cv=none; b=f4EeApm/h3DL4viyxxWcdQhWOzax31Qz63MRu0Moa121nenZcJ/CCe0wNm4AMslTYPfBsf ZRgYy+iNd4F8t8uJ/909pFKeVnsu7SONSxbh+yJTJhNHdPdPftPoKHvgVrzWT95NlGvUfz g2zCITcOE4qX3fIEceKAKmZ0Zb0+rXk= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1715159192; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=WHW+W0hGmLqX3Dp0aX8KDuI73KYPXnLkoFXURlwq6lM=; b=qzHefu3b02q9/nic5wWAfhnJWN6oSi8Em4vR5Gb/SOD9BpR6GR6pSTB6p7v6lBy1wr+cX7B2Isl3W0dsRiT0iBy6W0bvGX6e6owPmnvxtD1DiGIFreCmzG7FptLbi/XJc9iVQ5J2NF8lx5ura1SS10V6I4sxBPm/js3OSDZnED4= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R831e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037067110;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0W638hki_1715159189; Received: from 30.97.56.69(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0W638hki_1715159189) by smtp.aliyun-inc.com; Wed, 08 May 2024 17:06:30 +0800 Message-ID: <043a4f2d-e08d-4cea-a73d-819586509b12@linux.alibaba.com> Date: Wed, 8 May 2024 17:06:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/8] mm: memory: extend finish_fault() to support large folio To: David Hildenbrand , Ryan Roberts , akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ying.huang@intel.com, 21cnbao@gmail.com, shy828301@gmail.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <13939ade-a99a-4075-8a26-9be7576b7e03@arm.com> <5564e708-4a9f-4010-806e-4c5a7a5d2ebe@redhat.com> From: Baolin Wang In-Reply-To: <5564e708-4a9f-4010-806e-4c5a7a5d2ebe@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: 6gmm9az9ehgmbz5nro5tdgihsgtbjaqs X-Rspamd-Queue-Id: 9C47F40012 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1715159195-604357 X-HE-Meta: U2FsdGVkX18Kh9N09BJWgGFxchoXK2auCJusaeeId+u5HRTXXBtbkXl6uytQx1IcQRJdLflxmPh74W01cSQIKAKemMnBaHJFOJNIRBWC+AohDUNpgBlj49mz25VULOQEAY39+DZFhpyF0AAx133jp1FqAi4jBPPq74iRQEesb5Qs4UCi11REwyAoS2FhviV/5MrtmiH4pTDvsXfx5+M7dg35sHr9c30BqJCcN1x50iU5wiAtM5lvJumUg8HmqfuulQ/H71elHqONGkHRGMOAIiw+3DZNCgbnJIETifa12zYILawsr1u+m9lpVpUdY8B+5ew7ifwlZc+1PC2ZqjuzpZClDj/A/FhdeXL3CE/ZD+GSgl8K0gzuwy4Bct5mUFSa5OInW+CSYke9xexfyHOw6sXeB7f+RCpuXlVEbjIJdYbM1ylTgHn60ONCPH69u/Y8iS48SVEVsdh/PWyyZsK3iqOGakwp+fhpgRFiK1OS8/UnXHb4SzPnZHHzo6lBwxU8EL3R0ZymzGWSpe3guB0qD+GiFHBsLM4gCiO9oFgiUaTsqoWZTSIRnM5bpHCKGjUG3xShBhbc631yqpJB1XMt4HpMDUF2ozrG/2YC+keOCWHk6SBPBfkVvQDGMfW6p/7vMf80dFg9Zi3liEbFzJqDDQM4BSb4o1V/CnUxlAZEn9U6zmxgjHUK6/wvBYqhzxK/Mjc4OzMltPb/LjlMLnroSY0xd3dfwBNfyzR9z941NQXgDVkvkBS++RsaaUk5fGuRjL4Lj66ZjuR6UYKbT3Mort06RwiyExsITONhaJSyh3CuwREk+gJRi/cTWD3Pgly5nY+jOBSnXAKvCaz1aloucxRtLz3M7p1YRkzyy/0OxH48iw7OyGCLNUslh2d+3LnnVLM6Vu8iJ45xRE6T8HPbKmvZz1qOFx4vbfU+UdI+ICYQIhEr3OcxzWPhN4+EvJ34zrA37Q8UAN+GZ3exfsX ctkKM1X7 JzNXiSuDNkEMk8hWxjwB+Sy7cU15pJR9i92+/ZwWdklDg+FGjwKV1rP+pGdw1cGyIc4++oV+hKYqGO6IQx7gp+lGdveSkBY4ePU6/Q0Wj51qQ1oP7FADQi1F/aMckyJ+U6aWXxrZfL4FGidKTkmbvYOUswm89kx7nA9Fr5wODzAQ2iHsZMf2VN4uPUNcy9KcjB3KtiHkZk+27gZNurAdJWS73bzFgrGmz3Ix9Wdp0QTuk4ZLfWwxofc6WIpZvPslypfw+dci7uUoUxSVo5TKEQJUwvJwxU/2Q82GjAxgS81n9OLqYQUXo08l6qyS2B4qLdTRRW3FIYJ5Cj9/iUuMN+6nc0yQW+nOzadL5y9wM+OfozO/IbidkdYYsGSHlE0SM/WKK1aaDTtDVcVk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/5/8 15:15, David Hildenbrand wrote: > On 08.05.24 05:44, Baolin Wang wrote: >> >> >> On 2024/5/7 18:37, Ryan Roberts wrote: >>> On 06/05/2024 09:46, Baolin Wang wrote: >>>> Add large folio mapping establishment support for finish_fault() as >>>> a preparation, >>>> to support multi-size THP allocation of anonymous shmem pages in the >>>> following >>>> patches. >>>> >>>> Signed-off-by: Baolin Wang >>>> --- >>>>    mm/memory.c | 43 +++++++++++++++++++++++++++++++++---------- >>>>    1 file changed, 33 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/mm/memory.c b/mm/memory.c >>>> index eea6e4984eae..936377220b77 100644 >>>> --- a/mm/memory.c >>>> +++ b/mm/memory.c >>>> @@ -4747,9 +4747,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf) >>>>    { >>>>        struct vm_area_struct *vma = vmf->vma; >>>>        struct page *page; >>>> +    struct folio *folio; >>>>        vm_fault_t ret; >>>>        bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) && >>>>                  !(vma->vm_flags & VM_SHARED); >>>> +    int type, nr_pages, i; >>>> +    unsigned long addr = vmf->address; >>>>        /* Did we COW the page? */ >>>>        if (is_cow) >>>> @@ -4780,24 +4783,44 @@ vm_fault_t finish_fault(struct vm_fault *vmf) >>>>                return VM_FAULT_OOM; >>>>        } >>>> +    folio = page_folio(page); >>>> +    nr_pages = folio_nr_pages(folio); >>>> + >>>> +    if (unlikely(userfaultfd_armed(vma))) { >>>> +        nr_pages = 1; >>>> +    } else if (nr_pages > 1) { >>>> +        unsigned long start = ALIGN_DOWN(vmf->address, nr_pages * >>>> PAGE_SIZE); >>>> +        unsigned long end = start + nr_pages * PAGE_SIZE; >>>> + >>>> +        /* In case the folio size in page cache beyond the VMA >>>> limits. */ >>>> +        addr = max(start, vma->vm_start); >>>> +        nr_pages = (min(end, vma->vm_end) - addr) >> PAGE_SHIFT; >>>> + >>>> +        page = folio_page(folio, (addr - start) >> PAGE_SHIFT); >>> >>> I still don't really follow the logic in this else if block. Isn't it >>> possible >>> that finish_fault() gets called with a page from a folio that isn't >>> aligned with >>> vmf->address? >>> >>> For example, let's say we have a file who's size is 64K and which is >>> cached in a >>> single large folio in the page cache. But the file is mapped into a >>> process at >>> VA 16K to 80K. Let's say we fault on the first page (VA=16K). You >>> will calculate >> >> For shmem, this doesn't happen because the VA is aligned with the >> hugepage size in the shmem_get_unmapped_area() function. See patch 7. > > Does that cover mremap() and MAP_FIXED as well. Good point. Thanks for pointing this out. > We should try doing this as cleanly as possible, to prepare for the > future / corner cases. Sure. Let me re-think about the algorithm.