From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1401BC636D4 for ; Fri, 3 Feb 2023 13:34:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FBC06B0072; Fri, 3 Feb 2023 08:34:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9ABBE6B0073; Fri, 3 Feb 2023 08:34:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89A4B6B0074; Fri, 3 Feb 2023 08:34:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 793436B0072 for ; Fri, 3 Feb 2023 08:34:17 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 34C9DAB750 for ; Fri, 3 Feb 2023 13:34:17 +0000 (UTC) X-FDA: 80426074554.20.4CA7385 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf05.hostedemail.com (Postfix) with ESMTP id 085DF100009 for ; Fri, 3 Feb 2023 13:34:14 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=V23MQVFV; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675431255; a=rsa-sha256; cv=none; b=p5HuVi5fNvMOlHIxhvz5WQokm8kQK8zPd7IJN694tMjIZKFsCCi3QH+mNJP8OyI0x0MaKT zaMAKG+hVbrFIBl0j1B3wfLJC1/00LR317MO86dmjM/UUc+vmF5NKmefxdvwm5xxO3W1rP ekp1dlkLF4b+qkFPHgsJ85NXcwdKk0Q= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=V23MQVFV; spf=pass (imf05.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675431255; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P9AJXW8YgmggWpzABufwUAVUbUS8WGn2rgkVboRx/aA=; b=BNlSSDNkjmdnI+TzmFm4MnL2M9fFbHtKQ5k4v3L6XV1KfHEo7TypyxFtH4cJh7gN/708ey G9jwIaZhSYotZtRbu1k5uWEMHoZ33DOhEHO4HcIQVjeJXzewA09faRrkIDgFSDiem1Qjsz ZYMLFkaZo7CENq6TQDV0XOOzpdAMXCg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675431254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P9AJXW8YgmggWpzABufwUAVUbUS8WGn2rgkVboRx/aA=; b=V23MQVFVd/ZfuJSjOKlJbek8//ksfBpE+8UmlCYUjwpi9Y7YWdDqUQrrjzawveqYZvYnoY 3hVhEWrIyFr3fnoilJsGfwajWDrqViWinC16FdViNCfozjo0ecpWy1un6cx/sOq1mNrPn5 Si6PYKQi80JFNy34sFzJvy3DlG5fim8= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-475-Jj_PNUFgNte6WhzaSoe3kA-1; Fri, 03 Feb 2023 08:34:10 -0500 X-MC-Unique: Jj_PNUFgNte6WhzaSoe3kA-1 Received: by mail-wm1-f69.google.com with SMTP id o42-20020a05600c512a00b003dc5341afbaso2614818wms.7 for ; Fri, 03 Feb 2023 05:34:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=P9AJXW8YgmggWpzABufwUAVUbUS8WGn2rgkVboRx/aA=; b=tqbUvY492da4c2M/fsYn2pru3YHFZAsBbDTqwggBWWqC979e6X13U40qJyajSOoIT2 MHPrg+jxJhZWUf4iDBy1uvaaz+6qlLOO4rT3oMrqLPFJbRDrLpuUPRiIRanWwUpWTgYB GBr4WgvNI8cY1TgMaKyB81z56Vckj2RJPOexOZ4StXoOeoHac//EObKvp1KeORpPlC8P 9/mNe2cAB1sHlGcodPvgeezPDdhxtU544Uu9zXrZzkQfpjyzoKT+Qym1/JKZK9KQPOIH jn65cI4Nb9ueAL7FFS570Zj9sqi6pGHJ/WFwTLNRY3mbvCytZ9ukLjVTihgvQZro3q7I QVRQ== X-Gm-Message-State: AO0yUKXLNuDC0/cqiuGVMzCO1/7k7crU5YXtLE6vRoOs8mHrERp5MA0Z NoVGthI8Zc5m9+e7uEo4wUJU2XzLFcInX7I3r8mH7q8XDoaqnczdxJxIFelaPFeahpjFdqR+wOD 7ToOJ01+pGlo= X-Received: by 2002:adf:a4cf:0:b0:2bc:7f32:e6ae with SMTP id h15-20020adfa4cf000000b002bc7f32e6aemr8498770wrb.64.1675431249417; Fri, 03 Feb 2023 05:34:09 -0800 (PST) X-Google-Smtp-Source: AK7set9CpEgEwhgyuKFFstZXJIczSSQDcir5vFEyZpKiNfV+8k+8kdykY+IokThgViBE2XilowW9/w== X-Received: by 2002:adf:a4cf:0:b0:2bc:7f32:e6ae with SMTP id h15-20020adfa4cf000000b002bc7f32e6aemr8498747wrb.64.1675431249121; Fri, 03 Feb 2023 05:34:09 -0800 (PST) Received: from [192.168.3.108] (p5b0c6376.dip0.t-ipconnect.de. [91.12.99.118]) by smtp.gmail.com with ESMTPSA id s7-20020a5d6a87000000b00287da7ee033sm1985740wru.46.2023.02.03.05.34.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 03 Feb 2023 05:34:08 -0800 (PST) Message-ID: <6ccb2392-70fb-5135-d61d-c79aa82b9276@redhat.com> Date: Fri, 3 Feb 2023 14:34:07 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: [RFC PATCH v3 3/4] mm: add do_set_pte_range() To: "Yin, Fengwei" , willy@infradead.org, linux-mm@kvack.org Cc: dave.hansen@intel.com, tim.c.chen@intel.com, ying.huang@intel.com References: <20230203131636.1648662-1-fengwei.yin@intel.com> <20230203131636.1648662-4-fengwei.yin@intel.com> <5c85d6bb-f4bf-1969-8ec3-c16399e5d6f2@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 085DF100009 X-Rspamd-Server: rspam01 X-Stat-Signature: wydoo9x9yzrn75em3rm7o71kextq1koa X-HE-Tag: 1675431254-935351 X-HE-Meta: U2FsdGVkX1/rCqDdf7iM01qercHwMpF4V1LzJMtju1rbEEkwFXTMw1gDHMXRwQhwReerspvC/xHjRyTyHuMySVzKDn12scd9CRvXBzuEGOHJ2WDNGvxXGC+pcs6J+Wd+vTLNXi1RCY1ZxrxJCbp/8ZppPh9hv0dagWtJs71ZDGwU5wR8bJtq6AwQnx1yd+f27fXfTMoU8KAqPy0q7a763Mj3/p0+AN4MYAgZn/ynsy2WDq/K983njpxDBkJwq+zf2cpwIrj0dD9P+HbQe7wQs+MemFoO4JhY56fb1fIRyta02AxF6xU+0PhbeVChnaD3klhW6GesbalZbJaLz/q4UHW7SNS5389ZQd1XTwJ7tMfgxvz3ss07vNKtvjMnvX+ToVYaLIwIxwTMBfSZi9aZJiyTGcDNmHirDs/s2XW+zXRdhEDwG8P6A6btpaYNE0eM02DQvd+TgSpP/iF3KUXog0M7q0T5O/ih5/WcPp0JbbET9iz+fhEMl4Aw/tnxnVLFp+pZrBD6SL6ys4ymmng0YS/1iroia8lkc3XWP/LfsCZFTKOjppJC98dVwy5UVsnkBLYdLIApdU6aFHc8lkp/6bEDjf30pF+UTdpcGsQxVuPipjZAa9u7vSwgyATwqKywNkoC/2lSDOV41gynLCWEe6Dg0JHmPRvkNvZzSUm2vaAP9Tin79p3o0TGJ27ghn4C6n+omSKW8eql9iCq3dB6ST8z6kDdByOfhyWCXGjwZgMNkoXCfcI8fHIMR1eaWYFjSNbXjyqAtrKtsV7Ko017XujdyVPT/ygZsXz6JnGdM/wO0iCaTnwxOAOb6t7Iju+eGBBo4TC/pWZq9o1csZfrkKAA+X9HIw8jk8XTuUN16I+FXp6cfMhPTGbEtvqT3A/M9433APLSxlsl2c4GvqQKfjxGMlxlEpswuSnBvze1p/gzh/rFccvxsneM/MyM53MN1JrU7Y59T5Pemck5P1D jF0jfP5b /wbzET6PdI5/H4drEMmhDOXBgqKc0aWpQLw+C0WaYfffncY4wgRVCUY8tUxVmd/sWXnWyNBaLWzXkOoiIRg++ysFMRXodl2Ja/VQX1pQHP1tOc2IqEmfAwMeECVSpNmeXhU5VWqtkmcUhEijm9xMQWfFx3EX26TCm8COzHpL7dsLEr/PpVqTzRVYtu7GYTzLeh0yI4kSNSlU/Lj5vOQKXT98vFsb4AqSA07BrI8OjbjPW7pFvD2mRYwXbSnXngU12Ug6UA/nNp+jLmvFVewiY+FMRLOcmUF/S5m/S4J8i4LaLqgfbOUcBRTdeABoyT+Y9l50Dyq22NnrvY/6S7VK7TBIzh8gR4OuiMWnpGk49MzT4THdJ30Tp5L1SZLolFeemaQI14s+8R/vTZUcu2icoV4TkVlLP/X7gG9vYoiiIXMW5herulhQVWa8mWXlPiCNqPpVBqEUZDNlV4LzqU3kgB5S18cYAMmDja2qJontKLygFuh8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03.02.23 14:30, Yin, Fengwei wrote: > > > On 2/3/2023 9:25 PM, David Hildenbrand wrote: >> On 03.02.23 14:16, Yin Fengwei wrote: >>> do_set_pte_range() allows to setup page table entries for a >>> specific range. It calls folio_add_file_rmap_range() to take >>> advantage of batched rmap update for large folio. >>> >>> Signed-off-by: Yin Fengwei >>> --- >>>   include/linux/mm.h |  3 +++ >>>   mm/filemap.c       |  1 - >>>   mm/memory.c        | 59 ++++++++++++++++++++++++++++++---------------- >>>   3 files changed, 42 insertions(+), 21 deletions(-) >>> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index d6f8f41514cc..93192f04b276 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -1162,6 +1162,9 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) >>>     vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); >>>   void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr); >>> +void do_set_pte_range(struct vm_fault *vmf, struct folio *folio, >>> +        unsigned long addr, pte_t *pte, >>> +        unsigned long start, unsigned int nr); >>>     vm_fault_t finish_fault(struct vm_fault *vmf); >>>   vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index f444684db9f2..74046a3a0ff5 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -3386,7 +3386,6 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, >>>             ref_count++; >>>           do_set_pte(vmf, page, addr); >>> -        update_mmu_cache(vma, addr, vmf->pte); >>>       } while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages); >>>         /* Restore the vmf->pte */ >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 7a04a1130ec1..3754b2ef166a 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -4257,36 +4257,58 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) >>>   } >>>   #endif >>>   -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) >>> +void do_set_pte_range(struct vm_fault *vmf, struct folio *folio, >>> +        unsigned long addr, pte_t *pte, >>> +        unsigned long start, unsigned int nr) >>>   { >>>       struct vm_area_struct *vma = vmf->vma; >>>       bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte); >>>       bool write = vmf->flags & FAULT_FLAG_WRITE; >>> +    bool cow = write && !(vma->vm_flags & VM_SHARED); >>>       bool prefault = vmf->address != addr; >>> +    struct page *page = folio_page(folio, start); >>>       pte_t entry; >>>   -    flush_icache_page(vma, page); >>> -    entry = mk_pte(page, vma->vm_page_prot); >>> +    if (!cow) { >>> +        folio_add_file_rmap_range(folio, start, nr, vma, false); >>> +        add_mm_counter(vma->vm_mm, mm_counter_file(page), nr); >>> +    } >>>   -    if (prefault && arch_wants_old_prefaulted_pte()) >>> -        entry = pte_mkold(entry); >>> -    else >>> -        entry = pte_sw_mkyoung(entry); >>> +    do { >>> +        flush_icache_page(vma, page); >>> +        entry = mk_pte(page, vma->vm_page_prot); >>>   -    if (write) >>> -        entry = maybe_mkwrite(pte_mkdirty(entry), vma); >>> -    if (unlikely(uffd_wp)) >>> -        entry = pte_mkuffd_wp(entry); >>> -    /* copy-on-write page */ >>> -    if (write && !(vma->vm_flags & VM_SHARED)) { >>> +        if (prefault && arch_wants_old_prefaulted_pte()) >>> +            entry = pte_mkold(entry); >>> +        else >>> +            entry = pte_sw_mkyoung(entry); >>> + >>> +        if (write) >>> +            entry = maybe_mkwrite(pte_mkdirty(entry), vma); >>> +        if (unlikely(uffd_wp)) >>> +            entry = pte_mkuffd_wp(entry); >>> +        set_pte_at(vma->vm_mm, addr, pte, entry); >>> + >>> +        /* no need to invalidate: a not-present page won't be cached */ >>> +        update_mmu_cache(vma, addr, pte); >>> +    } while (pte++, page++, addr += PAGE_SIZE, --nr > 0); >>> +} >>> + >>> +void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) >>> +{ >>> +    struct folio *folio = page_folio(page); >>> +    struct vm_area_struct *vma = vmf->vma; >>> +    bool cow = (vmf->flags & FAULT_FLAG_WRITE) && >>> +            !(vma->vm_flags & VM_SHARED); >>> + >>> +    if (cow) { >>>           inc_mm_counter(vma->vm_mm, MM_ANONPAGES); >>>           page_add_new_anon_rmap(page, vma, addr); >> >> As raised, we cannot PTE-map a multi-page folio that way. >> >> This function only supports single-page anon folios. >> >> page_add_new_anon_rmap() -> folio_add_new_anon_rmap(). As that documents: >> >> "If the folio is large, it is accounted as a THP" -- for example, we would only increment the "entire mapcount" and set the PageAnonExclusive bit only on the head page. >> >> So this really doesn't work for multi-page folios and if the function would be used for that, we'd be in trouble. >> >> We'd want some fence here to detect that and bail out if we'd be instructed to do that. At least a WARN_ON_ONCE() I guess. >> update_mmu_tlb(vma, vmf->address, vmf->pte); >> >> Right now the function looks like it might just handle that. > You are right. I thought moving cow case out of it can make it explicit. > But looks like it doesn't. I will add WARN_ON_ONCE(). Thanks. I guess I would move the cow check into do_set_pte_range() as well, and verify in there that we are really only dealing with a single-page folio, commenting that rmap code would need serious adjustment to make it work and that current code never passes a multi-page folio. Thanks! -- Thanks, David / dhildenb