From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BCB3C61DA4 for ; Fri, 3 Feb 2023 13:25:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A02E76B0074; Fri, 3 Feb 2023 08:25:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B3426B0075; Fri, 3 Feb 2023 08:25:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87A756B0078; Fri, 3 Feb 2023 08:25:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 751656B0074 for ; Fri, 3 Feb 2023 08:25:11 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 30F271A03F6 for ; Fri, 3 Feb 2023 13:25:11 +0000 (UTC) X-FDA: 80426051622.21.95844B1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id EE67A2000A for ; Fri, 3 Feb 2023 13:25:08 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="J/i9YPiY"; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675430709; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aecYKE0jTua4kYiAi/7gm6eQdzq/vrU6sTqtFmSmkBQ=; b=dpKZO6sfFYGTvNbPDWd9XMuWjLK4mglCJ5DD4+qgJIK0qOoFDB26mpXxsRaUiFxJ0ZjN47 cNIv1aVcokGc9LlShkv16F/zOJdHoBqFaKpBrO4b7MrNgOWfSaVDhthFfI+RQLR9j2Etr8 VrzQqrshAmMt7hIBjy2RUkDd0SkAd2s= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="J/i9YPiY"; spf=pass (imf13.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675430709; a=rsa-sha256; cv=none; b=Vak+v098sN6dF2buZmKH4tkT3yJRy+/EYvD0s2PTCSceWdl0LOxMo9c5SO4bQFNvS0LzTJ XMMkJP9sRFazGYJCeNO8nwdUx2TlhYK9bUb8Y6EQ7Rb5+Cy12Aa2BPFt1ZLhjdXPWuMpQq hEyLjUyqtDY9FDtk9I6yq0LTWcV5QGM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675430708; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aecYKE0jTua4kYiAi/7gm6eQdzq/vrU6sTqtFmSmkBQ=; b=J/i9YPiYLmFoJzvSIcKz0YQ1NXY48tZ7Dcs+tEetgMc5ULDPQoW84v12ASpATKrr1k3MtV AZjsM7aAMh+CZl6CtchJuNrm1H6JkBdMXqp9RFKPH/hOD7YcNtU8AvSLjRzxjQC7lLfPF8 7P++OBr0OLdGITfuKXGxe+ujaXjNeec= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-74-s8qVy9J_OcSPfMQsZCtkGA-1; Fri, 03 Feb 2023 08:25:07 -0500 X-MC-Unique: s8qVy9J_OcSPfMQsZCtkGA-1 Received: by mail-wm1-f70.google.com with SMTP id e38-20020a05600c4ba600b003dc434dabbdso4596268wmp.6 for ; Fri, 03 Feb 2023 05:25:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=aecYKE0jTua4kYiAi/7gm6eQdzq/vrU6sTqtFmSmkBQ=; b=4ILyG/qM/ygaHriNINIpsJARtzSmuO5yyMRqqAU3KZeAHGx1ETv8o0lZA9woAIA/PO Y8t8+yJcGstGjXmNb1hCP+AUgYAzu+K+0nTPBO7UKILTf2LgdmUCT/RZGVHX8zDWPEeZ 3Jfo+pw4WeH42XN0szAeSYXH3GE4/x4gQyjnqn+PzJ5VtV+cHrmeankBBepkxYEuyKZU Sjrx1grGjFf3adLnBgzOgR5FvCzN4r9UjxI3olJeqjcvsBwsm8ALBnklF4c9Ewyl502u noTr2AldiGbAjk0QSRN4qeBPY2p0B2UxQQGoC4KbFZGcdHrTvISQps3XT6KwDaycYOq2 uVMg== X-Gm-Message-State: AO0yUKUK5WCjjcAwutLEy4rhJTNqL9DKs5WI7LZQxhUrPHfn6PcpZ+62 k9ZX9BdN71itK0kLMJgcCWbpLPHMToIuY24cI4TGE5wsW13QZ/EqVK+lLFfBy+oPQrBR2Z2Q0w0 gwVTRgyZwBvc= X-Received: by 2002:adf:c754:0:b0:2ba:dce5:ee28 with SMTP id b20-20020adfc754000000b002badce5ee28mr9889241wrh.18.1675430706019; Fri, 03 Feb 2023 05:25:06 -0800 (PST) X-Google-Smtp-Source: AK7set/q0oQzMYb2Pbi5zOghivP8JZKU+2iqZQd9ei/VAmTb7qH1p5U2E1IOKzkCdAymGDYft7X6Yw== X-Received: by 2002:adf:c754:0:b0:2ba:dce5:ee28 with SMTP id b20-20020adfc754000000b002badce5ee28mr9889222wrh.18.1675430705742; Fri, 03 Feb 2023 05:25:05 -0800 (PST) Received: from ?IPV6:2003:cb:c706:7900:b84d:7f2e:b638:3092? (p200300cbc7067900b84d7f2eb6383092.dip0.t-ipconnect.de. [2003:cb:c706:7900:b84d:7f2e:b638:3092]) by smtp.gmail.com with ESMTPSA id x9-20020a5d4449000000b002c3cf230b60sm1649927wrr.73.2023.02.03.05.25.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 03 Feb 2023 05:25:05 -0800 (PST) Message-ID: <5c85d6bb-f4bf-1969-8ec3-c16399e5d6f2@redhat.com> Date: Fri, 3 Feb 2023 14:25:04 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 To: Yin Fengwei , willy@infradead.org, linux-mm@kvack.org Cc: dave.hansen@intel.com, tim.c.chen@intel.com, ying.huang@intel.com References: <20230203131636.1648662-1-fengwei.yin@intel.com> <20230203131636.1648662-4-fengwei.yin@intel.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [RFC PATCH v3 3/4] mm: add do_set_pte_range() In-Reply-To: <20230203131636.1648662-4-fengwei.yin@intel.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 1hjf9yb7injd53ez6puu3sd9sgu5zehr X-Rspamd-Queue-Id: EE67A2000A X-HE-Tag: 1675430708-28018 X-HE-Meta: U2FsdGVkX1/rvtjvqe7KnKpCu1v9t/S/SJKTHAl2fv4LTIf8GHJQIk4ZA7n3mjtpwPRDygxzwck4/vHtbVmFswpcSOvt/g4cx1IntAn1Dzcj8OYlHQ13VhOTmnUuzpKHH7/6PtN68lYggcIyRal8gYOwlLnCAINsbwg2HES7hzKrm9QX4ccQpLQW2Yp+VWwkUOTNpWhlrW6PPMpCo6fDt9CF7eq6ALDAjSDg7NAt2uSydUfN9thYc3tte5qc8kgCQqh1nV7WynmrNaEfh2SLXhrVCnbvasghN6JvlJkajmgtFGpKjAXo+RqIHwlkUvzyrSCD7vaBkxyYiybX4agTwfw1AaALjVONcWmy8tFayj+EvHbufzHUOOyvzmRw7z8MBGoU4VmiOWfdeZqQBaeLVeslGkAbzwSVSK32KLrhVuF7nQCmKPOPhViUVFZ6AOz2GhkLq0tCfteZF17W1gepkL0vcgA1CktaZoNHYDJEeZ8dBYIFcWTWmErj9Fpx7nAPtHtLYv4cMR+x1fm18fj1wxlenuHkhmnzq3KejFjXbzfJaHRmFB5KAvQOJ5+m0zOh5dHmhI/fuX/eRKMhOWwT9GzM9FoQ4thAGSywL5mmeNdr8KCiiLW+Bq1c6TJ4PTy/8z7xUZnVghR4uzN6I//DP5uBKjpCIwY8JPwrJjs1y4klira0qvf3nqgrc9NndHw7HLOFnbQQv/845zoiHD7AminMyeP0+5nYyjP3Qih4M9iGa3DKsfwFycvz3m53y37/myoQ7KYW9SFlk1Vtm3COohOdN38kH5wxinG3Iq+avsMiGhGYB6WbUirIAMme5PrrzW1XEt+N0bpdvFti66kAK/cA5/4YIeTJP2YZ38Q2SNT2ZI09ttyBZW4T7eiYnsdGouzDtr6jgM1TqXpEvAMVuKX8/PNt0Y1/YK2AY1TTZwdSF/uY0qscuTqwRch0duqwexqQ/ChJ2ppysO2US91 ooKn4fQ4 EwZdja0Hfsu3DGSrsh8QFgptSJ4ii2Lt0IaUD2GGESBJb7VwXw/AfsbcoPjDaRqz1fnID9DotIpTpCcovRsmGNP5nRfZkwK63L4hpBzKypqekIx5nqOibFEWNVXA3C/yA68Re3MTZxQlI5Xs/ecBtDtap/Ec/YlrUiIZuI6A9NG03SD+pgxMjJMImuMiW/8YUfM3PrK8TuNpeJgkM+xSeya6hwr5+dyjLUSd+zwX6Jpu7IbV7BbnYlHf/Y/giQODDCa2C4QIfWTaCmL3GM0GV9awWJPuV1WN3YAly0OO5xeM/uRzhcWav8bJh/zYQbhMgiudUP3MH6Sxgijr46Jo9f073YjjHWinQUB8sPekQivxZHwS+NXUK/olk3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03.02.23 14:16, Yin Fengwei wrote: > do_set_pte_range() allows to setup page table entries for a > specific range. It calls folio_add_file_rmap_range() to take > advantage of batched rmap update for large folio. > > Signed-off-by: Yin Fengwei > --- > include/linux/mm.h | 3 +++ > mm/filemap.c | 1 - > mm/memory.c | 59 ++++++++++++++++++++++++++++++---------------- > 3 files changed, 42 insertions(+), 21 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index d6f8f41514cc..93192f04b276 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1162,6 +1162,9 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) > > vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); > void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr); > +void do_set_pte_range(struct vm_fault *vmf, struct folio *folio, > + unsigned long addr, pte_t *pte, > + unsigned long start, unsigned int nr); > > vm_fault_t finish_fault(struct vm_fault *vmf); > vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); > diff --git a/mm/filemap.c b/mm/filemap.c > index f444684db9f2..74046a3a0ff5 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3386,7 +3386,6 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf, > > ref_count++; > do_set_pte(vmf, page, addr); > - update_mmu_cache(vma, addr, vmf->pte); > } while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages); > > /* Restore the vmf->pte */ > diff --git a/mm/memory.c b/mm/memory.c > index 7a04a1130ec1..3754b2ef166a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4257,36 +4257,58 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > } > #endif > > -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) > +void do_set_pte_range(struct vm_fault *vmf, struct folio *folio, > + unsigned long addr, pte_t *pte, > + unsigned long start, unsigned int nr) > { > struct vm_area_struct *vma = vmf->vma; > bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte); > bool write = vmf->flags & FAULT_FLAG_WRITE; > + bool cow = write && !(vma->vm_flags & VM_SHARED); > bool prefault = vmf->address != addr; > + struct page *page = folio_page(folio, start); > pte_t entry; > > - flush_icache_page(vma, page); > - entry = mk_pte(page, vma->vm_page_prot); > + if (!cow) { > + folio_add_file_rmap_range(folio, start, nr, vma, false); > + add_mm_counter(vma->vm_mm, mm_counter_file(page), nr); > + } > > - if (prefault && arch_wants_old_prefaulted_pte()) > - entry = pte_mkold(entry); > - else > - entry = pte_sw_mkyoung(entry); > + do { > + flush_icache_page(vma, page); > + entry = mk_pte(page, vma->vm_page_prot); > > - if (write) > - entry = maybe_mkwrite(pte_mkdirty(entry), vma); > - if (unlikely(uffd_wp)) > - entry = pte_mkuffd_wp(entry); > - /* copy-on-write page */ > - if (write && !(vma->vm_flags & VM_SHARED)) { > + if (prefault && arch_wants_old_prefaulted_pte()) > + entry = pte_mkold(entry); > + else > + entry = pte_sw_mkyoung(entry); > + > + if (write) > + entry = maybe_mkwrite(pte_mkdirty(entry), vma); > + if (unlikely(uffd_wp)) > + entry = pte_mkuffd_wp(entry); > + set_pte_at(vma->vm_mm, addr, pte, entry); > + > + /* no need to invalidate: a not-present page won't be cached */ > + update_mmu_cache(vma, addr, pte); > + } while (pte++, page++, addr += PAGE_SIZE, --nr > 0); > +} > + > +void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) > +{ > + struct folio *folio = page_folio(page); > + struct vm_area_struct *vma = vmf->vma; > + bool cow = (vmf->flags & FAULT_FLAG_WRITE) && > + !(vma->vm_flags & VM_SHARED); > + > + if (cow) { > inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > page_add_new_anon_rmap(page, vma, addr); As raised, we cannot PTE-map a multi-page folio that way. This function only supports single-page anon folios. page_add_new_anon_rmap() -> folio_add_new_anon_rmap(). As that documents: "If the folio is large, it is accounted as a THP" -- for example, we would only increment the "entire mapcount" and set the PageAnonExclusive bit only on the head page. So this really doesn't work for multi-page folios and if the function would be used for that, we'd be in trouble. We'd want some fence here to detect that and bail out if we'd be instructed to do that. At least a WARN_ON_ONCE() I guess. update_mmu_tlb(vma, vmf->address, vmf->pte); Right now the function looks like it might just handle that. -- Thanks, David / dhildenb