From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29E37C4167B for ; Sun, 3 Dec 2023 21:41:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 887346B00B5; Sun, 3 Dec 2023 16:41:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 80B716B00B8; Sun, 3 Dec 2023 16:41:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65E046B00BA; Sun, 3 Dec 2023 16:41:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4DB206B00B5 for ; Sun, 3 Dec 2023 16:41:45 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0CEA614015A for ; Sun, 3 Dec 2023 21:41:45 +0000 (UTC) X-FDA: 81526829370.07.D3DEDE9 Received: from mail-ua1-f53.google.com (mail-ua1-f53.google.com [209.85.222.53]) by imf28.hostedemail.com (Postfix) with ESMTP id 4B161C0009 for ; Sun, 3 Dec 2023 21:41:43 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DVAfvs2D; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701639703; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DidZ9inAHikZu2d6qsTQyFDw8bvSjf3fWx5m6EwOzac=; b=3ozLSrJ0HxzaoiFNfsABWhDmfRP8GrAvBQFm2LlDRT2pgq/IIdBxFoirG2ERVjkkVvP9SN K3HmLehSASZLCiGcFBlWzLfflHgsKpvwRN3MqjNRdlO8lheh7Iytyu6ZCYET1KuRF9iN/S EZ1hBxZUA321w8w+qwC42rV/S2ne+BM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DVAfvs2D; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701639703; a=rsa-sha256; cv=none; b=GmEiaB+nzMvP0MADcWn9GtAgcvs/GbQh9Ztp5RgJApDOzAnYvEHnGGCm4lSY3D4tN6ldW2 v3Vq8J6C/rzV/4L/lD1FsPRv5+9tz2YluUoQkjCltpSbyNql0qLSTCd+VVxEhvMnaeAXe2 XG9xksTyY4iJOPsRgKzFkc6MpkSLmxs= Received: by mail-ua1-f53.google.com with SMTP id a1e0cc1a2514c-7c495bec2f7so795725241.2 for ; Sun, 03 Dec 2023 13:41:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701639702; x=1702244502; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DidZ9inAHikZu2d6qsTQyFDw8bvSjf3fWx5m6EwOzac=; b=DVAfvs2D0vPNSrXhmptCGNw63Vnit4fOcaixgAFURI+hRd/kL34pUiRkZ52gEdeypq 6P3OOTeWLimvfNpn071mhrHlNKPUyXPSRak9eacN79DHV/dbGx3nMHwIlJbkq3m0U/Yl Tp53vqhDA1Ta8MDlnJJ4jXT7hZ0agWtaULrtrfMQESg7Xgg9IcH3W7VrSMuYBqjQC8dh PWuyksO6MwlqRG7xv+PUgXoCOoJsQYj3KKHcCGzUSIj9b0LqKf5zcdLK+knwWuacKeMo 85vZ92BuAA/rqJF6AQcHNdaSkL9QGCSL7xQTOe5IA15v4RCbRd4pGsYqDjv74FqeGDig swTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701639702; x=1702244502; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DidZ9inAHikZu2d6qsTQyFDw8bvSjf3fWx5m6EwOzac=; b=a5K3G+zZIJBFBR274SIgo4Cb5avhZW62gfzodUNE7kRuP/1lqljHqeEEvSMDWXn5kh GHdirHTjFR01U4tC7vk+7UKFQjq3di5yVmen2NgB1G4fXsfD7ex+lN/bBd3KLb2/3QAh 23vIf0/DFdd/eve7MTBrIzk5Stpi5rP7OCnl1yg5RvTsA5IXhzsVr1uxiKFG3PAwHOY2 N7J0houpTpFDk0IoFKapYtj9d9jfzqF3XFmOQtDeP2jxtyKyXD8cJx1e36ThXHYcrj9U L04zYCaGKs7vKJHEfihoV4mOg/wNIjV723cEDzlCHEUqXQ597GkyRl2qTjGJ7Vs4vhDI oFtA== X-Gm-Message-State: AOJu0Yw3MPYan0d81zObMjLEk2QZpoOmkxD4EQlZeItPJ9HnIYTVMdU8 rouEzhLNxRDki1V898R2hX7N41mL9mX/NL6CaLw= X-Google-Smtp-Source: AGHT+IFV4tqkSOEuQG/MYf+di2ncGqYs2lWfsRSsfbo/sskGMRs8c/D5YASEYEUtun1i3Ls7Wo3+osnwsUoE8G6z4oo= X-Received: by 2002:a1f:728b:0:b0:4b2:c554:e3f8 with SMTP id n133-20020a1f728b000000b004b2c554e3f8mr510978vkc.24.1701639702199; Sun, 03 Dec 2023 13:41:42 -0800 (PST) MIME-Version: 1.0 References: <20231115163018.1303287-15-ryan.roberts@arm.com> <20231128081742.39204-1-v-songbaohua@oppo.com> <207de995-6d48-41ea-8373-2f9caad9b9c3@arm.com> <34da1e06-74da-4e45-b0b5-9c93d64eb64e@arm.com> <1d2f8e43-447e-4af4-96ac-1eefea7d6747@arm.com> In-Reply-To: <1d2f8e43-447e-4af4-96ac-1eefea7d6747@arm.com> From: Barry Song <21cnbao@gmail.com> Date: Mon, 4 Dec 2023 05:41:27 +0800 Message-ID: Subject: Re: [PATCH v2 14/14] arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown To: Ryan Roberts Cc: akpm@linux-foundation.org, andreyknvl@gmail.com, anshuman.khandual@arm.com, ardb@kernel.org, catalin.marinas@arm.com, david@redhat.com, dvyukov@google.com, glider@google.com, james.morse@arm.com, jhubbard@nvidia.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mark.rutland@arm.com, maz@kernel.org, oliver.upton@linux.dev, ryabinin.a.a@gmail.com, suzuki.poulose@arm.com, vincenzo.frascino@arm.com, wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org, yuzenghui@huawei.com, yuzhao@google.com, ziy@nvidia.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4B161C0009 X-Stat-Signature: o5ykhxkqp47kp5tnkbgmzw5q4r81zpkz X-Rspam-User: X-HE-Tag: 1701639703-212309 X-HE-Meta: U2FsdGVkX18Reraf0fQRZkDaWeLm+evXCFzuUCB98Zl63zqTSjq9raYnz9pS1uZ9AtACGPie30nj6yTW9jp3sztXgYqYX1uhbNC7HJf//73F0r3muenNztxXTwSzSzFiWNL9dWWBBhfYmkWQPnmoXCkcttnAxIMCQ47cdyO5Qx7AdgtFWXn1FknLqDtrHWEqEiIuX7BUc64sk+7hXh2H7FTVB5I6bmmx9xvGK+nXdhPWPWMA6KWRrCu4TzgMAsgx5Zhdhs4dYn7EUDJOg5/QFB/v9etNtlQwFxrTx5jxfeFc5224+Widi3RDOEpCvurA2k9nnI5XkrF7h04ljx5vrlwoaYNbbUGFXFHubzFW3N1j6CQKGnWRSuXc5T29L8U+Y3fUZeuLkntdzeqcyB5ko5qN65luIqLSQiUYbnpJvfvtLVI7MQp62g9YRIyXFnjjDrdel5ssb0T/cG2e96MOfY3KeihhCGPhTJk6gV9BxuA6ofLjC355HTpNBMex3Kik9kSVGhb6XaBDsXeg/I9iH9/nnt4L0i9H0xO5wAPuJ3GLTrjS61PmjphSyjTT2vqhzMG9GWg44oVDky59hm+ky9WwWXxvWTupeY9V+Vo98XQm0lyHX7oGhVOW/qXDAA7W7KLCEhRYcAjl9GjTIPt6h16pPcwlHokuaYeLLFZgfPqWEca2BdhEHNJTFnuzs3O72NQT8gWcx3EQtC6B4iserFKL7rjQV3OSjEoNf1ng5YEnYTL2R45EWnApbphGILIFf0NOItdBNP70V0IsC/R679c/yaReJnZslBGSpnO/LNMAZpnpg4PuybdQSTkNxE6IN0vy7MTm4v7IgzHwXheyAe8b9KuQDShLelavVju0pxvYfRJ7vF6gf5GSeD0gmmgmZXeLggP6NcQD47tqiYdcevP2AA5jdVAtqOIrPK/htZknKb+BXtDHRnMdeR0wXDQ7H+xSYjMltYGYmKK35jD PwGL2gyf 59BY3iukkKj+grRcfCpnnvVGxpOfWFtmJj6L1RfSKEM3skP8raFZ5SwbcSR2iPo5VN0cW+QdIK++r87jRPs0T/cLDYO/ERqmKmDFfzRdk/2dbZ/AM7+tghnthPhfeeqTjLdTnjpv6sHCz8BnRl54zUxgJbQ62ZVoVIVYYtvHyx9opXgvB0+vzsqRQL3Z2sVLlDQXkzT4CQmARd2rDs3+WFlJ6iu02dDvzYDdPt7cBiYhiSMr6gqeTKM67NEOhHjjXqlJz8EBg5H6t5Jx3EmMvaO3wcDsC4thudzZKiPMicbyI0h9oNaxnI1JuxUgDdiErLusZrtCZ5s0yyWoQV/1kxlkXUn79tZNusR5IrR5m1FyyRXTwkBKvF8Vt33LGI4HvfhfDI0Ct3JO+nXKRtERb1nrlf7duFl83q9j1IkknAnxMnmnRobsIpoXL9sLoF8/btbDIkG4rg4gmhOGUNVRDIgG5WpEz7qc47U7GF2qJkXs687fjAY5svn8xEU0nq0Z6nJLLdG5ZdfZPRqzVJF96wjVJ1mc2h+ScX4TJM2t1z1qytrk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 30, 2023 at 8:00=E2=80=AFPM Ryan Roberts = wrote: > > >> Just because you found a pte that maps a page from a large folio, that= doesn't > >> mean that all pages from the folio are mapped, and it doesn't mean the= y are > >> mapped contiguously. We have to deal with partial munmap(), partial mr= emap() > >> etc. We could split in these cases (and in future it might be sensible= to try), > >> but that can fail (due to GUP). So we still have to handle the corner = case. > >> > >> But I can imagine doing a batched version of ptep_get_and_clear(), lik= e I did > >> for ptep_set_wrprotects(). And I think this would be an improvement. > >> > >> The reason I haven't done that so far, is because ptep_get_and_clear()= returns > >> the pte value when it was cleared and that's hard to do if batching du= e to the > >> storage requirement. But perhaps you could just return the logical OR = of the > >> dirty and young bits across all ptes in the batch. The caller should b= e able to > >> reconstitute the rest if it needs it? > >> > >> What do you think? > > > > I really don't know why we care about the return value of ptep_get_and_= clear() > > as zap_pte_range() doesn't ask for any ret value at all. so why not tot= ally give > > up this kind of complex logical OR of dirty and young as they are usele= ss in > > this case? > > That's not the case in v6.7-rc1: > > > static unsigned long zap_pte_range(struct mmu_gather *tlb, > struct vm_area_struct *vma, pmd_t *pmd, > unsigned long addr, unsigned long end, > struct zap_details *details) > { > ... > > do { > pte_t ptent =3D ptep_get(pte); > > ... > > if (pte_present(ptent)) { > ... > > ptent =3D ptep_get_and_clear_full(mm, addr, pte, > tlb->fullmm); > arch_check_zapped_pte(vma, ptent); > tlb_remove_tlb_entry(tlb, pte, addr); > zap_install_uffd_wp_if_needed(vma, addr, pte, det= ails, > ptent); > if (unlikely(!page)) { > ksm_might_unmap_zero_page(mm, ptent); > continue; > } > > delay_rmap =3D 0; > if (!PageAnon(page)) { > if (pte_dirty(ptent)) { > set_page_dirty(page); > if (tlb_delay_rmap(tlb)) { > delay_rmap =3D 1; > force_flush =3D 1; > } > } > if (pte_young(ptent) && likely(vma_has_re= cency(vma))) > mark_page_accessed(page); > } > > ... > } > > ... > } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); > > ... > } > > Most importantly, file-backed mappings need the access/dirty bits to prop= agate that information back to the folio, so it will be written back to dis= k. x86 is also looking at the dirty bit in arch_check_zapped_pte(), and ksm= is using it in ksm_might_unmap_zero_page(). > > Probably for your use case of anon memory on arm64 on a phone, you don't = need the return value. But my solution is also setting cotnpte for file-bac= ked memory, and there are performance wins to be had there, especially for = executable mappings where contpte reduces iTLB pressure. (I have other work= which ensures these file-backed mappings are in correctly-sized large foli= os). > > So I don't think we can just do a clear without the get part. But I think= I have a solution in the shape of clear_ptes(), as described in the other = thread, which gives the characteristics you suggest. understood. i realized OPPO's code actually also returned logic OR of dirty and access bits while it exposes CONTPTE to mm-core [1]: static pte_t get_clear_flush(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long pgsize, unsigned long ncontig, bool flush) { pte_t orig_pte =3D ptep_get(ptep); bool valid =3D pte_valid(orig_pte); unsigned long i, saddr =3D addr; for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) { pte_t pte =3D ptep_get_and_clear(mm, addr, ptep); if (pte_dirty(pte)) orig_pte =3D pte_mkdirty(orig_pte); if (pte_young(pte)) orig_pte =3D pte_mkyoung(orig_pte); } if (valid && flush) { struct vm_area_struct vma =3D TLB_FLUSH_VMA(mm, 0); flush_tlb_range(&vma, saddr, addr); } return orig_pte; } static inline pte_t __cont_pte_huge_ptep_get_and_clear_flush(struct mm_struct *mm, unsigned long addr, pte_t *ptep, bool flush) { pte_t orig_pte =3D ptep_get(ptep); CHP_BUG_ON(!pte_cont(orig_pte)); CHP_BUG_ON(!IS_ALIGNED(addr, HPAGE_CONT_PTE_SIZE)); CHP_BUG_ON(!IS_ALIGNED(pte_pfn(orig_pte), HPAGE_CONT_PTE_NR)); return get_clear_flush(mm, addr, ptep, PAGE_SIZE, CONT_PTES, flush)= ; } [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/blob/oneplu= s/sm8550_u_14.0.0_oneplus11/mm/cont_pte_hugepage.c#L1421 > > > > > > Is it possible for us to introduce a new api like? > > > > bool clear_folio_ptes(folio, ptep) > > { > > if(ptes are contiguous mapped) { > > clear all ptes all together // this also clears all CONTP= TE > > return true; > > } > > return false; > > } > > > > in zap_pte_range(): > > > > if (large_folio(folio) && clear_folio_ptes(folio, ptep)) { > > addr +=3D nr - 1 > > pte +=3D nr -1 > > } else > > old path. > > > > > >> > >>> > >>> zap_pte_range is the most frequent behaviour from userspace libc heap > >>> as i explained > >>> before. libc can call madvise(DONTNEED) the most often. It is crucial > >>> to performance. > >>> > >>> and this way can also help drop your full version by moving to full > >>> flushing the whole > >>> large folios? and we don't need to depend on fullmm any more? > >>> > >>>> > >>>> I don't think there is any correctness issue here. But there is a pr= oblem with > >>>> fragility, as raised by Alistair. I have some ideas on potentially h= ow to solve > >>>> that. I'm going to try to work on it this afternoon and will post if= I get some > >>>> confidence that it is a real solution. > >>>> > >>>> Thanks, > >>>> Ryan > >>>> > >>>>> > >>>>> static inline pte_t __cont_pte_huge_ptep_get_and_clear_flush(struct= mm_struct *mm, > >>>>> unsigned long addr, > >>>>> pte_t *ptep, > >>>>> bool flush) > >>>>> { > >>>>> pte_t orig_pte =3D ptep_get(ptep); > >>>>> > >>>>> CHP_BUG_ON(!pte_cont(orig_pte)); > >>>>> CHP_BUG_ON(!IS_ALIGNED(addr, HPAGE_CONT_PTE_SIZE)); > >>>>> CHP_BUG_ON(!IS_ALIGNED(pte_pfn(orig_pte), HPAGE_CONT_PTE_NR))= ; > >>>>> > >>>>> return get_clear_flush(mm, addr, ptep, PAGE_SIZE, CONT_PTES, = flush); > >>>>> } > >>>>> > >>>>> [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8550/blo= b/oneplus/sm8550_u_14.0.0_oneplus11/mm/memory.c#L1539 > >>>>> > >>>>>> + */ > >>>>>> + > >>>>>> + return __ptep_get_and_clear(mm, addr, ptep); > >>>>>> +} > >>>>>> +EXPORT_SYMBOL(contpte_ptep_get_and_clear_full); > >>>>>> + > >>>>> > >>> > > Thanks > > Barry >