From: Ryan Roberts <ryan.roberts@arm.com>
To: Barry Song <21cnbao@gmail.com>, david@redhat.com
Cc: akpm@linux-foundation.org, andreyknvl@gmail.com,
anshuman.khandual@arm.com, ardb@kernel.org,
catalin.marinas@arm.com, dvyukov@google.com, glider@google.com,
james.morse@arm.com, jhubbard@nvidia.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
mark.rutland@arm.com, maz@kernel.org, oliver.upton@linux.dev,
ryabinin.a.a@gmail.com, suzuki.poulose@arm.com,
vincenzo.frascino@arm.com, wangkefeng.wang@huawei.com,
will@kernel.org, willy@infradead.org, yuzenghui@huawei.com,
yuzhao@google.com, ziy@nvidia.com
Subject: Re: [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork()
Date: Mon, 27 Nov 2023 09:35:49 +0000 [thread overview]
Message-ID: <bfebd80b-b60d-48e2-b350-7c0ac0299cda@arm.com> (raw)
In-Reply-To: <20231127084217.13110-1-v-songbaohua@oppo.com>
On 27/11/2023 08:42, Barry Song wrote:
>>> + for (i = 0; i < nr; i++, page++) {
>>> + if (anon) {
>>> + /*
>>> + * If this page may have been pinned by the
>>> + * parent process, copy the page immediately for
>>> + * the child so that we'll always guarantee the
>>> + * pinned page won't be randomly replaced in the
>>> + * future.
>>> + */
>>> + if (unlikely(page_try_dup_anon_rmap(
>>> + page, false, src_vma))) {
>>> + if (i != 0)
>>> + break;
>>> + /* Page may be pinned, we have to copy. */
>>> + return copy_present_page(
>>> + dst_vma, src_vma, dst_pte,
>>> + src_pte, addr, rss, prealloc,
>>> + page);
>>> + }
>>> + rss[MM_ANONPAGES]++;
>>> + VM_BUG_ON(PageAnonExclusive(page));
>>> + } else {
>>> + page_dup_file_rmap(page, false);
>>> + rss[mm_counter_file(page)]++;
>>> + }
>>> }
>>> - rss[MM_ANONPAGES]++;
>>> - } else if (page) {
>>> - folio_get(folio);
>>> - page_dup_file_rmap(page, false);
>>> - rss[mm_counter_file(page)]++;
>>> +
>>> + nr = i;
>>> + folio_ref_add(folio, nr);
>>
>> You're changing the order of mapcount vs. refcount increment. Don't.
>> Make sure your refcount >= mapcount.
>>
>> You can do that easily by doing the folio_ref_add(folio, nr) first and
>> then decrementing in case of error accordingly. Errors due to pinned
>> pages are the corner case.
>>
>> I'll note that it will make a lot of sense to have batch variants of
>> page_try_dup_anon_rmap() and page_dup_file_rmap().
>>
>
> i still don't understand why it is not a entire map+1, but an increment
> in each basepage.
Because we are PTE-mapping the folio, we have to account each individual page.
If we accounted the entire folio, where would we unaccount it? Each page can be
unmapped individually (e.g. munmap() part of the folio) so need to account each
page. When PMD mapping, the whole thing is either mapped or unmapped, and its
atomic, so we can account the entire thing.
>
> as long as it is a CONTPTE large folio, there is no much difference with
> PMD-mapped large folio. it has all the chance to be DoubleMap and need
> split.
>
> When A and B share a CONTPTE large folio, we do madvise(DONTNEED) or any
> similar things on a part of the large folio in process A,
>
> this large folio will have partially mapped subpage in A (all CONTPE bits
> in all subpages need to be removed though we only unmap a part of the
> large folioas HW requires consistent CONTPTEs); and it has entire map in
> process B(all PTEs are still CONPTES in process B).
>
> isn't it more sensible for this large folios to have entire_map = 0(for
> process B), and subpages which are still mapped in process A has map_count
> =0? (start from -1).
>
>> Especially, the batch variant of page_try_dup_anon_rmap() would only
>> check once if the folio maybe pinned, and in that case, you can simply
>> drop all references again. So you either have all or no ptes to process,
>> which makes that code easier.
I'm afraid this doesn't make sense to me. Perhaps I've misunderstood. But
fundamentally you can only use entire_mapcount if its only possible to map and
unmap the whole folio atomically.
>>
>> But that can be added on top, and I'll happily do that.
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>
> Thanks
> Barry
>
next prev parent reply other threads:[~2023-11-27 9:35 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-15 16:30 [PATCH v2 00/14] Transparent Contiguous PTEs for User Mappings Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 01/14] mm: Batch-copy PTE ranges during fork() Ryan Roberts
2023-11-15 21:26 ` kernel test robot
2023-11-16 10:07 ` Ryan Roberts
2023-11-16 10:12 ` David Hildenbrand
2023-11-16 10:36 ` Ryan Roberts
2023-11-16 11:01 ` David Hildenbrand
2023-11-16 11:13 ` Ryan Roberts
2023-11-15 21:37 ` Andrew Morton
2023-11-16 9:34 ` Ryan Roberts
2023-12-04 11:01 ` Christophe Leroy
2023-11-15 22:40 ` kernel test robot
2023-11-16 10:03 ` David Hildenbrand
2023-11-16 10:26 ` Ryan Roberts
2023-11-27 8:42 ` Barry Song
2023-11-27 9:35 ` Ryan Roberts [this message]
2023-11-27 9:59 ` Barry Song
2023-11-27 10:10 ` Ryan Roberts
2023-11-27 10:28 ` Barry Song
2023-11-27 11:07 ` Ryan Roberts
2023-11-27 20:34 ` Barry Song
2023-11-28 9:14 ` Ryan Roberts
2023-11-28 9:49 ` Barry Song
2023-11-28 10:49 ` Ryan Roberts
2023-11-28 21:06 ` Barry Song
2023-11-29 12:21 ` Ryan Roberts
2023-11-30 0:51 ` Barry Song
2023-11-16 11:03 ` David Hildenbrand
2023-11-16 11:20 ` Ryan Roberts
2023-11-16 13:20 ` David Hildenbrand
2023-11-16 13:49 ` Ryan Roberts
2023-11-16 14:13 ` David Hildenbrand
2023-11-16 14:15 ` David Hildenbrand
2023-11-16 17:58 ` Ryan Roberts
2023-11-23 10:26 ` Ryan Roberts
2023-11-23 12:12 ` David Hildenbrand
2023-11-23 12:28 ` Ryan Roberts
2023-11-24 8:53 ` David Hildenbrand
2023-11-23 4:26 ` Alistair Popple
2023-11-23 14:43 ` Ryan Roberts
2023-11-23 23:50 ` Alistair Popple
2023-11-27 5:54 ` Barry Song
2023-11-27 9:24 ` Ryan Roberts
2023-11-28 0:11 ` Barry Song
2023-11-28 11:00 ` Ryan Roberts
2023-11-28 19:00 ` Barry Song
2023-11-29 12:29 ` Ryan Roberts
2023-11-29 13:09 ` Barry Song
2023-11-29 14:07 ` Ryan Roberts
2023-11-30 0:34 ` Barry Song
2023-11-15 16:30 ` [PATCH v2 02/14] arm64/mm: set_pte(): New layer to manage contig bit Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 03/14] arm64/mm: set_ptes()/set_pte_at(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 04/14] arm64/mm: pte_clear(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 05/14] arm64/mm: ptep_get_and_clear(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 06/14] arm64/mm: ptep_test_and_clear_young(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 07/14] arm64/mm: ptep_clear_flush_young(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 08/14] arm64/mm: ptep_set_wrprotect(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 09/14] arm64/mm: ptep_set_access_flags(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 10/14] arm64/mm: ptep_get(): " Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 11/14] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 12/14] arm64/mm: Wire up PTE_CONT for user mappings Ryan Roberts
2023-11-21 11:22 ` Alistair Popple
2023-11-21 15:14 ` Ryan Roberts
2023-11-22 6:01 ` Alistair Popple
2023-11-22 8:35 ` Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 13/14] arm64/mm: Implement ptep_set_wrprotects() to optimize fork() Ryan Roberts
2023-11-15 16:30 ` [PATCH v2 14/14] arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown Ryan Roberts
2023-11-23 5:13 ` Alistair Popple
2023-11-23 16:01 ` Ryan Roberts
2023-11-24 1:35 ` Alistair Popple
2023-11-24 8:54 ` Ryan Roberts
2023-11-27 7:34 ` Alistair Popple
2023-11-27 8:53 ` Ryan Roberts
2023-11-28 6:54 ` Alistair Popple
2023-11-28 12:45 ` Ryan Roberts
2023-11-28 16:55 ` Ryan Roberts
2023-11-30 5:07 ` Alistair Popple
2023-11-30 5:57 ` Barry Song
2023-11-30 11:47 ` Ryan Roberts
2023-12-03 23:20 ` Alistair Popple
2023-12-04 9:39 ` Ryan Roberts
2023-11-28 7:32 ` Barry Song
2023-11-28 11:15 ` Ryan Roberts
2023-11-28 8:17 ` Barry Song
2023-11-28 11:49 ` Ryan Roberts
2023-11-28 20:23 ` Barry Song
2023-11-29 12:43 ` Ryan Roberts
2023-11-29 13:00 ` Barry Song
2023-11-30 5:35 ` Barry Song
2023-11-30 12:00 ` Ryan Roberts
2023-12-03 21:41 ` Barry Song
2023-11-27 3:18 ` [PATCH v2 00/14] Transparent Contiguous PTEs for User Mappings Barry Song
2023-11-27 9:15 ` Ryan Roberts
2023-11-27 10:35 ` Barry Song
2023-11-27 11:11 ` Ryan Roberts
2023-11-27 22:53 ` Barry Song
2023-11-28 11:52 ` Ryan Roberts
2023-11-28 3:13 ` Yang Shi
2023-11-28 11:58 ` Ryan Roberts
2023-11-28 5:49 ` Barry Song
2023-11-28 12:08 ` Ryan Roberts
2023-11-28 19:37 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bfebd80b-b60d-48e2-b350-7c0ac0299cda@arm.com \
--to=ryan.roberts@arm.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=james.morse@arm.com \
--cc=jhubbard@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=ryabinin.a.a@gmail.com \
--cc=suzuki.poulose@arm.com \
--cc=vincenzo.frascino@arm.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yuzenghui@huawei.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox