Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Matthew Wilcox <willy@infradead.org>,
	Hugh Dickins <hughd@google.com>,
	Yin Fengwei <fengwei.yin@intel.com>,
	Yang Shi <shy828301@gmail.com>, Ying Huang <ying.huang@intel.com>,
	Zi Yan <ziy@nvidia.com>, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio
Date: Mon, 18 Dec 2023 15:04:43 +0100	[thread overview]
Message-ID: <944d990f-3c98-4ade-8176-4e4b25eae0b8@redhat.com> (raw)
In-Reply-To: <F5EF2C8E-2902-447C-BD97-05AF6FF8832D@gmail.com>

On 17.12.23 20:13, Nadav Amit wrote:
> 
>> On Nov 24, 2023, at 3:26 PM, David Hildenbrand <david@redhat.com> wrote:
>>
>> 5. sub-IDs
>> ==========
>>
>> To achieve (2), we generate sub-IDs that have the following property,
>> assuming that our folio has P=folio_nr_pages() pages.
>>   "2 * sub-ID" cannot be represented by the sum of any other *2* sub-IDs
>>   "3 * sub-ID" cannot be represented by the sum of any other *3* sub-IDs
>>   "4 * sub-ID" cannot be represented by the sum of any other *4* sub-IDs
>>   ...
>>   "P * sub-ID" cannot be represented by the sum of any other *P* sub-IDs
>>
>> The sub-IDs are generated in generations, whereby
>> (1) Generation #0 is the number 0
>> (2) Generation #N takes all numbers from generations #0..#N-1 and adds
>>     (P + 1)^(N - 1), effectively doubling the number of sub-IDs
>>
>> Consequently, the smallest number S in gen #N is:
>>   S[#N] = (P + 1)^(N - 1)
>>
>> The largest number L in gen #N is:
>>   L[#N] = (P + 1)^(N - 1) + (P + 1)^(N - 2) + ... (P + 1)^0 + 0.
>>   -> [geometric sum with "P + 1 != 1"]
>>         = (1 - (P + 1)^N) / (1 - (P + 1))
>>         = (1 - (P + 1)^N) / (-P)
>>         = ((P + 1)^N - 1) / P
> 
> David, as you know it took me a while to understand your impressive work.
> 

Hi Nadav,

thanks a bunch for digging into the details of the solution and trying 
to verify the correctness!

And thanks a lot for highlighting the special case below that I 
intuitively used for the "intuition" :)


> I think that part of what made it hard for me is the presentation and the
> formulation of sub-IDs in arithmetic means instead of bit manipulations.

Yes, that needs more work to make it all easier to understand.

> 
> Basically, IIUC, you want for order-K pages to add K-1 “0” bits between
> each rmap-id bits.
> 
> In this case, in x86-64 (with BMI2) there is the PDEP instruction that can
> generate these values rather easily with little overhead.

Partially, yes.

What you describe here is a special case that is easier to "grasp", 
likely a bit faster to calculate, but ends up being less-optimal in 
space consumption for some cases (especially, order-2 and order-9 folios).

You are describing the special case where "P+1" is a power of two.

Let me explain using the example (P = 3) from the "intuition" further 
down in this patch description, highlighting the mapping:

RMAP-ID |       |Subid |
-----------------------------------
  0       | 0000 | 0    | 0000 0000
  1       | 0001 | 1    | 0000 0001
  2       | 0010 | 4    | 0000 0100
  3       | 0011 | 5    | 0000 0101
  4       | 0100 | 16   | 0001 0000
  5       | 0101 | 17   | 0001 0001
  6       | 0110 | 20   | 0001 0100
  7       | 0111 | 21   | 0001 0101
  8       | 1000 | 64   | 0100 0000
  9       | 1001 | 65   | 0100 0001
  10      | 1010 | 68   | 0100 0100
  11      | 1011 | 69   | 0100 0101
  12      | 1100 | 80   | 0101 0100
  13      | 1101 | 81   | 0101 0001
  14      | 1110 | 84   | 0101 0100
  15      | 1111 | 85   | 0101 0101

So going from e.g., 11 -> 69 means adding one zero for each bit, just 
like you said.

But adding 1 "0" bit is not sufficient for handling order-2 folios (P = 
4), only for handling order-1 folios. So what the current approach does 
is the following (P = 4):

RMAP-ID |       | Subid |
-----------------------------------
  0       | 0000 | 0     | 0000 0000
  1       | 0001 | 1     | 0000 0001
  2       | 0010 | 5     | 0000 0101
  3       | 0011 | 6     | 0000 0110
  4       | 0100 | 25    | 0001 1001
  5       | 0101 | 26    | 0001 1010
  6       | 0110 | 30    | 0001 1110
  7       | 0111 | 31    | 0001 1111
  8       | 1000 | 125   | 0111 1101
  9       | 1001 | 126   | 0111 1110
  10      | 1010 | 130   | 1000 0010
  11      | 1011 | 131   | 1000 0011
  12      | 1100 | 150   | 1001 0110
  13      | 1101 | 151   | 1001 0111
  14      | 1110 | 155   | 1001 1011
  15      | 1111 | 156   | 1001 1100

Which is not just adding "0"s.

To handle it using your simplification we'd have to add one more "0" bit 
to be able to use it for order-2 folios. So I'll refine your statement to:
	for order-K pages to add K “0” bits between each rmap-id bits.

Then, it's easy to see how going from 24 RMAP-ID bits for an order-2 
page would require with that simplification 3*24 = 72bit and would no 
longer fit into a single 64bit value.

To summarize, with the optimized (currently implemented) scheme, we achieve:
  * == order-2:  1x 64bit rmap values
  * <= order-5:  2x 64bit rmap values
  * <= order-9:  3x 64bit rmap values
  * <= order-10: 4x 64bit rmap values
  * <= order-12: 5x 64bit rmap values
  * <= order-15: 6x 64bit rmap values
  * ...

I think, going with the simplification above (to be double-checked), 
we'd achieve [essentially, subtracting 1 from all orders]:
  * == order-1:  1x 64bit rmap values
  * <= order-4:  2x 64bit rmap values
  * <= order-8:  3x 64bit rmap values
  * <= order-9:  4x 64bit rmap values
  * <= order-11: 5x 64bit rmap values
  * <= order-14: 6x 64bit rmap values
  * ...

So especially for order-9 folios we would require 4 instead of 3  64bit 
values.


Now, going with this simplification as first step makes absolute sense, 
because it's much more intuitive and eventually a bit easier to 
implement (although really not significantly IMHO).

One can then easily add the optimization on top later.

> 
> I think that besides the easier generation of sub-ids values in this
> manner, discussing the matter without the “generations" also makes it
> easier to understand the correctness (at least for me).

Yes, that presentation certainly needs improvement. Generating the magic 
numbers recursively makes it easier to proof the correctness using 
induction, that's how I started to use that whole "generation" terminology.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2023-12-18 14:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24 13:26 [PATCH WIP v1 00/20] mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 01/20] mm/rmap: factor out adding folio range into __folio_add_rmap_range() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 02/20] mm: add a total mapcount for large folios David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 03/20] mm: convert folio_estimated_sharers() to folio_mapped_shared() and improve it David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 04/20] mm/rmap: pass dst_vma to page_try_dup_anon_rmap() and page_dup_file_rmap() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 05/20] mm/rmap: abstract total mapcount operations for partially-mappable folios David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 06/20] atomic_seqcount: new (raw) seqcount variant to support concurrent writers David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio David Hildenbrand
2023-12-17 19:13   ` Nadav Amit
2023-12-18 14:04     ` David Hildenbrand [this message]
2023-12-18 14:34       ` Nadav Amit
2023-11-24 13:26 ` [PATCH WIP v1 08/20] mm: pass MM to folio_mapped_shared() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 09/20] mm: improve folio_mapped_shared() for partially-mappable folios using rmap IDs David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 10/20] mm/memory: COW reuse support for PTE-mapped THP with " David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 11/20] mm/rmap_id: support for 1, 2 and 3 values by manual calculation David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 12/20] mm/rmap: introduce folio_add_anon_rmap_range() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 13/20] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 14/20] mm/huge_memory: avoid folio_refcount() < folio_mapcount() " David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 15/20] mm/rmap_id: verify precalculated subids with CONFIG_DEBUG_VM David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 16/20] atomic_seqcount: support a single exclusive writer in the absence of other writers David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 17/20] mm/rmap_id: reduce atomic RMW operations when we are the exclusive writer David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 18/20] atomic_seqcount: use atomic add-return instead of atomic cmpxchg on 64bit David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 19/20] mm/rmap: factor out removing folio range into __folio_remove_rmap_range() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 20/20] mm/rmap: perform all mapcount operations of large folios under the rmap seqcount David Hildenbrand
2023-11-24 20:55 ` [PATCH WIP v1 00/20] mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios Linus Torvalds
2023-11-25 17:02   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=944d990f-3c98-4ade-8176-4e4b25eae0b8@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengwei.yin@intel.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox