From: David Hildenbrand <david@redhat.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
Mark Rutland <mark.rutland@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Muchun Song <muchun.song@linux.dev>
Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64
Date: Wed, 27 Mar 2024 10:34:30 +0100 [thread overview]
Message-ID: <35236bbf-3d9a-40e9-84b5-e10e10295c0c@redhat.com> (raw)
In-Reply-To: <86680856-2532-495b-951a-ea7b2b93872f@arm.com>
On 26.03.24 18:51, Ryan Roberts wrote:
> On 26/03/2024 17:39, David Hildenbrand wrote:
>> On 26.03.24 18:32, Ryan Roberts wrote:
>>> On 26/03/2024 17:04, David Hildenbrand wrote:
>>>>>>>>
>>>>>>>> Likely, we just want to read "the real deal" on both sides of the pte_same()
>>>>>>>> handling.
>>>>>>>
>>>>>>> Sorry I'm not sure I understand? You mean read the full pte including
>>>>>>> access/dirty? That's the same as dropping the patch, right? Of course if
>>>>>>> we do
>>>>>>> that, we still have to keep pte_get_lockless() around for this case. In an
>>>>>>> ideal
>>>>>>> world we would convert everything over to ptep_get_lockless_norecency() and
>>>>>>> delete ptep_get_lockless() to remove the ugliness from arm64.
>>>>>>
>>>>>> Yes, agreed. Patch #3 does not look too crazy and it wouldn't really affect
>>>>>> any
>>>>>> architecture.
>>>>>>
>>>>>> I do wonder if pte_same_norecency() should be defined per architecture and the
>>>>>> default would be pte_same(). So we could avoid the mkold etc on all other
>>>>>> architectures.
>>>>>
>>>>> Wouldn't that break it's semantics? The "norecency" of
>>>>> ptep_get_lockless_norecency() means "recency information in the returned pte
>>>>> may
>>>>> be incorrect". But the "norecency" of pte_same_norecency() means "ignore the
>>>>> access and dirty bits when you do the comparison".
>>>>
>>>> My idea was that ptep_get_lockless_norecency() would return the actual result on
>>>> these architectures. So e.g., on x86, there would be no actual change in
>>>> generated code.
>>>
>>> I think this is a bad plan... You'll end up with subtle differences between
>>> architectures.
>>>
>>>>
>>>> But yes, the documentation of these functions would have to be improved.
>>>>
>>>> Now I wonder if ptep_get_lockless_norecency() should actively clear
>>>> dirty/accessed bits to more easily find any actual issues where the bits still
>>>> matter ...
>>>
>>> I did a version that took that approach. Decided it was not as good as this way
>>> though. Now for the life of me, I can't remember my reasoning.
>>
>> Maybe because there are some code paths that check accessed/dirty without
>> "correctness" implications? For example, if the PTE is already dirty, no need to
>> set it dirty etc?
>
> I think I decided I was penalizing the architectures that don't care because all
> their ptep_get_norecency() and ptep_get_lockless_norecency() need to explicitly
> clear access/dirty. And I would have needed ptep_get_norecency() from day 1 so
> that I could feed its result into pte_same().
True. With ptep_get_norecency() you're also penalizing other
architectures. Therefore my original thought about making the behavior
arch-specific, but the arch has to make sure to get the combination of
ptep_get_lockless_norecency()+ptep_same_norecency() is right.
So if an arch decide to ignore bits in ptep_get_lockless_norecency(), it
must make sure to also ignore them in ptep_same_norecency(), and must be
able to handle access/dirty bit changes differently.
Maybe one could have one variant for "hw-managed access/dirty" vs. "sw
managed accessed or dirty". Only the former would end up ignoring stuff
here, the latter would not.
But again, just some random thoughts how this affects other
architectures and how we could avoid it. The issue I describe in patch
#3 would be gone if ptep_same_norecency() would just do a ptep_same()
check on other architectures -- and would make it easier to sell :)
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-03-27 9:34 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-15 12:17 Ryan Roberts
2024-02-15 12:17 ` [RFC PATCH v1 1/4] mm: Introduce ptep_get_lockless_norecency() Ryan Roberts
[not found] ` <7aefa967-43aa-490b-ae0d-7d1455402e89@redhat.com>
2024-03-26 16:39 ` Ryan Roberts
2024-03-27 9:28 ` David Hildenbrand
2024-03-27 9:57 ` Ryan Roberts
2024-03-27 17:02 ` David Hildenbrand
2024-02-15 12:17 ` [RFC PATCH v1 2/4] mm/gup: Use ptep_get_lockless_norecency() Ryan Roberts
2024-03-26 16:30 ` David Hildenbrand
2024-03-26 16:48 ` Ryan Roberts
2024-02-15 12:17 ` [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte Ryan Roberts
2024-03-26 17:02 ` David Hildenbrand
2024-03-26 17:27 ` Ryan Roberts
2024-03-26 17:38 ` David Hildenbrand
2024-03-26 17:48 ` Ryan Roberts
2024-03-26 17:58 ` David Hildenbrand
2024-03-27 9:51 ` Ryan Roberts
2024-03-27 17:05 ` David Hildenbrand
2024-02-15 12:17 ` [RFC PATCH v1 4/4] arm64/mm: Override ptep_get_lockless_norecency() Ryan Roberts
2024-03-26 16:35 ` David Hildenbrand
2024-03-26 16:17 ` [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 David Hildenbrand
2024-03-26 16:31 ` Ryan Roberts
[not found] ` <de143212-49ce-4c30-8bfa-4c0ff613f107@redhat.com>
2024-03-26 16:53 ` Ryan Roberts
2024-03-26 17:04 ` David Hildenbrand
2024-03-26 17:32 ` Ryan Roberts
2024-03-26 17:39 ` David Hildenbrand
2024-03-26 17:51 ` Ryan Roberts
2024-03-27 9:34 ` David Hildenbrand [this message]
2024-03-27 10:01 ` Ryan Roberts
2024-04-03 12:59 ` Ryan Roberts
2024-04-08 8:36 ` David Hildenbrand
2024-04-09 16:35 ` Ryan Roberts
2024-04-10 20:09 ` David Hildenbrand
2024-04-11 9:45 ` Ryan Roberts
[not found] ` <70a36403-aefd-4311-b612-84e602465689@redhat.com>
2024-04-15 9:28 ` Ryan Roberts
[not found] ` <3e50030d-2289-4470-a727-a293baa21618@redhat.com>
2024-04-15 13:30 ` Ryan Roberts
[not found] ` <969dc6c3-2764-4a35-9fa6-7596832fb2a3@redhat.com>
2024-04-15 14:34 ` Ryan Roberts
[not found] ` <11b1c25b-3e20-4acf-9be5-57b508266c5b@redhat.com>
2024-04-15 15:17 ` Ryan Roberts
2024-04-15 15:22 ` David Hildenbrand
2024-04-15 15:53 ` Ryan Roberts
2024-04-15 16:02 ` David Hildenbrand
2024-04-23 10:15 ` Ryan Roberts
2024-04-23 10:18 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=35236bbf-3d9a-40e9-84b5-e10e10295c0c@redhat.com \
--to=david@redhat.com \
--cc=adrian.hunter@intel.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=catalin.marinas@arm.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=muchun.song@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox