Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Michal Hocko <mhocko@suse.com>,
	Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	kirill@shutemov.name, kirill.shutemov@linux.intel.com,
	vbabka@suse.cz, will.deacon@arm.com, dave.hansen@intel.com
Subject: Re: [RFC 0/4] mm: Introduce lazy exec permission setting on a page
Date: Fri, 15 Feb 2019 14:15:58 +0530	[thread overview]
Message-ID: <d2646840-f2f0-3618-889a-54cfef6cb455@arm.com> (raw)
In-Reply-To: <20190214122816.GD4525@dhcp22.suse.cz>

a

On 02/14/2019 05:58 PM, Michal Hocko wrote:
> On Thu 14-02-19 10:19:37, Catalin Marinas wrote:
>> On Thu, Feb 14, 2019 at 09:38:44AM +0100, Michal Hocko wrote:
>>> On Thu 14-02-19 11:34:09, Anshuman Khandual wrote:
>>>> On 02/13/2019 09:08 PM, Michal Hocko wrote:
>>>>> Are there any numbers to show the optimization impact?
>>>>
>>>> This series transfers execution cost linearly with nr_pages from migration path
>>>> to subsequent exec access path for normal, THP and HugeTLB pages. The experiment
>>>> is on mainline kernel (1f947a7a011fcceb14cb912f548) along with some patches for
>>>> HugeTLB and THP migration enablement on arm64 platform.
>>>
>>> Please make sure that these numbers are in the changelog. I am also
>>> missing an explanation why this is an overal win. Why should we pay
>>> on the later access rather than the migration which is arguably a slower
>>> path. What is the usecase that benefits from the cost shift?
>>
>> Originally the investigation started because of a regression we had
>> sending IPIs on each set_pte_at(PROT_EXEC). This has been fixed
>> separately, so the original value of this patchset has been diminished.
>>
>> Trying to frame the problem, let's analyse the overall cost of migration
>> + execute. Removing other invariants like cost of the initial mapping of
>> the pages or the mapping of new pages after migration, we have:
>>
>> M - number of mapped executable pages just before migration
>> N - number of previously mapped pages that will be executed after
>>     migration (N <= M)
>> D - cost of migrating page data
>> I - cost of I-cache maintenance for a page
>> F - cost of an instruction fault (handle_mm_fault() + set_pte_at()
>>     without the actual I-cache maintenance)
>>
>> Tc - total migration cost current kernel (including executing)
>> Tp - total migration cost patched kernel (including executing)
>>
>>   Tc = M * (D + I)
>>   Tp = M * D + N * (F + I)
>>
>> To be useful, we want this patchset to lead to:
>>
>>   Tp < Tc
>>
>> Simplifying:
>>
>>   M * D + N * (F + I) < M * (D + I)
>>   ...
>>   F < I * (M - N) / N
>>
>> So the question is, in a *real-world* scenario, what proportion of the
>> mapped executable pages would still be executed from after migration.
>> I'd leave this as a task for Anshuman to investigate and come up with
>> some numbers (and it's fine if it's just in the noise, we won't need
>> this patchset).
> 
> Yeah, betting on accessing only a smaller subset of the migrated memory
> is something I figured out. But I am really missing a usecase or a
> larger set of them to actually benefit from it. We have different
> triggers for a migration. E.g. numa balancing. I would expect that
> migrated pages are likely to be accessed after migration because
> the primary reason to migrate them is that they are accessed from a
> remote node. Then we a compaction which is a completely different story.

That access might not have been an exec fault it could have been bunch of
write faults which triggered NUMA migration. So NUMA triggered migration
does not necessarily mean continuing exec faults before and after migration.

Compaction might move around mapped pages with exec permission which might
not have any recent history of exec accesses before compaction or might not
even see any future exec access as well.

> It is hard to assume any further access for migrated pages here. Then we
> have an explicit move_pages syscall and I would expect this to be
> somewhere in the middle. One would expect that the caller knows why the
> memory is migrated and it will be used but again, we cannot really
> assume anything.

What if the caller knows that it wont be used ever again or in near future
and hence trying to migrate to a different node which has less expensive and
slower memory. Kernel should not assume either way on it but can decide to
be conservative in spending time in preparing for future exec faults.

But being conservative during migration risks additional exec faults which
would have been avoided if exec permission should have stayed on followed
by an I-cache invalidation. Deferral of the I-cache invalidation requires
removing the exec permission completely (unless there is some magic which
I am not aware about) i.e unmapping page for exec permission and risking
an exec fault next time around.

This problem gets particularly amplified for mixed permission (WRITE | EXEC)
user space mappings where things like NUMA migration, compaction etc probably
gets triggered by write faults and additional exec permission there never
really gets used.

> 
> This would suggest that this depends on the migration reason quite a
> lot. So I would really like to see a more comprehensive analysis of
> different workloads to see whether this is really worth it.

Sure. Could you please give some more details on how to go about this and
what specifically you are looking for ? User initiated migration through
systems calls seems bit tricky as an application can be written primarily
to benefit from this series. If real world applications can help give
some better insights then which ones I wonder. Or do we need to understand
more about compaction and NUMA triggered migration which are kernel
driven. Statistics from compaction/NUMA migration can reveal what ratio
of the exec enabled mapping gets exec faulted again later on after kernel
driven migrations (compaction/NUMA) which are more or less random without
depending too much on application behavior.

- Anshuman

next prev parent reply	other threads:[~2019-02-15  8:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13  8:06 Anshuman Khandual
2019-02-13  8:06 ` [RFC 1/4] " Anshuman Khandual
2019-02-13 13:17   ` Matthew Wilcox
2019-02-13 13:53     ` Anshuman Khandual
2019-02-14  9:06       ` Mike Rapoport
2019-02-15  8:11         ` Anshuman Khandual
2019-02-15  9:49           ` Catalin Marinas
2019-02-13  8:06 ` [RFC 2/4] arm64/mm: Identify user level instruction faults Anshuman Khandual
2019-02-13  8:06 ` [RFC 3/4] arm64/mm: Allow non-exec to exec transition in ptep_set_access_flags() Anshuman Khandual
2019-02-13  8:06 ` [RFC 4/4] arm64/mm: Enable ARCH_SUPPORTS_LAZY_EXEC Anshuman Khandual
2019-02-13 11:21 ` [RFC 0/4] mm: Introduce lazy exec permission setting on a page Catalin Marinas
2019-02-13 15:38   ` Michal Hocko
2019-02-14  6:04     ` Anshuman Khandual
2019-02-14  8:38       ` Michal Hocko
2019-02-14 10:19         ` Catalin Marinas
2019-02-14 12:28           ` Michal Hocko
2019-02-15  8:45             ` Anshuman Khandual [this message]
2019-02-15  9:27               ` Michal Hocko
2019-02-18  3:07                 ` Anshuman Khandual
2019-02-14 15:38       ` Dave Hansen
2019-02-18  3:19         ` Anshuman Khandual
2019-02-13 15:44 ` Dave Hansen
2019-02-14  4:12   ` Anshuman Khandual
2019-02-14 16:55     ` Dave Hansen
2019-02-18  8:31       ` Anshuman Khandual
2019-02-18  9:04         ` Catalin Marinas
2019-02-18  9:16           ` Anshuman Khandual
2019-02-18 18:20         ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2646840-f2f0-3618-889a-54cfef6cb455@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox