Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Thomas Hellström (VMware)" <thomas_os@shipmail.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Will Deacon" <will.deacon@arm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present
Date: Wed, 9 Oct 2019 18:20:21 +0200	[thread overview]
Message-ID: <467a4a34-27be-8f46-2c9a-c5b335d11438@shipmail.org> (raw)
In-Reply-To: <20191009152737.p42w7w456zklxz72@box>

Hi, Kirill.

Thanks for reviewing.

On 10/9/19 5:27 PM, Kirill A. Shutemov wrote:
> On Tue, Oct 08, 2019 at 11:15:02AM +0200, Thomas Hellström (VMware) wrote:
>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>
>> The pagewalk code was unconditionally splitting transhuge pmds when a
>> pte_entry was present. However ideally we'd want to handle transhuge pmds
>> in the pmd_entry function and ptes in pte_entry function. So don't split
>> huge pmds when there is a pmd_entry function present, but let the callback
>> take care of it if necessary.
> Do we have any current user that expect split_huge_pmd() in this scenario.

No. All current users either have pmd_entry (no splitting) or pte_entry 
(unconditional splitting)

>
>> In order to make sure a virtual address range is handled by one and only
>> one callback, and since pmd entries may be unstable, we introduce a
>> pmd_entry return code that tells the walk code to continue processing this
>> pmd entry rather than to move on. Since caller-defined positive return
>> codes (up to 2) are used by current callers, use a high value that allows a
>> large range of positive caller-defined return codes for future users.
>>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Rik van Riel <riel@surriel.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Huang Ying <ying.huang@intel.com>
>> Cc: Jérôme Glisse <jglisse@redhat.com>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>> ---
>>   include/linux/pagewalk.h |  8 ++++++++
>>   mm/pagewalk.c            | 28 +++++++++++++++++++++-------
>>   2 files changed, 29 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>> index bddd9759bab9..c4a013eb445d 100644
>> --- a/include/linux/pagewalk.h
>> +++ b/include/linux/pagewalk.h
>> @@ -4,6 +4,11 @@
>>   
>>   #include <linux/mm.h>
>>   
>> +/* Highest positive pmd_entry caller-specific return value */
>> +#define PAGE_WALK_CALLER_MAX     (INT_MAX / 2)
>> +/* The handler did not handle the entry. Fall back to the next level */
>> +#define PAGE_WALK_FALLBACK       (PAGE_WALK_CALLER_MAX + 1)
>> +
> That's hacky.
>
> Maybe just use an error code for this? -EAGAIN?

I agree this is hacky. But IMO it's a reasonably safe option. My 
thinking was that in the long run we'd move the positive return codes to 
the mm_walk private and introduce a PAGE_WALK_TERMINATE code as well.

Perhaps a completely clean and safe way would be to add an "int 
walk_control" in the struct mm_walk?

I'm pretty sure using an error code will come back and bite us at some 
point, if someone just blindly forwards error messages. But if you 
insist, I'll use -EAGAIN.

Please let me know what you think.

Thanks,

Thomas

WARNING: multiple messages have this Message-ID

From: "Thomas Hellström (VMware)        " <thomas_os@shipmail.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Will Deacon" <will.deacon@arm.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Rik van Riel" <riel@surriel.com>,
	"Minchan Kim" <minchan@kernel.org>,
	"Michal Hocko" <mhocko@suse.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>
Subject: Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present
Date: Wed, 9 Oct 2019 18:20:21 +0200	[thread overview]
Message-ID: <467a4a34-27be-8f46-2c9a-c5b335d11438@shipmail.org> (raw)
Message-ID: <20191009162021.wWrVgq4fQPT_f0apCC78hmx509rM_YymkaqwmPQD4VY@z> (raw)
In-Reply-To: <20191009152737.p42w7w456zklxz72@box>

Hi, Kirill.

Thanks for reviewing.

On 10/9/19 5:27 PM, Kirill A. Shutemov wrote:
> On Tue, Oct 08, 2019 at 11:15:02AM +0200, Thomas Hellström (VMware) wrote:
>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>
>> The pagewalk code was unconditionally splitting transhuge pmds when a
>> pte_entry was present. However ideally we'd want to handle transhuge pmds
>> in the pmd_entry function and ptes in pte_entry function. So don't split
>> huge pmds when there is a pmd_entry function present, but let the callback
>> take care of it if necessary.
> Do we have any current user that expect split_huge_pmd() in this scenario.

No. All current users either have pmd_entry (no splitting) or pte_entry
(unconditional splitting)

>
>> In order to make sure a virtual address range is handled by one and only
>> one callback, and since pmd entries may be unstable, we introduce a
>> pmd_entry return code that tells the walk code to continue processing this
>> pmd entry rather than to move on. Since caller-defined positive return
>> codes (up to 2) are used by current callers, use a high value that allows a
>> large range of positive caller-defined return codes for future users.
>>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Rik van Riel <riel@surriel.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Huang Ying <ying.huang@intel.com>
>> Cc: Jérôme Glisse <jglisse@redhat.com>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>> ---
>>   include/linux/pagewalk.h |  8 ++++++++
>>   mm/pagewalk.c            | 28 +++++++++++++++++++++-------
>>   2 files changed, 29 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>> index bddd9759bab9..c4a013eb445d 100644
>> --- a/include/linux/pagewalk.h
>> +++ b/include/linux/pagewalk.h
>> @@ -4,6 +4,11 @@
>>   
>>   #include <linux/mm.h>
>>   
>> +/* Highest positive pmd_entry caller-specific return value */
>> +#define PAGE_WALK_CALLER_MAX     (INT_MAX / 2)
>> +/* The handler did not handle the entry. Fall back to the next level */
>> +#define PAGE_WALK_FALLBACK       (PAGE_WALK_CALLER_MAX + 1)
>> +
> That's hacky.
>
> Maybe just use an error code for this? -EAGAIN?

I agree this is hacky. But IMO it's a reasonably safe option. My 
thinking was that in the long run we'd move the positive return codes to 
the mm_walk private and introduce a PAGE_WALK_TERMINATE code as well.

Perhaps a completely clean and safe way would be to add an "int 
walk_control" in the struct mm_walk?

I'm pretty sure using an error code will come back and bite us at some 
point, if someone just blindly forwards error messages. But if you 
insist, I'll use -EAGAIN.

Please let me know what you think.

Thanks,

Thomas

next prev parent reply	other threads:[~2019-10-09 16:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:14 [PATCH v4 0/9] Emulated coherent graphics memory take 2 Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 1/9] mm: Remove BUG_ON mmap_sem not held from xxx_trans_huge_lock() Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 2/9] mm: pagewalk: Take the pagetable lock in walk_pte_range() Thomas Hellström (VMware)
2019-10-09 15:14   ` Kirill A. Shutemov
2019-10-09 16:07     ` Linus Torvalds
2019-10-08  9:15 ` [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present Thomas Hellström (VMware)
2019-10-09 15:27   ` Kirill A. Shutemov
2019-10-09 15:27     ` Kirill A. Shutemov
2019-10-09 16:20     ` Thomas Hellström (VMware) [this message]
2019-10-09 16:20       ` Thomas Hellström (VMware)        
2019-10-09 16:21     ` Linus Torvalds
2019-10-09 17:03       ` Thomas Hellström (VMware)
2019-10-09 17:16         ` Linus Torvalds
2019-10-09 18:52           ` Thomas Hellstrom
2019-10-09 19:20             ` Linus Torvalds
2019-10-09 20:06               ` Thomas Hellström (VMware)
2019-10-09 20:20                 ` Linus Torvalds
2019-10-09 22:30                   ` Thomas Hellström (VMware)
2019-10-09 23:50                     ` Thomas Hellström (VMware)
2019-10-09 23:51                     ` Linus Torvalds
2019-10-10  0:18                       ` Linus Torvalds
2019-10-10  1:09                       ` Thomas Hellström (VMware)
2019-10-10  2:07                         ` Linus Torvalds
2019-10-10  6:15                           ` Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 4/9] mm: Add a walk_page_mapping() function to the pagewalk code Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 5/9] mm: Add write-protect and clean utilities for address space ranges Thomas Hellström (VMware)
2019-10-08 17:06   ` Linus Torvalds
2019-10-08 18:25     ` Thomas Hellstrom
2019-10-08  9:15 ` [PATCH v4 6/9] drm/vmwgfx: Implement an infrastructure for write-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 7/9] drm/vmwgfx: Use an RBtree instead of linked list for MOB resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 8/9] drm/vmwgfx: Implement an infrastructure for read-coherent resources Thomas Hellström (VMware)
2019-10-08  9:15 ` [PATCH v4 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks Thomas Hellström (VMware)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=467a4a34-27be-8f46-2c9a-c5b335d11438@shipmail.org \
    --to=thomas_os@shipmail.org \
    --cc=jglisse@redhat.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=thellstrom@vmware.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox