From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
To: Dave Hansen <dave.hansen@intel.com>,
Anshuman Khandual <khandual@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: mhocko@suse.com, vbabka@suse.cz, mgorman@suse.de,
minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com,
bsingharora@gmail.com, srikar@linux.vnet.ibm.com,
haren@linux.vnet.ibm.com, jglisse@redhat.com,
dan.j.williams@intel.com
Subject: Re: [RFC V2 11/12] mm: Tag VMA with VM_CDM flag during page fault
Date: Tue, 31 Jan 2017 10:40:07 +0530 [thread overview]
Message-ID: <01ed36eb-bb1d-bb75-57f9-90159985e75e@linux.vnet.ibm.com> (raw)
In-Reply-To: <5f1ec7f6-16d3-8653-4494-50e124916a9e@intel.com>
On 01/30/2017 11:21 PM, Dave Hansen wrote:
> Here's the flag definition:
>
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +#define VM_CDM 0x00800000 /* Contains coherent device memory */
>> +#endif
>
> But it doesn't match the implementation:
>
>> +#ifdef CONFIG_COHERENT_DEVICE
>> +static void mark_vma_cdm(nodemask_t *nmask,
>> + struct page *page, struct vm_area_struct *vma)
>> +{
>> + if (!page)
>> + return;
>> +
>> + if (vma->vm_flags & VM_CDM)
>> + return;
>> +
>> + if (nmask && !nodemask_has_cdm(*nmask))
>> + return;
>> +
>> + if (is_cdm_node(page_to_nid(page)))
>> + vma->vm_flags |= VM_CDM;
>> +}
>
> That flag is a one-way trip. Any VMA with that flag set on it will keep
> it for the life of the VMA, despite whether it has CDM pages in it now
> or not. Even if you changed the policy back to one that doesn't allow
> CDM and forced all the pages to be migrated out.
Right, we have this limitation right now. But as I have mentioned in the
reply on the other thread, will work towards both static and runtime
re-evaluation of the VMA flag next time around.
>
> This also assumes that the only way to get a page mapped into a VMA is
> via alloc_pages_vma(). Do the NUMA migration APIs use this path?
Right now I have just taken care of these two paths.
* Page fault path
* mbind() path
agreed, will work on the NUMA migration APIs paths next. Wondering if
I need to update for migrate_pages() kernel API also as it will be
used by the driver or should the driver tag the VMA explicitly knowing
what has just happened ? I had also mentioned about this in the cover
letter :) But as you have pointed out will move the documentation
to the patches.
"
VM_CDM tagged VMA:
There are two parts to this problem.
* How to mark a VMA with VM_CDM ?
- During page fault path
- During mbind(MPOL_BIND) call
- Any other paths ?
- Should a driver mark a VMA with VM_CDM explicitly ?
* How VM_CDM marked VMA gets treated ?
- Disabled from auto NUMA migrations
- Disabled from KSM merging
- Anything else ?
"
>
> When you *set* this flag, you don't go and turn off KSM merging, for
> instance. You keep it from being turned on from this point forward, but
> you don't turn it off.
I was in the impression that the KSM merging does not start unless we
do madvise(MADV_MERGEABLE) call on the VMA (where its blocked now). I
might be missing something here if it can start before hand.
>
> This is happening with mmap_sem held for read. Correct? Is it OK that
> you're modifying the VMA? That vm_flags manipulation is non-atomic, so
> how can that even be safe?
Hmm. should it be done with mmap_sem being held for write. Will look
into this further. But intercepting the page faults inside alloc_pages_vma()
for tagging the VMA is okay from over all design perspective ?. Or this
should be moved up or down the call chain in the page fault path ?
>
> If you're going to go down this route, I think you need to be very
> careful. We need to ensure that when this flag gets set, it's never set
> on VMAs that are "normal" and will only be set on VMAs that were
> *explicitly* set up for accessing CDM. That means that you'll need to
> make sure that there's no possible way to get a CDM page faulted into a
> VMA unless it's via an explicitly assigned policy that would have cause
> the VMA to be split from any "normal" one in the system.
>
> This all makes me really nervous.
Got it, will work towards this.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-01-31 5:11 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-30 3:35 [RFC V2 00/12] Define coherent device memory node Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 01/12] mm: Define coherent device memory (CDM) node Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 02/12] mm: Isolate HugeTLB allocations away from CDM nodes Anshuman Khandual
2017-01-30 17:19 ` Dave Hansen
2017-01-31 1:03 ` Anshuman Khandual
2017-01-31 1:37 ` Dave Hansen
2017-02-01 13:59 ` Anshuman Khandual
2017-02-01 19:01 ` Dave Hansen
2017-01-30 3:35 ` [RFC V2 03/12] mm: Change generic FALLBACK zonelist creation process Anshuman Khandual
2017-01-30 17:34 ` Dave Hansen
2017-01-31 1:36 ` Anshuman Khandual
2017-01-31 1:57 ` Dave Hansen
2017-01-31 7:25 ` John Hubbard
2017-01-31 18:04 ` Dave Hansen
2017-01-31 19:14 ` David Nellans
2017-02-01 6:56 ` Anshuman Khandual
2017-02-01 6:46 ` Anshuman Khandual
2017-02-01 6:40 ` Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 04/12] mm: Change mbind(MPOL_BIND) implementation for CDM nodes Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 05/12] cpuset: Add cpuset_inc() inside cpuset_init() Anshuman Khandual
2017-01-30 17:36 ` Dave Hansen
2017-01-30 20:30 ` Mel Gorman
2017-01-31 14:22 ` [RFC] cpuset: Enable changing of top_cpuset's mems_allowed nodemask Anshuman Khandual
2017-01-31 16:00 ` Mel Gorman
2017-02-01 7:31 ` Anshuman Khandual
2017-02-01 8:53 ` Michal Hocko
2017-02-01 9:18 ` Mel Gorman
2017-01-31 14:36 ` [RFC V2 05/12] cpuset: Add cpuset_inc() inside cpuset_init() Vlastimil Babka
2017-01-31 15:30 ` Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 06/12] mm: Exclude CDM nodes from task->mems_allowed and root cpuset Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 07/12] mm: Ignore cpuset enforcement when allocation flag has __GFP_THISNODE Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 08/12] mm: Add new VMA flag VM_CDM Anshuman Khandual
2017-01-30 18:52 ` Jerome Glisse
2017-01-31 4:22 ` Anshuman Khandual
2017-01-31 6:05 ` Jerome Glisse
2017-01-30 3:35 ` [RFC V2 09/12] mm: Exclude CDM marked VMAs from auto NUMA Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 10/12] mm: Ignore madvise(MADV_MERGEABLE) request for VM_CDM marked VMAs Anshuman Khandual
2017-01-30 3:35 ` [RFC V2 11/12] mm: Tag VMA with VM_CDM flag during page fault Anshuman Khandual
2017-01-30 17:51 ` Dave Hansen
2017-01-31 5:10 ` Anshuman Khandual [this message]
2017-01-31 17:54 ` Dave Hansen
2017-01-30 3:35 ` [RFC V2 12/12] mm: Tag VMA with VM_CDM flag explicitly during mbind(MPOL_BIND) Anshuman Khandual
2017-01-30 17:54 ` Dave Hansen
2017-01-31 4:36 ` Anshuman Khandual
2017-02-07 18:07 ` Dave Hansen
2017-02-08 14:13 ` Anshuman Khandual
2017-02-08 15:04 ` Jerome Glisse
2017-01-30 3:35 ` [DEBUG 13/21] powerpc/mm: Identify coherent device memory nodes during platform init Anshuman Khandual
2017-01-30 3:35 ` [DEBUG 14/21] powerpc/mm: Create numa nodes for hotplug memory Anshuman Khandual
2017-01-30 3:35 ` [DEBUG 15/21] powerpc/mm: Enable CONFIG_MOVABLE_NODE for PPC64 platform Anshuman Khandual
2017-01-30 3:35 ` [DEBUG 16/21] mm: Enable CONFIG_MOVABLE_NODE on powerpc Anshuman Khandual
2017-01-30 3:35 ` [DEBUG 17/21] mm: Export definition of 'zone_names' array through mmzone.h Anshuman Khandual
2017-01-30 3:35 ` [DEBUG 18/21] mm: Add debugfs interface to dump each node's zonelist information Anshuman Khandual
2017-01-30 3:36 ` [DEBUG 19/21] mm: Add migrate_virtual_range migration interface Anshuman Khandual
2017-01-30 3:36 ` [DEBUG 20/21] drivers: Add two drivers for coherent device memory tests Anshuman Khandual
2017-01-30 3:36 ` [DEBUG 21/21] selftests/powerpc: Add a script to perform random VMA migrations Anshuman Khandual
2017-01-31 5:48 ` [RFC V2 00/12] Define coherent device memory node Anshuman Khandual
2017-01-31 6:15 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01ed36eb-bb1d-bb75-57f9-90159985e75e@linux.vnet.ibm.com \
--to=khandual@linux.vnet.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=bsingharora@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=haren@linux.vnet.ibm.com \
--cc=jglisse@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox