* [PATCH] - GRU virtual -> physical translation
@ 2008-07-09 19:14 Jack Steiner
2008-07-11 19:17 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Jack Steiner @ 2008-07-09 19:14 UTC (permalink / raw)
To: akpm, linux-kernel; +Cc: linux-mm
Open code the equivalent to follow_page(). This eliminates the
requirement for an EXPORT of follow_page(). In addition, the code
is optimized for the specific case that is needed by the GRU and only
supports architectures supported by the GRU (ia64 & x86_64).
Signed-off-by: Jack Steiner <steiner@sgi.com>
---
Andrew - do you want incremental patches or should I repost a V4 of
the original GRU driver.
After the GRU code is stabilized, I'll look into moving the open codes
PT lookup code into the kernel.
drivers/misc/sgi-gru/grufault.c | 54 +++++++++++++++++++++++++++++++++-------
1 file changed, 45 insertions(+), 9 deletions(-)
Index: linux/drivers/misc/sgi-gru/grufault.c
===================================================================
--- linux.orig/drivers/misc/sgi-gru/grufault.c 2008-07-09 13:54:16.786142769 -0500
+++ linux/drivers/misc/sgi-gru/grufault.c 2008-07-09 13:56:58.762219264 -0500
@@ -213,20 +213,56 @@ static int non_atomic_pte_lookup(struct
return 0;
}
+/*
+ *
+ * atomic_pte_lookup
+ *
+ * Convert a user virtual address to a physical address
+ * Only supports Intel large pages (2MB only) on x86_64.
+ * ZZZ - hugepage support is incomplete
+ */
static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
- int write, unsigned long *paddr, int *pageshift)
+ int write, unsigned long *paddr, int *pageshift)
{
- struct page *page;
+ pgd_t *pgdp;
+ pmd_t *pmdp;
+ pud_t *pudp;
+ pte_t pte;
+
+ WARN_ON(irqs_disabled()); /* ZZZ debug */
+
+ local_irq_disable();
+ pgdp = pgd_offset(vma->vm_mm, vaddr);
+ if (unlikely(pgd_none(*pgdp)))
+ goto err;
+
+ pudp = pud_offset(pgdp, vaddr);
+ if (unlikely(pud_none(*pudp)))
+ goto err;
+
+ pmdp = pmd_offset(pudp, vaddr);
+ if (unlikely(pmd_none(*pmdp)))
+ goto err;
+#ifdef CONFIG_X86_64
+ if (unlikely(pmd_large(*pmdp)))
+ pte = *(pte_t *) pmdp;
+ else
+#endif
+ pte = *pte_offset_kernel(pmdp, vaddr);
- /* ZZZ Need to handle HUGE pages */
- if (is_vm_hugetlb_page(vma))
- return -EFAULT;
- *pageshift = PAGE_SHIFT;
- page = follow_page(vma, vaddr, (write ? FOLL_WRITE : 0));
- if (!page)
+ local_irq_enable();
+
+ if (unlikely(!pte_present(pte) ||
+ (write && (!pte_write(pte) || !pte_dirty(pte)))))
return 1;
- *paddr = page_to_phys(page);
+
+ *paddr = pte_pfn(pte) << PAGE_SHIFT;
+ *pageshift = is_vm_hugetlb_page(vma) ? HPAGE_SHIFT : PAGE_SHIFT;
return 0;
+
+err:
+ local_irq_enable();
+ return 1;
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] - GRU virtual -> physical translation 2008-07-09 19:14 [PATCH] - GRU virtual -> physical translation Jack Steiner @ 2008-07-11 19:17 ` Andrew Morton 2008-07-14 14:52 ` Jack Steiner 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2008-07-11 19:17 UTC (permalink / raw) To: Jack Steiner; +Cc: linux-kernel, linux-mm On Wed, 9 Jul 2008 14:14:39 -0500 Jack Steiner <steiner@sgi.com> wrote: > Open code the equivalent to follow_page(). This eliminates the > requirement for an EXPORT of follow_page(). I'd prefer to export follow_page() - copying-n-pasting just to avoid exporting the darn thing is silly. > In addition, the code > is optimized for the specific case that is needed by the GRU and only > supports architectures supported by the GRU (ia64 & x86_64). Unless you think that this alone justifies the patch? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] - GRU virtual -> physical translation 2008-07-11 19:17 ` Andrew Morton @ 2008-07-14 14:52 ` Jack Steiner 2008-07-14 16:24 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Jack Steiner @ 2008-07-14 14:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm On Fri, Jul 11, 2008 at 12:17:36PM -0700, Andrew Morton wrote: > On Wed, 9 Jul 2008 14:14:39 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > Open code the equivalent to follow_page(). This eliminates the > > requirement for an EXPORT of follow_page(). > > I'd prefer to export follow_page() - copying-n-pasting just to avoid > exporting the darn thing is silly. If follow_page() can be EXPORTed, I think that may make the most sense for now. > > > In addition, the code > > is optimized for the specific case that is needed by the GRU and only > > supports architectures supported by the GRU (ia64 & x86_64). > > Unless you think that this alone justifies the patch? No, at least not now. We don't have enough data yet to know if the additional performance is worth having an optimized lookup routine. Currently the focus is in making the driver functionally correct. Performance optimization will be done later when we have a better understanding of the user apps that will use the GRU. _IF_ we agree to the export, what is the best way to send you the patches. Incremental or an entirely new GRU V3 patch with all issues resolved? --- jack -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] - GRU virtual -> physical translation 2008-07-14 14:52 ` Jack Steiner @ 2008-07-14 16:24 ` Andrew Morton 2008-07-14 16:31 ` Jack Steiner 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2008-07-14 16:24 UTC (permalink / raw) To: Jack Steiner; +Cc: linux-kernel, linux-mm, Christoph Hellwig On Mon, 14 Jul 2008 09:52:55 -0500 Jack Steiner <steiner@sgi.com> wrote: > On Fri, Jul 11, 2008 at 12:17:36PM -0700, Andrew Morton wrote: > > On Wed, 9 Jul 2008 14:14:39 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > > Open code the equivalent to follow_page(). This eliminates the > > > requirement for an EXPORT of follow_page(). > > > > I'd prefer to export follow_page() - copying-n-pasting just to avoid > > exporting the darn thing is silly. > > If follow_page() can be EXPORTed, I think that may make the most sense for > now. What was Christoph's reason for objecting to the export? > > > > > In addition, the code > > > is optimized for the specific case that is needed by the GRU and only > > > supports architectures supported by the GRU (ia64 & x86_64). > > > > Unless you think that this alone justifies the patch? > > No, at least not now. We don't have enough data yet to know if the additional > performance is worth having an optimized lookup routine. Currently the > focus is in making the driver functionally correct. Performance optimization > will be done later when we have a better understanding of the user apps that > will use the GRU. > > _IF_ we agree to the export, what is the best way to send you the patches. > Incremental or an entirely new GRU V3 patch with all issues resolved? Incremental would be preferred, please. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] - GRU virtual -> physical translation 2008-07-14 16:24 ` Andrew Morton @ 2008-07-14 16:31 ` Jack Steiner 2008-07-14 19:50 ` Robin Holt 0 siblings, 1 reply; 9+ messages in thread From: Jack Steiner @ 2008-07-14 16:31 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm, Christoph Hellwig On Mon, Jul 14, 2008 at 09:24:51AM -0700, Andrew Morton wrote: > On Mon, 14 Jul 2008 09:52:55 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > On Fri, Jul 11, 2008 at 12:17:36PM -0700, Andrew Morton wrote: > > > On Wed, 9 Jul 2008 14:14:39 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > > > > Open code the equivalent to follow_page(). This eliminates the > > > > requirement for an EXPORT of follow_page(). > > > > > > I'd prefer to export follow_page() - copying-n-pasting just to avoid > > > exporting the darn thing is silly. > > > > If follow_page() can be EXPORTed, I think that may make the most sense for > > now. > > What was Christoph's reason for objecting to the export? No clue. Just a NACK. Christoph??? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] - GRU virtual -> physical translation 2008-07-14 16:31 ` Jack Steiner @ 2008-07-14 19:50 ` Robin Holt 2008-07-14 20:01 ` Jack Steiner 2008-07-14 20:37 ` Hugh Dickins 0 siblings, 2 replies; 9+ messages in thread From: Robin Holt @ 2008-07-14 19:50 UTC (permalink / raw) To: Jack Steiner; +Cc: Andrew Morton, linux-kernel, linux-mm, Christoph Hellwig On Mon, Jul 14, 2008 at 11:31:07AM -0500, Jack Steiner wrote: > On Mon, Jul 14, 2008 at 09:24:51AM -0700, Andrew Morton wrote: > > On Mon, 14 Jul 2008 09:52:55 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > > On Fri, Jul 11, 2008 at 12:17:36PM -0700, Andrew Morton wrote: > > > > On Wed, 9 Jul 2008 14:14:39 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > > > > > > Open code the equivalent to follow_page(). This eliminates the > > > > > requirement for an EXPORT of follow_page(). > > > > > > > > I'd prefer to export follow_page() - copying-n-pasting just to avoid > > > > exporting the darn thing is silly. > > > > > > If follow_page() can be EXPORTed, I think that may make the most sense for > > > now. > > > > What was Christoph's reason for objecting to the export? > > No clue. Just a NACK. > > Christoph??? Maybe I missed part of the discussion, but I thought follow_page() would not work because you need this to function in the interrupt context and locks would then need to be made irqsave/irqrestore. This, of course does not in any way answer the question about why follow_page() can not be exported. Thanks, Robin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] - GRU virtual -> physical translation 2008-07-14 19:50 ` Robin Holt @ 2008-07-14 20:01 ` Jack Steiner 2008-07-14 20:37 ` Hugh Dickins 1 sibling, 0 replies; 9+ messages in thread From: Jack Steiner @ 2008-07-14 20:01 UTC (permalink / raw) To: Robin Holt; +Cc: Andrew Morton, linux-kernel, linux-mm, Christoph Hellwig On Mon, Jul 14, 2008 at 02:50:18PM -0500, Robin Holt wrote: > On Mon, Jul 14, 2008 at 11:31:07AM -0500, Jack Steiner wrote: > > On Mon, Jul 14, 2008 at 09:24:51AM -0700, Andrew Morton wrote: > > > On Mon, 14 Jul 2008 09:52:55 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > > > > On Fri, Jul 11, 2008 at 12:17:36PM -0700, Andrew Morton wrote: > > > > > On Wed, 9 Jul 2008 14:14:39 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > > > > > > > > Open code the equivalent to follow_page(). This eliminates the > > > > > > requirement for an EXPORT of follow_page(). > > > > > > > > > > I'd prefer to export follow_page() - copying-n-pasting just to avoid > > > > > exporting the darn thing is silly. > > > > > > > > If follow_page() can be EXPORTed, I think that may make the most sense for > > > > now. > > > > > > What was Christoph's reason for objecting to the export? > > > > No clue. Just a NACK. > > > > Christoph??? > > Maybe I missed part of the discussion, but I thought follow_page() would > not work because you need this to function in the interrupt context and > locks would then need to be made irqsave/irqrestore. Arggg. You are right. I forgot the issue with the pte locks. Looks like I am back to the plan suggested by Nick. I'll add a get_user_pte_fast() function to gup.c. The pte lookup function is very similar to the get_user_pages_fast() function. Ignore the previous noise about follow_page(). --- jack -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] - GRU virtual -> physical translation 2008-07-14 19:50 ` Robin Holt 2008-07-14 20:01 ` Jack Steiner @ 2008-07-14 20:37 ` Hugh Dickins 2008-07-14 21:52 ` Jack Steiner 1 sibling, 1 reply; 9+ messages in thread From: Hugh Dickins @ 2008-07-14 20:37 UTC (permalink / raw) To: Robin Holt Cc: Jack Steiner, Andrew Morton, linux-kernel, linux-mm, Christoph Hellwig, Nick Piggin On Mon, 14 Jul 2008, Robin Holt wrote: > On Mon, Jul 14, 2008 at 11:31:07AM -0500, Jack Steiner wrote: > > On Mon, Jul 14, 2008 at 09:24:51AM -0700, Andrew Morton wrote: > > > On Mon, 14 Jul 2008 09:52:55 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > On Fri, Jul 11, 2008 at 12:17:36PM -0700, Andrew Morton wrote: > > > > > On Wed, 9 Jul 2008 14:14:39 -0500 Jack Steiner <steiner@sgi.com> wrote: > > > > > > > > > > > Open code the equivalent to follow_page(). This eliminates the > > > > > > requirement for an EXPORT of follow_page(). > > > > > > > > > > I'd prefer to export follow_page() - copying-n-pasting just to avoid > > > > > exporting the darn thing is silly. > > > > > > > > If follow_page() can be EXPORTed, I think that may make the most sense for > > > > now. > > > > > > What was Christoph's reason for objecting to the export? > > > > No clue. Just a NACK. > > > > Christoph??? > > Maybe I missed part of the discussion, but I thought follow_page() would > not work because you need this to function in the interrupt context and > locks would then need to be made irqsave/irqrestore. Exactly, that seems to have gone missing from the patch explanation. > This, of course does not in any way answer the question about why > follow_page() can not be exported. I can't answer that either, and like Andrew I would have preferred to export follow_page, but for this locking issue: on the face of it, you cannot safely pte_offset_map_lock from interrupt context. But I do now wonder whether it was just my kneejerk reaction on seeing Jack's comment that it was needed in interrupt context. After glancing through the GRU patches, I'm left wondering whether gru_intr is an asynchronous interrupt - in which case follow_page cannot be used, and it's not obvious to me that the version in this patch is safe - or whether it's more akin to a trap, coming in the context of the current mm, in which case using follow_page may be okay. I say it's not obvious to me that the version in this patch is safe, because I was wondering about the local_irq_disable there. That's copied from Nick's fast GUP patches, but its function here is less obvious. gru_intr already did down_read_trylock(>s->ts->mmap_sem), which seems to give more guarantees than Nick relies upon; but is gts->ts_mm necessarily the same as current->mm or not? If not, then you don't have quite the TLB flushing guarantees that Nick relies upon: you won't be sent an IPI to flush TLB if swapping or truncation suddenly frees the page, which that local_irq_disable is designed to keep at bay (if I understand fast GUP correctly) while we get a reference to the page to rescue it from freeing. In Jack's patch I see no get_page within the irq_disabled area, so it's not clear to me what the local_irq_disable achieves. But perhaps the MMU notification mechanism, or the GRU's "magic hardware", keeps it all safe. (Personally, I dislike the other patch, adding zap_vma_ptes: to me it's just minor bloat and obscurity on top of the familiar zap_page_range; but I may be in a minority of one on that.) Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] - GRU virtual -> physical translation 2008-07-14 20:37 ` Hugh Dickins @ 2008-07-14 21:52 ` Jack Steiner 0 siblings, 0 replies; 9+ messages in thread From: Jack Steiner @ 2008-07-14 21:52 UTC (permalink / raw) To: Hugh Dickins, nickpiggin Cc: Robin Holt, Andrew Morton, linux-kernel, linux-mm, Christoph Hellwig > > Maybe I missed part of the discussion, but I thought follow_page() would > > not work because you need this to function in the interrupt context and > > locks would then need to be made irqsave/irqrestore. > > Exactly, that seems to have gone missing from the patch explanation. > > > This, of course does not in any way answer the question about why > > follow_page() can not be exported. > > I can't answer that either, and like Andrew I would have preferred > to export follow_page, but for this locking issue: on the face of it, > you cannot safely pte_offset_map_lock from interrupt context. Agree. But there were flaws that you pointed out, I forgot, & Robin reminded me. > > But I do now wonder whether it was just my kneejerk reaction on seeing > Jack's comment that it was needed in interrupt context. After glancing > through the GRU patches, I'm left wondering whether gru_intr is an > asynchronous interrupt - in which case follow_page cannot be used, > and it's not obvious to me that the version in this patch is safe - > or whether it's more akin to a trap, coming in the context of the > current mm, in which case using follow_page may be okay. gru_intr() is truly an asynchronous interrupt. The function that actually does the TLB dropin can be called thru two different paths. One is from user context & should be safe for using follow_page(). However, the other path really is via an async interrupt. The GRU needs to do a lookup using something similar to the get_user_pages_fast() function from Nick. > I say it's not obvious to me that the version in this patch is safe, > because I was wondering about the local_irq_disable there. That's > copied from Nick's fast GUP patches, but its function here is less > obvious. gru_intr already did down_read_trylock(>s->ts->mmap_sem), > which seems to give more guarantees than Nick relies upon; but is > gts->ts_mm necessarily the same as current->mm or not? > > If not, then you don't have quite the TLB flushing guarantees that > Nick relies upon: you won't be sent an IPI to flush TLB if swapping > or truncation suddenly frees the page, which that local_irq_disable > is designed to keep at bay (if I understand fast GUP correctly) > while we get a reference to the page to rescue it from freeing. I think the GRU is ok but the reason is very specific to the GRU (but please check me on this). In fact, maybe for this reason alone, I should open-code the lookup. Otherwise the new lookup routine potentially could be misused by other drivers. The control structions used to manage the GRU & GRU TLB reside in normal cacheable writeback memory. The GRU memory resides in the GRU chip - not normal RAM. Aside from that, the GRU space appears to the cpu as RAM but with a few special properties & side-effects. The GRU is a home agent for all of the GRU space & tracks ownership of cache lines that belong to the GRU. At all times, the GRU is aware of whether a cache line belonging to the GRU is unowned OR if a copy of the cacheline is potentially held by a cpu. A GRU TLB dropin consists of the following steps by a cpu: - read the TLB-related cacheline (referred to as a TFH) from the GRU. This cacheline contains the vaddr that caused the miss. - convert the vaddr into a physical address - store the paddr into the TFH cacheline that contains the vaddr - issue a flush-cache to write the TFH cacheline back to the GRU. If another cpu is flushing the same entry, there is a race that must be resolved. For example cpu1 could be in the middle of a TLB dropin while another cpu is purging the entry that is about to be dropped in. This is the race that you refer to above. The timing could be something like: cpu 1 cpu 2 ----- ----- read TFH look up pte 0 -> pte flush TLB flush GRU TLB via MMUOPS callback extract paddr from pte store paddr -> TFH writeback TFH to GRU The GRU has special hardware that eliminates the need for locks to resolve this race. If a TLB flush is issued AND the GRU does not own the TFH, the subsequent TLB dropin is ignored. I think this solves the race shown above. > But perhaps the MMU notification mechanism, or the GRU's > "magic hardware", keeps it all safe. The above is a very brief description of the "magic" hardware. So.... How should I proceed? It looks like 2 approaches: 1) add a get_user_pte_fast() as discussed in mail with Nick. This function would resides in gup.c and be exported to drivers can can handle the necessary races dealing with TLB flushes/dropin 2) Open code a fast pte walker in the GRU. This has the advantage that it does not add a difficult-to-use-correctly API to the kernel. The downside is that pte lookup functions are questionable in drivers. (Not sure if it is possible to implement the API on all arches. Might be restricted to arches that implement CONFIG_HAVE_GET_USER_PAGES_FAST. Other arches would always return "fail" but this is permitted by the API. ---- > (Personally, I dislike the other patch, adding zap_vma_ptes: > to me it's just minor bloat and obscurity on top of the familiar > zap_page_range; but I may be in a minority of one on that.) I can live with either. The original zap_page_range() was NACKed with no explanation. I looked at possible objections to exporting zap_page_range() & the main one that I could find was that it far more powerful than what a normal driver would need. I can understand this objection - it makes sense to me. I floated the idea of a restricted form of a zap & got no objections. But like I said - either works for me & I don't see an obvious reason to chose one over the other. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-14 21:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-07-09 19:14 [PATCH] - GRU virtual -> physical translation Jack Steiner 2008-07-11 19:17 ` Andrew Morton 2008-07-14 14:52 ` Jack Steiner 2008-07-14 16:24 ` Andrew Morton 2008-07-14 16:31 ` Jack Steiner 2008-07-14 19:50 ` Robin Holt 2008-07-14 20:01 ` Jack Steiner 2008-07-14 20:37 ` Hugh Dickins 2008-07-14 21:52 ` Jack Steiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox