linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC] page fault retry with NOPAGE_RETRY
@ 2006-09-14 22:55 Benjamin Herrenschmidt
  2006-09-15  0:19 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2006-09-14 22:55 UTC (permalink / raw)
  To: linux-mm; +Cc: Linux Kernel list, Linus Torvalds

Hi !

While tracking some issues with mapping of SPEs on Cell into userland
processes, I figured out that we have a problem that can be solved with
a small generic change (which might be useful for other situations as
well).

Basically, when a user tries to access some mmap'ed SPE registers, the
no_page() function for those tries to schedule in a physial SPE the
virtual SPE context of that user. For that, it needs to wait for an SPE
to become available, which can take some time, and with the current SPE
scheduler, can take a long time indeed.

This wait is interruptible. However, we have no way to fail "gracefully"
from no_page() if the routine we use underneath returns a failure due to
a signal (we use, logically, -EINTR). It's a generic issue with no_page
handlers. They can either wait non-interruptibly, or fail with a sigbus
or oom result.

However, it would be trivial to add the ability for no_page() to fail in
such a way that execution just comes back all the way to userland (and
thus pending signals are delivered), and the faulting instruction is
simply taken again. All we need is something like:

in mm.h:

 #define NOPAGE_SIGBUS   (NULL)
 #define NOPAGE_OOM      ((struct page *) (-1))
+#define NOPAGE_RETRY	((struct page *) (-2))

and in memory.c, in do_no_page():


        /* no page was available -- either SIGBUS or OOM */
        if (new_page == NOPAGE_SIGBUS)
                return VM_FAULT_SIGBUS;
        if (new_page == NOPAGE_OOM)
                return VM_FAULT_OOM;
+       if (new_page == NOPAGE_RETRY)
+               return VM_FAULT_MINOR;

In fact, it would be nicer to turn the whole serie into -one- if
(unlikely(something wrong)) and inside that if, a switch case of fault
codes. That would probably make the code better than what it is today
and allow for adding error cases without performance harm to the fast
path. I'll provide a patch doing that if there is no strong disagreement
with the approach.

With the above change, a no_page() handler can just return NOPAGE_RETRY
(for example if it was waiting on availability of a hardware resource
and a signal got detected) and this will simply be counted as a minor
fault. I've looked at the callers of handle_mm_fault() on a couple of
archs and it seems that the approach will work though I haven't looked
at all of them yet (though at this point, I only plan to use that for
PowerPC SPE code anyway).

Somebody pointed to me that this might also be used to shoot another
bird, though I have not really though about it and wether it's good or
bad, which is the old problem of needing struct page for things that can
be mmap'ed. Using that trick, a driver could do the set_pte() itself in
the no_page handler and return NOPAGE_RETRY. I'm not sure about
advertising that feature though as I like all  callers of things like
set_pte() to be in well known locations, as there are various issues
related to manipulating the page tables that driver writers might not
get right. Though I suppose that if we consider the approach good, we
can provide a helper that "does the right thing" as well (like calling
update_mmu_cache(), flush_tlb_whatever(), etc...).

Any comment ? If it's ok, I'll cook a patch candidate for 2.6.19 along
with the matching change to spufs to make use of it.

Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-09-23 22:35 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-14 22:55 [RFC] page fault retry with NOPAGE_RETRY Benjamin Herrenschmidt
2006-09-15  0:19 ` Linus Torvalds
2006-09-15  7:11 ` Andrew Morton
2006-09-15  7:35   ` Andrew Morton
2006-09-15 13:30     ` Hugh Dickins
2006-09-16  1:03       ` Benjamin Herrenschmidt
2006-09-19 23:35   ` Mike Waychison
2006-09-19 23:50     ` Benjamin Herrenschmidt
2006-09-19 23:59       ` Andrew Morton
2006-09-20  0:06         ` Benjamin Herrenschmidt
2006-09-20  0:05       ` Benjamin Herrenschmidt
2006-09-20  0:21         ` Andrew Morton
2006-09-20  1:57           ` Benjamin Herrenschmidt
2006-09-20  3:05             ` Andrew Morton
2006-09-20  5:04               ` Benjamin Herrenschmidt
2006-09-20  5:26                 ` Andrew Morton
2006-09-20  6:54                   ` Benjamin Herrenschmidt
2006-09-20 17:53                     ` Andrew Morton
2006-09-21 22:05                       ` Benjamin Herrenschmidt
2006-09-21 22:41                         ` Andrew Morton
2006-09-21 23:09                           ` Benjamin Herrenschmidt
2006-09-23 14:21                       ` Hugh Dickins
2006-09-23 19:46                         ` Andrew Morton
2006-09-23 22:35                           ` Benjamin Herrenschmidt
2006-09-20  5:06               ` Benjamin Herrenschmidt
2006-09-20  1:14       ` Mike Waychison
2006-09-20  2:02         ` Benjamin Herrenschmidt
2006-09-15 21:35 ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox