Philipp Rumpf wrote:
> 
> On Sun, Jul 09, 2000 at 11:54:54PM +0000, Andrew Morton wrote:
> > Philipp Rumpf wrote:
> > Hi, Philipp.
> >
> > > Here's a simple way:
> >
> > Already done it :)  It's apparent that not _all_ callers of z_p_r need
> > this treatment, so I've added an extra 'do_reschedule' flag.  I've also
> > moved the TLB flushing into this function.
> 
> It is ?  I must be missing something, but it looks to me like all calls
> to z_p_r can be done out of syscalls, with pretty much any size the user
> wants.

Possibly - but I don't want to put reschedules into places unless
they're demonstrated to cause scheduling stalls.  Probably just haven't
run the right tests :(


> > It strikes me that the TLB flush race can be avoided by simply deferring
> > the actual free_page until _after_ the flush.  So
> > free_page_and_swap_cache simply appends them to a passed-in list rather
> > than returning them to the buddy allocator.  zap_page_range can then
> > free the pages after the flush.
> 
> In fact, both the tlb flushing and the cache invalidating/flushing (we don't
> really need to flush the cache if we're zapping the last mapping) belong in
> zap_page_range.

I did that.

>  Right now three callers don't do the tlb/cache flushes:
>  exit_mmap and move_page_tables should be fine with doing the cache/tlb
> invalidates;  read_zero_pagealigned doesn't want to have intermediate invalid
> ptes, so I would say it's buggy now.

Not hard to change.

> > > [PAGE_SIZE*4 is low, I suspect.]
> >
> > zap_page_range zaps 1000 pages per millisecond, so I'm doing 1000 at a
> > time.
> 
> I think we should be able to live with that for 2.4, unless the tlb flushing
> race is really bad.  It looks like a rather theoretical possibility limited
> to SMP systems to me.

hmm..

Anyway, I have the perfect reimplementation which fixes the race and the
damn thing crashes after 5-10 minutes of load and I _cannot_ see what
I've done wrong.  I basically implemented Manfred's initial suggestion
of deferring the page freeing until after the TLB flush.

Can you please cast an eye over the attached patch and pick out why it
would die?  The only sensible diag I got out of it was for one crash
where this test in __free_pages_ok() died:

        if (page->mapping)
                BUG();

It is solid if you disable ZPR_DEFER_FREE_PAGE.  This is on a
uniprocessor.  I thought there may be a race between an interrupt
routine's kmalloc(GFP_ATOMIC) and the local_tlb_flush, so I put a big
local_irq_disable() around the whole thing and it _still_ died.

Need sleep....