linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Con Kolivas <kernel@kolivas.org>
To: Andrew Morton <akpm@osdl.org>
Cc: nickpiggin@yahoo.com.au, linux-mm@kvack.org, ck@vds.kolivas.org,
	pj@sgi.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: Implement Swap Prefetching v23
Date: Fri, 10 Feb 2006 14:49:58 +1100	[thread overview]
Message-ID: <200602101449.59486.kernel@kolivas.org> (raw)
In-Reply-To: <20060209192556.2629e36b.akpm@osdl.org>

On Friday 10 February 2006 14:25, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > Here's a respin with Nick's suggestions and a modification to not cost us
> > extra slab on non-numa.
>
> v23?  I'm sure we can do better than that.

:D

> > This patch implements swap prefetching when the vm is relatively idle and
> > there is free ram available.
>
> I think "free ram available" is the critical thing here.  If it doesn't
> evict anyhing else then OK, it basically uses unutilised disk bandwidth for
> free.
>
> But where does it put the pages?  If it was really "free", they'd go onto
> the tail of the inactive list.

It puts them in swapcache. This seems to work nicely as a nowhere-land place 
where they don't have much affect on anything until we need them or need more 
ram. This has worked well, but I'm open to other suggestions.

> And what about all those unpaged text pages which the app will need to
> fault back in?

Tracking these would make the perceived time to wakeup much much faster but 
also make the list a heck of a lot larger.. but more to the point I have no 
idea how to do it and get what we really want on the list.

> > Once pages have been added to the swapped list, a timer is started,
> > testing for conditions suitable to prefetch swap pages every 5 seconds.
> > Suitable conditions are defined as lack of swapping out or in any pages,
> > and no watermark tests failing. Significant amounts of dirtied ram and
> > changes in free ram representing disk writes or reads also prevent
> > prefetching.
>
> OK.   The has-the-disk-been-used-recently test still isn't there?

No, but you'd have to free up +/- SWAP_SCAN_MAX as many pages as you fill when 
reading from disk for this indirect method to not pick it up. It's easy 
enough to see that it's effective: You can sit and watch vmstat for many 
minutes after say an oom-kill in the hope you see it quiet enough before 
prefetch does anything. After allowing 'tail -f /dev/zero' to be oomkilled in 
a full gui environment it usually takes >5 mins before swap prefetch starts 
prefetching.

> > @@ -844,6 +844,7 @@ int zone_watermark_ok(struct zone *z, in
> >  		if (free_pages <= min)
> >  			return 0;
> >  	}
> > +
> >  	return 1;
> >  }
>
> ?

Legacy of v1->v22..

>
> > +	read_lock(&swapper_space.tree_lock);
>
> That's interesting.  We conventionally do read_lock_irq() on an
> address_space.tree_lock.  Because
> end_page_writeback()->rotate_reclaimable_page()->test_clear_page_writeback(
>) needs to take a write_lock from interrupt context.  end_swap_bio_write()
> calls end_page_writeback() so I don't immediately see why we don't have a
> deadlock here.

Wasn't sure of the semantics. I was lucky I guess.

> Ordinarily we'd just use find_get_page(), but
>
> > +	/* Entry may already exist */
> > +	page = radix_tree_lookup(&swapper_space.page_tree, entry.val);
> > +	read_unlock(&swapper_space.tree_lock);
> > +	if (page) {
> > +		remove_from_swapped_list(entry.val);
> > +		goto out;
> > +	}
>
> you don't take a ref on the page.
>
> Which makes one wonder what happens here if `page' got whipped out of
> swapcache between the lookup and the remove_from_swapped_list().  Probably
> nothing much.
>
> Did you consider borrowing swapper_space.tree_lock to provide the list and
> radix-tree locking throughout here?

I did. I was concerned that since most of the functions occur as we swap in or 
out that we'd end up with more contention of that lock.

> > +out:
> > +	if (mru) {
> > +		spin_lock(&swapped.lock);
> > +		swapped.list.next = mru;
> > +		spin_unlock(&swapped.lock);
> > +	}
>
> That looks strange.  What happens to swapped.list.prev?

Left dangling for future null dereferencing... Only would have been hit on 
numa.

> > +			schedule_timeout_interruptible(MAX_SCHEDULE_TIMEOUT);
>
> 	set_current_state(TASK_INTERRUPTIBLE);
> 	schedule();

Ack.

Thanks! I guess I will be working on v24 in the near future...

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-02-10  3:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-10  2:55 Con Kolivas
2006-02-10  3:25 ` Andrew Morton
2006-02-10  3:49   ` Con Kolivas [this message]
2006-02-10  4:07     ` Nick Piggin
2006-02-10  4:14       ` Con Kolivas
2006-02-10  4:17         ` Nick Piggin
2006-02-10  4:25         ` Andrew Morton
2006-02-10  4:45           ` Nick Piggin
2006-02-10  4:55             ` Andrew Morton
2006-02-10  5:01               ` Nick Piggin
2006-02-10  5:26                 ` Con Kolivas
2006-02-10  5:32                   ` Nick Piggin
2006-02-10  5:37                     ` Con Kolivas
2006-02-10  5:43                       ` Nick Piggin
2006-02-10  5:48                         ` Con Kolivas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200602101449.59486.kernel@kolivas.org \
    --to=kernel@kolivas.org \
    --cc=akpm@osdl.org \
    --cc=ck@vds.kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox