linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Stephen C. Tweedie" <sct@redhat.com>
To: "Eric W. Biederman" <ebiederm+eric@ccr.net>
Cc: linux-mm@kvack.org, Stephen Tweedie <sct@redhat.com>
Subject: Re: Alpha quality write out daemon
Date: Tue, 19 Jan 1999 15:15:33 GMT	[thread overview]
Message-ID: <199901191515.PAA05462@dax.scot.redhat.com> (raw)
In-Reply-To: <m1g19ep3p9.fsf@flinx.ccr.net>

Hi,

On 14 Jan 1999 04:08:02 -0600, ebiederm+eric@ccr.net (Eric W. Biederman)
said:

> This patch is agains 2.2.0-pre5.
> I have been working and have implemented a daemon that does
> all swaping out except from shm areas (todo).

> What it does is add an extra kernel daemon that does nothing but
> walking through the page tables start I/O on dirty pages and mark them
> clean and write protected.  Sleep 30 seconds and do it again.

This feels like the wrong way of doing it.  We _already_ have a page
walking algorithm.  Why do we need another one?

My own feeling is that a laundry list (a list of pages needing written
out, either to swap or via filemap) is a better way of running the
background swapout thread.  That way we can cluster the IO appropriately
(using the ordering of the laundry list) while still making the IO
completely asynchronous.

> Unfortunantely this extra aggressiveness seems to be turning up lurking 
> bugs in other parts of the kernel.  I keep getting:

> Kernel Panic: Freeing swap cahce page
> or
> swap entry mismatch<7>clean_mm:0 found a page
> swap_cache: replacing non-empty entry 00076300 on page c18fe000
>    Which I have tracked down to finding dirty ptes that point at swap cache pages!

> Since I have only added one signficant function, and it only runs in a single
> thread I am 95% sure it's not my new code.

I'd suspect any new code first, and old code second: if my own experience
with the VM is anything to go by, seemingly trivial changes can have
unexpected and subtle side effects...

Looking at your code, it seems to be absolutely full of diabolical
races.  I'm sure that your problems are there!  To take the first two I
came across:

> +static int try_to_clean(struct vm_area_struct* vma, unsigned long address, pte_t * page_table)

> +	if (vma->vm_ops && vma->vm_ops->writeout) {
> +		if (vma->vm_ops->writeout(vma, address - vma->vm_start + vma->vm_offset, page)) {
> +			/* Find some appropriate process to tell,
> +			 * anyone with the same mm_struct is fine
> +			 */
...
> +			goto out;
> +		}
> +		result = 1;
> +		pte = pte_mkclean(pte);
> +		set_pte(page_table, pte);
> +		goto out;
> +	}

Ouch.  You are setting the pte stuff after calling writeout(), and the
page tables can have changed drastically during the sleep (indeed, the
process may even have died while you were asleep in the writout). 

> +	/*
> +	 * This is a dirty, swappable page.  First of all,
> +	 * get a suitable swap entry for it, and make sure
> +	 * we have the swap cache set up to associate the
> +	 * page with that swap entry.
> +	 */
> ...
> +	/* OK, do a physical asynchronous write to swap.  */
> +	rw_swap_page(WRITE, entry, (char *) page, 0);
> +
> +	result = 1; /* Could we have slept? Play it safe */
> +	/* Note:  We make the page read only here to maintain the invariant
> +	 * that swap cache pages are always read only.
> +	 * Once we have PG_dirty or a similar mechanism implemented we
> +	 * can relax this.
> +	 */
> +	pte = pte_wrprotect(pte_mkclean(pte));
> +	set_pte(page_table, pte);

Again, mucking about with ptes after the potential sleep in
rw_swap_page() is completely forbidden.  vmscan.c:try_to_swap_out() is
*really* careful to avoid this.  Have a good look at the order in which
we do things there, and also while you are at it look at the tlb and
cache flush logic we need to employ to keep things sane.  As it stands,
your daemon is a complete VM death trap!

--Stephen


--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

  parent reply	other threads:[~1999-01-19 15:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-01-14 10:08 Eric W. Biederman
     [not found] ` <369E0501.987D2B3B@xinit.se>
1999-01-14 16:49   ` Eric W. Biederman
1999-01-15  7:31 ` Eric W. Biederman
1998-01-14 18:44   ` Hans Eric Sandström
1999-01-15  9:02     ` Eric W. Biederman
1999-01-17  6:12       ` Beta " Eric W. Biederman
1999-01-18  5:05         ` Eric W. Biederman
1999-01-19 15:15 ` Stephen C. Tweedie [this message]
1999-01-20 14:52   ` Alpha " Eric W. Biederman
1999-01-20 18:46     ` Zlatko Calusic
1999-01-25  1:40       ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=199901191515.PAA05462@dax.scot.redhat.com \
    --to=sct@redhat.com \
    --cc=ebiederm+eric@ccr.net \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox