linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Badari Pulavarty <pbadari@us.ibm.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: Jeff Dike <jdike@addtoit.com>, linux-mm <linux-mm@kvack.org>,
	dvhltc@us.ibm.com
Subject: Re: [RFC][PATCH] OVERCOMMIT_ALWAYS extension
Date: Mon, 24 Oct 2005 13:24:08 -0700	[thread overview]
Message-ID: <1130185448.6831.20.camel@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.61.0510242027001.6509@goblin.wat.veritas.com>

On Mon, 2005-10-24 at 21:04 +0100, Hugh Dickins wrote:
> On Thu, 20 Oct 2005, Badari Pulavarty wrote:
> > 
> > Changes from previous:
> > 
> > 1) madvise(DISCARD) - zaps the range and discards the pages. So, no
> > need to call madvise(DONTNEED) before.
> > 
> > 2) I added truncate_inode_pages2_range() to just discard only the
> > range of pages - not the whole file.
> > 
> > Hugh, when you get a chance could you review this instead ?
> 
> I haven't had time to go through it thoroughly, and will have no time
> the next couple of days, but here are some remarks.
> 
> --- linux-2.6.14-rc3/include/asm-alpha/mman.h	2005-09-30 14:17:35.000000000 -0700
> +++ linux-2.6.14-rc3.db2/include/asm-alpha/mman.h	2005-10-20 10:52:37.000000000 -0700
> @@ -42,6 +42,7 @@
>  #define MADV_WILLNEED	3		/* will need these pages */
>  #define	MADV_SPACEAVAIL	5		/* ensure resources are available */
>  #define MADV_DONTNEED	6		/* don't need these pages */
> +#define MADV_DISCARD    7               /* discard pages right now */
> 
> Throughout the patch there's lots of spaces where there should be tabs.
> But I'm glad you've put a space after the "#define" here, unlike in that
> MADV_SPACEAVAIL higher up!  Not so glad at your spaces to the right of it.


Sorry about that. I was working with Darren's old patch and didn't
bother cleaning up the white spaces. 

> 
> Are we free to define MADV_DISCARD, coming after the others, in each of
> the architectures?  In general, I think mman.h reflects definitions made
> by native Operating Systems of the architectures in question, and they
> might have added a few since.

I looked at all architectures. No matter what their header file says,
none of them actually implemented anything other than the standard ones
(documented in the manpages).

> --- linux-2.6.14-rc3/include/linux/mm.h	2005-09-30 14:17:35.000000000 -0700
> +++ linux-2.6.14-rc3.db2/include/linux/mm.h	2005-10-20 13:41:57.000000000 -0700
> @@ -865,6 +865,7 @@ extern unsigned long do_brk(unsigned lon
>  /* filemap.c */
>  extern unsigned long page_unuse(struct page *);
>  extern void truncate_inode_pages(struct address_space *, loff_t);
> +extern void truncate_inode_pages2_range(struct address_space *, loff_t, loff_t);
> 
> Personally, I have an aversion to sticking a "2" in there.  I know you're
> just following the convention established by invalidate_inode_pages2, but..
> 
> Hold on, -mm already contains reiser4-truncate_inode_pages_range.patch,
> you should be working with that.  Doesn't it do just what you need,
> even without a "2" :-?

Yes. Thats exactly what I did also. One less thing to worry for me :)

>  
> --- linux-2.6.14-rc3/mm/madvise.c	2005-09-30 14:17:35.000000000 -0700
> +++ linux-2.6.14-rc3.db2/mm/madvise.c	2005-10-20 13:37:41.000000000 -0700
> @@ -137,6 +137,40 @@ static long madvise_dontneed(struct vm_a
>  	return 0;
>  }
>  
> +static long madvise_discard(struct vm_area_struct * vma,
> +			     struct vm_area_struct ** prev,
> +			     unsigned long start, unsigned long end)
> +{
> ....
> +	error = madvise_dontneed(vma, prev, start, end);
> +	if (error)
> +		return error;
> +
> +	/* looks good, try and rip it out of page cache */
> +	printk("%s: trying to rip shm vma (%p) inode from page cache\n", __FUNCTION__, vma);
> +	offset = (loff_t)(start - vma->vm_start);
> +	endoff = (loff_t)(end - vma->vm_start);
> +	printk("call truncate_inode_pages(%p, %x %x)\n", mapping, 
> +			(unsigned int)offset, (unsigned int)endoff);
> +	down(&mapping->host->i_sem);
> +	truncate_inode_pages2_range(mapping, offset, endoff);
> +	up(&mapping->host->i_sem);
> +	return 0;
> +}
> 
> Hmm.  I don't think it's consistent to zap the pages from a single mm,
> then remove them from the page cache, while leaving the pages mapped into
> other mms.  Just what would those pages then be?  they're not file pages,
> they're not anonymous pages, such pages have given trouble in the past.
> 
> I think you'll need to follow vmtruncate much more closely - and the
> unmap_mapping_range code already allows for a range, shouldn't need
> much change - going through all the vmas before truncating the range.
> 
> Which makes it feel more like sys_fpunch than an madvise.
> 
> You of course need write access to the underlying file, is that checked?
> 
> What should it be doing to anonymous COWed pages?  Not clear whether
> it should be following truncate in discarding those too, or not.

You are right. What we have here is a kludge - pointed out by Andrea
also (in a private e-mail). He recommended that I should look at
doing "real" MADV_TRUNCATE and add filesystem hooks to make it
sure its not limited to only "shmfs". 

I am re-doing it again. I am scared to touch this part of VM code,
thats why I was trying to get away with smallest possible thing.
I guess its time to sit and do it for real.

Thanks,
Badari

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      parent reply	other threads:[~2005-10-24 20:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-17 17:30 [RFC] " Badari Pulavarty
2005-10-17 18:13 ` Hugh Dickins
2005-10-17 18:25   ` Hugh Dickins
2005-10-17 23:14     ` Badari Pulavarty
2005-10-18 16:05     ` [RFC][PATCH] " Badari Pulavarty
2005-10-19 17:56       ` Hugh Dickins
2005-10-19 18:32         ` Jeff Dike
2005-10-19 21:21           ` Badari Pulavarty
2005-10-19 22:38             ` Jeff Dike
2005-10-19 18:50         ` Badari Pulavarty
2005-10-19 19:12           ` Darren Hart
2005-10-19 20:10           ` Hugh Dickins
2005-10-19 20:47           ` Jeff Dike
2005-10-20 15:11             ` Badari Pulavarty
2005-10-20 17:27               ` Jeff Dike
2005-10-20 22:37                 ` Badari Pulavarty
2005-10-24 20:04                   ` Hugh Dickins
2005-10-24 20:22                     ` Darren Hart
2005-10-24 20:24                     ` Badari Pulavarty [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1130185448.6831.20.camel@localhost.localdomain \
    --to=pbadari@us.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=hugh@veritas.com \
    --cc=jdike@addtoit.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox