linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] hugetlbfs read() support
@ 2007-07-14  1:23 Badari Pulavarty
  2007-07-19  5:19 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Badari Pulavarty @ 2007-07-14  1:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bill Irwin, nacc, lkml, linux-mm

Hi Andrew,

Here is the patch to support read() for hugetlbfs, needed to get
oprofile working on executables backed by largepages. 

If you plan to consider Christoph Lameter's pagecache cleanup patches,
I will re-write this. Otherwise, please consider this for -mm.

Thanks,
Badari

Support for reading from hugetlbfs files. libhugetlbfs lets application
text/data to be placed in large pages. When we do that, oprofile doesn't
work - since libbfd tries to read from it.

This code is very similar to what do_generic_mapping_read() does, but
I can't use it since it has PAGE_CACHE_SIZE assumptions.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: William Irwin <bill.irwin@oracle.com>
Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>

 fs/hugetlbfs/inode.c |  113 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

Index: linux-2.6.22/fs/hugetlbfs/inode.c
===================================================================
--- linux-2.6.22.orig/fs/hugetlbfs/inode.c	2007-07-08 16:32:17.000000000 -0700
+++ linux-2.6.22/fs/hugetlbfs/inode.c	2007-07-13 19:24:36.000000000 -0700
@@ -156,6 +156,118 @@ full_search:
 }
 #endif
 
+static int
+hugetlbfs_read_actor(struct page *page, unsigned long offset,
+			char __user *buf, unsigned long count,
+			unsigned long size)
+{
+	char *kaddr;
+	unsigned long left, copied = 0;
+	int i, chunksize;
+
+	if (size > count)
+		size = count;
+
+	/* Find which 4k chunk and offset with in that chunk */
+	i = offset >> PAGE_CACHE_SHIFT;
+	offset = offset & ~PAGE_CACHE_MASK;
+
+	while (size) {
+		chunksize = PAGE_CACHE_SIZE;
+		if (offset)
+			chunksize -= offset;
+		if (chunksize > size)
+			chunksize = size;
+		kaddr = kmap(&page[i]);
+		left = __copy_to_user(buf, kaddr + offset, chunksize);
+		kunmap(&page[i]);
+		if (left) {
+			copied += (chunksize - left);
+			break;
+		}
+		offset = 0;
+		size -= chunksize;
+		buf += chunksize;
+		copied += chunksize;
+		i++;
+	}
+	return copied ? copied : -EFAULT;
+}
+
+/*
+ * Support for read() - Find the page attached to f_mapping and copy out the
+ * data. Its *very* similar to do_generic_mapping_read(), we can't use that
+ * since it has PAGE_CACHE_SIZE assumptions.
+ */
+ssize_t
+hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
+{
+	struct address_space *mapping = filp->f_mapping;
+	struct inode *inode = mapping->host;
+	unsigned long index = *ppos >> HPAGE_SHIFT;
+	unsigned long end_index;
+	loff_t isize;
+	unsigned long offset;
+	ssize_t retval = 0;
+
+	/* validate length */
+	if (len == 0)
+		goto out;
+
+	isize = i_size_read(inode);
+	if (!isize)
+		goto out;
+
+	offset = *ppos & ~HPAGE_MASK;
+	end_index = (isize - 1) >> HPAGE_SHIFT;
+	for (;;) {
+		struct page *page;
+		int nr, ret;
+
+		/* nr is the maximum number of bytes to copy from this page */
+		nr = HPAGE_SIZE;
+		if (index >= end_index) {
+			if (index > end_index)
+				goto out;
+			nr = ((isize - 1) & ~HPAGE_MASK) + 1;
+			if (nr <= offset) {
+				goto out;
+			}
+		}
+		nr = nr - offset;
+
+		/* Find the page */
+		page = find_get_page(mapping, index);
+		if (unlikely(page == NULL)) {
+			/*
+			 * We can't find the page in the cache - bail out ?
+			 */
+			goto out;
+		}
+		/*
+		 * Ok, we have the page, copy it to user space buffer.
+		 */
+		ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
+		if (ret < 0) {
+			retval = retval ? : ret;
+			goto out;
+		}
+
+		offset += ret;
+		retval += ret;
+		len -= ret;
+		index += offset >> HPAGE_SHIFT;
+		offset &= ~HPAGE_MASK;
+
+		page_cache_release(page);
+		if (ret == nr && len)
+			continue;
+		goto out;
+	}
+out:
+	return retval;
+}
+
 /*
  * Read a page. Again trivial. If it didn't already exist
  * in the page cache, it is zero-filled.
@@ -560,6 +672,7 @@ static void init_once(void *foo, struct 
 }
 
 const struct file_operations hugetlbfs_file_operations = {
+	.read			= hugetlbfs_read,
 	.mmap			= hugetlbfs_file_mmap,
 	.fsync			= simple_sync_file,
 	.get_unmapped_area	= hugetlb_get_unmapped_area,


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-14  1:23 [PATCH] hugetlbfs read() support Badari Pulavarty
@ 2007-07-19  5:19 ` Andrew Morton
  2007-07-19 15:51   ` Badari Pulavarty
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2007-07-19  5:19 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Bill Irwin, nacc, lkml, linux-mm

On Fri, 13 Jul 2007 18:23:33 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:

> Hi Andrew,
> 
> Here is the patch to support read() for hugetlbfs, needed to get
> oprofile working on executables backed by largepages. 
> 
> If you plan to consider Christoph Lameter's pagecache cleanup patches,
> I will re-write this. Otherwise, please consider this for -mm.
> 
> Thanks,
> Badari
> 
> Support for reading from hugetlbfs files. libhugetlbfs lets application
> text/data to be placed in large pages. When we do that, oprofile doesn't
> work - since libbfd tries to read from it.
> 
> This code is very similar to what do_generic_mapping_read() does, but
> I can't use it since it has PAGE_CACHE_SIZE assumptions.
> 
> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
> Acked-by: William Irwin <bill.irwin@oracle.com>
> Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
> 
>  fs/hugetlbfs/inode.c |  113 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 113 insertions(+)
> 
> Index: linux-2.6.22/fs/hugetlbfs/inode.c
> ===================================================================
> --- linux-2.6.22.orig/fs/hugetlbfs/inode.c	2007-07-08 16:32:17.000000000 -0700
> +++ linux-2.6.22/fs/hugetlbfs/inode.c	2007-07-13 19:24:36.000000000 -0700
> @@ -156,6 +156,118 @@ full_search:
>  }
>  #endif
>  
> +static int
> +hugetlbfs_read_actor(struct page *page, unsigned long offset,
> +			char __user *buf, unsigned long count,
> +			unsigned long size)
> +{
> +	char *kaddr;
> +	unsigned long left, copied = 0;
> +	int i, chunksize;
> +
> +	if (size > count)
> +		size = count;
> +
> +	/* Find which 4k chunk and offset with in that chunk */
> +	i = offset >> PAGE_CACHE_SHIFT;
> +	offset = offset & ~PAGE_CACHE_MASK;
> +
> +	while (size) {
> +		chunksize = PAGE_CACHE_SIZE;
> +		if (offset)
> +			chunksize -= offset;
> +		if (chunksize > size)
> +			chunksize = size;
> +		kaddr = kmap(&page[i]);
> +		left = __copy_to_user(buf, kaddr + offset, chunksize);
> +		kunmap(&page[i]);
> +		if (left) {
> +			copied += (chunksize - left);
> +			break;
> +		}
> +		offset = 0;
> +		size -= chunksize;
> +		buf += chunksize;
> +		copied += chunksize;
> +		i++;
> +	}
> +	return copied ? copied : -EFAULT;
> +}

This returns -EFAULT when asked to read zero bytes.  The caller prevents
that, but it's a little bit ugly.  Livable with.

> +/*
> + * Support for read() - Find the page attached to f_mapping and copy out the
> + * data. Its *very* similar to do_generic_mapping_read(), we can't use that
> + * since it has PAGE_CACHE_SIZE assumptions.
> + */
> +ssize_t
> +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
> +{
> +	struct address_space *mapping = filp->f_mapping;
> +	struct inode *inode = mapping->host;
> +	unsigned long index = *ppos >> HPAGE_SHIFT;
> +	unsigned long end_index;
> +	loff_t isize;
> +	unsigned long offset;
> +	ssize_t retval = 0;
> +
> +	/* validate length */
> +	if (len == 0)
> +		goto out;
> +
> +	isize = i_size_read(inode);
> +	if (!isize)
> +		goto out;
> +
> +	offset = *ppos & ~HPAGE_MASK;
> +	end_index = (isize - 1) >> HPAGE_SHIFT;
> +	for (;;) {
> +		struct page *page;
> +		int nr, ret;
> +
> +		/* nr is the maximum number of bytes to copy from this page */
> +		nr = HPAGE_SIZE;
> +		if (index >= end_index) {
> +			if (index > end_index)
> +				goto out;
> +			nr = ((isize - 1) & ~HPAGE_MASK) + 1;
> +			if (nr <= offset) {
> +				goto out;
> +			}
> +		}
> +		nr = nr - offset;
> +
> +		/* Find the page */
> +		page = find_get_page(mapping, index);
> +		if (unlikely(page == NULL)) {
> +			/*
> +			 * We can't find the page in the cache - bail out ?
> +			 */
> +			goto out;
> +		}
> +		/*
> +		 * Ok, we have the page, copy it to user space buffer.
> +		 */
> +		ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
> +		if (ret < 0) {
> +			retval = retval ? : ret;
> +			goto out;

Missing put_page().

> +		}
> +
> +		offset += ret;
> +		retval += ret;
> +		len -= ret;
> +		index += offset >> HPAGE_SHIFT;
> +		offset &= ~HPAGE_MASK;
> +
> +		page_cache_release(page);
> +		if (ret == nr && len)
> +			continue;
> +		goto out;
> +	}
> +out:
> +	return retval;
> +}

This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.

>  /*
>   * Read a page. Again trivial. If it didn't already exist
>   * in the page cache, it is zero-filled.
> @@ -560,6 +672,7 @@ static void init_once(void *foo, struct 
>  }
>  
>  const struct file_operations hugetlbfs_file_operations = {
> +	.read			= hugetlbfs_read,
>  	.mmap			= hugetlbfs_file_mmap,
>  	.fsync			= simple_sync_file,
>  	.get_unmapped_area	= hugetlb_get_unmapped_area,
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19  5:19 ` Andrew Morton
@ 2007-07-19 15:51   ` Badari Pulavarty
  2007-07-19 16:58     ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Badari Pulavarty @ 2007-07-19 15:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bill Irwin, nacc, lkml, linux-mm

On Wed, 2007-07-18 at 22:19 -0700, Andrew Morton wrote:
> On Fri, 13 Jul 2007 18:23:33 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:
> 
> > Hi Andrew,
> > 
> > Here is the patch to support read() for hugetlbfs, needed to get
> > oprofile working on executables backed by largepages. 
> > 
> > If you plan to consider Christoph Lameter's pagecache cleanup patches,
> > I will re-write this. Otherwise, please consider this for -mm.
> > 
> > Thanks,
> > Badari
> > 
> > Support for reading from hugetlbfs files. libhugetlbfs lets application
> > text/data to be placed in large pages. When we do that, oprofile doesn't
> > work - since libbfd tries to read from it.
> > 
> > This code is very similar to what do_generic_mapping_read() does, but
> > I can't use it since it has PAGE_CACHE_SIZE assumptions.
> > 
> > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
> > Acked-by: William Irwin <bill.irwin@oracle.com>
> > Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
> > 
> >  fs/hugetlbfs/inode.c |  113 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 113 insertions(+)
> > 
> > Index: linux-2.6.22/fs/hugetlbfs/inode.c
> > ===================================================================
> > --- linux-2.6.22.orig/fs/hugetlbfs/inode.c	2007-07-08 16:32:17.000000000 -0700
> > +++ linux-2.6.22/fs/hugetlbfs/inode.c	2007-07-13 19:24:36.000000000 -0700
> > @@ -156,6 +156,118 @@ full_search:
> >  }
> >  #endif
> >  
> > +static int
> > +hugetlbfs_read_actor(struct page *page, unsigned long offset,
> > +			char __user *buf, unsigned long count,
> > +			unsigned long size)
> > +{
> > +	char *kaddr;
> > +	unsigned long left, copied = 0;
> > +	int i, chunksize;
> > +
> > +	if (size > count)
> > +		size = count;
> > +
> > +	/* Find which 4k chunk and offset with in that chunk */
> > +	i = offset >> PAGE_CACHE_SHIFT;
> > +	offset = offset & ~PAGE_CACHE_MASK;
> > +
> > +	while (size) {
> > +		chunksize = PAGE_CACHE_SIZE;
> > +		if (offset)
> > +			chunksize -= offset;
> > +		if (chunksize > size)
> > +			chunksize = size;
> > +		kaddr = kmap(&page[i]);
> > +		left = __copy_to_user(buf, kaddr + offset, chunksize);
> > +		kunmap(&page[i]);
> > +		if (left) {
> > +			copied += (chunksize - left);
> > +			break;
> > +		}
> > +		offset = 0;
> > +		size -= chunksize;
> > +		buf += chunksize;
> > +		copied += chunksize;
> > +		i++;
> > +	}
> > +	return copied ? copied : -EFAULT;
> > +}
> 
> This returns -EFAULT when asked to read zero bytes.  The caller prevents
> that, but it's a little bit ugly.  Livable with.

I can fix that, but I didn't want to come here if length == 0 - so
took a shortcut.

> 
> > +/*
> > + * Support for read() - Find the page attached to f_mapping and copy out the
> > + * data. Its *very* similar to do_generic_mapping_read(), we can't use that
> > + * since it has PAGE_CACHE_SIZE assumptions.
> > + */
> > +ssize_t
> > +hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
> > +{
> > +	struct address_space *mapping = filp->f_mapping;
> > +	struct inode *inode = mapping->host;
> > +	unsigned long index = *ppos >> HPAGE_SHIFT;
> > +	unsigned long end_index;
> > +	loff_t isize;
> > +	unsigned long offset;
> > +	ssize_t retval = 0;
> > +
> > +	/* validate length */
> > +	if (len == 0)
> > +		goto out;
> > +
> > +	isize = i_size_read(inode);
> > +	if (!isize)
> > +		goto out;
> > +
> > +	offset = *ppos & ~HPAGE_MASK;
> > +	end_index = (isize - 1) >> HPAGE_SHIFT;
> > +	for (;;) {
> > +		struct page *page;
> > +		int nr, ret;
> > +
> > +		/* nr is the maximum number of bytes to copy from this page */
> > +		nr = HPAGE_SIZE;
> > +		if (index >= end_index) {
> > +			if (index > end_index)
> > +				goto out;
> > +			nr = ((isize - 1) & ~HPAGE_MASK) + 1;
> > +			if (nr <= offset) {
> > +				goto out;
> > +			}
> > +		}
> > +		nr = nr - offset;
> > +
> > +		/* Find the page */
> > +		page = find_get_page(mapping, index);
> > +		if (unlikely(page == NULL)) {
> > +			/*
> > +			 * We can't find the page in the cache - bail out ?
> > +			 */
> > +			goto out;
> > +		}
> > +		/*
> > +		 * Ok, we have the page, copy it to user space buffer.
> > +		 */
> > +		ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
> > +		if (ret < 0) {
> > +			retval = retval ? : ret;
> > +			goto out;
> 
> Missing put_page().

Yes. Thanks for catching it.

> 
> > +		}
> > +
> > +		offset += ret;
> > +		retval += ret;
> > +		len -= ret;
> > +		index += offset >> HPAGE_SHIFT;
> > +		offset &= ~HPAGE_MASK;
> > +
> > +		page_cache_release(page);
> > +		if (ret == nr && len)
> > +			continue;
> > +		goto out;
> > +	}
> > +out:
> > +	return retval;
> > +}
> 
> This code doesn't have all the ghastly tricks which we deploy to handle
> concurrent truncate.

Do I need to ? Baaahh!!  I don't want to deal with them. 
All I want is a simple read() to get my oprofile working.
Please advise.

Thanks,
Badari

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 15:51   ` Badari Pulavarty
@ 2007-07-19 16:58     ` Andrew Morton
  2007-07-19 17:07       ` Nishanth Aravamudan
                         ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Andrew Morton @ 2007-07-19 16:58 UTC (permalink / raw)
  To: Badari Pulavarty; +Cc: Bill Irwin, nacc, lkml, linux-mm, Nick Piggin

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:

> > > +		}
> > > +
> > > +		offset += ret;
> > > +		retval += ret;
> > > +		len -= ret;
> > > +		index += offset >> HPAGE_SHIFT;
> > > +		offset &= ~HPAGE_MASK;
> > > +
> > > +		page_cache_release(page);
> > > +		if (ret == nr && len)
> > > +			continue;
> > > +		goto out;
> > > +	}
> > > +out:
> > > +	return retval;
> > > +}
> > 
> > This code doesn't have all the ghastly tricks which we deploy to handle
> > concurrent truncate.
> 
> Do I need to ? Baaahh!!  I don't want to deal with them. 

Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..

> All I want is a simple read() to get my oprofile working.
> Please advise.

Did you consider changing oprofile userspace to read the executable with
mmap?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 16:58     ` Andrew Morton
@ 2007-07-19 17:07       ` Nishanth Aravamudan
  2007-07-19 17:52         ` Bill Irwin
  2007-07-20  4:47         ` Nick Piggin
  2007-07-20  4:29       ` Nick Piggin
                         ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Nishanth Aravamudan @ 2007-07-19 17:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Badari Pulavarty, Bill Irwin, lkml, linux-mm, Nick Piggin

On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:
> On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:
> 
> > > > +		}
> > > > +
> > > > +		offset += ret;
> > > > +		retval += ret;
> > > > +		len -= ret;
> > > > +		index += offset >> HPAGE_SHIFT;
> > > > +		offset &= ~HPAGE_MASK;
> > > > +
> > > > +		page_cache_release(page);
> > > > +		if (ret == nr && len)
> > > > +			continue;
> > > > +		goto out;
> > > > +	}
> > > > +out:
> > > > +	return retval;
> > > > +}
> > > 
> > > This code doesn't have all the ghastly tricks which we deploy to
> > > handle concurrent truncate.
> > 
> > Do I need to ? Baaahh!!  I don't want to deal with them. 
> 
> Nick, can you think of any serious consequences of a read/truncate
> race in there?  I can't..
> 
> > All I want is a simple read() to get my oprofile working.  Please
> > advise.
> 
> Did you consider changing oprofile userspace to read the executable
> with mmap?

It's not actually oprofile's code, though, it's libbfd (used by
oprofile). And it works fine (presumably) for other binaries. Just not
for libhugetlbfs-relinked binaries because hugetlbfs doesn't behave like
a normal ramfs (perhaps it shouldn't, but that's a different argument).

But I do think a second reason to do this is to make hugetlbfs behave
like a normal fs -- that is read(), write(), etc. work on files in the
mountpoint. But that is simply my opinion.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 17:07       ` Nishanth Aravamudan
@ 2007-07-19 17:52         ` Bill Irwin
  2007-07-31  5:57           ` dean gaudet
  2007-07-20  4:47         ` Nick Piggin
  1 sibling, 1 reply; 13+ messages in thread
From: Bill Irwin @ 2007-07-19 17:52 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Andrew Morton, Badari Pulavarty, Bill Irwin, lkml, linux-mm, Nick Piggin

On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote:
> But I do think a second reason to do this is to make hugetlbfs behave
> like a normal fs -- that is read(), write(), etc. work on files in the
> mountpoint. But that is simply my opinion.

Mine as well.


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 16:58     ` Andrew Morton
  2007-07-19 17:07       ` Nishanth Aravamudan
@ 2007-07-20  4:29       ` Nick Piggin
  2007-07-20 21:15         ` Badari Pulavarty
  2007-07-20  4:39       ` Nick Piggin
  2007-07-20  6:13       ` Nick Piggin
  3 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2007-07-20  4:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Badari Pulavarty, Bill Irwin, nacc, lkml, linux-mm

Andrew Morton wrote:
> On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:
> 
> 
>>>>+		}
>>>>+
>>>>+		offset += ret;
>>>>+		retval += ret;
>>>>+		len -= ret;
>>>>+		index += offset >> HPAGE_SHIFT;
>>>>+		offset &= ~HPAGE_MASK;
>>>>+
>>>>+		page_cache_release(page);
>>>>+		if (ret == nr && len)
>>>>+			continue;
>>>>+		goto out;
>>>>+	}
>>>>+out:
>>>>+	return retval;
>>>>+}
>>>
>>>This code doesn't have all the ghastly tricks which we deploy to handle
>>>concurrent truncate.
>>
>>Do I need to ? Baaahh!!  I don't want to deal with them. 
> 
> 
> Nick, can you think of any serious consequences of a read/truncate race in
> there?  I can't..

As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But why not just hold i_mutex around the whole thing just to be safe?

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 16:58     ` Andrew Morton
  2007-07-19 17:07       ` Nishanth Aravamudan
  2007-07-20  4:29       ` Nick Piggin
@ 2007-07-20  4:39       ` Nick Piggin
  2007-07-20  6:13       ` Nick Piggin
  3 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2007-07-20  4:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Badari Pulavarty, Bill Irwin, nacc, lkml, linux-mm

Andrew Morton wrote:
> On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:

>>>This code doesn't have all the ghastly tricks which we deploy to handle
>>>concurrent truncate.
>>
>>Do I need to ? Baaahh!!  I don't want to deal with them. 
> 
> 
> Nick, can you think of any serious consequences of a read/truncate race in
> there?  I can't..

As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 17:07       ` Nishanth Aravamudan
  2007-07-19 17:52         ` Bill Irwin
@ 2007-07-20  4:47         ` Nick Piggin
  2007-07-23 14:02           ` Nishanth Aravamudan
  1 sibling, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2007-07-20  4:47 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Andrew Morton, Badari Pulavarty, Bill Irwin, lkml, linux-mm

Nishanth Aravamudan wrote:
> On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:
> 
>>On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:
>>
>>
>>>>>+		}
>>>>>+
>>>>>+		offset += ret;
>>>>>+		retval += ret;
>>>>>+		len -= ret;
>>>>>+		index += offset >> HPAGE_SHIFT;
>>>>>+		offset &= ~HPAGE_MASK;
>>>>>+
>>>>>+		page_cache_release(page);
>>>>>+		if (ret == nr && len)
>>>>>+			continue;
>>>>>+		goto out;
>>>>>+	}
>>>>>+out:
>>>>>+	return retval;
>>>>>+}
>>>>
>>>>This code doesn't have all the ghastly tricks which we deploy to
>>>>handle concurrent truncate.
>>>
>>>Do I need to ? Baaahh!!  I don't want to deal with them. 
>>
>>Nick, can you think of any serious consequences of a read/truncate
>>race in there?  I can't..
>>
>>
>>>All I want is a simple read() to get my oprofile working.  Please
>>>advise.
>>
>>Did you consider changing oprofile userspace to read the executable
>>with mmap?
> 
> 
> It's not actually oprofile's code, though, it's libbfd (used by
> oprofile). And it works fine (presumably) for other binaries.

So... what's the problem with changing it? The fact that it is a
library doesn't really make a difference except that you'll also
help everyone else who links with it.

It won't break backwards compatibility, and it will work on older
kernels...

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 16:58     ` Andrew Morton
                         ` (2 preceding siblings ...)
  2007-07-20  4:39       ` Nick Piggin
@ 2007-07-20  6:13       ` Nick Piggin
  3 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2007-07-20  6:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Badari Pulavarty, Bill Irwin, nacc, lkml, linux-mm

(sorry if this is a resend... something bad seems to have happened to me)

Andrew Morton wrote:
> On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:

>>>This code doesn't have all the ghastly tricks which we deploy to handle
>>>concurrent truncate.
>>
>>Do I need to ? Baaahh!!  I don't want to deal with them. 
> 
> 
> Nick, can you think of any serious consequences of a read/truncate race in
> there?  I can't..

As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

-- 
SUSE Labs, Novell Inc.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-20  4:29       ` Nick Piggin
@ 2007-07-20 21:15         ` Badari Pulavarty
  0 siblings, 0 replies; 13+ messages in thread
From: Badari Pulavarty @ 2007-07-20 21:15 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Bill Irwin, nacc, lkml, linux-mm

On Fri, 2007-07-20 at 14:29 +1000, Nick Piggin wrote:
> Andrew Morton wrote:
> > On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote:
> > 
> > 
> >>>>+		}
> >>>>+
> >>>>+		offset += ret;
> >>>>+		retval += ret;
> >>>>+		len -= ret;
> >>>>+		index += offset >> HPAGE_SHIFT;
> >>>>+		offset &= ~HPAGE_MASK;
> >>>>+
> >>>>+		page_cache_release(page);
> >>>>+		if (ret == nr && len)
> >>>>+			continue;
> >>>>+		goto out;
> >>>>+	}
> >>>>+out:
> >>>>+	return retval;
> >>>>+}
> >>>
> >>>This code doesn't have all the ghastly tricks which we deploy to handle
> >>>concurrent truncate.
> >>
> >>Do I need to ? Baaahh!!  I don't want to deal with them. 
> > 
> > 
> > Nick, can you think of any serious consequences of a read/truncate race in
> > there?  I can't..
> 
> As it doesn't allow writes, then I _think_ it should be OK. If you
> ever did want to add write(2) support, then you would have transient
> zeroes problems.

I have no plans to add write() support - unless there is real reason
for doing so.

> 
> But why not just hold i_mutex around the whole thing just to be safe?

Yeah. I can do that, just to be safe for future..

Thanks,
Badari

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-20  4:47         ` Nick Piggin
@ 2007-07-23 14:02           ` Nishanth Aravamudan
  0 siblings, 0 replies; 13+ messages in thread
From: Nishanth Aravamudan @ 2007-07-23 14:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Badari Pulavarty, Bill Irwin, lkml, linux-mm

On 20.07.2007 [14:47:31 +1000], Nick Piggin wrote:
> Nishanth Aravamudan wrote:
> >On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:
> >
> >>On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty <pbadari@us.ibm.com> 
> >>wrote:
> >>
> >>
> >>>>>+		}
> >>>>>+
> >>>>>+		offset += ret;
> >>>>>+		retval += ret;
> >>>>>+		len -= ret;
> >>>>>+		index += offset >> HPAGE_SHIFT;
> >>>>>+		offset &= ~HPAGE_MASK;
> >>>>>+
> >>>>>+		page_cache_release(page);
> >>>>>+		if (ret == nr && len)
> >>>>>+			continue;
> >>>>>+		goto out;
> >>>>>+	}
> >>>>>+out:
> >>>>>+	return retval;
> >>>>>+}
> >>>>
> >>>>This code doesn't have all the ghastly tricks which we deploy to
> >>>>handle concurrent truncate.
> >>>
> >>>Do I need to ? Baaahh!!  I don't want to deal with them. 
> >>
> >>Nick, can you think of any serious consequences of a read/truncate
> >>race in there?  I can't..
> >>
> >>
> >>>All I want is a simple read() to get my oprofile working.  Please
> >>>advise.
> >>
> >>Did you consider changing oprofile userspace to read the executable
> >>with mmap?
> >
> >
> >It's not actually oprofile's code, though, it's libbfd (used by
> >oprofile). And it works fine (presumably) for other binaries.
> 
> So... what's the problem with changing it? The fact that it is a
> library doesn't really make a difference except that you'll also help
> everyone else who links with it.

Well, I'm more concerned about testing that change libbfd is rather core
code and used in a number of places. Also, libbfd's current code 'just
works' for every other filesystem concerned, or I'd expect it would have
been changed to mmap() before. I'm also terrified of binutils code :)
I'm also not sure who I'm 'helping', exactly by changing it, beyond
users of libhugetlbfs and OProfile, who are equally helped by this
kernel patch (which, again, also has the added benefit of making
hugetlbfs appear to be more like a normal filesystem).

> It won't break backwards compatibility, and it will work on older
> kernels...

Fair enough. I'm looking into it, but I can't make any promises on
timelines.

Thanks,
Nish

-- 
Nishanth Aravamudan <nacc@us.ibm.com>
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] hugetlbfs read() support
  2007-07-19 17:52         ` Bill Irwin
@ 2007-07-31  5:57           ` dean gaudet
  0 siblings, 0 replies; 13+ messages in thread
From: dean gaudet @ 2007-07-31  5:57 UTC (permalink / raw)
  To: Bill Irwin
  Cc: Nishanth Aravamudan, Andrew Morton, Badari Pulavarty, lkml,
	linux-mm, Nick Piggin

On Thu, 19 Jul 2007, Bill Irwin wrote:

> On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote:
> > But I do think a second reason to do this is to make hugetlbfs behave
> > like a normal fs -- that is read(), write(), etc. work on files in the
> > mountpoint. But that is simply my opinion.
> 
> Mine as well.

ditto.  here's a few other things i've run into recently:

it should be possible to use cp(1) to load large datasets into a 
hugetlbfs.

it should be possible to use ftruncate() on hugetlbfs files.  (on a tmpfs 
it's req'd to extend the file before mmaping... on hugetlbfs it returns 
EINVAL or somesuch and mmap just magically extends files.)

it should be possible to statfs() and get usage info... this works only if 
you mount with size=N.

-dean




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-07-31  5:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-14  1:23 [PATCH] hugetlbfs read() support Badari Pulavarty
2007-07-19  5:19 ` Andrew Morton
2007-07-19 15:51   ` Badari Pulavarty
2007-07-19 16:58     ` Andrew Morton
2007-07-19 17:07       ` Nishanth Aravamudan
2007-07-19 17:52         ` Bill Irwin
2007-07-31  5:57           ` dean gaudet
2007-07-20  4:47         ` Nick Piggin
2007-07-23 14:02           ` Nishanth Aravamudan
2007-07-20  4:29       ` Nick Piggin
2007-07-20 21:15         ` Badari Pulavarty
2007-07-20  4:39       ` Nick Piggin
2007-07-20  6:13       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox