Re: Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "majianpeng" <majianpeng@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed.
Date: Mon, 28 May 2012 14:26:27 +0800	[thread overview]
Message-ID: <201205281426238284699@gmail.com> (raw)
In-Reply-To: <201205242138175936268@gmail.com>

Sorry for late to reply.I reviewed the code again and found some probleam.
I created a soft-raid and the size was larger than 16T.
The os is ubuntu 12.04 32bit x86.
The udev create the block node is /dev dir(as tmpfs).
And I readed the tmpfs code :
in mm/shmem.c:shmem_fill_super()
>sb->s_maxbytes = MAX_LFS_FILESIZE;
In my computer, MAX_LFS_FILESZE is equal 8T -1.
But the read code:
generic_file_aio_read-->do_generic_file_read[not use direct flag
In function:do_generic_file_read():
>index = *ppos >> PAGE_CACHE_SHIFT;
index is the type of pgoff_t.
So if  *ppos is larger than 16T, the index is overflow.As you said, it will read low position data.

But I tested the write operation:
blkdev_aio_write->__generic_file_aio_write.
In function:__generic_file_aio_write()
It will check by function:generic_write_checks()
But In function
>if (likely(!isblk)) {
>		if (unlikely(*pos >= inode->i_sb->s_maxbytes)) {
>			if (*count || *pos > inode->i_sb->s_maxbytes) {
>				return -EFBIG;
>			}
>			/* zero-length writes at ->s_maxbytes are OK */
>		}

>		if (unlikely(*pos + *count > inode->i_sb->s_maxbytes))
>			*count = inode->i_sb->s_maxbytes - *pos;
>	} else {
>#ifdef CONFIG_BLOCK
>		loff_t isize;
>		if (bdev_read_only(I_BDEV(inode)))
>			return -EPERM;
>		isize = i_size_read(inode);
>		if (*pos >= isize) {
>			if (*count || *pos > isize)
>				return -ENOSPC;
>		}

>		if (*pos + *count > isize)
>			*count = isize - *pos;
>#else
>		return -EPERM;
>#endif
Although it check (s_maxbytes)MAX_LFS_FILESIZE.But is file is block device,it did not check,it only check the real size.
But there is also a bug.Because if block size > 16T,there was not error and execed continue.
When exec generic_file_buffered_write()[no odriect action] --->generic_perform_write-->write_begin[blkdev_write_begin]
--->block_write_begin
In function:block_write_begin()
>pgoff_t index = pos >> PAGE_CACHE_SHIFT;
index will overflow.

I once thought to patch those bug(I may be well-known ,haha).But I can't,as is generic_write_checks():
>/*
>	 * Are we about to exceed the fs block limit ?
>	 *
>	 * If we have written data it becomes a short write.  If we have
>	 * exceeded without writing data we send a signal and return EFBIG.
>	 * Linus frestrict idea will clean these up nicely..
>	 */
>	if (likely(!isblk)) {
how to deal with block? As a regular file or not?
						



------------------				 
majianpeng
2012-05-28

-------------------------------------------------------------
发件人：Hugh Dickins
发送日期：2012-05-27 05:24:13
收件人：majianpeng
抄送：Al Viro; Andrew Morton; linux-mm; linux-fsdevel
主题：Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed.

On Thu, 24 May 2012, majianpeng wrote:
>   Hi all:
> 		I readed a raid5,which size 30T.OS is RHEL6 32bit.
> 	    I reaed the raid5(as a whole,not parted) and found read address which not i wanted.
> 		So I tested the newest kernel code,the problem is still.
> 		I review the code, in function do_generic_file_read()
> 
> 		index = *ppos >> PAGE_CACHE_SHIFT;
> 		index is u32.and *ppos is long long.
> 		So when *ppos is larger than 0xFFFF FFFF *  PAGE_CACHE_SHIFT(16T Byte),then the index is error.
> 
> 		I wonder this .In 32bit os ,block devices size do not large then 16T,in other words, if block devices larger than 16T,must parted.

I am not surprised that the page cache limitation prevents you from
reading the whole device with a 32-bit kernel.  See MAX_LFS_FILESIZE in
include/linux/fs.h.  Our answer to that is just to use a 64-bit kernel.

#if BITS_PER_LONG==32
#define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) 
#elif BITS_PER_LONG==64
#define MAX_LFS_FILESIZE 0x7fffffffffffffffUL
#endif

But I am a little surprised that you get as far as 16TiB (with 4k page):
I would have expected you to be stopped just before 8TiB (although I
suspect that the limitation to 8TiB rather than 16TiB is unnecessary).

And if I understand you correctly, read() or pread() gave you no error
at those large offsets, but supplied data from the low offset instead?

That does surprise me - have we missed a check there?

Hugh

     prev parent reply	other threads:[~2012-05-28  6:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-24 13:38 the max size of block device on 32bit os,when using do_generic_file_read() proceed majianpeng
2012-05-26 21:23 ` Hugh Dickins
2012-05-28  6:26 ` majianpeng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201205281426238284699@gmail.com \
    --to=majianpeng@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox