From: "majianpeng" <majianpeng@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed.
Date: Mon, 28 May 2012 14:26:27 +0800 [thread overview]
Message-ID: <201205281426238284699@gmail.com> (raw)
In-Reply-To: <201205242138175936268@gmail.com>
Sorry for late to reply.I reviewed the code again and found some probleam.
I created a soft-raid and the size was larger than 16T.
The os is ubuntu 12.04 32bit x86.
The udev create the block node is /dev dir(as tmpfs).
And I readed the tmpfs code :
in mm/shmem.c:shmem_fill_super()
>sb->s_maxbytes = MAX_LFS_FILESIZE;
In my computer, MAX_LFS_FILESZE is equal 8T -1.
But the read code:
generic_file_aio_read-->do_generic_file_read[not use direct flag
In function:do_generic_file_read():
>index = *ppos >> PAGE_CACHE_SHIFT;
index is the type of pgoff_t.
So if *ppos is larger than 16T, the index is overflow.As you said, it will read low position data.
But I tested the write operation:
blkdev_aio_write->__generic_file_aio_write.
In function:__generic_file_aio_write()
It will check by function:generic_write_checks()
But In function
>if (likely(!isblk)) {
> if (unlikely(*pos >= inode->i_sb->s_maxbytes)) {
> if (*count || *pos > inode->i_sb->s_maxbytes) {
> return -EFBIG;
> }
> /* zero-length writes at ->s_maxbytes are OK */
> }
> if (unlikely(*pos + *count > inode->i_sb->s_maxbytes))
> *count = inode->i_sb->s_maxbytes - *pos;
> } else {
>#ifdef CONFIG_BLOCK
> loff_t isize;
> if (bdev_read_only(I_BDEV(inode)))
> return -EPERM;
> isize = i_size_read(inode);
> if (*pos >= isize) {
> if (*count || *pos > isize)
> return -ENOSPC;
> }
> if (*pos + *count > isize)
> *count = isize - *pos;
>#else
> return -EPERM;
>#endif
Although it check (s_maxbytes)MAX_LFS_FILESIZE.But is file is block device,it did not check,it only check the real size.
But there is also a bug.Because if block size > 16T,there was not error and execed continue.
When exec generic_file_buffered_write()[no odriect action] --->generic_perform_write-->write_begin[blkdev_write_begin]
--->block_write_begin
In function:block_write_begin()
>pgoff_t index = pos >> PAGE_CACHE_SHIFT;
index will overflow.
I once thought to patch those bug(I may be well-known ,haha).But I can't,as is generic_write_checks():
>/*
> * Are we about to exceed the fs block limit ?
> *
> * If we have written data it becomes a short write. If we have
> * exceeded without writing data we send a signal and return EFBIG.
> * Linus frestrict idea will clean these up nicely..
> */
> if (likely(!isblk)) {
how to deal with block? As a regular file or not?
------------------
majianpeng
2012-05-28
-------------------------------------------------------------
发件人:Hugh Dickins
发送日期:2012-05-27 05:24:13
收件人:majianpeng
抄送:Al Viro; Andrew Morton; linux-mm; linux-fsdevel
主题:Re: the max size of block device on 32bit os,when usingdo_generic_file_read() proceed.
On Thu, 24 May 2012, majianpeng wrote:
> Hi all:
> I readed a raid5,which size 30T.OS is RHEL6 32bit.
> I reaed the raid5(as a whole,not parted) and found read address which not i wanted.
> So I tested the newest kernel code,the problem is still.
> I review the code, in function do_generic_file_read()
>
> index = *ppos >> PAGE_CACHE_SHIFT;
> index is u32.and *ppos is long long.
> So when *ppos is larger than 0xFFFF FFFF * PAGE_CACHE_SHIFT(16T Byte),then the index is error.
>
> I wonder this .In 32bit os ,block devices size do not large then 16T,in other words, if block devices larger than 16T,must parted.
I am not surprised that the page cache limitation prevents you from
reading the whole device with a 32-bit kernel. See MAX_LFS_FILESIZE in
include/linux/fs.h. Our answer to that is just to use a 64-bit kernel.
#if BITS_PER_LONG==32
#define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
#elif BITS_PER_LONG==64
#define MAX_LFS_FILESIZE 0x7fffffffffffffffUL
#endif
But I am a little surprised that you get as far as 16TiB (with 4k page):
I would have expected you to be stopped just before 8TiB (although I
suspect that the limitation to 8TiB rather than 16TiB is unnecessary).
And if I understand you correctly, read() or pread() gave you no error
at those large offsets, but supplied data from the low offset instead?
That does surprise me - have we missed a check there?
Hugh
prev parent reply other threads:[~2012-05-28 6:25 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-24 13:38 the max size of block device on 32bit os,when using do_generic_file_read() proceed majianpeng
2012-05-26 21:23 ` Hugh Dickins
2012-05-28 6:26 ` majianpeng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201205281426238284699@gmail.com \
--to=majianpeng@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox