linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Cong Wang <amwang@redhat.com>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	Pekka Enberg <penberg@kernel.org>, Christoph Hellwig <hch@lst.de>,
	Hugh Dickins <hughd@google.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Lennart Poettering <lennart@poettering.net>,
	Kay Sievers <kay.sievers@vrfy.org>,
	linux-mm@kvack.org
Subject: Re: [V3 PATCH 1/2] tmpfs: add fallocate support
Date: Wed, 23 Nov 2011 14:59:01 -0500	[thread overview]
Message-ID: <CAHGf_=rOYkEGHakyHpihopMg2VtVfDV7XvC_QGs_kj6HgDmBRA@mail.gmail.com> (raw)
In-Reply-To: <1322038412-29013-1-git-send-email-amwang@redhat.com>

> Systemd needs tmpfs to support fallocate [1], to be able
> to safely use mmap(), regarding SIGBUS, on files on the
> /dev/shm filesystem. The glibc fallback loop for -ENOSYS
> on fallocate is just ugly.

for EOPNOTSUPP?

glibc/sysdeps/unix/sysv/linux/i386/posix_fallocate.c
----------------
int
posix_fallocate (int fd, __off_t offset, __off_t len)
{
#ifdef __NR_fallocate
# ifndef __ASSUME_FALLOCATE
  if (__builtin_expect (__have_fallocate >= 0, 1))
# endif
    {
      int res = __call_fallocate (fd, 0, offset, len);
      if (! res)
        return 0;

# ifndef __ASSUME_FALLOCATE
      if (__builtin_expect (res == ENOSYS, 0))
        __have_fallocate = -1;
      else
# endif
        if (res != EOPNOTSUPP)
          return res;
    }
#endif

  return internal_fallocate (fd, offset, len);
}
--------------------------


But, ok, I'm now convinced this is needed. people strongly dislike to
receive SIGBUS. yes.


> This patch adds fallocate support to tmpfs, and as we
> already have shmem_truncate_range(), it is also easy to
> add FALLOC_FL_PUNCH_HOLE support too.
>
> 1. http://lkml.org/lkml/2011/10/20/275
>
> V2->V3:
> a) Read i_size directly after holding i_mutex;
> b) Call page_cache_release() too after shmem_getpage();
> c) Undo previous changes when -ENOSPC.
>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> Cc: Lennart Poettering <lennart@poettering.net>
> Cc: Kay Sievers <kay.sievers@vrfy.org>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: WANG Cong <amwang@redhat.com>
>
> ---
>  mm/shmem.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 65 insertions(+), 0 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d672250..65f7a27 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -30,6 +30,7 @@
>  #include <linux/mm.h>
>  #include <linux/export.h>
>  #include <linux/swap.h>
> +#include <linux/falloc.h>
>
>  static struct vfsmount *shm_mnt;
>
> @@ -1431,6 +1432,69 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
>        return error;
>  }
>
> +static void shmem_truncate_page(struct inode *inode, pgoff_t index)
> +{
> +       loff_t start = index << PAGE_CACHE_SHIFT;
> +       loff_t end = ((index + 1) << PAGE_CACHE_SHIFT) - 1;
> +       shmem_truncate_range(inode, start, end);
> +}
> +
> +static long shmem_fallocate(struct file *file, int mode,
> +                               loff_t offset, loff_t len)
> +{
> +       struct inode *inode = file->f_path.dentry->d_inode;
> +       pgoff_t start = offset >> PAGE_CACHE_SHIFT;
> +       pgoff_t end = DIV_ROUND_UP((offset + len), PAGE_CACHE_SIZE);
> +       pgoff_t index = start;
> +       loff_t i_size;
> +       struct page *page = NULL;
> +       int ret = 0;

do_fallocate has following file type check.

        if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
                return -ENODEV;

However, this implementation don't support dir allocation and/or punch hole.
ext4's ext4_punch_hole() has following additional check. Maybe we need similar
check.

        if (!S_ISREG(inode->i_mode))
                return -ENOTSUPP;


> +       mutex_lock(&inode->i_mutex);
> +       i_size = inode->i_size;
> +       if (mode & FALLOC_FL_PUNCH_HOLE) {
> +               if (!(offset > i_size || (end << PAGE_CACHE_SHIFT) > i_size))

Seems incorrect.
fallocate(PUNCH, 0, very_big_number) should punch to a range of [0, end).


> +                       shmem_truncate_range(inode, offset,
> +                                            (end << PAGE_CACHE_SHIFT) - 1);
> +               goto unlock;
> +       }
> +
> +       if (!(mode & FALLOC_FL_KEEP_SIZE)) {
> +               ret = inode_newsize_ok(inode, (offset + len));
> +               if (ret)
> +                       goto unlock;
> +       }
> +       while (index < end) {
> +               ret = shmem_getpage(inode, index, &page, SGP_WRITE, NULL);
> +               if (ret) {
> +                       if (ret == -ENOSPC)
> +                               goto undo;
> +                       else
> +                               goto unlock;
> +               }
> +               if (page) {
> +                       unlock_page(page);
> +                       page_cache_release(page);
> +               }
> +               index++;
> +       }
> +       if (!(mode & FALLOC_FL_KEEP_SIZE) && (index << PAGE_CACHE_SHIFT) > i_size)
> +               i_size_write(inode, index << PAGE_CACHE_SHIFT);

Seems incorrect.
new i_size should be offset+len. our round-up is implementation detail
and don't have to expose
to userland.

> +
> +       goto unlock;
> +
> +undo:
> +       while (index > start) {
> +               shmem_truncate_page(inode, index);
> +               index--;

Hmmm...
seems too aggressive truncate if the file has pages before starting fallocate.
but I have no idea to make better undo. ;)


> +       }
> +
> +unlock:
> +       mutex_unlock(&inode->i_mutex);
> +       return ret;
> +}
> +
>  static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>        struct shmem_sb_info *sbinfo = SHMEM_SB(dentry->d_sb);
> @@ -2286,6 +2350,7 @@ static const struct file_operations shmem_file_operations = {
>        .fsync          = noop_fsync,
>        .splice_read    = shmem_file_splice_read,
>        .splice_write   = generic_file_splice_write,
> +       .fallocate      = shmem_fallocate,
>  #endif
>  };

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-11-23 19:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-23  8:53 Cong Wang
2011-11-23  8:53 ` [PATCH 2/2] fs: wire up .truncate_range and .fallocate Cong Wang
2011-11-23 10:38   ` Christoph Hellwig
2011-11-23 19:16     ` Hugh Dickins
2011-11-23  9:06 ` [V3 PATCH 1/2] tmpfs: add fallocate support Pekka Enberg
2011-11-23 19:07 ` Hugh Dickins
2011-11-24  3:18   ` Cong Wang
2011-11-23 19:59 ` KOSAKI Motohiro [this message]
2011-11-23 21:11   ` Pekka Enberg
2011-11-23 22:20     ` Hugh Dickins
2011-11-24  3:15       ` Cong Wang
2011-11-24  1:52 ` KAMEZAWA Hiroyuki
2011-11-24  2:46   ` KOSAKI Motohiro
2011-11-24  3:01     ` KAMEZAWA Hiroyuki
2011-11-24  3:22       ` Cong Wang
2011-11-24  4:23         ` KAMEZAWA Hiroyuki
2011-11-24  5:52           ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHGf_=rOYkEGHakyHpihopMg2VtVfDV7XvC_QGs_kj6HgDmBRA@mail.gmail.com' \
    --to=kosaki.motohiro@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=amwang@redhat.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=kay.sievers@vrfy.org \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox