linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	stable@vger.kernel.org
Subject: Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache
Date: Tue, 23 Oct 2018 19:41:48 +0200	[thread overview]
Message-ID: <20181023174148.GX18839@dhcp22.suse.cz> (raw)
In-Reply-To: <b5be45b8-5afe-56cd-9482-28384699a049@oracle.com>

On Tue 23-10-18 10:30:44, Mike Kravetz wrote:
> On 10/23/18 12:43 AM, Michal Hocko wrote:
> > On Wed 17-10-18 21:10:22, Mike Kravetz wrote:
> >> Some test systems were experiencing negative huge page reserve
> >> counts and incorrect file block counts.  This was traced to
> >> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
> >> file pagecaches.  When non-hugetlbfs explicit code removes the
> >> pages, the appropriate accounting is not performed.
> >>
> >> This can be recreated as follows:
> >>  fallocate -l 2M /dev/hugepages/foo
> >>  echo 1 > /proc/sys/vm/drop_caches
> >>  fallocate -l 2M /dev/hugepages/foo
> >>  grep -i huge /proc/meminfo
> >>    AnonHugePages:         0 kB
> >>    ShmemHugePages:        0 kB
> >>    HugePages_Total:    2048
> >>    HugePages_Free:     2047
> >>    HugePages_Rsvd:    18446744073709551615
> >>    HugePages_Surp:        0
> >>    Hugepagesize:       2048 kB
> >>    Hugetlb:         4194304 kB
> >>  ls -lsh /dev/hugepages/foo
> >>    4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
> >>
> >> To address this issue, dirty pages as they are added to pagecache.
> >> This can easily be reproduced with fallocate as shown above. Read
> >> faulted pages will eventually end up being marked dirty.  But there
> >> is a window where they are clean and could be impacted by code such
> >> as drop_caches.  So, just dirty them all as they are added to the
> >> pagecache.
> >>
> >> In addition, it makes little sense to even try to drop hugetlbfs
> >> pagecache pages, so disable calls to these filesystems in drop_caches
> >> code.
> >>
> >> Fixes: 70c3547e36f5 ("hugetlbfs: add hugetlbfs_fallocate()")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> > 
> > I do agree with others that HUGETLBFS_MAGIC check in drop_pagecache_sb
> > is wrong in principal. I am not even sure we want to special case memory
> > backed filesystems. What if we ever implement MADV_FREE on fs? Should
> > those pages be dropped? My first idea take would be yes.
> 
> Ok, I have removed that hard coded check.  Implementing MADV_FREE on
> hugetlbfs would take some work, but it could be done.
> 
> > Acked-by: Michal Hocko <mhocko@suse.com> to the set_page_dirty dirty
> > part.
> > 
> > Although I am wondering why you haven't covered only the fallocate path
> > wrt Fixes tag. In other words, do we need the same treatment for the
> > page fault path? We do not set dirty bit on page there as well. We rely
> > on the dirty bit in pte and only for writable mappings. I have hard time
> > to see why we have been safe there as well. So maybe it is your Fixes:
> > tag which is not entirely correct, or I am simply missing the fault
> > path.
> 
> No, you are not missing anything.  In the commit log I mentioned that this
> also does apply to the fault path.  The change takes care of them both.
> 
> I was struggling with what to put in the fixes tag.  As mentioned, this
> problem also exists in the fault path.  Since 3.16 is the oldest stable
> release, I went back and used the commit next to the add_to_page_cache code
> there.  However, that seems kind of random.  Is there a better way to say
> the patch applies to all stable releases?

OK, good, I was afraid I was missing something, well except for not
reading the changelog properly. I would go with

Cc: stable # all kernels with hugetlb

> Here is updated patch without the drop_caches change and updated fixes tag.
> 
> From: Mike Kravetz <mike.kravetz@oracle.com>
> 
> hugetlbfs: dirty pages as they are added to pagecache
> 
> Some test systems were experiencing negative huge page reserve
> counts and incorrect file block counts.  This was traced to
> /proc/sys/vm/drop_caches removing clean pages from hugetlbfs
> file pagecaches.  When non-hugetlbfs explicit code removes the
> pages, the appropriate accounting is not performed.
> 
> This can be recreated as follows:
>  fallocate -l 2M /dev/hugepages/foo
>  echo 1 > /proc/sys/vm/drop_caches
>  fallocate -l 2M /dev/hugepages/foo
>  grep -i huge /proc/meminfo
>    AnonHugePages:         0 kB
>    ShmemHugePages:        0 kB
>    HugePages_Total:    2048
>    HugePages_Free:     2047
>    HugePages_Rsvd:    18446744073709551615
>    HugePages_Surp:        0
>    Hugepagesize:       2048 kB
>    Hugetlb:         4194304 kB
>  ls -lsh /dev/hugepages/foo
>    4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
> 
> To address this issue, dirty pages as they are added to pagecache.
> This can easily be reproduced with fallocate as shown above. Read
> faulted pages will eventually end up being marked dirty.  But there
> is a window where they are clean and could be impacted by code such
> as drop_caches.  So, just dirty them all as they are added to the
> pagecache.
> 
> Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into huge_no_page()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Acked-by: Mihcla Hocko <mhocko@suse.com>

> ---
>  mm/hugetlb.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5c390f5a5207..7b5c0ad9a6bd 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3690,6 +3690,12 @@ int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
>  		return err;
>  	ClearPagePrivate(page);
>  
> +	/*
> +	 * set page dirty so that it will not be removed from cache/file
> +	 * by non-hugetlbfs specific code paths.
> +	 */
> +	set_page_dirty(page);
> +
>  	spin_lock(&inode->i_lock);
>  	inode->i_blocks += blocks_per_huge_page(h);
>  	spin_unlock(&inode->i_lock);
> -- 
> 2.17.2

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-10-23 17:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-18  4:10 Mike Kravetz
2018-10-18 23:08 ` Andrew Morton
2018-10-18 23:16   ` Mike Kravetz
2018-10-19  0:46     ` Andrea Arcangeli
2018-10-19  1:47       ` Andrew Morton
2018-10-19  4:50         ` Mike Kravetz
2018-10-23  7:43 ` Michal Hocko
2018-10-23 17:30   ` Mike Kravetz
2018-10-23 17:41     ` Michal Hocko [this message]
2018-10-24  5:00     ` Khalid Aziz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181023174148.GX18839@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=dave@stgolabs.net \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox