From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Anshuman Khandual <khandual@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting
Date: Mon, 23 Oct 2017 07:32:59 +0000 [thread overview]
Message-ID: <20171023073258.GA5115@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <5016e528-8ea9-7597-3420-086ae57f3d9d@oracle.com>
On Fri, Oct 20, 2017 at 10:49:46AM -0700, Mike Kravetz wrote:
> On 10/19/2017 07:30 PM, Naoya Horiguchi wrote:
> > On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote:
> >
> > Thank you for addressing this. The patch itself looks good to me, but
> > the reported issue (negative reserve count) doesn't reproduce in my trial
> > with v4.14-rc5, so could you share the exact procedure for this issue?
>
> Sure, but first one question on your test scenario below.
>
> >
> > When error handler runs over a huge page, the reserve count is incremented
> > so I'm not sure why the reserve count goes negative.
>
> I'm not sure I follow. What specific code is incrementing the reserve
> count?
The call path is like below:
hugetlbfs_error_remove_page
hugetlb_fix_reserve_counts
hugepage_subpool_get_pages(spool, 1)
hugetlb_acct_memory(h, 1);
gather_surplus_pages
h->resv_huge_pages += delta;
>
> > My operation is like below:
> >
> > $ sysctl vm.nr_hugepages=10
> > $ grep HugePages_ /proc/meminfo
> > HugePages_Total: 10
> > HugePages_Free: 10
> > HugePages_Rsvd: 0
> > HugePages_Surp: 0
> > $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard" // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it.
> > $ grep HugePages_ /proc/meminfo
> > HugePages_Total: 10
> > HugePages_Free: 9
> > HugePages_Rsvd: 1 // reserve count is incremented
> > HugePages_Surp: 0
>
> This is confusing to me. I can not create a test where there is a reserve
> count after poisoning page.
>
> I tried to recreate your test. Running unmodified 4.14.0-rc5.
>
> Before test
> -----------
> HugePages_Total: 1
> HugePages_Free: 1
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
>
> After open(creat) and mmap of 2MB hugetlbfs file
> ------------------------------------------------
> HugePages_Total: 1
> HugePages_Free: 1
> HugePages_Rsvd: 1
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
>
> Reserve count is 1 as expected/normal
>
> After madvise(MADV_HWPOISON) of the single huge page in mapping/file
> --------------------------------------------------------------------
> HugePages_Total: 1
> HugePages_Free: 0
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
>
> In this case, the reserve (and free) count were decremented. Note that
> before the poison operation the page was not associated with the mapping/
> file. I did not look closely at the code, but assume the madvise may
> cause the page to be 'faulted in'.
>
> The counts remain the same when the program exits
> -------------------------------------------------
> HugePages_Total: 1
> HugePages_Free: 0
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
>
> Remove the file (rm /var/opt/oracle/hugepool/foo)
> -------------------------------------------------
> HugePages_Total: 1
> HugePages_Free: 0
> HugePages_Rsvd: 18446744073709551615
> HugePages_Surp: 0
> Hugepagesize: 2048 kB
>
> I am still confused about how your test maintains a reserve count after
> poisoning. It may be a good idea for you to test my patch with your
> test scenario as I can not recreate here.
Interestingly, I found that this reproduces if all hugetlb pages are
reserved when poisoning.
Your testing meets the condition, and mine doesn't.
In gather_surplus_pages() we determine whether we extend hugetlb pool
with surplus pages like below:
needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
if (needed <= 0) {
h->resv_huge_pages += delta;
return 0;
}
...
needed is 1 if h->resv_huge_pages == h->free_huge_pages, and then
the reserve count gets inconsistent.
I confirmed that your patch fixes the issue, so I'm OK with it.
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-23 7:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-19 23:00 [PATCH 0/1] " Mike Kravetz
2017-10-19 23:00 ` [PATCH 1/1] " Mike Kravetz
2017-10-20 2:30 ` Naoya Horiguchi
2017-10-20 17:49 ` Mike Kravetz
2017-10-23 7:32 ` Naoya Horiguchi [this message]
2017-10-23 18:20 ` Mike Kravetz
2017-10-24 0:46 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171023073258.GA5115@hori1.linux.bs1.fc.nec.co.jp \
--to=n-horiguchi@ah.jp.nec.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=khandual@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox