From: Yang Shi <shy828301@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
Li Xinhai <lixinhai.lxh@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
akpm <akpm@linux-foundation.org>,
"kirill.shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [PATCH v4] mm/page_vma_mapped.c: Explicitly compare pfn for normal, hugetlbfs and THP page
Date: Thu, 3 Sep 2020 18:22:34 -0700 [thread overview]
Message-ID: <CAHbLzkrJcmfgtTS_qNtRuTcotdqZb7sQ2VmvOM90yV_K1mhceQ@mail.gmail.com> (raw)
In-Reply-To: <CAHbLzkoarMEH303LtNZ2VrbYwcdEjZRczYth0=BO2bj6TyKA2A@mail.gmail.com>
On Thu, Sep 3, 2020 at 9:08 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Sep 3, 2020 at 1:02 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 02-09-20 15:26:39, Yang Shi wrote:
> > > On Thu, Jan 16, 2020 at 2:29 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Wed 15-01-20 16:40:26, Mike Kravetz wrote:
> > > > > On 1/15/20 1:31 AM, Michal Hocko wrote:
> > > > > > On Tue 14-01-20 21:47:51, Li Xinhai wrote:
> > > > > >> On 2020-01-14 at 17:25 Michal Hocko wrote:
> > > > > >>> On Sat 11-01-20 10:18:05, Li Xinhai wrote:
> > > > > >>>> When check_pte, pfn of normal, hugetlbfs and THP page need be compared.
> > > > > >>>> The current implementation apply comparison as
> > > > > >>>> - normal 4K page: page_pfn <= pfn < page_pfn + 1
> > > > > >>>> - hugetlbfs page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR
> > > > > >>>> - THP page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR
> > > > > >>>> in pfn_in_hpage. For hugetlbfs page, it should be
> > > > > >>>> page_pfn == pfn
> > > > > >>>>
> > > > > >>>> Now, change pfn_in_hpage to pfn_is_match to highlight that comparison
> > > > > >>>> is not only for THP and explicitly compare for these cases.
> > > > > >>>
> > > > > >>> Why is this important to do. I have asked and Mike had the same feeling
> > > > > >>> that the patch is missing any real justification. Why do we need this
> > > > > >>> change? It is great that you have dropped VM_BUG_ON btw.
> > > > > >>>
> > > > > >> I think it is important to make the code clear, as said, comparing hugetlbfs page
> > > > > >> in range page_pfn <= pfn < page_pfn + HPAGE_PMD_NR is confusion.
> > > > > >
> > > > > > I am sorry but I do not really see a big confusion here. It is probably
> > > > > > a matter of taste. If others consider this an improvement then I will
> > > > > > not stand in the way but I consider the justification insuficient for
> > > > > > merging.
> > > > >
> > > > > Perhaps I am confused, but the patch does update the check done for
> > > > > hugetlb pages. IIUC, previously there was no distinction between THP
> > > > > pages and hugetlb pages in the check. It is valid to pass in a sub-
> > > > > page of a THP page, but not that of a hugetlb page.
> > > > >
> > > > > I do not believe it is possible for existing code to pass in a sub-page
> > > > > of a hugetlb page. And, no one has ever seen this as an issue. This
> > > > > is why I was questioning the need for the patch.
> > > >
> > > > Exactly and that is the reason I fail the see a point.
> > > >
> > > > > With all that said, the new code/check is better (more complete) than
> > > > > the original. It may not do anything for the current code base, but
> > > > > it 'could' catch potential errors in future code. Because of this, I
> > > > > do consider the code an improvement.
> > > >
> > > > It adds a branch for something that doesn't happen and also to a code
> > > > path which is quite internal to the MM AFAICS. That being said, if you
> > > > believe this is an improvement then I will not stand in the way. But
> > > > there are so many other places which could add checks that are not
> > > > exercised...
> > >
> > > I just saw a bad page bug on one our host with 4.14 kernel:
> > >
> > > backtrace:
> > > :BUG: Bad page map in process python3.6 pte:1036e24025 pmd:1dd743d067
> > > :page:fffff62840db8900 count:105 mapcount:-35 mapping:ffff97d432e97ad0 index:0x1
> > > :flags: 0xbfffe000001006c(referenced|uptodate|lru|active|mappedtodisk)
> > > :raw: 0bfffe000001006c ffff97d432e97ad0 0000000000000001 00000069ffffffdc
> > > :raw: dead000000000100 dead000000000200 0000000000000000 ffff97c58fc0e000
> > > :page dumped because: bad pte
> > > :page->mem_cgroup:ffff97c58fc0e000
> > > :addr:00007f2ddffcc000 vm_flags:00000075 anon_vma: (null)
> > > mapping:ffff97d432e97ad0 index:7f
> > > :file:libc-2.17.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap
> > > [xfs] readpage:xfs_vm_readpage [xfs]
> > > :do_exit+0x563/0xbb0
> > > :? vfs_write+0x162/0x1a0
> > > :do_group_exit+0x3a/0xa0
> > > :file:libc-2.17.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap
> > > [xfs] readpage:xfs_vm_readpage [xfs]
> > > :SyS_exit_group+0x10/0x10
> > > :do_syscall_64+0x60/0x110
> > > :entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > > :RIP: 0033:0x7fb2504091d9
> > > :RSP: 002b:00007ffcf035db88 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> > > :RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb2504091d9
> > > :RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > :RBP: 00007fb250706838 R08: 000000000000003c R09: 00000000000000e7
> > > :R10: ffffffffffffff70 R11: 0000000000000246 R12: 00007fb250706838
> > > :R13: 00007fb25070be80 R14: 0000000000000000 R15: 0000000000000000
> > > :CPU: 42 PID: 454567 Comm: python3.6 Tainted: G B W
> > > 4.14.98-t5.el7.twitter.x86_64 #1
> > >
> > > I just saw it once and it didn't happen on the newer kernel (maybe
> > > just not happen yet). I can't tell why the mapcount could reach -35
> > > since all page_remove_rmap() is protected by page table lock. Then I
> > > looked into pvmw code, and suspected the PTE might be changed after
> > > acquiring ptl.
> > >
> > > With the old check it was fine as long as "page_pfn <= pfn < page_pfn
> > > + 1", but for the base page case, it means we might be dec'ing
> > > mapcount for another page.
> > >
> > > Then I came across this commit. I guess this should be able to solve
> > > the problem, but unfortunately the problem is very rare so basically I
> > > can't prove this commit could solve it.
> > >
> > > If you agree my analysis, we may consider to backport this to stable tree.
> >
> > I am not sure I follow. Your page looks like a normal page cache and the
> > patch you are referring to is only clarifying hugetlb pages. I do not
> > remember details but it shouldn't have any functional effect. Are those
> > even used in your environemnt?
>
> Not only does the code touch hugetlbfs page, please see the below code snippet:
>
> + /* normal page and hugetlbfs page */
> + if (!PageTransCompound(page) || PageHuge(page))
> + return page_pfn == pfn;
>
> The potential problem per my understanding is pvmw does:
>
> 1. read pte
> 2. lock ptl
> 3. check pfn
>
> During step #1 and #2 the PTE might be changed, it is not surprising
> we typically have pte_same in page fault path to check this after
> acquiring ptl, but for pvmw path a full pte_same check might be
> unnecessary, since we just care if the pfn is intact or not.
>
> But before the patch as long as "old_pfn < new_pfn < old_pfn + 512" is
I just realized I misread the old implementation, it looks the old
implementation just did the right pfn check for base page, this commit
just made it more readable. Sorry for the confusion.
> true the check would pass for both normal base page and hugetlbfs page
> even though the new pfn is changed, that commit tightened the check.
> For the normal base page it must be "old_pfn == new_pfn".
>
> > --
> > Michal Hocko
> > SUSE Labs
prev parent reply other threads:[~2020-09-04 1:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-11 10:18 Li Xinhai
2020-01-13 22:38 ` Andrew Morton
2020-01-14 14:24 ` Li Xinhai
2020-01-14 9:25 ` Michal Hocko
2020-01-14 13:47 ` Li Xinhai
2020-01-15 9:31 ` Michal Hocko
2020-01-16 0:40 ` Mike Kravetz
2020-01-16 10:29 ` Michal Hocko
2020-09-02 22:26 ` Yang Shi
2020-09-03 8:02 ` Michal Hocko
2020-09-03 16:08 ` Yang Shi
2020-09-04 1:22 ` Yang Shi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHbLzkrJcmfgtTS_qNtRuTcotdqZb7sQ2VmvOM90yV_K1mhceQ@mail.gmail.com \
--to=shy828301@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=lixinhai.lxh@gmail.com \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox