From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <matthew.wilcox@oracle.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Kernel Team <Kernel-team@fb.com>,
William Kucharski <william.kucharski@oracle.com>,
"srikar@linux.vnet.ibm.com" <srikar@linux.vnet.ibm.com>
Subject: Re: [PATCH v12 5/6] khugepaged: enable collapse pmd for pte-mapped THP
Date: Tue, 13 Aug 2019 18:05:39 +0300 [thread overview]
Message-ID: <20190813150539.ciai477wk2cratvc@box> (raw)
In-Reply-To: <20190813140552.GB6971@redhat.com>
On Tue, Aug 13, 2019 at 04:05:53PM +0200, Oleg Nesterov wrote:
> On 08/13, Oleg Nesterov wrote:
> >
> > On 08/12, Kirill A. Shutemov wrote:
> > >
> > > On Mon, Aug 12, 2019 at 03:22:58PM +0200, Oleg Nesterov wrote:
> > > > On 08/12, Kirill A. Shutemov wrote:
> > > > >
> > > > > On Fri, Aug 09, 2019 at 06:01:18PM +0000, Song Liu wrote:
> > > > > > + if (pte_none(*pte) || !pte_present(*pte))
> > > > > > + continue;
> > > > >
> > > > > You don't need to check both. Present is never none.
> > > >
> > > > Agreed.
> > > >
> > > > Kirill, while you are here, shouldn't retract_page_tables() check
> > > > vma->anon_vma (and probably do mm_find_pmd) under vm_mm->mmap_sem?
> > > >
> > > > Can't it race with, say, do_cow_fault?
> > >
> > > vma->anon_vma can race, but it doesn't matter. False-negative is fine.
> > > It's attempt to avoid taking mmap_sem where it can be not productive.
> >
> > I guess I misunderstood the purpose of this check or your answer...
> >
> > Let me reword my question. Why can retract_page_tables() safely do
> > pmdp_collapse_flush(vma) without additional checks similar to what
> > collapse_pte_mapped_thp() does?
> >
> > I thought that retract_page_tables() checks vma->anon_vma to ensure that
> > this vma doesn't have a cow'ed PageAnon() page. And I still can't understand
> > why can't it race with __handle_mm_fault() paths.
vma->anon_vma check is a cheap way to exclude MAP_PRIVATE mappings that
got written from userspace. My thinking was that these VMAs are not worth
investing down_write(mmap_sem) as PMD-mapping is likely to be split later.
(It's totally made up reasoning, I don't have numbers to back it up).
vma->anon_vma can be set up after the check but before taking mmap_sem.
But page lock would prevent establishing any new ptes of the page, so we
are safe.
An alternative would be drop the check, but check that page table is clear
before calling pmdp_collapse_flush() under ptl. It has higher chance to
recover THP for the VMA, but has higher cost too.
I don't know which way is better, so I've chosen which is easier to
implement.
> >
> > Suppose that shmem_file was mmaped with PROT_READ|WRITE, MAP_PRIVATE.
> > To simplify, suppose that a non-THP page was already faulted in,
> > pte_present() == T.
> >
> > Userspace writes to this page.
> >
> > Why __handle_mm_fault()->handle_pte_fault()->do_wp_page()->wp_page_copy()
> > can not cow this page and update pte after the vma->anon_vma chech and
> > before down_write_trylock(mmap_sem) ?
>
> OK, probably this is impossible, collapse_shmem() does unmap_mapping_pages(),
> so handle_pte_fault() will call shmem_fault() which iiuc should block in
> find_lock_entry() because new_page is locked, and thus down_write_trylock()
> can't succeed.
You've got it right.
> Nevermind, I am sure I missed something. Perhaps you can update the comments
> to make this more clear.
Let me see first that my explanation makes sense :P
--
Kirill A. Shutemov
next prev parent reply other threads:[~2019-08-13 15:05 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-07 23:37 [PATCH v12 0/6] THP aware uprobe Song Liu
2019-08-07 23:37 ` [PATCH v12 1/6] mm: move memcmp_pages() and pages_identical() Song Liu
2019-08-07 23:37 ` [PATCH v12 2/6] uprobe: use original page when all uprobes are removed Song Liu
2019-08-07 23:37 ` [PATCH v12 3/6] mm, thp: introduce FOLL_SPLIT_PMD Song Liu
2019-08-08 16:37 ` Oleg Nesterov
2019-08-08 17:16 ` Song Liu
2019-08-09 16:35 ` Oleg Nesterov
2019-08-09 16:50 ` Song Liu
2019-08-07 23:37 ` [PATCH v12 4/6] uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT Song Liu
2019-08-07 23:37 ` [PATCH v12 5/6] khugepaged: enable collapse pmd for pte-mapped THP Song Liu
2019-08-08 16:33 ` Oleg Nesterov
2019-08-08 17:05 ` Song Liu
2019-08-09 15:24 ` Oleg Nesterov
2019-08-09 16:30 ` Song Liu
2019-08-09 18:01 ` Song Liu
2019-08-12 12:11 ` Kirill A. Shutemov
2019-08-12 13:22 ` Oleg Nesterov
2019-08-12 14:40 ` Kirill A. Shutemov
2019-08-12 21:04 ` Song Liu
2019-08-13 14:44 ` Song Liu
2019-08-15 10:16 ` Oleg Nesterov
2019-08-15 16:27 ` Song Liu
2019-08-13 13:30 ` Oleg Nesterov
2019-08-13 14:05 ` Oleg Nesterov
2019-08-13 15:05 ` Kirill A. Shutemov [this message]
2019-08-13 16:24 ` Oleg Nesterov
2019-08-16 14:54 ` Kirill A. Shutemov
2019-08-12 13:06 ` Oleg Nesterov
2019-08-12 14:36 ` Song Liu
2019-08-07 23:37 ` [PATCH v12 6/6] uprobe: collapse THP pmd after removing all uprobes Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190813150539.ciai477wk2cratvc@box \
--to=kirill@shutemov.name \
--cc=Kernel-team@fb.com \
--cc=akpm@linux-foundation.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.wilcox@oracle.com \
--cc=oleg@redhat.com \
--cc=songliubraving@fb.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=william.kucharski@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox