linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>,
	Rik van Riel <riel@redhat.com>, Christoph Lameter <cl@gentwo.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Steve Capper <steve.capper@linaro.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.cz>,
	Jerome Marchand <jmarchan@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCHv5 00/28] THP refcounting redesign
Date: Fri, 15 May 2015 10:55:55 +0200	[thread overview]
Message-ID: <5555B49B.3050901@suse.cz> (raw)
In-Reply-To: <1429823043-157133-1-git-send-email-kirill.shutemov@linux.intel.com>

On 04/23/2015 11:03 PM, Kirill A. Shutemov wrote:
> Hello everybody,
>
> Here's reworked version of my patchset. All known issues were addressed.
>
> The goal of patchset is to make refcounting on THP pages cheaper with
> simpler semantics and allow the same THP compound page to be mapped with
> PMD and PTEs. This is required to get reasonable THP-pagecache
> implementation.
>
> With the new refcounting design it's much easier to protect against
> split_huge_page(): simple reference on a page will make you the deal.
> It makes gup_fast() implementation simpler and doesn't require
> special-case in futex code to handle tail THP pages.
>
> It should improve THP utilization over the system since splitting THP in
> one process doesn't necessary lead to splitting the page in all other
> processes have the page mapped.
>
> The patchset drastically lower complexity of get_page()/put_page()
> codepaths. I encourage reviewers look on this code before-and-after to
> justify time budget on reviewing this patchset.
>
> = Changelog =
>
> v5:
>    - Tested-by: Sasha Levin!a?c
>    - re-split patchset in hope to improve readability;
>    - rebased on top of page flags and ->mapping sanitizing patchset;
>    - uncharge compound_mapcount rather than mapcount for hugetlb pages
>      during removing from rmap;
>    - differentiate page_mapped() from page_mapcount() for compound pages;
>    - rework deferred_split_huge_page() to use shrinker interface;
>    - fix race in page_remove_rmap();
>    - get rid of __get_page_tail();
>    - few random bug fixes;
> v4:
>    - fix sizes reported in smaps;
>    - defines instead of enum for RMAP_{EXCLUSIVE,COMPOUND};
>    - skip THP pages on munlock_vma_pages_range(): they are never mlocked;
>    - properly handle huge zero page on FOLL_SPLIT;
>    - fix lock_page() slow path on tail pages;
>    - account page_get_anon_vma() fail to THP_SPLIT_PAGE_FAILED;
>    - fix split_huge_page() on huge page with unmapped head page;
>    - fix transfering 'write' and 'young' from pmd to ptes on split_huge_pmd;
>    - call page_remove_rmap() in unfreeze_page under ptl.
>
> = Design overview =
>
> The main reason why we can't map THP with 4k is how refcounting on THP
> designed. It built around two requirements:
>
>    - split of huge page should never fail;
>    - we can't change interface of get_user_page();
>
> To be able to split huge page at any point we have to track which tail
> page was pinned. It leads to tricky and expensive get_page() on tail pages
> and also occupy tail_page->_mapcount.
>
> Most split_huge_page*() users want PMD to be split into table of PTEs and
> don't care whether compound page is going to be split or not.
>
> The plan is:
>
>   - allow split_huge_page() to fail if the page is pinned. It's trivial to
>     split non-pinned page and it doesn't require tail page refcounting, so
>     tail_page->_mapcount is free to be reused.
>
>   - introduce new routine -- split_huge_pmd() -- to split PMD into table of
>     PTEs. It splits only one PMD, not touching other PMDs the page is
>     mapped with or underlying compound page. Unlike new split_huge_page(),
>     split_huge_pmd() never fails.
>
> Fortunately, we have only few places where split_huge_page() is needed:
> swap out, memory failure, migration, KSM. And all of them can handle
> split_huge_page() fail.
>
> In new scheme we use page->_mapcount is used to account how many time
> the page is mapped with PTEs. We have separate compound_mapcount() to
> count mappings with PMD. page_mapcount() returns sum of PTE and PMD
> mappings of the page.

It would be very beneficial to describe the scheme in full, both before 
in after. The latter goes also for the Documentation patch, where you 
fixed what wasn't true anymore, but I think the picture wasn't complete 
neither before, nor is it now. There's the lwn article [1] which helps a 
lot, but we shouldn't rely on that exclusively.

So the full scheme should include at least:
- where were/are pins and mapcounts stored
- what exactly get_page()/put_page() did/does now
- etc.

[1] https://lwn.net/Articles/619738/





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2015-05-15  8:55 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-23 21:03 Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 01/28] mm, proc: adjust PSS calculation Kirill A. Shutemov
2015-04-29 15:49   ` Jerome Marchand
2015-05-14 14:12   ` Vlastimil Babka
2015-05-15 10:56     ` Kirill A. Shutemov
2015-05-15 11:33       ` Vlastimil Babka
2015-05-15 11:43         ` Kirill A. Shutemov
2015-05-15 12:37           ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 02/28] rmap: add argument to charge compound page Kirill A. Shutemov
2015-04-29 15:53   ` Jerome Marchand
2015-04-30 11:52     ` Kirill A. Shutemov
2015-05-14 16:07   ` Vlastimil Babka
2015-05-15 11:14     ` Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 03/28] memcg: adjust to support new THP refcounting Kirill A. Shutemov
2015-05-15  7:44   ` Vlastimil Babka
2015-05-15 11:18     ` Kirill A. Shutemov
2015-05-15 14:57       ` Dave Hansen
2015-05-16 23:17         ` Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 04/28] mm, thp: adjust conditions when we can reuse the page on WP fault Kirill A. Shutemov
2015-04-29 15:54   ` Jerome Marchand
2015-05-15  9:15   ` Vlastimil Babka
2015-05-15 11:21     ` Kirill A. Shutemov
2015-05-15 11:35       ` Vlastimil Babka
2015-05-15 13:29         ` Kirill A. Shutemov
2015-05-19 13:00           ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 05/28] mm: adjust FOLL_SPLIT for new refcounting Kirill A. Shutemov
2015-05-15 11:05   ` Vlastimil Babka
2015-05-15 11:36     ` Kirill A. Shutemov
2015-05-15 12:01       ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 06/28] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton Kirill A. Shutemov
2015-04-29 15:56   ` Jerome Marchand
2015-05-15 12:46   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 07/28] thp, mlock: do not allow huge pages in mlocked area Kirill A. Shutemov
2015-04-29 15:58   ` Jerome Marchand
2015-05-15 12:56   ` Vlastimil Babka
2015-05-15 13:41     ` Kirill A. Shutemov
2015-05-19 14:37       ` Vlastimil Babka
2015-05-20 12:10         ` Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 08/28] khugepaged: ignore pmd tables with THP mapped with ptes Kirill A. Shutemov
2015-04-29 15:59   ` Jerome Marchand
2015-05-15 12:59   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 09/28] thp: rename split_huge_page_pmd() to split_huge_pmd() Kirill A. Shutemov
2015-04-29 16:00   ` Jerome Marchand
2015-05-15 13:08   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 10/28] mm, vmstats: new THP splitting event Kirill A. Shutemov
2015-04-29 16:02   ` Jerome Marchand
2015-05-15 13:10   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 11/28] mm: temporally mark THP broken Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 12/28] thp: drop all split_huge_page()-related code Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 13/28] mm: drop tail page refcounting Kirill A. Shutemov
2015-05-18  9:48   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 14/28] futex, thp: remove special case for THP in get_futex_key Kirill A. Shutemov
2015-05-18 11:49   ` Vlastimil Babka
2015-05-18 12:13     ` Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 15/28] ksm: prepare to new THP semantics Kirill A. Shutemov
2015-05-18 12:41   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 16/28] mm, thp: remove compound_lock Kirill A. Shutemov
2015-04-29 16:11   ` Jerome Marchand
2015-04-30 11:58     ` Kirill A. Shutemov
2015-05-18 12:57   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 17/28] mm, thp: remove infrastructure for handling splitting PMDs Kirill A. Shutemov
2015-04-29 16:14   ` Jerome Marchand
2015-04-30 12:03     ` Kirill A. Shutemov
2015-05-18 13:40   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 18/28] x86, " Kirill A. Shutemov
2015-04-29  9:13   ` Aneesh Kumar K.V
2015-04-23 21:03 ` [PATCHv5 19/28] mm: store mapcount for compound page separately Kirill A. Shutemov
2015-05-18 14:32   ` Vlastimil Babka
2015-05-19  3:55     ` Kirill A. Shutemov
2015-05-19  9:01       ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 20/28] mm: differentiate page_mapped() from page_mapcount() for compound pages Kirill A. Shutemov
2015-04-29 16:20   ` Jerome Marchand
2015-04-30 12:06     ` Kirill A. Shutemov
2015-05-18 15:35   ` Vlastimil Babka
2015-05-19  4:00     ` Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 21/28] mm, numa: skip PTE-mapped THP on numa fault Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 22/28] thp: implement split_huge_pmd() Kirill A. Shutemov
2015-05-19  8:25   ` Vlastimil Babka
2015-05-20 14:38     ` Kirill A. Shutemov
2015-04-23 21:03 ` [PATCHv5 23/28] thp: add option to setup migration entiries during PMD split Kirill A. Shutemov
2015-05-19 13:55   ` Vlastimil Babka
2015-04-23 21:03 ` [PATCHv5 24/28] thp, mm: split_huge_page(): caller need to lock page Kirill A. Shutemov
2015-05-19 13:55   ` Vlastimil Babka
2015-04-23 21:04 ` [PATCHv5 25/28] thp: reintroduce split_huge_page() Kirill A. Shutemov
2015-05-19 12:43   ` Vlastimil Babka
2015-04-23 21:04 ` [PATCHv5 26/28] thp: introduce deferred_split_huge_page() Kirill A. Shutemov
2015-05-19 13:54   ` Vlastimil Babka
2015-04-23 21:04 ` [PATCHv5 27/28] mm: re-enable THP Kirill A. Shutemov
2015-04-23 21:04 ` [PATCHv5 28/28] thp: update documentation Kirill A. Shutemov
2015-04-27 23:03 ` [PATCHv5 00/28] THP refcounting redesign Andrew Morton
2015-04-27 23:33   ` Kirill A. Shutemov
2015-04-30  8:25 ` [RFC PATCH 0/3] Remove _PAGE_SPLITTING from ppc64 Aneesh Kumar K.V
2015-04-30  8:25   ` [RFC PATCH 1/3] mm/thp: Use pmdp_splitting_flush_notify to clear pmd on splitting Aneesh Kumar K.V
2015-04-30 13:30     ` Kirill A. Shutemov
2015-04-30 15:59       ` Aneesh Kumar K.V
2015-04-30 16:47         ` Aneesh Kumar K.V
2015-04-30  8:25   ` [RFC PATCH 2/3] powerpc/thp: Remove _PAGE_SPLITTING and related code Aneesh Kumar K.V
2015-04-30  8:25   ` [RFC PATCH 3/3] mm/thp: Add new function to clear pmd on collapse Aneesh Kumar K.V
2015-05-15  8:55 ` Vlastimil Babka [this message]
2015-05-15 13:31   ` [PATCHv5 00/28] THP refcounting redesign Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5555B49B.3050901@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cl@gentwo.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jmarchan@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=steve.capper@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox