From: "Shi, Yang" <yang.shi@linaro.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Hugh Dickins <hughd@google.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
Christoph Lameter <cl@gentwo.org>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Jerome Marchand <jmarchan@redhat.com>,
Sasha Levin <sasha.levin@oracle.com>,
Andres Lagar-Cavilla <andreslc@google.com>,
Ning Qu <quning@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages
Date: Mon, 18 Apr 2016 15:55:44 -0700 [thread overview]
Message-ID: <571565F0.9070203@linaro.org> (raw)
In-Reply-To: <1460766240-84565-1-git-send-email-kirill.shutemov@linux.intel.com>
Hi Kirill,
Finally, I got some time to look into and try yours and Hugh's patches,
got two problems.
1. A quick boot up test on my ARM64 machine with your v7 tree shows some
unexpected error:
systemd-journald[285]: Failed to save stream data
/run/systemd/journal/streams/8:16863: No space left on device
systemd-journald[285]: Failed to save stream data
/run/systemd/journal/streams/8:16865: No space left on device
Starting DNS forwarder and DHCP server.systemd-journald[285]:
Failed to save stream data /run/systemd/journal/streams/8:16867: No
space left on device
..
systemd-journald[285]: Failed to save stream data
/run/systemd/journal/streams/8:16869: No space left on device
Starting Postfix Mail Transport Agent...
systemd-journald[285]: Failed to save stream data
/run/systemd/journal/streams/8:16871: No space left on device
Starting Berkeley Internet Name Domain (DNS)...
Starting Wait for Network to be Configured...
systemd-journald[285]: Failed to save stream data
/run/systemd/journal/streams/8:2422: No space left on device
[ OK ] Started /etc/rc.local Compatibility.
[FAILED] Failed to start DNS forwarder and DHCP server.
See 'systemctl status dnsmasq.service' for details.
systemd-journald[285]: Failed to save stream data
/run/systemd/journal/streams/8:2425: No space left on device
[ OK ] Started Serial Getty on ttyS1.
[ OK ] Started Serial Getty on ttyS0.
[ OK ] Started Getty on tty1.
systemd-journald[285]: Failed to save stream data
/run/systemd/journal/streams/8:2433: No space left on device
[FAILED] Failed to start Berkeley Internet Name Domain (DNS).
See 'systemctl status named.service' for details.
The /run dir is mounted as tmpfs.
x86 boot doesn't get such error. And, Hugh's patches don't have such
problem.
2. I ran my THP test (generated a program with 4MB text section) on both
x86-64 and ARM64 with yours and Hugh's patches (linux-next tree), I got
the program execution time reduced by ~12% on x86-64, it looks very
impressive.
But, on ARM64, there is just ~3% change, and sometimes huge tmpfs may
show even worse data than non-hugepage.
Both yours and Hugh's patches has the same behavior.
Any idea?
Thanks,
Yang
On 4/15/2016 5:23 PM, Kirill A. Shutemov wrote:
> This is probably the last update before the mm summit. Main forcus is on
> khugepaged stability.
>
> khugepaged is in more reasonable shape now. I missed quite a few corner
> cases on first try. I run this version via LTP, trinity and syzkaller
> without crashes so far.
>
> The patchset is on top of v4.6-rc3 plus Hugh's "easy preliminaries to
> THPagecache" and Ebru's khugepaged swapin patches form -mm tree.
>
> Git tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugetmpfs/v7
>
> == Changelog ==
>
> v7:
> - khugepaged updates:
> + fix page leak/page cache corruption on collapse fail;
> + filter out VMAs not suitable for huge pages due misaligned vm_pgoff;
> + fix build without CONFIG_SHMEM;
> + drop few over-protective checks;
> - fix bogus VM_BUG_ON() in __delete_from_page_cache();
>
> v6:
> - experimental collapse support;
> - fix swapout mapped huge pages;
> - fix page leak in faularound code;
> - fix exessive huge page allocation with huge=within_size;
> - rename VM_NO_THP to VM_NO_KHUGEPAGED;
> - fix condition in hugepage_madvise();
> - accounting reworked again;
>
> v5:
> - add FileHugeMapped to /proc/PID/smaps;
> - make FileHugeMapped in meminfo aligned with other fields;
> - Documentation/vm/transhuge.txt updated;
>
> v4:
> - first four patch were applied to -mm tree;
> - drop pages beyond i_size on split_huge_pages;
> - few small random bugfixes;
>
> v3:
> - huge= mountoption now can have values always, within_size, advice and
> never;
> - sysctl handle is replaced with sysfs knob;
> - MADV_HUGEPAGE/MADV_NOHUGEPAGE is now respected on page allocation via
> page fault;
> - mlock() handling had been fixed;
> - bunch of smaller bugfixes and cleanups.
>
> == Design overview ==
>
> Huge pages are allocated by shmem when it's allowed (by mount option) and
> there's no entries for the range in radix-tree. Huge page is represented by
> HPAGE_PMD_NR entries in radix-tree.
>
> MM core maps a page with PMD if ->fault() returns huge page and the VMA is
> suitable for huge pages (size, alignment). There's no need into two
> requests to file system: filesystem returns huge page if it can,
> graceful fallback to small pages otherwise.
>
> As with DAX, split_huge_pmd() is implemented by unmapping the PMD: we can
> re-fault the page with PTEs later.
>
> Basic scheme for split_huge_page() is the same as for anon-THP.
> Few differences:
>
> - File pages are on radix-tree, so we have head->_count offset by
> HPAGE_PMD_NR. The count got distributed to small pages during split.
>
> - mapping->tree_lock prevents non-lockless access to pages under split
> over radix-tree;
>
> - Lockless access is prevented by setting the head->_count to 0 during
> split, so get_page_unless_zero() would fail;
>
> - After split, some pages can be beyond i_size. We drop them from
> radix-tree.
>
> - We don't setup migration entries. Just unmap pages. It helps
> handling cases when i_size is in the middle of the page: no need
> handle unmap pages beyond i_size manually.
>
> COW mapping handled on PTE-level. It's not clear how beneficial would be
> allocation of huge pages on COW faults. And it would require some code to
> make them work.
>
> I think at some point we can consider teaching khugepaged to collapse
> pages in COW mappings, but allocating huge on fault is probably overkill.
>
> As with anon THP, we mlock file huge page only if it mapped with PMD.
> PTE-mapped THPs are never mlocked. This way we can avoid all sorts of
> scenarios when we can leak mlocked page.
>
> As with anon THP, we split huge page on swap out.
>
> Truncate and punch hole that only cover part of THP range is implemented
> by zero out this part of THP.
>
> This have visible effect on fallocate(FALLOC_FL_PUNCH_HOLE) behaviour.
> As we don't really create hole in this case, lseek(SEEK_HOLE) may have
> inconsistent results depending what pages happened to be allocated.
> I don't think this will be a problem.
>
> == Patchset overview ==
>
> [01/29]
> Update documentation on THP vs. mlock. I've posted it separately
> before. It can go in.
>
> [02-04/29]
> Rework fault path and rmap to handle file pmd. Unlike DAX with
> vm_ops->pmd_fault, we don't need to ask filesystem twice -- first
> for huge page and then for small. If ->fault happened to return
> huge page and VMA is suitable for mapping it as huge, we would
> do so.
> [05/29]
> Add support for huge file pages in rmap;
>
> [06-15/29]
> Various preparation of THP core for file pages.
>
> [16-20/29]
> Various preparation of MM core for file pages.
>
> [21-24/29]
> And finally, bring huge pages into tmpfs/shmem.
>
> [25/29]
> Wire up madvise() existing hints for file THP.
> We can implement fadvise() later.
>
> [26/29]
> Documentation update.
>
> [27-29/29]
> Extend khugepaged to support shmem/tmpfs.
> Hugh Dickins (1):
> shmem: get_unmapped_area align huge page
>
> Kirill A. Shutemov (28):
> thp, mlock: update unevictable-lru.txt
> mm: do not pass mm_struct into handle_mm_fault
> mm: introduce fault_env
> mm: postpone page table allocation until we have page to map
> rmap: support file thp
> mm: introduce do_set_pmd()
> thp, vmstats: add counters for huge file pages
> thp: support file pages in zap_huge_pmd()
> thp: handle file pages in split_huge_pmd()
> thp: handle file COW faults
> thp: skip file huge pmd on copy_huge_pmd()
> thp: prepare change_huge_pmd() for file thp
> thp: run vma_adjust_trans_huge() outside i_mmap_rwsem
> thp: file pages support for split_huge_page()
> thp, mlock: do not mlock PTE-mapped file huge pages
> vmscan: split file huge pages before paging them out
> page-flags: relax policy for PG_mappedtodisk and PG_reclaim
> radix-tree: implement radix_tree_maybe_preload_order()
> filemap: prepare find and delete operations for huge pages
> truncate: handle file thp
> mm, rmap: account shmem thp pages
> shmem: prepare huge= mount option and sysfs knob
> shmem: add huge pages support
> shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
> thp: update Documentation/vm/transhuge.txt
> thp: extract khugepaged from mm/huge_memory.c
> khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
> khugepaged: add support of collapse for tmpfs/shmem pages
>
> Documentation/filesystems/Locking | 10 +-
> Documentation/vm/transhuge.txt | 130 ++-
> Documentation/vm/unevictable-lru.txt | 21 +
> arch/alpha/mm/fault.c | 2 +-
> arch/arc/mm/fault.c | 2 +-
> arch/arm/mm/fault.c | 2 +-
> arch/arm64/mm/fault.c | 2 +-
> arch/avr32/mm/fault.c | 2 +-
> arch/cris/mm/fault.c | 2 +-
> arch/frv/mm/fault.c | 2 +-
> arch/hexagon/mm/vm_fault.c | 2 +-
> arch/ia64/mm/fault.c | 2 +-
> arch/m32r/mm/fault.c | 2 +-
> arch/m68k/mm/fault.c | 2 +-
> arch/metag/mm/fault.c | 2 +-
> arch/microblaze/mm/fault.c | 2 +-
> arch/mips/mm/fault.c | 2 +-
> arch/mn10300/mm/fault.c | 2 +-
> arch/nios2/mm/fault.c | 2 +-
> arch/openrisc/mm/fault.c | 2 +-
> arch/parisc/mm/fault.c | 2 +-
> arch/powerpc/mm/copro_fault.c | 2 +-
> arch/powerpc/mm/fault.c | 2 +-
> arch/s390/mm/fault.c | 2 +-
> arch/score/mm/fault.c | 2 +-
> arch/sh/mm/fault.c | 2 +-
> arch/sparc/mm/fault_32.c | 4 +-
> arch/sparc/mm/fault_64.c | 2 +-
> arch/tile/mm/fault.c | 2 +-
> arch/um/kernel/trap.c | 2 +-
> arch/unicore32/mm/fault.c | 2 +-
> arch/x86/mm/fault.c | 2 +-
> arch/xtensa/mm/fault.c | 2 +-
> drivers/base/node.c | 13 +-
> drivers/char/mem.c | 24 +
> drivers/iommu/amd_iommu_v2.c | 3 +-
> drivers/iommu/intel-svm.c | 2 +-
> fs/proc/meminfo.c | 7 +-
> fs/proc/task_mmu.c | 10 +-
> fs/userfaultfd.c | 22 +-
> include/linux/huge_mm.h | 36 +-
> include/linux/khugepaged.h | 6 +
> include/linux/mm.h | 51 +-
> include/linux/mmzone.h | 4 +-
> include/linux/page-flags.h | 19 +-
> include/linux/radix-tree.h | 1 +
> include/linux/rmap.h | 2 +-
> include/linux/shmem_fs.h | 29 +-
> include/linux/userfaultfd_k.h | 8 +-
> include/linux/vm_event_item.h | 7 +
> include/trace/events/huge_memory.h | 3 +-
> ipc/shm.c | 6 +-
> lib/radix-tree.c | 68 +-
> mm/Makefile | 2 +-
> mm/filemap.c | 226 ++--
> mm/gup.c | 7 +-
> mm/huge_memory.c | 2028 ++++++----------------------------
> mm/internal.h | 4 +-
> mm/khugepaged.c | 1772 +++++++++++++++++++++++++++++
> mm/ksm.c | 5 +-
> mm/memory.c | 859 +++++++-------
> mm/mempolicy.c | 4 +-
> mm/migrate.c | 5 +-
> mm/mmap.c | 26 +-
> mm/nommu.c | 3 +-
> mm/page-writeback.c | 1 +
> mm/page_alloc.c | 21 +
> mm/rmap.c | 78 +-
> mm/shmem.c | 689 ++++++++++--
> mm/swap.c | 2 +
> mm/truncate.c | 22 +-
> mm/util.c | 6 +
> mm/vmscan.c | 6 +
> mm/vmstat.c | 4 +
> 74 files changed, 3919 insertions(+), 2395 deletions(-)
> create mode 100644 mm/khugepaged.c
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-04-18 22:55 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-16 0:23 Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 01/29] thp, mlock: update unevictable-lru.txt Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 02/29] mm: do not pass mm_struct into handle_mm_fault Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 03/29] mm: introduce fault_env Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 04/29] mm: postpone page table allocation until we have page to map Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 05/29] rmap: support file thp Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 06/29] mm: introduce do_set_pmd() Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 07/29] thp, vmstats: add counters for huge file pages Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 08/29] thp: support file pages in zap_huge_pmd() Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 09/29] thp: handle file pages in split_huge_pmd() Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 10/29] thp: handle file COW faults Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 11/29] thp: skip file huge pmd on copy_huge_pmd() Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 12/29] thp: prepare change_huge_pmd() for file thp Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 13/29] thp: run vma_adjust_trans_huge() outside i_mmap_rwsem Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 14/29] thp: file pages support for split_huge_page() Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 15/29] thp, mlock: do not mlock PTE-mapped file huge pages Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 16/29] vmscan: split file huge pages before paging them out Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 17/29] page-flags: relax policy for PG_mappedtodisk and PG_reclaim Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 18/29] radix-tree: implement radix_tree_maybe_preload_order() Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 19/29] filemap: prepare find and delete operations for huge pages Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 20/29] truncate: handle file thp Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 21/29] mm, rmap: account shmem thp pages Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 22/29] shmem: prepare huge= mount option and sysfs knob Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 23/29] shmem: get_unmapped_area align huge page Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 24/29] shmem: add huge pages support Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 25/29] shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 26/29] thp: update Documentation/vm/transhuge.txt Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 27/29] thp: extract khugepaged from mm/huge_memory.c Kirill A. Shutemov
2016-04-16 0:23 ` [PATCHv7 28/29] khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() Kirill A. Shutemov
2016-04-16 0:24 ` [PATCHv7 29/29] khugepaged: add support of collapse for tmpfs/shmem pages Kirill A. Shutemov
2016-04-18 22:55 ` Shi, Yang [this message]
2016-04-19 14:33 ` [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages Jerome Marchand
2016-04-19 16:11 ` Shi, Yang
2016-04-19 16:50 ` Andrea Arcangeli
2016-04-19 17:07 ` Andres Lagar-Cavilla
2016-04-24 5:46 ` Wincy Van
2016-04-25 13:30 ` Andres Lagar-Cavilla
2016-04-26 14:02 ` Wincy Van
2016-04-27 15:48 ` Andrea Arcangeli
2016-04-19 23:48 ` Shi, Yang
2016-04-20 8:31 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=571565F0.9070203@linaro.org \
--to=yang.shi@linaro.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andreslc@google.com \
--cc=cl@gentwo.org \
--cc=dave.hansen@intel.com \
--cc=hughd@google.com \
--cc=jmarchan@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=quning@gmail.com \
--cc=sasha.levin@oracle.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox