linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Shi, Yang" <yang.shi@linaro.org>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Christoph Lameter <cl@gentwo.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Jerome Marchand <jmarchan@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Ning Qu <quning@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages
Date: Mon, 18 Apr 2016 15:55:44 -0700	[thread overview]
Message-ID: <571565F0.9070203@linaro.org> (raw)
In-Reply-To: <1460766240-84565-1-git-send-email-kirill.shutemov@linux.intel.com>

Hi Kirill,

Finally, I got some time to look into and try yours and Hugh's patches, 
got two problems.

1. A quick boot up test on my ARM64 machine with your v7 tree shows some 
unexpected error:

systemd-journald[285]: Failed to save stream data 
/run/systemd/journal/streams/8:16863: No space left on device
systemd-journald[285]: Failed to save stream data 
/run/systemd/journal/streams/8:16865: No space left on device
          Starting DNS forwarder and DHCP server.systemd-journald[285]: 
Failed to save stream data /run/systemd/journal/streams/8:16867: No 
space left on device
..
systemd-journald[285]: Failed to save stream data 
/run/systemd/journal/streams/8:16869: No space left on device
          Starting Postfix Mail Transport Agent...
systemd-journald[285]: Failed to save stream data 
/run/systemd/journal/streams/8:16871: No space left on device
          Starting Berkeley Internet Name Domain (DNS)...
          Starting Wait for Network to be Configured...
systemd-journald[285]: Failed to save stream data 
/run/systemd/journal/streams/8:2422: No space left on device
[  OK  ] Started /etc/rc.local Compatibility.
[FAILED] Failed to start DNS forwarder and DHCP server.
See 'systemctl status dnsmasq.service' for details.
systemd-journald[285]: Failed to save stream data 
/run/systemd/journal/streams/8:2425: No space left on device
[  OK  ] Started Serial Getty on ttyS1.
[  OK  ] Started Serial Getty on ttyS0.
[  OK  ] Started Getty on tty1.
systemd-journald[285]: Failed to save stream data 
/run/systemd/journal/streams/8:2433: No space left on device
[FAILED] Failed to start Berkeley Internet Name Domain (DNS).
See 'systemctl status named.service' for details.


The /run dir is mounted as tmpfs.

x86 boot doesn't get such error. And, Hugh's patches don't have such 
problem.

2. I ran my THP test (generated a program with 4MB text section) on both 
x86-64 and ARM64 with yours and Hugh's patches (linux-next tree), I got 
the program execution time reduced by ~12% on x86-64, it looks very 
impressive.

But, on ARM64, there is just ~3% change, and sometimes huge tmpfs may 
show even worse data than non-hugepage.

Both yours and Hugh's patches has the same behavior.

Any idea?

Thanks,
Yang


On 4/15/2016 5:23 PM, Kirill A. Shutemov wrote:
> This is probably the last update before the mm summit. Main forcus is on
> khugepaged stability.
>
> khugepaged is in more reasonable shape now. I missed quite a few corner
> cases on first try. I run this version via LTP, trinity and syzkaller
> without crashes so far.
>
> The patchset is on top of v4.6-rc3 plus Hugh's "easy preliminaries to
> THPagecache" and Ebru's khugepaged swapin patches form -mm tree.
>
> Git tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugetmpfs/v7
>
> == Changelog ==
>
> v7:
>    - khugepaged updates:
>      + fix page leak/page cache corruption on collapse fail;
>      + filter out VMAs not suitable for huge pages due misaligned vm_pgoff;
>      + fix build without CONFIG_SHMEM;
>      + drop few over-protective checks;
>    - fix bogus VM_BUG_ON() in __delete_from_page_cache();
>
> v6:
>    - experimental collapse support;
>    - fix swapout mapped huge pages;
>    - fix page leak in faularound code;
>    - fix exessive huge page allocation with huge=within_size;
>    - rename VM_NO_THP to VM_NO_KHUGEPAGED;
>    - fix condition in hugepage_madvise();
>    - accounting reworked again;
>
> v5:
>    - add FileHugeMapped to /proc/PID/smaps;
>    - make FileHugeMapped in meminfo aligned with other fields;
>    - Documentation/vm/transhuge.txt updated;
>
> v4:
>    - first four patch were applied to -mm tree;
>    - drop pages beyond i_size on split_huge_pages;
>    - few small random bugfixes;
>
> v3:
>    - huge= mountoption now can have values always, within_size, advice and
>      never;
>    - sysctl handle is replaced with sysfs knob;
>    - MADV_HUGEPAGE/MADV_NOHUGEPAGE is now respected on page allocation via
>      page fault;
>    - mlock() handling had been fixed;
>    - bunch of smaller bugfixes and cleanups.
>
> == Design overview ==
>
> Huge pages are allocated by shmem when it's allowed (by mount option) and
> there's no entries for the range in radix-tree. Huge page is represented by
> HPAGE_PMD_NR entries in radix-tree.
>
> MM core maps a page with PMD if ->fault() returns huge page and the VMA is
> suitable for huge pages (size, alignment). There's no need into two
> requests to file system: filesystem returns huge page if it can,
> graceful fallback to small pages otherwise.
>
> As with DAX, split_huge_pmd() is implemented by unmapping the PMD: we can
> re-fault the page with PTEs later.
>
> Basic scheme for split_huge_page() is the same as for anon-THP.
> Few differences:
>
>    - File pages are on radix-tree, so we have head->_count offset by
>      HPAGE_PMD_NR. The count got distributed to small pages during split.
>
>    - mapping->tree_lock prevents non-lockless access to pages under split
>      over radix-tree;
>
>    - Lockless access is prevented by setting the head->_count to 0 during
>      split, so get_page_unless_zero() would fail;
>
>    - After split, some pages can be beyond i_size. We drop them from
>      radix-tree.
>
>    - We don't setup migration entries. Just unmap pages. It helps
>      handling cases when i_size is in the middle of the page: no need
>      handle unmap pages beyond i_size manually.
>
> COW mapping handled on PTE-level. It's not clear how beneficial would be
> allocation of huge pages on COW faults. And it would require some code to
> make them work.
>
> I think at some point we can consider teaching khugepaged to collapse
> pages in COW mappings, but allocating huge on fault is probably overkill.
>
> As with anon THP, we mlock file huge page only if it mapped with PMD.
> PTE-mapped THPs are never mlocked. This way we can avoid all sorts of
> scenarios when we can leak mlocked page.
>
> As with anon THP, we split huge page on swap out.
>
> Truncate and punch hole that only cover part of THP range is implemented
> by zero out this part of THP.
>
> This have visible effect on fallocate(FALLOC_FL_PUNCH_HOLE) behaviour.
> As we don't really create hole in this case, lseek(SEEK_HOLE) may have
> inconsistent results depending what pages happened to be allocated.
> I don't think this will be a problem.
>
> == Patchset overview ==
>
> [01/29]
> 	Update documentation on THP vs. mlock. I've posted it separately
> 	before. It can go in.
>
> [02-04/29]
>          Rework fault path and rmap to handle file pmd. Unlike DAX with
>          vm_ops->pmd_fault, we don't need to ask filesystem twice -- first
>          for huge page and then for small. If ->fault happened to return
>          huge page and VMA is suitable for mapping it as huge, we would
> 	do so.
> [05/29]
> 	Add support for huge file pages in rmap;
>
> [06-15/29]
>          Various preparation of THP core for file pages.
>
> [16-20/29]
>          Various preparation of MM core for file pages.
>
> [21-24/29]
>          And finally, bring huge pages into tmpfs/shmem.
>
> [25/29]
> 	Wire up madvise() existing hints for file THP.
> 	We can implement fadvise() later.
>
> [26/29]
> 	Documentation update.
>
> [27-29/29]
> 	Extend khugepaged to support shmem/tmpfs.
> Hugh Dickins (1):
>    shmem: get_unmapped_area align huge page
>
> Kirill A. Shutemov (28):
>    thp, mlock: update unevictable-lru.txt
>    mm: do not pass mm_struct into handle_mm_fault
>    mm: introduce fault_env
>    mm: postpone page table allocation until we have page to map
>    rmap: support file thp
>    mm: introduce do_set_pmd()
>    thp, vmstats: add counters for huge file pages
>    thp: support file pages in zap_huge_pmd()
>    thp: handle file pages in split_huge_pmd()
>    thp: handle file COW faults
>    thp: skip file huge pmd on copy_huge_pmd()
>    thp: prepare change_huge_pmd() for file thp
>    thp: run vma_adjust_trans_huge() outside i_mmap_rwsem
>    thp: file pages support for split_huge_page()
>    thp, mlock: do not mlock PTE-mapped file huge pages
>    vmscan: split file huge pages before paging them out
>    page-flags: relax policy for PG_mappedtodisk and PG_reclaim
>    radix-tree: implement radix_tree_maybe_preload_order()
>    filemap: prepare find and delete operations for huge pages
>    truncate: handle file thp
>    mm, rmap: account shmem thp pages
>    shmem: prepare huge= mount option and sysfs knob
>    shmem: add huge pages support
>    shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
>    thp: update Documentation/vm/transhuge.txt
>    thp: extract khugepaged from mm/huge_memory.c
>    khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
>    khugepaged: add support of collapse for tmpfs/shmem pages
>
>   Documentation/filesystems/Locking    |   10 +-
>   Documentation/vm/transhuge.txt       |  130 ++-
>   Documentation/vm/unevictable-lru.txt |   21 +
>   arch/alpha/mm/fault.c                |    2 +-
>   arch/arc/mm/fault.c                  |    2 +-
>   arch/arm/mm/fault.c                  |    2 +-
>   arch/arm64/mm/fault.c                |    2 +-
>   arch/avr32/mm/fault.c                |    2 +-
>   arch/cris/mm/fault.c                 |    2 +-
>   arch/frv/mm/fault.c                  |    2 +-
>   arch/hexagon/mm/vm_fault.c           |    2 +-
>   arch/ia64/mm/fault.c                 |    2 +-
>   arch/m32r/mm/fault.c                 |    2 +-
>   arch/m68k/mm/fault.c                 |    2 +-
>   arch/metag/mm/fault.c                |    2 +-
>   arch/microblaze/mm/fault.c           |    2 +-
>   arch/mips/mm/fault.c                 |    2 +-
>   arch/mn10300/mm/fault.c              |    2 +-
>   arch/nios2/mm/fault.c                |    2 +-
>   arch/openrisc/mm/fault.c             |    2 +-
>   arch/parisc/mm/fault.c               |    2 +-
>   arch/powerpc/mm/copro_fault.c        |    2 +-
>   arch/powerpc/mm/fault.c              |    2 +-
>   arch/s390/mm/fault.c                 |    2 +-
>   arch/score/mm/fault.c                |    2 +-
>   arch/sh/mm/fault.c                   |    2 +-
>   arch/sparc/mm/fault_32.c             |    4 +-
>   arch/sparc/mm/fault_64.c             |    2 +-
>   arch/tile/mm/fault.c                 |    2 +-
>   arch/um/kernel/trap.c                |    2 +-
>   arch/unicore32/mm/fault.c            |    2 +-
>   arch/x86/mm/fault.c                  |    2 +-
>   arch/xtensa/mm/fault.c               |    2 +-
>   drivers/base/node.c                  |   13 +-
>   drivers/char/mem.c                   |   24 +
>   drivers/iommu/amd_iommu_v2.c         |    3 +-
>   drivers/iommu/intel-svm.c            |    2 +-
>   fs/proc/meminfo.c                    |    7 +-
>   fs/proc/task_mmu.c                   |   10 +-
>   fs/userfaultfd.c                     |   22 +-
>   include/linux/huge_mm.h              |   36 +-
>   include/linux/khugepaged.h           |    6 +
>   include/linux/mm.h                   |   51 +-
>   include/linux/mmzone.h               |    4 +-
>   include/linux/page-flags.h           |   19 +-
>   include/linux/radix-tree.h           |    1 +
>   include/linux/rmap.h                 |    2 +-
>   include/linux/shmem_fs.h             |   29 +-
>   include/linux/userfaultfd_k.h        |    8 +-
>   include/linux/vm_event_item.h        |    7 +
>   include/trace/events/huge_memory.h   |    3 +-
>   ipc/shm.c                            |    6 +-
>   lib/radix-tree.c                     |   68 +-
>   mm/Makefile                          |    2 +-
>   mm/filemap.c                         |  226 ++--
>   mm/gup.c                             |    7 +-
>   mm/huge_memory.c                     | 2028 ++++++----------------------------
>   mm/internal.h                        |    4 +-
>   mm/khugepaged.c                      | 1772 +++++++++++++++++++++++++++++
>   mm/ksm.c                             |    5 +-
>   mm/memory.c                          |  859 +++++++-------
>   mm/mempolicy.c                       |    4 +-
>   mm/migrate.c                         |    5 +-
>   mm/mmap.c                            |   26 +-
>   mm/nommu.c                           |    3 +-
>   mm/page-writeback.c                  |    1 +
>   mm/page_alloc.c                      |   21 +
>   mm/rmap.c                            |   78 +-
>   mm/shmem.c                           |  689 ++++++++++--
>   mm/swap.c                            |    2 +
>   mm/truncate.c                        |   22 +-
>   mm/util.c                            |    6 +
>   mm/vmscan.c                          |    6 +
>   mm/vmstat.c                          |    4 +
>   74 files changed, 3919 insertions(+), 2395 deletions(-)
>   create mode 100644 mm/khugepaged.c
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-04-18 22:55 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-16  0:23 Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 01/29] thp, mlock: update unevictable-lru.txt Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 02/29] mm: do not pass mm_struct into handle_mm_fault Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 03/29] mm: introduce fault_env Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 04/29] mm: postpone page table allocation until we have page to map Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 05/29] rmap: support file thp Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 06/29] mm: introduce do_set_pmd() Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 07/29] thp, vmstats: add counters for huge file pages Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 08/29] thp: support file pages in zap_huge_pmd() Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 09/29] thp: handle file pages in split_huge_pmd() Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 10/29] thp: handle file COW faults Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 11/29] thp: skip file huge pmd on copy_huge_pmd() Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 12/29] thp: prepare change_huge_pmd() for file thp Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 13/29] thp: run vma_adjust_trans_huge() outside i_mmap_rwsem Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 14/29] thp: file pages support for split_huge_page() Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 15/29] thp, mlock: do not mlock PTE-mapped file huge pages Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 16/29] vmscan: split file huge pages before paging them out Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 17/29] page-flags: relax policy for PG_mappedtodisk and PG_reclaim Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 18/29] radix-tree: implement radix_tree_maybe_preload_order() Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 19/29] filemap: prepare find and delete operations for huge pages Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 20/29] truncate: handle file thp Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 21/29] mm, rmap: account shmem thp pages Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 22/29] shmem: prepare huge= mount option and sysfs knob Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 23/29] shmem: get_unmapped_area align huge page Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 24/29] shmem: add huge pages support Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 25/29] shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 26/29] thp: update Documentation/vm/transhuge.txt Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 27/29] thp: extract khugepaged from mm/huge_memory.c Kirill A. Shutemov
2016-04-16  0:23 ` [PATCHv7 28/29] khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page() Kirill A. Shutemov
2016-04-16  0:24 ` [PATCHv7 29/29] khugepaged: add support of collapse for tmpfs/shmem pages Kirill A. Shutemov
2016-04-18 22:55 ` Shi, Yang [this message]
2016-04-19 14:33   ` [PATCHv7 00/29] THP-enabled tmpfs/shmem using compound pages Jerome Marchand
2016-04-19 16:11     ` Shi, Yang
2016-04-19 16:50   ` Andrea Arcangeli
2016-04-19 17:07     ` Andres Lagar-Cavilla
2016-04-24  5:46       ` Wincy Van
2016-04-25 13:30         ` Andres Lagar-Cavilla
2016-04-26 14:02           ` Wincy Van
2016-04-27 15:48       ` Andrea Arcangeli
2016-04-19 23:48     ` Shi, Yang
2016-04-20  8:31   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571565F0.9070203@linaro.org \
    --to=yang.shi@linaro.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=cl@gentwo.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=jmarchan@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=quning@gmail.com \
    --cc=sasha.levin@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox