linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Houghton <jthoughton@google.com>
To: Junxiao Chang <junxiao.chang@intel.com>
Cc: akpm@linux-foundation.org, kirill.shutemov@linux.intel.com,
	 mhocko@suse.com, jmarchan@redhat.com, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, mike.kravetz@oracle.com,
	muchun.song@linux.dev
Subject: Re: [PATCH] mm: fix hugetlb page unmap count balance issue
Date: Fri, 12 May 2023 14:26:49 -0700	[thread overview]
Message-ID: <CADrL8HV25JyeaT=peaR7NWhUiaBz8LzpyFosYZ3_0ACt+twU6w@mail.gmail.com> (raw)
In-Reply-To: <20230512072036.1027784-1-junxiao.chang@intel.com>

On Fri, May 12, 2023 at 12:20 AM Junxiao Chang <junxiao.chang@intel.com> wrote:
>
> hugetlb page usually is mapped with pmd, but occasionally it might be
> mapped with pte. QEMU can use udma-buf to create host dmabufs for guest
> framebuffers. When QEMU is launched with parameter "hugetlb=on",
> udmabuffer driver maps hugetlb page with pte in page fault handler.
> Call chain looks like:
>
> page_add_file_rmap
> do_set_pte
> finish_fault
> __do_fault -> udmabuf_vm_fault, it maps hugetlb page here.
> do_read_fault
>
> In function page_add_file_rmap, compound is false since it is pte mapping.
>
> When qemu exits and page is unmapped in function page_remove_rmap, the
> hugetlb page should not be handled in pmd way.
>
> This change is to check compound parameter as well as hugetlb flag. It
> fixes below kernel bug which is reproduced with 6.3 kernel:
>
> [  114.027754] BUG: Bad page cache in process qemu-system-x86  pfn:37aa00
> [  114.034288] page:000000000dd2153b refcount:514 mapcount:-4 mapping:000000004b01ca30 index:0x13800 pfn:0x37aa00
> [  114.044277] head:000000000dd2153b order:9 entire_mapcount:-4 nr_pages_mapped:4 pincount:512
> [  114.052623] aops:hugetlbfs_aops ino:6f93
> [  114.056552] flags: 0x17ffffc0010001(locked|head|node=0|zone=2|lastcpupid=0x1fffff)
> [  114.064115] raw: 0017ffffc0010001 fffff7338deb0008 fffff7338dea0008 ffff98dc855ea870
> [  114.071847] raw: 000000000000009c 0000000000000002 00000202ffffffff 0000000000000000
> [  114.079572] page dumped because: still mapped when deleted
> [  114.085048] CPU: 0 PID: 3122 Comm: qemu-system-x86 Tainted: G    BU  W   E      6.3.0-v3+ #62
> [  114.093566] Hardware name: Intel Corporation Alder Lake Client Platform DDR5 SODIMM SBS RVP, BIOS ADLPFWI1.R00.3084.D89.2303211034 03/21/2023
> [  114.106839] Call Trace:
> [  114.109291]  <TASK>
> [  114.111405]  dump_stack_lvl+0x4c/0x70
> [  114.115073]  dump_stack+0x14/0x20
> [  114.118395]  filemap_unaccount_folio+0x159/0x220
> [  114.123021]  filemap_remove_folio+0x54/0x110
> [  114.127295]  remove_inode_hugepages+0x111/0x5b0
> [  114.131834]  hugetlbfs_evict_inode+0x23/0x50
> [  114.136111]  evict+0xcd/0x1e0
> [  114.139083]  iput.part.0+0x183/0x1e0
> [  114.142663]  iput+0x20/0x30
> [  114.145466]  dentry_unlink_inode+0xcc/0x130
> [  114.149655]  __dentry_kill+0xec/0x1a0
> [  114.153325]  dput+0x1ca/0x3c0
> [  114.156293]  __fput+0xf4/0x280
> [  114.159357]  ____fput+0x12/0x20
> [  114.162502]  task_work_run+0x62/0xa0
> [  114.166088]  do_exit+0x352/0xae0
> [  114.169321]  do_group_exit+0x39/0x90
> [  114.172892]  get_signal+0xa09/0xa30
> [  114.176391]  arch_do_signal_or_restart+0x33/0x280
> [  114.181098]  exit_to_user_mode_prepare+0x11f/0x190
> [  114.185893]  syscall_exit_to_user_mode+0x2a/0x50
> [  114.190509]  do_syscall_64+0x4c/0x90
> [  114.194095]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>
> Fixes: 53f9263baba6 ("mm: rework mapcount accounting to enable 4k mapping of THPs")
> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
> ---
>  mm/rmap.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 19392e090bec6..b42fc0389c243 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1377,9 +1377,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>
>         VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>
> -       /* Hugetlb pages are not counted in NR_*MAPPED */
> -       if (unlikely(folio_test_hugetlb(folio))) {
> -               /* hugetlb pages are always mapped with pmds */
> +       /* Hugetlb pages usually are not counted in NR_*MAPPED */
> +       if (unlikely(folio_test_hugetlb(folio) && compound)) {
> +               /* hugetlb pages are mapped with pmds */
>                 atomic_dec(&folio->_entire_mapcount);
>                 return;
>         }

This alone doesn't fix mapcounting for PTE-mapped HugeTLB pages. You
need something like [1]. I can resend it if that's what we should be
doing, but this mapcounting scheme doesn't work when the page structs
have been freed.

It seems like it was a mistake to include support for hugetlb memfds in udmabuf.

[1]: https://lore.kernel.org/linux-mm/20230306230004.1387007-2-jthoughton@google.com/

- James


  parent reply	other threads:[~2023-05-12 21:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-12  7:20 Junxiao Chang
2023-05-12 21:03 ` Andrew Morton
2023-05-15  0:08   ` Chang, Junxiao
2023-05-12 21:26 ` James Houghton [this message]
2023-05-12 23:29   ` Mike Kravetz
2023-05-15  0:44     ` Chang, Junxiao
2023-05-15 17:04     ` Mike Kravetz
2023-05-16 22:34       ` Mike Kravetz
2023-06-07 19:03         ` Andrew Morton
2023-06-07 20:53           ` Mike Kravetz
2023-06-07 21:00             ` Andrew Morton
2023-06-07 21:16               ` Mike Kravetz
2023-06-08  7:59               ` Greg Kroah-Hartman
2023-06-07 19:27         ` David Hildenbrand
2023-06-19 12:27       ` Gerd Hoffmann
2023-06-20  6:23         ` Kasireddy, Vivek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADrL8HV25JyeaT=peaR7NWhUiaBz8LzpyFosYZ3_0ACt+twU6w@mail.gmail.com' \
    --to=jthoughton@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=jmarchan@redhat.com \
    --cc=junxiao.chang@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox