Re: [PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	James Houghton <jthoughton@google.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	David Hildenbrand <david@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe
Date: Sun, 6 Nov 2022 11:41:42 -0500	[thread overview]
Message-ID: <Y2fjxgojqKazzINq@x1n> (raw)
In-Reply-To: <202211061521.28931f7-oliver.sang@intel.com>

On Sun, Nov 06, 2022 at 04:14:10PM +0800, kernel test robot wrote:
> 
> Greeting,
> 
> FYI, we noticed WARNING:suspicious_RCU_usage due to commit (built with gcc-11):
> 
> commit: 8b7e3b7ca3897ebc4cb7b23c65a4618d64056e3b ("[PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe")
> url: https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-hugetlb-Make-huge_pte_offset-thread-safe-for-pmd-unshare/20221031-053221
> base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/lkml/20221030212929.335473-6-peterx@redhat.com
> patch subject: [PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe
> 
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-9313ba54-1_20221017
> with following parameters:
> 
> 	sc_nr_hugepages: 2
> 	group: vm
> 
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> 
> 
> on test machine: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Link: https://lore.kernel.org/oe-lkp/202211061521.28931f7-oliver.sang@intel.com
> 
> 
> kern  :warn  : [  181.942648] WARNING: suspicious RCU usage
> kern  :warn  : [  181.943175] 6.1.0-rc1-00309-g8b7e3b7ca389 #1 Tainted: G S
> kern  :warn  : [  181.943972] -----------------------------
> kern  :warn  : [  181.944526] include/linux/rcupdate.h:364 Illegal context switch in RCU read-side critical section!
> kern  :warn  : [  181.945559]
> other info that might help us debug this:
> 
> kern  :warn  : [  181.946625]
> rcu_scheduler_active = 2, debug_locks = 1
> kern  :warn  : [  181.947473] 2 locks held by hmm-tests/9934:
> kern :warn : [  181.948016] #0: ffff8884325b2d18 (&mm->mmap_lock#2){++++}-{3:3}, at: dmirror_fault (test_hmm.c:?) test_hmm
> kern :warn : [  181.949129] #1: ffffffff858a7860 (rcu_read_lock){....}-{1:2}, at: walk_hugetlb_range (pagewalk.c:?) 
> kern  :warn  : [  181.950161]
> stack backtrace:
> kern  :warn  : [  181.950780] CPU: 9 PID: 9934 Comm: hmm-tests Tainted: G S                 6.1.0-rc1-00309-g8b7e3b7ca389 #1
> kern  :warn  : [  181.951863] Hardware name: Dell Inc. Vostro 3670/0HVPDY, BIOS 1.5.11 12/24/2018
> kern  :warn  : [  181.952709] Call Trace:
> kern  :warn  : [  181.953070]  <TASK>
> kern :warn : [  181.953403] dump_stack_lvl (??:?) 
> kern :warn : [  181.953890] __might_resched (??:?) 
> kern :warn : [  181.954403] __mutex_lock (mutex.c:?) 
> kern :warn : [  181.954886] ? validate_chain (lockdep.c:?) 
> kern :warn : [  181.955405] ? hugetlb_fault (??:?) 
> kern :warn : [  181.955926] ? mark_lock+0xca/0xac0 
> kern :warn : [  181.956450] ? mutex_lock_io_nested (mutex.c:?) 
> kern :warn : [  181.957039] ? check_prev_add (lockdep.c:?) 
> kern :warn : [  181.957580] ? hugetlb_vm_op_pagesize (hugetlb.c:?) 
> kern :warn : [  181.958177] ? hugetlb_fault (??:?) 
> kern :warn : [  181.958690] hugetlb_fault (??:?) 
> kern :warn : [  181.959199] ? find_held_lock (lockdep.c:?) 
> kern :warn : [  181.959709] ? hugetlb_no_page (??:?) 
> kern :warn : [  181.960255] ? __lock_release (lockdep.c:?) 
> kern :warn : [  181.960772] ? lock_downgrade (lockdep.c:?) 
> kern :warn : [  181.961292] ? lock_is_held_type (??:?) 
> kern :warn : [  181.961830] ? handle_mm_fault (??:?) 
> kern :warn : [  181.962363] handle_mm_fault (??:?) 
> kern :warn : [  181.962870] ? hmm_vma_walk_hugetlb_entry (hmm.c:?) 
> kern :warn : [  181.963501] hmm_vma_fault (hmm.c:?) 
> kern :warn : [  181.964096] walk_hugetlb_range (pagewalk.c:?) 
> kern :warn : [  181.964639] __walk_page_range (pagewalk.c:?) 
> kern :warn : [  181.965160] walk_page_range (??:?) 
> kern :warn : [  181.965670] ? __walk_page_range (??:?) 
> kern :warn : [  181.966213] ? rcu_read_unlock (main.c:?) 
> kern :warn : [  181.966718] ? lock_is_held_type (??:?) 
> kern :warn : [  181.967259] ? mmu_interval_read_begin (??:?) 
> kern :warn : [  181.967855] ? lock_is_held_type (??:?) 
> kern :warn : [  181.968400] hmm_range_fault (??:?) 
> kern :warn : [  181.968911] ? down_read (??:?) 
> kern :warn : [  181.969383] ? hmm_vma_fault (??:?) 
> kern :warn : [  181.969891] ? __lock_release (lockdep.c:?) 
> kern :warn : [  181.970416] dmirror_fault (test_hmm.c:?) test_hmm
> kern :warn : [  181.971012] ? dmirror_migrate_to_system+0x590/0x590 test_hmm
> kern :warn : [  181.971847] ? find_held_lock (lockdep.c:?) 
> kern :warn : [  181.972355] ? dmirror_write+0x202/0x310 test_hmm
> kern :warn : [  181.973069] ? __lock_release (lockdep.c:?) 
> kern :warn : [  181.973586] ? lock_downgrade (lockdep.c:?) 
> kern :warn : [  181.974107] ? lock_is_held_type (??:?) 
> kern :warn : [  181.974641] ? dmirror_write+0x202/0x310 test_hmm
> kern :warn : [  181.975355] ? lock_release (??:?) 
> kern :warn : [  181.975845] ? __mutex_unlock_slowpath (mutex.c:?) 
> kern :warn : [  181.976444] ? bit_wait_io_timeout (mutex.c:?) 
> kern :warn : [  181.977008] ? lock_is_held_type (??:?) 
> kern :warn : [  181.977547] ? dmirror_do_write (test_hmm.c:?) test_hmm
> kern :warn : [  181.978185] dmirror_write+0x1bf/0x310 test_hmm
> kern :warn : [  181.978881] ? dmirror_fault (test_hmm.c:?) test_hmm
> kern :warn : [  181.979484] ? lock_is_held_type (??:?) 
> kern :warn : [  181.980021] ? __might_fault (??:?) 
> kern :warn : [  181.980523] ? lock_release (??:?) 
> kern :warn : [  181.981019] dmirror_fops_unlocked_ioctl (test_hmm.c:?) test_hmm
> kern :warn : [  181.981732] ? dmirror_exclusive+0x780/0x780 test_hmm
> kern :warn : [  181.982485] ? do_user_addr_fault (fault.c:?) 
> kern :warn : [  181.983042] ? __lock_release (lockdep.c:?) 
> kern :warn : [  181.983562] __x64_sys_ioctl (??:?) 
> kern :warn : [  181.984074] do_syscall_64 (??:?) 
> kern :warn : [  181.984545] ? do_user_addr_fault (fault.c:?) 
> kern :warn : [  181.985103] ? do_user_addr_fault (fault.c:?) 
> kern :warn : [  181.985654] ? irqentry_exit_to_user_mode (??:?) 
> kern :warn : [  181.986256] ? lockdep_hardirqs_on_prepare (lockdep.c:?) 
> kern :warn : [  181.986945] entry_SYSCALL_64_after_hwframe (??:?) 

So it is caused by the hmm code doing page fault during page walk, where
it'll go into the hugetlb fault logic and trying to take sleeptable locks..

That's slightly out of my expectation because logically I think the page
walk hooks should only do trivial works on the pte/pmd/.. being walked on,
rather than things as complicated as triggering a page fault as what HMM
does.  And it's also surprising to me that we can actually allow sleep.
But so far it looks safe.

Besides HMM it seems there's yet another user (enable_skey_walk_ops) that
can also yield itself by calling cond_resched().

My current plan is I may need to add some helpers so that when the hooks
decides to call code that can sleep, we need to notify the walker API.  It
could be something called walk_page_pause(), walk_page_cont(), then for
either a fault or cond_reched(), we could:

  walk_page_pause(&walk);
  hmm_vma_fault(); // or cond_reched(), etc.
  walk_page_cont(&walk);

We should probably also emphasize somewhere that mmap lock should never be
released for the whole page walk process, because walk_page_range() will
cache vma pointers.

If there's any better suggestion, please feel free to comment, or I'll give
it a shot with above approach in the next version.

-- 
Peter Xu

next prev parent reply	other threads:[~2022-11-06 16:41 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-30 21:29 [PATCH RFC 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-10-30 21:29 ` [PATCH RFC 01/10] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-11-03 15:25   ` Mike Kravetz
2022-10-30 21:29 ` [PATCH RFC 02/10] mm/hugetlb: Comment huge_pte_offset() for its locking requirements Peter Xu
2022-11-01  5:46   ` Nadav Amit
2022-11-02 20:51     ` Peter Xu
2022-11-03 15:42   ` Mike Kravetz
2022-11-03 18:11     ` Peter Xu
2022-11-03 18:38       ` Mike Kravetz
2022-10-30 21:29 ` [PATCH RFC 03/10] mm/hugetlb: Make hugetlb_vma_maps_page() RCU-safe Peter Xu
2022-10-30 21:29 ` [PATCH RFC 04/10] mm/hugetlb: Make userfaultfd_huge_must_wait() RCU-safe Peter Xu
2022-11-02 18:06   ` James Houghton
2022-11-02 21:17     ` Peter Xu
2022-10-30 21:29 ` [PATCH RFC 05/10] mm/hugetlb: Make walk_hugetlb_range() RCU-safe Peter Xu
2022-11-06  8:14   ` kernel test robot
2022-11-06 16:41     ` Peter Xu [this message]
2022-10-30 21:29 ` [PATCH RFC 06/10] mm/hugetlb: Make page_vma_mapped_walk() RCU-safe Peter Xu
2022-10-30 21:29 ` [PATCH RFC 07/10] mm/hugetlb: Make hugetlb_follow_page_mask() RCU-safe Peter Xu
2022-11-02 18:24   ` James Houghton
2022-11-03 15:50     ` Peter Xu
2022-10-30 21:30 ` [PATCH RFC 08/10] mm/hugetlb: Make follow_hugetlb_page RCU-safe Peter Xu
2022-10-30 21:30 ` [PATCH RFC 09/10] mm/hugetlb: Make hugetlb_fault() RCU-safe Peter Xu
2022-11-02 18:04   ` James Houghton
2022-11-03 15:39     ` Peter Xu
2022-10-30 21:30 ` [PATCH RFC 10/10] mm/hugetlb: Comment at rest huge_pte_offset() places Peter Xu
2022-11-01  5:39   ` Nadav Amit
2022-11-02 21:21     ` Peter Xu
2022-11-04  0:21 ` [PATCH RFC 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Mike Kravetz
2022-11-04 15:02   ` Peter Xu
2022-11-04 15:44     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2fjxgojqKazzINq@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=riel@surriel.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox