From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57397C43334 for ; Wed, 6 Jul 2022 14:21:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C1F666B0071; Wed, 6 Jul 2022 10:21:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B7F1F6B0073; Wed, 6 Jul 2022 10:21:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1E728E0001; Wed, 6 Jul 2022 10:21:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8CAB26B0071 for ; Wed, 6 Jul 2022 10:21:48 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 59D8621608 for ; Wed, 6 Jul 2022 14:21:48 +0000 (UTC) X-FDA: 79656888696.02.65E9A09 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf16.hostedemail.com (Postfix) with ESMTP id 26001180030 for ; Wed, 6 Jul 2022 14:21:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657117307; x=1688653307; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=rANpPC1k8ntNoM9UbPaxTmHG5k84FYQtX4k3gZlotro=; b=Yn/5FHfQ6dTFu4jfipz6oYLBZoFwDPEjEP5AW6U0XWgO1lT6XGbmog+B O7Oeyh3X8H6CJ5rXzvUXeOEdWp3kqTHrnT7XNRb63j6nLHlFJbLE2TQgC sBfTPN9j8w+NOLO8HVr8BLtYWh2U+gAtZUaH/QiMu8F8zV/mMci5gtf+d PJH5tHqmrYrV+XunflEfQ13J3r420QcbHPGVnX+QCKQxX5eg1FGCAjV2Q 6tQLqN9+7x5Mv6wxQcpMIe+iXdTr6oRS+7O6sa1hp1o2++sUizLyVqYEv EaNcMEVsoTQDFcgc7A8KI2fXvgu/g7guW2NdzWlQKBOhqXurRHVCQ4+JU Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10399"; a="264176707" X-IronPort-AV: E=Sophos;i="5.92,250,1650956400"; d="scan'208";a="264176707" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2022 07:21:45 -0700 X-IronPort-AV: E=Sophos;i="5.92,250,1650956400"; d="scan'208";a="650683204" Received: from xsang-optiplex-9020.sh.intel.com (HELO xsang-OptiPlex-9020) ([10.239.159.143]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2022 07:21:42 -0700 Date: Wed, 6 Jul 2022 22:21:36 +0800 From: Oliver Sang To: Mel Gorman Cc: Andrew Morton , 0day robot , LKML , linux-mm@kvack.org, lkp@lists.01.org, Nicolas Saenz Julienne , Marcelo Tosatti , Vlastimil Babka , Michal Hocko , Hugh Dickins Subject: Re: [mm/page_alloc] 2bd8eec68f: BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c Message-ID: References: <20220613125622.18628-8-mgorman@techsingularity.net> <20220703132209.875b823d1cb7169a8d51d56d@linux-foundation.org> <20220706095535.GD27531@techsingularity.net> <20220706115328.GE27531@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220706115328.GE27531@techsingularity.net> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657117307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6/sZWf/Lvx1q3xK7nhT0Zw1w9/NgyKAMJBWjw0cFvjQ=; b=EM9pEidRJENX7nlSp8hMWPk2ogjUkw3AfnZSabxda8ejl2Q9x0VuphcaMk7VsQI//Z7/3n VxNRZs7KnAwPr3/c+c9DCvWqbWw3TgJHb8ovSmAcNd46vuxDr1XM+9MCZTjJ1P9UoInVBO 6F74Pev2DYcCuxZuG1D9xmRvSUn4kyc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Yn/5FHfQ"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf16.hostedemail.com: domain of oliver.sang@intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=oliver.sang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657117307; a=rsa-sha256; cv=none; b=JY1gN/birPyGfcXcTaODK8OmVZ4PeWyO5Q75HWURIMo28GsLUWv5CEmQUYL4tSgHjU0MmG PbHBu8SNg3nV9D1uieBx63Q4A69sWcgfUtoAWJHsFeQIoPKaXRm+suP4SmTLKj8ko05CaU eLavGL0dzUdb/gmE2WVuv+MW7jdEosk= X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: y54sxd7kzxyhifh4c16azqdriec3uipj X-Rspamd-Queue-Id: 26001180030 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Yn/5FHfQ"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf16.hostedemail.com: domain of oliver.sang@intel.com has no SPF policy when checking 192.55.52.151) smtp.mailfrom=oliver.sang@intel.com X-HE-Tag: 1657117306-757608 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: hi, Mel Gorman, On Wed, Jul 06, 2022 at 12:53:29PM +0100, Mel Gorman wrote: > On Wed, Jul 06, 2022 at 10:55:35AM +0100, Mel Gorman wrote: > > On Tue, Jul 05, 2022 at 09:51:25PM +0800, Oliver Sang wrote: > > > Hi Andrew Morton, > > > > > > On Sun, Jul 03, 2022 at 01:22:09PM -0700, Andrew Morton wrote: > > > > On Sun, 3 Jul 2022 17:44:30 +0800 kernel test robot wrote: > > > > > > > > > FYI, we noticed the following commit (built with gcc-11): > > > > > > > > > > commit: 2bd8eec68f740608db5ea58ecff06965228764cb ("[PATCH 7/7] mm/page_alloc: Replace local_lock with normal spinlock") > > > > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Drain-remote-per-cpu-directly/20220613-230139 > > > > > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > > > patch link: https://lore.kernel.org/lkml/20220613125622.18628-8-mgorman@techsingularity.net > > > > > > > > > > > > > Did this test include the followup patch > > > > mm-page_alloc-replace-local_lock-with-normal-spinlock-fix.patch? > > > > > > no, we just fetched original patch set and test upon it. > > > > > > now we applied the patch you pointed to us upon 2bd8eec68f and found the issue > > > still exist. > > > (attached dmesg FYI) > > > > > > > Thanks Oliver. > > > > The trace is odd in that it hits in GUP when the page allocator is no > > longer active and the context is a syscall. First, is this definitely > > the first patch the problem occurs? > > > > I tried reproducing this on a 2-socket machine with Xeon > Gold Gold 5218R CPUs. It was necessary to set timeouts in both > vm/settings and kselftest/runner.sh to avoid timeouts. Testing with > a standard config on my original 5.19-rc3 baseline and the baseline > b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 both passed. I tried your kernel > config with i915 disabled (would not build) and necessary storage drivers > and network drivers enabled (for boot and access). The kernel log shows > a bunch of warnings related to USBAN during boot and during some of the > tests but otherwise compaction_test completed successfully as well as > the other VM tests. > > Is this always reproducible? not always but high rate. we actually also observed other dmesgs stats for both 2bd8eec68f74 and its parent, but those dmesg.BUG:sleeping_function_called_from_invalid_context_at* seem only happen on 2bd8eec68f74 as well as the '-fix' commit. ========================================================================================= compiler/group/kconfig/rootfs/sc_nr_hugepages/tbox_group/testcase/ucode: gcc-11/vm/x86_64-rhel-8.3-kselftests/debian-11.1-x86_64-20220510.cgz/2/lkp-csl-2sp9/kernel-selftests/0x500320a commit: eec0ff5df294 ("mm/page_alloc: Remotely drain per-cpu lists") 2bd8eec68f74 ("mm/page_alloc: Replace local_lock with normal spinlock") 292baeb4c714 ("mm/page_alloc: replace local_lock with normal spinlock -fix") eec0ff5df2945d19 2bd8eec68f740608db5ea58ecff 292baeb4c7149ac2cb844137481 ---------------- --------------------------- --------------------------- fail:runs %reproduction fail:runs %reproduction fail:runs | | | | | :20 75% 15:20 70% 14:21 dmesg.BUG:scheduling_while_atomic :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/binfmt_elf.c :20 5% 1:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/freezer.h :20 10% 2:20 25% 5:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/mmu_notifier.h :20 5% 1:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h :20 40% 8:20 40% 8:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h :20 10% 2:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c :20 10% 2:20 10% 2:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c :20 55% 11:20 65% 13:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/gup.c :20 15% 3:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/memory.c :20 60% 12:20 55% 11:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/migrate.c :20 5% 1:20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c :20 0% :20 5% 1:21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/rmap.c :20 15% 3:20 0% :21 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmalloc.c :20 45% 9:20 45% 9:21 dmesg.BUG:workqueue_leaked_lock_or_atomic :20 25% 5:20 15% 3:21 dmesg.Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= :20 5% 1:20 0% :21 dmesg.RIP:__clear_user 20:20 0% 20:20 5% 21:21 dmesg.RIP:rcu_eqs_exit 20:20 0% 20:20 5% 21:21 dmesg.RIP:sched_clock_tick :20 5% 1:20 0% :21 dmesg.RIP:smp_call_function_many_cond 20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/rcu/tree.c:#rcu_eqs_exit 20:20 0% 20:20 5% 21:21 dmesg.WARNING:at_kernel/sched/clock.c:#sched_clock_tick :20 5% 1:20 0% :21 dmesg.WARNING:at_kernel/smp.c:#smp_call_function_many_cond 20:20 0% 20:20 5% 21:21 dmesg.WARNING:suspicious_RCU_usage 20:20 0% 20:20 5% 21:21 dmesg.boot_failures 9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_lock()used_illegally_while_idle 9:20 -15% 6:20 -5% 8:21 dmesg.include/linux/rcupdate.h:#rcu_read_unlock()used_illegally_while_idle 20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/error_report.h:#suspicious_rcu_dereference_check()usage 20:20 0% 20:20 5% 21:21 dmesg.include/trace/events/lock.h:#suspicious_rcu_dereference_check()usage > > -- > Mel Gorman > SUSE Labs