From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AF55CD37B0 for ; Mon, 18 Sep 2023 15:36:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C49A36B03BB; Mon, 18 Sep 2023 11:36:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD29E6B03BE; Mon, 18 Sep 2023 11:36:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A73116B03C0; Mon, 18 Sep 2023 11:36:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8FD126B03BB for ; Mon, 18 Sep 2023 11:36:11 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 408DDB3CCC for ; Mon, 18 Sep 2023 15:36:11 +0000 (UTC) X-FDA: 81250119342.16.FEE8236 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf16.hostedemail.com (Postfix) with ESMTP id 3270C18000C for ; Mon, 18 Sep 2023 15:36:08 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bqnMOokH; spf=pass (imf16.hostedemail.com: domain of zhenyzha12@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=zhenyzha12@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695051369; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PFoiviEU+r5T5YETdOh81T/TN/5MFoyz6Of7lkKfnA4=; b=tp9epc+q35zqGb1IC+3WpHOjvoHsKUOfVda0gq4gvqM87qdJn+omSsA/YH9CaAt4AswL+U WScPNnOmWz9oTCnpXh0/qyv/UW9WXI48Tr/TuM7H0CiasZP1Zl2ozDP29kO9CunnkJycVP wr2nYnmtaeFUpdcpTk/Yowtu6W/oqo0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695051369; a=rsa-sha256; cv=none; b=VI0D70LpGaW+YtIWMwxvhVvxXe4OrfJV4I0dlxNxSFSpUtYI/PlMOrdGVkJSC73D/r7XNy 1DfFsPhv3pNcxSooL0bpha4YUmP/iyVLkwaxpdZSddMnrwNcew+JbSmjIsrV2OF+2OdnGi 8dqga6oIC/rCkOHU970ENYsDwKxddv8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=bqnMOokH; spf=pass (imf16.hostedemail.com: domain of zhenyzha12@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=zhenyzha12@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-402c46c49f4so49962655e9.1 for ; Mon, 18 Sep 2023 08:36:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695051368; x=1695656168; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PFoiviEU+r5T5YETdOh81T/TN/5MFoyz6Of7lkKfnA4=; b=bqnMOokH6HMuG36l2qQyIrdhfGSZhB1EZItagZRJ9n8s4fsa9WPQun4zjnZusPi3gZ YNEKLiWTlireuAmCbUf9oAHr5c7VQn+jGGPBMwfZ1oaUxKm1ctVbXvxPoA/INRO5XdsD ZdBZ/5TFh8usZo8VMrniBLhDNmxdJ7N8A5+ZrfZH3A18uGaCCgA+mByfk9RMc7ANQyW7 V0KULe+e4VaFFrgTD1RI5WSjAjzfwxiJ7rHt7WCUkp4DcAWJLgcjPa3OeKzlU22HaOYU RkHconUsCPp/Xfw9qKefldXPRgmOSRgY4H3UWM5U7+lG3ztc2RsPPzbBQStGWnspIGjV hxQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695051368; x=1695656168; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PFoiviEU+r5T5YETdOh81T/TN/5MFoyz6Of7lkKfnA4=; b=CgrRbWhF+IsI2sr8AL6CZxYjr7AcMsQGfKggaLrWKogZd+zAlEsZLn5ppjSDbX9Pbj KKgHJdVNYiyP1RBqctl94pa002GUoSquMhXO/OXQk2KwoPNTNZV5EWpjdE9VRu9a4sNF XBu47KFE2ZsHrLJ+sAQu6lLDhFKtE/9t9iTP7/UYaC4hYEtrgL2ogbKovaWzxxEwhIVL cbDFVweHlQ2lPHHAK/ryY330jhAlz4MT7jESr+7yzFLwfz474dobw58vRmMSY2zMT7B8 2nfTwcyYuzOIVrBhwkzfJgNg5Abponk15ry/V8uos71Hmg/lf6gdzJHiGuDePgTl60VV Q7rA== X-Gm-Message-State: AOJu0Yx90R7m271Z4Q1xzmdF+YGxKB7P7p801VrKNcK/bIK4zVBa+5px 3OEptay77M1Tze8ai4VFFdM+14hD6gRkbLj8Mn0= X-Google-Smtp-Source: AGHT+IFKSH/+C+uOu5kBwylXxmDTkEcOBL9VlGK5joVaE0p0NpWN+FsHQAbH9uv3E7NZ4/dszXhpwSdXyzt25aIoFtg= X-Received: by 2002:adf:a4c9:0:b0:31f:f9aa:a456 with SMTP id h9-20020adfa4c9000000b0031ff9aaa456mr7499134wrb.2.1695051367373; Mon, 18 Sep 2023 08:36:07 -0700 (PDT) MIME-Version: 1.0 References: <20230903151328.2981432-1-joel@joelfernandes.org> In-Reply-To: <20230903151328.2981432-1-joel@joelfernandes.org> From: zhenyu zhang Date: Mon, 18 Sep 2023 23:35:56 +0800 Message-ID: Subject: Re: [PATCH v6 0/7] Optimize mremap during mutual alignment within PMD To: joel@joelfernandes.org Cc: linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, Shuah Khan , Vlastimil Babka , Michal Hocko , Linus Torvalds , Lorenzo Stoakes , Kirill A Shutemov , "Liam R. Howlett" , "Paul E. McKenney" , Suren Baghdasaryan , Kalesh Singh , Lokesh Gidra , gshan@redhat.com, david@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3270C18000C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 61rgen7t3jj48sszc19b9u9z5r5prboy X-HE-Tag: 1695051368-366097 X-HE-Meta: U2FsdGVkX1+hpLOWre5XPXms52mjYOGjpz7homIeHj45O5R593gzI9UJU7tdzRruzwLKP35i2dVVq6zq8uT6cGNg6y3M6jgsxQAb36pbqwvFwyIPp4jeBCOUsFnignYG7b0k2TcBNxTUf6eNmS1wtt/5iWQYdGx2S+myQSwtnn10FRvM20gfbSMEAj+Srxo2oVkYYuWupd937UAkE6ytZH77HZ3M8mpotnsD8E1zmZEySnDIfSCDNUS18AwI8/mFRDsWdnVdpw1XApOiO8kUQryMkC0uhQTxcbs+IbGaKRbG4BFrVFXOrPKxggEDoBoWBqkbBztW0tvXP41tBWsqvs1qs1B5U4Ted91MNcpYJTfPIND5iamVQ6wcWk/kKPmsm3kaHkX4lrJ/nonrAoVWFudWt+GCZsdS64osiAiSWFM7WqUJN/Gr8nohmkPkhqUiAdIjYfFXY51pUuE775hnI23Aq4ujSNmWKD9fGVCsnor5wd0y00bDcfU4PHxVwqWa8hZl04VcJi5Ss6uFfiNDB2wQhQB5aOB90nmK7AEzmKy8F5usxyRH6lX+zpc6NwYUXqMEpyTGLlKWpUEhATfC1KxHDW6uUjag2VJ+OZTrqotF9ZuoPhKvYvJ0H/oBW0nSdio8lr/DO1ToYkbBMucpL5VS/VXowdYzdRyDhmVzampMyWg+RO8Dj/3yyIH0g9DRalPX4o+HNoawSR3nYoFW3WTYWihfe+bRVoEgMbf01YN9RWtFZoly5C146eqJU2fbYdRVRAw1jvo1Y9I0y4pd7EFfSSFVwmHSRPS9LvYZyJbxQYnBg19nGim1u7Hvpa+tyoWZZph08KadssiisGIVGi4E/TN8vRVfwCwckRmMR/t8Ke8Rw+pXvjYgkdcbnnsNrDdpXiD/712QGH0VRQkoRTXqRsXbr9pDfuVadOhYZj58+fT5ae1yJtw2gIDMYdeNqH/4VL/5t1YhdDZ24z/ zz+37g8C T458xV8FuQJnjUCxJ0okv6pHJO1L99OcSgfhRM92V6Pd0DnsT/v9BoX33siCBzKnTAETo1AoG5dNAAMqkafG1OrHOqCBjWCH+G1QzVTizcA9OuDM+mjCtU1Wa/vO/4uqWeCtONm30Xv3QjF+RhyFBAU+yy2qg42HZCqQrP6DVKh+Y89eK8eEOtIS3PBUvaXkpyp9MKsodaglYDVtHNz3JbE2rgcPhn7qDVxz7Sk6OZdhlCRi31Z9IdCLR9v79AUwPV+4+f/5pQvUL61wfUmk3d7m4o6TtPBDDkVFHeqX0YgI2z0pBSG2gSKRolqH5te3TonRFbvzFRJQgf5kw+Djj+gR/UJqKKivxJTZ51ZjuDEq2cZLN3rUvcVcQaLPCr/Q5ScMQE26NYaxIRa26CH4+wC4jEl5Jg/seVlEOCGJJCjYt1DV86nsIA2lTHuo8c8nCU4IgzzhZ6zTZVD9w+S3CPioyOUw+bKDVSWJ+NFKvx6vaxOsH/6O+C1WDSA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With 4k guest and 64k host, on aarch64(Ampere's Altra Max CPU) hit Call tra= ce: Steps: 1) System setup hugepages on host. # echo 50 > /proc/sys/vm/nr_hugepages 2) Mount this hugepage to /mnt/kvm_hugepage. # mount -t hugetlbfs -o pagesize=3D524288K none /mnt/kvm_hugepage 3) HugePages didn't leak when using non-existent mem-path. # cd /home/kar/workspace/avocado-vt/virttest; mkdir -p /mnt/tmp 4) Run memory heavy stress inside guest. # /usr/libexec/qemu-kvm \ ... -m 25600 \ -object '{"size": 26843545600, "mem-path": "/mnt/tmp", "id": "mem-machine_mem", "qom-type": "memory-backend-file"}' \ -smp 60,maxcpus=3D60,cores=3D30,threads=3D1,clusters=3D1,sockets= =3D2 \ login guest: # nohup stress --vm 50 --vm-bytes 256M --timeout 30s > /dev/null & ------> hit Call trace On guest kernel: 2023-09-18 07:54:03: [ 76.592706] CPU: 23 PID: 254 Comm: kworker/23:1 Kdump: loaded Not tainted 6.6.0-rc2-zhenyzha_4k+ #3 2023-09-18 07:54:03: [ 76.593782] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20230524-3.el9 05/24/2023 2023-09-18 07:54:03: [ 76.594641] Workqueue: rcu_gp wait_rcu_exp_gp 2023-09-18 07:54:03: [ 76.595248] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=3D--) 2023-09-18 07:54:03: [ 76.596025] pc : smp_call_function_single+0xe4/0x1e= 8 2023-09-18 07:54:03: [ 76.596833] lr : __sync_rcu_exp_select_node_cpus+0x27c/0x428 2023-09-18 07:54:03: [ 76.597534] sp : ffff800084a0bc60 2023-09-18 07:54:03: [ 76.598078] x29: ffff800084a0bc60 x28: ffff0003fdad9440 x27: 0000000000000001 2023-09-18 07:54:03: [ 76.598874] x26: ffff800081a541b0 x25: ffff800081e0af40 x24: ffff0000c425ed80 2023-09-18 07:54:03: [ 76.599817] x23: 0000000000000004 x22: ffff800081532fa0 x21: 0000000000000ffe 2023-09-18 07:54:03: [ 76.600621] x20: ffff800081537440 x19: ffff800084a0bca0 x18: 0000000000000001 2023-09-18 07:54:03: [ 76.601420] x17: 0000000000000000 x16: ffff800080f352e8 x15: 0000ffff97d02fff 2023-09-18 07:54:03: [ 76.602212] x14: 0000000000000000 x13: 0000000000000030 x12: 0101010101010101 2023-09-18 07:54:03: [ 76.603158] x11: ffff800081532fa0 x10: 0000000000000001 x9 : ffff80008014c714 2023-09-18 07:54:03: [ 76.603963] x8 : ffff800081e03130 x7 : ffff800081521008 x6 : ffff80008014e070 2023-09-18 07:54:03: [ 76.604759] x5 : 0000000000000000 x4 : ffff0003fda34c88 x3 : 0000000000000001 2023-09-18 07:54:03: [ 76.605703] x2 : 0000000000000000 x1 : ffff0003fda34c80 x0 : 000000000000001c 2023-09-18 07:54:03: [ 76.606507] Call trace: 2023-09-18 07:54:03: [ 76.606990] smp_call_function_single+0xe4/0x1e8 2023-09-18 07:54:03: [ 76.607617] __sync_rcu_exp_select_node_cpus+0x27c/= 0x428 2023-09-18 07:54:03: [ 76.608290] sync_rcu_exp_select_cpus+0x164/0x2e0 2023-09-18 07:54:03: [ 76.608963] wait_rcu_exp_gp+0x1c/0x38 2023-09-18 07:54:03: [ 76.609563] process_one_work+0x174/0x3c8 2023-09-18 07:54:03: [ 76.610181] worker_thread+0x2c8/0x3e0 2023-09-18 07:54:03: [ 76.610776] kthread+0x100/0x110 2023-09-18 07:54:03: [ 76.611330] ret_from_fork+0x10/0x20 2023-09-18 07:54:15: [ 88.396191] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 2023-09-18 07:54:15: [ 88.397195] rcu: 11-...0: (18 ticks this GP) idle=3D79ec/1/0x4000000000000000 softirq=3D577/579 fqs=3D1215 2023-09-18 07:54:15: [ 88.398244] rcu: 25-...0: (1 GPs behind) idle=3D599c/1/0x4000000000000000 softirq=3D300/301 fqs=3D1215 2023-09-18 07:54:15: [ 88.399254] rcu: 33-...0: (36 ticks this GP) idle=3De454/1/0x4000000000000000 softirq=3D717/719 fqs=3D1216 2023-09-18 07:54:15: [ 88.400275] rcu: (detected by 19, t=3D6006 jiffies, g=3D1173, q=3D61327 ncpus=3D38) 2023-09-18 07:54:15: [ 88.401135] Task dump for CPU 11: 2023-09-18 07:54:15: [ 88.401711] task:stress state:R running task stack:0 pid:3182 ppid:3178 flags:0x00000202 2023-09-18 07:54:15: [ 88.402794] Call trace: 2023-09-18 07:54:15: [ 88.403312] __switch_to+0xc8/0x110 2023-09-18 07:54:15: [ 88.403915] do_page_fault+0x198/0x4e0 2023-09-18 07:54:15: [ 88.404533] do_translation_fault+0x38/0x68 2023-09-18 07:54:15: [ 88.405169] do_mem_abort+0x48/0xa0 2023-09-18 07:54:15: [ 88.405771] el0_da+0x4c/0x180 2023-09-18 07:54:15: [ 88.406337] el0t_64_sync_handler+0xdc/0x150 2023-09-18 07:54:15: [ 88.406991] el0t_64_sync+0x17c/0x180 2023-09-18 07:54:15: [ 88.407601] Task dump for CPU 25: 2023-09-18 07:54:15: [ 88.408182] task:stress state:R running task stack:0 pid:3200 ppid:3178 flags:0x00000203 2023-09-18 07:54:15: [ 88.409258] Call trace: 2023-09-18 07:54:15: [ 88.409769] __switch_to+0xc8/0x110 2023-09-18 07:54:15: [ 88.410339] 0x440dc0 2023-09-18 07:54:15: [ 88.410816] Task dump for CPU 33: 2023-09-18 07:54:15: [ 88.411362] task:stress state:R running task stack:0 pid:3191 ppid:3178 flags:0x00000203 2023-09-18 07:54:15: [ 88.412403] Call trace: 2023-09-18 07:54:15: [ 88.412866] __switch_to+0xc8/0x110 2023-09-18 07:54:15: [ 88.413405] __memcg_kmem_charge_page+0x270/0x2c0 2023-09-18 07:54:15: [ 88.414033] __alloc_pages+0x100/0x278 2023-09-18 07:54:15: [ 88.414585] memcg_stock+0x0/0x58 On host kernel: 173242 Sep 18 08:57:51 virt-mtsnow-02 kernel: ------------[ cut here ]------------ 173243 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 52 kernel messages 173244 Sep 18 08:57:51 virt-mtsnow-02 kernel: do_cow_fault+0xf0/0x300 173245 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 162 kernel messages 173246 Sep 18 08:57:51 virt-mtsnow-02 kernel: CPU: 14 PID: 11294 Comm: qemu-kvm Tainted: G W 6.6.0-rc2-zhenyzha-64k+ #1 173247 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 226 kernel messages 173248 Sep 18 08:57:51 virt-mtsnow-02 kernel: x21: 0000000000000000 173249 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 120 kernel messages 173250 Sep 18 08:57:51 virt-mtsnow-02 kernel: __do_fault+0x40/0x210 173251 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 39 kernel messages 173252 Sep 18 08:57:51 virt-mtsnow-02 kernel: do_el0_svc+0xb4/0xd0 173253 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 325 kernel messages 173254 Sep 18 08:57:51 virt-mtsnow-02 kernel: get_user_pages_unlocked+0xc4/= 0x3b8 173255 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 255 kernel messages 173256 Sep 18 08:57:51 virt-mtsnow-02 kernel: pci_hyperv_intf i2c_designware_core 173257 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 87 kernel messages 173258 Sep 18 08:57:51 virt-mtsnow-02 kernel: xfs_filemap_fault+0x54/0x68 [= xfs] 173259 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 248 kernel messages 173260 Sep 18 08:57:51 virt-mtsnow-02 kernel: pci_hyperv_intf 173261 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 69 kernel messages 173262 Sep 18 08:57:51 virt-mtsnow-02 kernel: Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F18v (SCP: 1.08.20211002) 12/01/2021 173263 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 297 kernel messages 173264 Sep 18 08:57:51 virt-mtsnow-02 kernel: __filemap_add_folio+0x33c/0x4= e0 173265 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 12 kernel messages 173266 Sep 18 08:57:51 virt-mtsnow-02 kernel: x26: 0000000000000001 173267 Sep 18 08:57:51 virt-mtsnow-02 systemd-journald[15184]: Missed 74 kernel messages [ 5456.588346] ------------[ cut here ]------------ [ 5456.588358] x10: 000000000000000a [ 5456.588365] dm_mod [ 5456.588372] nft_compat [ 5456.588374] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F18v (SCP: 1.08.20211002) 12/01/2021 [ 5456.588417] fat [ 5456.588421] x16: 000000009872d4d0 [ 5456.588430] ipmi_msghandler arm_cmn [ 5456.588439] x10: 000000000000000a [ 5456.588414] __xfs_filemap_fault+0x60/0x3c0 [xfs] [ 5456.588454] x5 : 0000000000000028 [ 5456.588460] nvme_core [ 5456.588474] pci_hyperv_intf [ 5456.588482] ------------[ cut here ]------------ [ 5456.588488] page_cache_async_ra+0x64/0xa8 [ 5456.588491] filemap_fault+0x238/0xaa8 [ 5456.588506] nf_defrag_ipv4 nf_tables [ 5456.588514] nfs_acl [ 5456.588518] x22: ffffffc202880000 [ 5456.588525] netfs [ 5456.588527] stp [ 5456.588539] acpi_ipmi [ 5456.588546] x10: 000000000000000a [ 5456.588554] x7 : ffff07ffa0a67210 [ 5456.588562] get_user_pages_unlocked+0xc4/0x3b8 [ 5456.588567] __gfn_to_pfn_memslot+0xa4/0xf8 [ 5456.588575] xas_split_alloc+0xf8/0x128 [ 5456.588581] sha1_ce [ 5456.588588] i2c_algo_bit [ 5456.588592] page_cache_async_ra+0x64/0xa8 Using @gshan@redhat.com 's patch:KVM: arm64: Fix soft-lockup on relaxing PTE permission Still hit Call trace: 2023-09-18 10:56:20: [ 57.494201] watchdog: BUG: soft lockup - CPU#58 stuck for 22s! [gsd-power:4858] 2023-09-18 10:56:20: [ 57.495674] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr vfat fat fuse xfs libcrc32c virtio_gpu virtio_dma_buf drm_shmem_helper nvme_tcp drm_kms_helper nvme_fabrics nvme_core nvme_common sg drm crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_net net_failover virtio_scsi failover virtio_mmio dm_multipath dm_mirror dm_region_hash dm_log dm_mod be2iscsi cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 2023-09-18 10:56:20: [ 57.501871] CPU: 58 PID: 4858 Comm: gsd-power Kdump: loaded Not tainted 6.6.0-rc2-zhenyzha_4k+ #3 2023-09-18 10:56:20: [ 57.502719] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20230524-3.el9 05/24/2023 2023-09-18 10:56:20: [ 57.503540] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=3D--) 2023-09-18 10:56:20: [ 57.504612] pc : smp_call_function_many_cond+0x16c/= 0x618 2023-09-18 10:56:20: [ 57.505425] lr : smp_call_function_many_cond+0x188/= 0x618 2023-09-18 10:56:20: [ 57.505974] sp : ffff8000870f38f0 2023-09-18 10:56:20: [ 57.506370] x29: ffff8000870f38f0 x28: 000000000000003c x27: ffff00063c5dcaa0 2023-09-18 10:56:20: [ 57.507041] x26: 000000000000003c x25: 000000000000003b x24: ffff00063c5b6848 2023-09-18 10:56:20: [ 57.507812] x23: 0000000000000000 x22: ffff00063c5b6848 x21: ffff800081a541b0 2023-09-18 10:56:20: [ 57.508513] x20: ffff00063c5b6840 x19: ffff800081a4f840 x18: 0000000000000014 2023-09-18 10:56:20: [ 57.509247] x17: 00000000fd875552 x16: 0000000044ca0210 x15: 000000005df1120b 2023-09-18 10:56:20: [ 57.509947] x14: 00000000ac15cb21 x13: 00000000b7ff1817 x12: 0000000006d3918c 2023-09-18 10:56:20: [ 57.510645] x11: 00000000ba65fdab x10: 00000000f60c2b88 x9 : ffff80008061a9dc 2023-09-18 10:56:20: [ 57.511264] x8 : ffff00063c5b6a50 x7 : 0000000000000000 x6 : 0000000001000000 2023-09-18 10:56:20: [ 57.511817] x5 : 000000000000003c x4 : 0000000000000007 x3 : ffff00063bf28aa8 2023-09-18 10:56:20: [ 57.512415] x2 : 0000000000000000 x1 : 0000000000000011 x0 : 0000000000000007 2023-09-18 10:56:20: [ 57.513092] Call trace: 2023-09-18 10:56:20: [ 57.515105] smp_call_function_many_cond+0x16c/0x61= 8 2023-09-18 10:56:20: [ 57.515684] kick_all_cpus_sync+0x48/0x80 2023-09-18 10:56:20: [ 57.516039] flush_icache_range+0x40/0x60 2023-09-18 10:56:20: [ 57.516413] bpf_int_jit_compile+0x1ac/0x5f8 2023-09-18 10:56:20: [ 57.516821] bpf_prog_select_runtime+0xd4/0x110 2023-09-18 10:56:20: [ 57.517279] bpf_prepare_filter+0x1e8/0x220 2023-09-18 10:56:20: [ 57.517727] __get_filter+0xdc/0x180 2023-09-18 10:56:20: [ 57.518231] sk_attach_filter+0x1c/0xb0 2023-09-18 10:56:20: [ 57.518605] sk_setsockopt+0x9dc/0x1230 2023-09-18 10:56:20: [ 57.518909] sock_setsockopt+0x18/0x28 2023-09-18 10:56:20: [ 57.519177] __sys_setsockopt+0x164/0x190 2023-09-18 10:56:20: [ 57.519501] __arm64_sys_setsockopt+0x2c/0x40 2023-09-18 10:56:20: [ 57.519911] invoke_syscall.constprop.0+0x7c/0xd0 2023-09-18 10:56:20: [ 57.520345] do_el0_svc+0xb4/0xd0 2023-09-18 10:56:20: [ 57.520670] el0_svc+0x50/0x228 2023-09-18 10:56:20: [ 57.521331] el0t_64_sync_handler+0x134/0x150 2023-09-18 10:56:20: [ 57.521758] el0t_64_sync+0x17c/0x180 2023-09-18 10:56:23: [ 60.724199] watchdog: BUG: soft lockup - CPU#28 stuck for 26s! [(fwupd):5108] [ 6253.928601] CPU: 64 PID: 18885 Comm: qemu-kvm Kdump: loaded Tainted: G W 6.6.0-rc1-zhenyzha_64k+ #2 [ 6253.939021] Hardware name: GIGABYTE R152-P31-00/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022 [ 6253.948312] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE= =3D--) [ 6253.955262] pc : xas_split_alloc+0xf8/0x128 [ 6253.959432] lr : __filemap_add_folio+0x33c/0x4e0 [ 6253.964037] sp : ffff80008b10f210 [ 6253.967338] x29: ffff80008b10f210 x28: ffffba8c43708c00 x27: 00000000000= 00001 [ 6253.974461] x26: 0000000000000001 x25: ffffffffffffc005 x24: 00000000000= 00000 [ 6253.981583] x23: ffff80008b10f2c0 x22: 00000a36da000101 x21: 00000000000= 00000 [ 6253.988706] x20: ffffffc203be2a00 x19: 000000000000000d x18: 00000000000= 00014 [ 6253.995828] x17: 00000000be237f61 x16: 000000001baa68cc x15: ffffba8c429= a5944 [ 6254.002950] x14: ffffba8c429b57bc x13: ffffba8c429a5944 x12: ffffba8c429= b57bc [ 6254.010073] x11: ffffba8c4297160c x10: ffffba8c4365d414 x9 : ffffba8c436= 5857c [ 6254.017195] x8 : ffff80008b10f210 x7 : ffff07ffa1304900 x6 : ffff80008b1= 0f210 [ 6254.024317] x5 : 000000000000000e x4 : 0000000000000000 x3 : 00000000000= 12c40 [ 6254.031439] x2 : 000000000000000d x1 : 000000000000000c x0 : 00000000000= 00000 [ 6254.038562] Call trace: [ 6254.040995] xas_split_alloc+0xf8/0x128 [ 6254.044818] __filemap_add_folio+0x33c/0x4e0 [ 6254.049076] filemap_add_folio+0x48/0xd0 [ 6254.052986] page_cache_ra_unbounded+0xf0/0x1f0 [ 6254.057504] page_cache_ra_order+0x8c/0x310 [ 6254.061675] filemap_fault+0x67c/0xaa8 [ 6254.065412] __xfs_filemap_fault+0x60/0x3c0 [xfs] [ 6254.070163] xfs_filemap_fault+0x54/0x68 [xfs] [ 6254.074651] __do_fault+0x40/0x210 [ 6254.078040] do_cow_fault+0xf0/0x300 [ 6254.081602] do_pte_missing+0x140/0x238 [ 6254.085426] handle_pte_fault+0x100/0x160 [ 6254.089423] __handle_mm_fault+0x100/0x310 [ 6254.093506] handle_mm_fault+0x6c/0x270 [ 6254.097330] faultin_page+0x70/0x128 [ 6254.100893] __get_user_pages+0xc8/0x2d8 [ 6254.104802] get_user_pages_unlocked+0xc4/0x3b8 [ 6254.109320] hva_to_pfn+0xf8/0x468 [ 6254.112709] __gfn_to_pfn_memslot+0xa4/0xf8 [ 6254.116879] user_mem_abort+0x174/0x7e8 [ 6254.120702] kvm_handle_guest_abort+0x2dc/0x450 [ 6254.125220] handle_exit+0x70/0x1c8 [ 6254.128696] kvm_arch_vcpu_ioctl_run+0x224/0x5b8 [ 6254.133300] kvm_vcpu_ioctl+0x28c/0x9d0 [ 6254.137123] __arm64_sys_ioctl+0xa8/0xf0 [ 6254.141033] invoke_syscall.constprop.0+0x7c/0xd0 [ 6254.145725] do_el0_svc+0xb4/0xd0 [ 6254.149028] el0_svc+0x50/0x228 [ 6254.152157] el0t_64_sync_handler+0x134/0x150 [ 6254.156501] el0t_64_sync+0x17c/0x180 [ 6254.160151] ---[ end trace 0000000000000000 ]--- [ 6254.164766] ------------[ cut here ]------------ [ 6254.169370] WARNING: CPU: 64 PID: 18885 at lib/xarray.c:1010 xas_split_alloc+0xf8/0x128 [ 6254.177361] Modules linked in: loop isofs cdrom vhost_net vhost vhost_iotlb tap tun bluetooth tls nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat ast drm_shmem_helper drm_kms_helper acpi_ipmi ipmi_ssif arm_spe_pmu ipmi_devintf ipmi_msghandler arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq drm fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c crct10dif_ce ghash_ce igb sha2_ce sha256_arm64 sha1_ce sbsa_gwdt i2c_designware_platform i2c_algo_bit i2c_designware_core xgene_hwmon sg dm_mirror dm_region_hash dm_log dm_mod Tested-by: Zhenyu Zhang On Mon, Sep 4, 2023 at 1:20=E2=80=AFAM Joel Fernandes (Google) wrote: > > Hello! > > Here is v6 of the mremap start address optimization / fix for exec warnin= g. > Should be hopefully final now and only 2/7 and 6/7 need a tag. Thanks a l= ot to > Lorenzo and Linus for the detailed reviews. > > Description of patches > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > These patches optimizes the start addresses in move_page_tables() and tes= ts the > changes. It addresses a warning [1] that occurs due to a downward, overla= pping > move on a mutually-aligned offset within a PMD during exec. By initiating= the > copy process at the PMD level when such alignment is present, we can prev= ent > this warning and speed up the copying process at the same time. Linus Tor= valds > suggested this idea. Check the individual patches for more details. > [1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/ > > History of patches: > v5->v6: > 1. Reworking the stack case a bit more and tested it (should be final now= ). > 2. Other small nits. > > v4->v5: > 1. Rebased on mainline. > 2. Several improvement suggestions from Lorenzo. > > v3->v4: > 1. Care to be taken to move purely within a VMA, in other words this chec= k > in call_align_down(): > if (vma->vm_start !=3D addr_masked) > return false; > > As an example of why this is needed: > Consider the following range which is 2MB aligned and is > a part of a larger 10MB range which is not shown. Each > character is 256KB below making the source and destination > 2MB each. The lower case letters are moved (s to d) and the > upper case letters are not moved. > > |DDDDddddSSSSssss| > > If we align down 'ssss' to start from the 'SSSS', we will end up dest= roying > SSSS. The above if statement prevents that and I verified it. > > I also added a test for this in the last patch. > > 2. Handle the stack case separately. We do not care about #1 for stack mo= vement > because the 'SSSS' does not matter during this move. Further we need t= o do this > to prevent the stack move warning. > > if (!for_stack && vma->vm_start <=3D addr_masked) > return false; > > v2->v3: > 1. Masked address was stored in int, fixed it to unsigned long to avoid t= runcation. > 2. We now handle moves happening purely within a VMA, a new test is added= to handle this. > 3. More code comments. > > v1->v2: > 1. Trigger the optimization for mremaps smaller than a PMD. I tested by t= racing > that it works correctly. > > 2. Fix issue with bogus return value found by Linus if we broke out of th= e > above loop for the first PMD itself. > > v1: Initial RFC. > > Joel Fernandes (1): > selftests: mm: Add a test for moving from an offset from start of > mapping > > Joel Fernandes (Google) (6): > mm/mremap: Optimize the start addresses in move_page_tables() > mm/mremap: Allow moves within the same VMA for stack moves > selftests: mm: Fix failure case when new remap region was not found > selftests: mm: Add a test for mutually aligned moves > PMD size > selftests: mm: Add a test for remapping to area immediately after > existing mapping > selftests: mm: Add a test for remapping within a range > > fs/exec.c | 2 +- > include/linux/mm.h | 2 +- > mm/mremap.c | 73 +++++- > tools/testing/selftests/mm/mremap_test.c | 301 +++++++++++++++++++---- > 4 files changed, 329 insertions(+), 49 deletions(-) > > -- > 2.42.0.283.g2d96d420d3-goog >