From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACDBAC02198 for ; Sat, 15 Feb 2025 01:47:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 166FF280003; Fri, 14 Feb 2025 20:47:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 117B1280001; Fri, 14 Feb 2025 20:47:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F20B1280003; Fri, 14 Feb 2025 20:47:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D4CD8280001 for ; Fri, 14 Feb 2025 20:47:44 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4C84A140F31 for ; Sat, 15 Feb 2025 01:47:44 +0000 (UTC) X-FDA: 83120492448.08.2788E2C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 0D2E2160003 for ; Sat, 15 Feb 2025 01:47:41 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aPJtSOgt; spf=pass (imf08.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739584062; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jR5FKhI4FVO9AbOrqS45EooHEOiq7ojN8MeZGnbI9i8=; b=P1y+jwyaSCQNMDJ/ty9Bgbk+OaVcvQ15GC6RpYtZwy1xOjKikBoiIKTvCIPVObg6nXX4g+ s0M2D4ERzUxNtB/VtxDm69PYuZHR5ZhKEs3fiNHJXur7desmrrmk/erJqd7Wqvl3P5Ok2A BvM1hxbq+g8r6gqCe/Hqh+IehcfaxHY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739584062; a=rsa-sha256; cv=none; b=t+Q0xBaJoPVBQFgGRyMynkZXq3QfY9QjM37lvLGOiMwhImOzHAuQ7btmcSBf0UgMz7XaM6 /nszP9v7CoqkvG6qFF3wJv+FUVl+RSl6ED+yKCbz+8OjoVZrTzbaGmnxyg0sJpxFa6jHcH Cpc6rtlIIMwD6mzvaKtMs0PeqSDQbtA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=aPJtSOgt; spf=pass (imf08.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739584061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jR5FKhI4FVO9AbOrqS45EooHEOiq7ojN8MeZGnbI9i8=; b=aPJtSOgtpN2KpDX/+k/BQaRGqpnmlSmk+Vx1RNqA4OnFfUUf6UUWe6v6vzds09b4CcXGoQ AsNdW3OSucOkRs/nSfT4ckgGHL0rk7a9hJkgVIJJmGTNfibEagMnDS9yL3gg/JyNLH4Ysz WuUl4hpdKqo+wX23Aop8w6ewJRMA048= Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-96-BW0OQ2a9O4-f6KiTnrCPig-1; Fri, 14 Feb 2025 20:47:40 -0500 X-MC-Unique: BW0OQ2a9O4-f6KiTnrCPig-1 X-Mimecast-MFC-AGG-ID: BW0OQ2a9O4-f6KiTnrCPig_1739584059 Received: by mail-yb1-f199.google.com with SMTP id 3f1490d57ef6-e5898d3cb18so4226668276.0 for ; Fri, 14 Feb 2025 17:47:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739584059; x=1740188859; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jR5FKhI4FVO9AbOrqS45EooHEOiq7ojN8MeZGnbI9i8=; b=bIc96Hom48ctw2wIZoQ0xP4RK0QPmEBMg/CDs6ez7ejj4zCW7Icuy5uLg7uX6zKqVH tWEXmum7b9Z1valKJ9BCe4PNKa4d72AUPxBo0JLEZt2hoOtPM0OGpb7tDswbUhPi/tTJ 7YS4sF5R26NkXAMaRGa1WmZBpaWgbXMd/IxJFxjbk6q0ZXqJTLeAPUax1t1emgm8jjnN 1T5v8gPyl1xOwIbKxesqbeEgDM3uXGaoBhbOG+lj4Nne0AYdaHMG1OUzzzFSS+bp++Be vWAhERi45mJp1Xs8Y2hnTsCWULkjNwQxJF74Pif/Qu4LKGaoUCLjm/ZLIKvhhhJe/SEE Z7wA== X-Forwarded-Encrypted: i=1; AJvYcCWKpekZ/48DVHZ0GHEacR9Z5HeGheqfIjcHk6mZHeJhZhUNswC147tP7rOnA3yQ3TqNoWt2VVDIXw==@kvack.org X-Gm-Message-State: AOJu0YxTcN/SmDJCqywYN6TK7bHInnwHve4LSxd9sxfM1woE+3TS+k/M l1JtcHYAoNdagGtL1yo/kswNwGKCEYSfZufoGLRpjSJI63iwkI+w1HACYNuyGIHC07UrI6NIgU1 B7/pHbNyaD1DglFnqCV7neesm5an/WP0iAWC1TdY2rvHeYHtutdJT601VCvYQ+3GyYnqVMiRoft INFrqLRTAzPILZJ7wVGBjCrNs= X-Gm-Gg: ASbGncsgs6iRBBD4tUAo2lh9rL3us4mP2u5qNgV5KVP2BY7xjZv/9+cr19sXhk2pXl7 u8aGeI88tb6Gn0CXO2zVBLcxT6p2Yo5+MYKdh6To1QkpPhctwsSn4lvGEICSdA3YmohfPf/hiDj I= X-Received: by 2002:a05:6902:e05:b0:e5d:c697:8805 with SMTP id 3f1490d57ef6-e5dc9316db1mr1338205276.41.1739584059356; Fri, 14 Feb 2025 17:47:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IH6MVBDkllzmZdCUlUTnpV4eFhBlnNhsbDJcdaHr1V7wMfWnl/eem9Y6Iz4EKBhulqpEFLIK3WWji7cZMXxH2I= X-Received: by 2002:a05:6902:e05:b0:e5d:c697:8805 with SMTP id 3f1490d57ef6-e5dc9316db1mr1338175276.41.1739584058990; Fri, 14 Feb 2025 17:47:38 -0800 (PST) MIME-Version: 1.0 References: <20250211111326.14295-1-dev.jain@arm.com> In-Reply-To: <20250211111326.14295-1-dev.jain@arm.com> From: Nico Pache Date: Fri, 14 Feb 2025 18:47:13 -0700 X-Gm-Features: AWEUYZmQv0R5du_FIsvAIv_FcUpSBv-yTAAmkD2v_7mf3m_PrbAp8pJ9bqzwcLU Message-ID: Subject: Re: [PATCH v2 00/17] khugepaged: Asynchronous mTHP collapse To: Dev Jain Cc: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, kirill.shutemov@linux.intel.com, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: nBp_rFYpJNLgX7jObYxMzmuU8nRs4XERuZg7cmwzT_E_1739584059 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 0D2E2160003 X-Rspamd-Server: rspam07 X-Stat-Signature: 3gfgzk7k1x1s76zfmoc6j1naawd5rajt X-HE-Tag: 1739584061-401775 X-HE-Meta: U2FsdGVkX18IMc0yzQR7Xi4CRA657D/9RVOwW0jZQGwmQXUt+L6DHK2QZRRkQ8wmWV+b2yR3NT6qW3v/7J037bEtsfUb3enBOVUh+br4I/ZGvw9fVyoR30kvrYnrCQmAKt4GTArHK0SEUA88TTypaTTbnYqllWqp7SZ5obJbZjrfSNcwLyaGTB/lx/7E0aKab2J7I1CNdhsINK7oBnVLxxAK2gjcnuEkXr0OdU7NBgi5FdIXUFqDILKWM9HqJDASV5QyQNq0xvE2+M4B7PkvpHXc/GlJpy+PIh8UcQi4hSHUp5jgp9RWVVkaMh77oXU1KHeYUXjqlYs1ZoEyx+948VEkKbHzVEJH8c5vIjbIXM68j+v/kk3+jQcjEj2mDm24kMNNXW0RjZBqePIcTVBNKcaPqEM234ShL8sLD6VvH8MvmExtzcbohTL0LcH51TYo1+1Lmp+ntcvl2jlekejH4Pho+NfmpsSWdWAXY7ix3j3+7KjENKigh1JsaZohoPdeVORfcfbb3zH8GOSaOafkZOslPJK8eyYzUlyQIbLW/USNdPu+hBQ1VVHIoXxqX21KKuHIGF7F74a6YYhhdlzpvnMYcovbbKdZj1faOQSt3NOJRBkQdUWSzVgFeWYZ/QMdUckTXoqdfto+i5hnqKXfoJ4pYIKa4/+5xb6q/M3n0Eel/DJZIqDfUQHmZk12vV0HYrOVlVqD90oQnSHSfvtsxRH4uhSTHG1RH9ktF+gHMR4+9FY9Oa6hrAPk1NRp6czt+gd+gGPOyItdAxp96ozAYcuLvzd6QrI0P4z7ZwrWNn8eu7CjWRAzWwHZ67+YlNwzOjtmqSbNKn5Wk46odp5QCcHznCmXkMvlsLw6NVDpniGnPAtARJNDIq7Y8ghisJW8rx7g/P6gLw+fEWMTI581PZfAt1BxxVK2KiLrlJk7vWwvTTM1WzQEplAC/tRz9cljHIthpwptUDirCzPPAeE F2Z/hGIo fGPWUX6yRwa7JaeKNDVM0H+4clmM7W4aou3fWqlWeMmR8WqYXzqdwXikqjk0bogB7bsejxNgDJS7uUQaU7wwED8VLQwcFSlws+KI1OLFUQX86dIvcBI9LZmcTLLgHX0X0cwuDoCIQ46pUlOwgDZ00JFUxdt/ujeV5gCxAAOUpjTGhkFLeqBcnaY3AZHgMfgVcyh6qRxQPqdXYG1JXxj4zSQS21YF94lrjN/A1HbdZmY3HnXtjFShvFYKjyhlRb+U8u4OOZBlp8pfho0RmVRiUeAFgIB1HiftPm/Zf9DmKVZlrDAodX64OL2ESdfWI7h6Bii7y18NvnCiGYSPYDZYW23NFcdX72uq8zP34qbbsjkfyfvFCYuTciIRz/x7QI47qXE6skO6X+alkJ0XYAvoxSj7gnA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Dev, I tried to run your kernel to get some performance numbers out of it, but ran into the following issue while running my defer-mthp-test.sh workload. [ 297.393032] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [ 297.393618] WARNING: bad unlock balance detected! [ 297.394201] 6.14.0-rc2mthpDEV #2 Not tainted [ 297.394732] ------------------------------------- [ 297.395421] khugepaged/111 is trying to release lock (&mm->mmap_lock) at= : [ 297.396509] [] khugepaged+0x23a/0xb40 [ 297.397205] but there are no more locks to release! [ 297.397865] [ 297.397865] other info that might help us debug this: [ 297.398684] no locks held by khugepaged/111. [ 297.399155] [ 297.399155] stack backtrace: [ 297.399591] CPU: 10 UID: 0 PID: 111 Comm: khugepaged Not tainted 6.14.0-rc2mthpDEV #2 [ 297.399593] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 297.399595] Call Trace: [ 297.399599] [ 297.399602] dump_stack_lvl+0x6e/0xa0 [ 297.399607] ? khugepaged+0x23a/0xb40 [ 297.399610] print_unlock_imbalance_bug.part.0+0xfb/0x110 [ 297.399612] ? khugepaged+0x23a/0xb40 [ 297.399614] lock_release+0x283/0x3f0 [ 297.399620] up_read+0x1b/0x30 [ 297.399622] khugepaged+0x23a/0xb40 [ 297.399631] ? __pfx_khugepaged+0x10/0x10 [ 297.399633] kthread+0xf2/0x240 [ 297.399636] ? __pfx_kthread+0x10/0x10 [ 297.399638] ret_from_fork+0x34/0x50 [ 297.399640] ? __pfx_kthread+0x10/0x10 [ 297.399642] ret_from_fork_asm+0x1a/0x30 [ 297.399649] [ 297.505555] ------------[ cut here ]------------ [ 297.506044] DEBUG_RWSEMS_WARN_ON(tmp < 0): count =3D 0xffffffffffffff00, magic =3D 0xffff8c6e03bc1f88, owner =3D 0x1, curr 0xffff8c6e0eccb700, list empty [ 297.507362] WARNING: CPU: 8 PID: 1946 at kernel/locking/rwsem.c:1346 __up_read+0x1ba/0x220 [ 297.508220] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill nf_tables intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 i2c_smbus lpc_ich virtio_net net_failover failover virtio_balloon joydev fuse loop nfnetlink zram xfs polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 virtio_console virtio_blk sha1_ssse3 serio_raw qemu_fw_cfg [ 297.513474] CPU: 8 UID: 0 PID: 1946 Comm: thp_test Not tainted 6.14.0-rc2mthpDEV #2 [ 297.514314] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 [ 297.515265] RIP: 0010:__up_read+0x1ba/0x220 [ 297.515756] Code: c6 78 8b e1 95 48 c7 c7 88 0e d3 95 48 39 c2 48 c7 c2 be 39 e4 95 48 c7 c0 29 8b e1 95 48 0f 44 c2 48 8b 13 50 e8 e6 44 f5 ff <0f> 0b 58 e9 20 ff ff ff 48 8b 57 60 48 8d 47 60 4c 8b 47 08 c6 05 [ 297.517659] RSP: 0018:ffffa8a943533ac8 EFLAGS: 00010282 [ 297.518209] RAX: 0000000000000000 RBX: ffff8c6e03bc1f88 RCX: 00000000000= 00000 [ 297.518884] RDX: ffff8c7366ff0980 RSI: ffff8c7366fe1a80 RDI: ffff8c7366f= e1a80 [ 297.519577] RBP: ffffa8a943533b58 R08: 0000000000000000 R09: 00000000000= 00001 [ 297.520272] R10: 0000000000000000 R11: 0770076d07650720 R12: ffffa8a9435= 33b10 [ 297.520949] R13: ffff8c6e03bc1f88 R14: ffffa8a943533b58 R15: ffffa8a9435= 33b10 [ 297.521651] FS: 00007f24de01b740(0000) GS:ffff8c7366e00000(0000) knlGS:0000000000000000 [ 297.522425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 297.522990] CR2: 0000000a7ffef000 CR3: 000000010d9d6000 CR4: 00000000007= 50ef0 [ 297.523799] PKRU: 55555554 [ 297.524100] Call Trace: [ 297.524367] [ 297.524597] ? __warn.cold+0xb7/0x151 [ 297.525072] ? __up_read+0x1ba/0x220 [ 297.525442] ? report_bug+0xff/0x140 [ 297.525804] ? console_unlock+0x9d/0x150 [ 297.526233] ? handle_bug+0x58/0x90 [ 297.526590] ? exc_invalid_op+0x17/0x70 [ 297.526993] ? asm_exc_invalid_op+0x1a/0x20 [ 297.527420] ? __up_read+0x1ba/0x220 [ 297.527783] ? __up_read+0x1ba/0x220 [ 297.528160] vms_complete_munmap_vmas+0x19c/0x1f0 [ 297.528628] do_vmi_align_munmap+0x20a/0x280 [ 297.529069] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.529552] do_vmi_munmap+0xd0/0x190 [ 297.529920] __vm_munmap+0xb1/0x1b0 [ 297.530293] __x64_sys_munmap+0x1b/0x30 [ 297.530677] do_syscall_64+0x95/0x180 [ 297.531058] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.531534] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 297.532167] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.532640] ? syscall_exit_to_user_mode+0x97/0x290 [ 297.533226] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.533701] ? do_syscall_64+0xa1/0x180 [ 297.534097] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.534587] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 297.535129] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.535603] ? syscall_exit_to_user_mode+0x97/0x290 [ 297.536092] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.536568] ? do_syscall_64+0xa1/0x180 [ 297.536954] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.537444] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 297.537936] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.538524] ? syscall_exit_to_user_mode+0x97/0x290 [ 297.539044] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.539526] ? do_syscall_64+0xa1/0x180 [ 297.539931] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.540597] ? do_user_addr_fault+0x5a9/0x8a0 [ 297.541102] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.541580] ? trace_hardirqs_off+0x4b/0xc0 [ 297.542011] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.542488] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 297.542991] ? srso_alias_return_thunk+0x5/0xfbef5 [ 297.543466] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 297.543960] RIP: 0033:0x7f24de1367eb [ 297.544344] Code: 73 01 c3 48 8b 0d 2d f6 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd f5 0c 00 f7 d8 64 89 01 48 [ 297.546074] RSP: 002b:00007ffc7bb2e2b8 EFLAGS: 00000206 ORIG_RAX: 000000000000000b [ 297.546796] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f24de1= 367eb [ 297.547488] RDX: 0000000080000000 RSI: 0000000080000000 RDI: 00000004800= 00000 [ 297.548182] RBP: 00007ffc7bb2e390 R08: 0000000000000064 R09: 00000000fff= ffffe [ 297.548884] R10: 0000000000000000 R11: 0000000000000206 R12: 00000000000= 00006 [ 297.549594] R13: 0000000000000000 R14: 00007f24de258000 R15: 00000000004= 03e00 [ 297.550292] [ 297.550530] irq event stamp: 64417291 [ 297.550903] hardirqs last enabled at (64417291): [] seqcount_lockdep_reader_access+0x82/0x90 [ 297.551859] hardirqs last disabled at (64417290): [] seqcount_lockdep_reader_access+0x4e/0x90 [ 297.552810] softirqs last enabled at (64413640): [] __irq_exit_rcu+0xe2/0x100 [ 297.553654] softirqs last disabled at (64413627): [] __irq_exit_rcu+0xe2/0x100 [ 297.554504] ---[ end trace 0000000000000000 ]--- On Tue, Feb 11, 2025 at 4:13=E2=80=AFAM Dev Jain wrote: > > This patchset extends khugepaged from collapsing only PMD-sized THPs to > collapsing anonymous mTHPs. > > mTHPs were introduced in the kernel to improve memory management by alloc= ating > chunks of larger memory, so as to reduce number of page faults, TLB misse= s (due > to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP pr= operty > is often lost due to CoW, swap-in/out, and when the kernel just cannot fi= nd > enough physically contiguous memory to allocate on fault. Henceforth, the= re is a > need to regain mTHPs in the system asynchronously. This work is an attemp= t in > this direction, starting with anonymous folios. > > In the fault handler, we select the THP order in a greedy manner; the sam= e has > been used here, along with the same sysfs interface to control the order = of > collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap= _write_lock(). > > --------------------------------------------------------- > Testing > --------------------------------------------------------- > > The set has been build tested on x86_64. > For Aarch64, > 1. mm-selftests: No regressions. > 2. Analyzing with tools/mm/thpmaps on different userspace programs mappin= g > aligned VMAs of a large size, faulting in basepages/mTHPs (according t= o sysfs), > and then madvise()'ing the VMA, khugepaged is able to 100% collapse th= e VMAs. > > This patchset is rebased on mm-unstable (4637fa5d47a49c977116321cc575ea22= 215df22d). > > v1->v2: > - Handle VMAs less than PMD size (patches 12-15) > - Do not add mTHP into deferred split queue > - Drop lock optimization and collapse mTHP under mmap_write_lock() > - Define policy on what to do when we encounter a folio order larger tha= n > the order we are scanning for > - Prevent the creep problem by enforcing tunable simplification > - Update Documentation > - Drop patch 12 from v1 updating selftest w.r.t the creep problem > - Drop patch 1 from v1 > > v1: > https://lore.kernel.org/all/20241216165105.56185-1-dev.jain@arm.com/ > > Dev Jain (17): > khugepaged: Generalize alloc_charge_folio() > khugepaged: Generalize hugepage_vma_revalidate() > khugepaged: Generalize __collapse_huge_page_swapin() > khugepaged: Generalize __collapse_huge_page_isolate() > khugepaged: Generalize __collapse_huge_page_copy() > khugepaged: Abstract PMD-THP collapse > khugepaged: Scan PTEs order-wise > khugepaged: Introduce vma_collapse_anon_folio() > khugepaged: Define collapse policy if a larger folio is already mapped > khugepaged: Exit early on fully-mapped aligned mTHP > khugepaged: Enable sysfs to control order of collapse > khugepaged: Enable variable-sized VMA collapse > khugepaged: Lock all VMAs mapping the PTE table > khugepaged: Reset scan address to correct alignment > khugepaged: Delay cond_resched() > khugepaged: Implement strict policy for mTHP collapse > Documentation: transhuge: Define khugepaged mTHP collapse policy > > Documentation/admin-guide/mm/transhuge.rst | 49 +- > include/linux/huge_mm.h | 2 + > mm/huge_memory.c | 4 + > mm/khugepaged.c | 603 ++++++++++++++++----- > 4 files changed, 511 insertions(+), 147 deletions(-) > > -- > 2.30.2 >