From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42049CF2576 for ; Sun, 13 Oct 2024 11:04:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A50456B0085; Sun, 13 Oct 2024 07:04:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A00DD6B0088; Sun, 13 Oct 2024 07:04:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A0726B0089; Sun, 13 Oct 2024 07:04:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6AB406B0085 for ; Sun, 13 Oct 2024 07:04:57 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E5850120779 for ; Sun, 13 Oct 2024 11:04:50 +0000 (UTC) X-FDA: 82668296466.26.F4E93E4 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf29.hostedemail.com (Postfix) with ESMTP id 2F3C1120009 for ; Sun, 13 Oct 2024 11:04:47 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=scylladb.com header.s=google header.b=d9fZWIq8; dmarc=pass (policy=reject) header.from=scylladb.com; spf=pass (imf29.hostedemail.com: domain of avi@scylladb.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=avi@scylladb.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728817448; a=rsa-sha256; cv=none; b=RF10cEB2HVmJDrnsMe25uBgDlUfaArDC5twAAK6J1JR6iJ99J5FYmqS4zu8YfJtjVJFGfr gapfUkVUVVZD3DaTU5d1iH+ryOBDlIOxDZRV6Ik5b7ESk1QrVU9dLQSv4LRjdlV3t/lzFW PrD6bZAj3pluxRCKef8tZG3JaBdq6zE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=scylladb.com header.s=google header.b=d9fZWIq8; dmarc=pass (policy=reject) header.from=scylladb.com; spf=pass (imf29.hostedemail.com: domain of avi@scylladb.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=avi@scylladb.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728817448; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bHycgWd1kE+H9nhnO2dFbXiZmJuMvA1+4BoTo7gocXM=; b=dD0AoDyXPVPodwNjpTmFQuo5aOqcFg6W1lgQrmwrc2bWmr7RjgWocq66N59tEqPUOYvtYf DmfMt/HS1O/s3luwMP4KPOcADRz4EK9WGHbxUjxUsBcZ0TprDR1rI0mkW4psZLHveF0Tw5 4NRjmrrfoRXh8L8HX/qAIw1YqRAHnwg= Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-42f6bec84b5so32997205e9.1 for ; Sun, 13 Oct 2024 04:04:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=scylladb.com; s=google; t=1728817493; x=1729422293; darn=kvack.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=bHycgWd1kE+H9nhnO2dFbXiZmJuMvA1+4BoTo7gocXM=; b=d9fZWIq8SWnLJtevaXw5l/uAZVxd++giL3TrRf0zAcTcpdtziJjlXM04RsSHzBQgXw N9eu1t1t4Erx2DPmT29AJh5ca2e0EIaIJLdUzM6I/GT3AjKlYFbqzUtYtRUdjIz68Byl IcbhxCJG+DQTvUeSBbf313pyIiok5xz0vmmT8FQLL/Jj9ES71jULd3N2EwHSdNFGTAqS PeY0VHyfYAQfyxAj17fl+fUwGHHvVVSQB/uAN4/SPhQfHhe5dcRviNTayaYTLgInEIol 8aXW4V6rpjaGAILk6NRg4sD1PZ0kuOLiudiICTSWmNMOg1BPeK24C/v7rVnWbjyBkQSD FO1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728817493; x=1729422293; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=bHycgWd1kE+H9nhnO2dFbXiZmJuMvA1+4BoTo7gocXM=; b=iLwZZVJOTyE9e66T/E0kZtlM6u0hrT0Ks9suv5S7a8EGSQ7ZxPpHWXvAULGAwZvRpE CRfxzKmxcY6wDeXU4aHLWZOd1GHRft86vpWheX1A2gmN8s9xzExwPoq8Bl7xkqiUizRy rnBth7FTrKe88cHxAhphaFYV4COf2QALACtqZBq+U8gg/5mqp2kuvqag5sW9jv8r5Bt8 Kg9YB147NlfL6n3/dlO0P1aoQvcjip+uicqjXu1DB12miLzn2vetbUJoenfC9ghpvH2P 2SUXlZnY6bQXcEBnx1MmYXiwWC6x18ei2H8sQkXAlnGvPyqtWsdZ29GWspkMCK1X3nIT c6eQ== X-Gm-Message-State: AOJu0YxKOjbfwV1HKHZZrda+0JesxZBJvKssyT6h8JNtxCTWYO0KxjG8 /haiFK3hixqnWLheeEO9gMmoD6nnWcOTG9zAbewUoOGNNnx8dCJgcAQI/BS5rgrzBu2uYF1cfi6 WcQMo9jGybJTeiRkVbOPnfdeWgAa4Zkdqg9EZdiG9NEKvn/m7kM8xmVXHhqJIPv5k0AJ35bb5A2 OckRhrRdVvfbJecQzkF/8t+v2ToD5CZecvgVNL04l3yaW9U+mszvKtVSysPAqDytEQ4m1J0wKmb 9HgCkP5k7od/XF7bd2h4bf+84N+nNPV9/y+nvWk7DHLP/LUPbfZ3SW34BO5GejOFEif1sllg3pC Vnz7bOL+nh0gFPXBnZ8= X-Google-Smtp-Source: AGHT+IH9w9vWXNKDdhdDtrzgPNQdALUDdM9fr9JhLI73Q3K3Y15Fol2YoGF+GJc5C5dIOKXUJB1x5Q== X-Received: by 2002:a05:600c:358d:b0:42c:b45d:4a7b with SMTP id 5b1f17b1804b1-4311df4236cmr70804885e9.25.1728817492865; Sun, 13 Oct 2024 04:04:52 -0700 (PDT) Received: from avi.scylladb.com ([5.29.124.170]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-37d4b9190dfsm8219852f8f.116.2024.10.13.04.04.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Oct 2024 04:04:52 -0700 (PDT) Message-ID: <9ebd0aad60ca97fd3987fc4730dee0e3e021929c.camel@scylladb.com> Subject: Re: Possible regression with file madvise(MADV_COLLAPSE) From: Avi Kivity To: Yang Shi Cc: linux-mm , Baolin Wang Date: Sun, 13 Oct 2024 14:04:51 +0300 In-Reply-To: References: <8ac28fb858a2394cc72c3dc5924f1fd031fc6fe0.camel@scylladb.com> <04a11431e9edffb85470da3611c287cb7caf3281.camel@scylladb.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.0 (3.54.0-1.fc41) MIME-Version: 1.0 X-CLOUD-SEC-AV-Sent: true X-CLOUD-SEC-AV-Info: scylladb,google_mail,monitor X-Gm-Spam: 0 X-Gm-Phishy: 0 X-CLOUD-SEC-AV-Sent: true X-CLOUD-SEC-AV-Info: scylla,google_mail,monitor X-Gm-Spam: 0 X-Gm-Phishy: 0 X-Rspam-User: X-Rspamd-Queue-Id: 2F3C1120009 X-Rspamd-Server: rspam01 X-Stat-Signature: 4fgeu4cjqc8ckcjwizymbadujqkgdq45 X-HE-Tag: 1728817487-749374 X-HE-Meta: U2FsdGVkX1+YHgQSmy/a2Q6lULK83YikNrNXRCuTbNgJv5z+I3PtuCwJqSM7aOQYDUvYVKv73UkYXuw7IaNc8ZnDZpPTGljd1Na8B19OtPfPat+YKm6XPfCAPJ0DSvCM/2CTH59cp12rBVY6KiQAKABrcAHTgZ+ar7vMRrqDKT6hNmg8k29CBUY8cmyE36M11Ybu09hMroLaKB702mM93Uw0iBfC5RGHRLlQlwsTZdNthUUvG7FizhAUHhAqzpfzJCpv6KH5pSDGSVHvFSWdg6N7cMQsTPV7kbSfoeOr0cjDwHCEcpU1M5aFfNA9bD5iU5+gMQ/gQC/5s/Pmvr4UmIrd5NvsUnyUktduFgh5RlucNtE0ZZzFLUJ3B46w5lF225ktgevDhuD5aDFGBc2PpuPKcj0xagRNjBWKAf8UFmtL4MDQSgh8XrrRuiivA9YQVR4TGW/Ss+Z4YrfDVBCFwPL3B6BV+NnFfRAuUCA4/fMaWS2ifXyYF8hVPmAQ9qJkOzKBRNocp06gZvKk6GuUchHin3NzioiBJOY4IFQ8Dsp5oguJdU3pIGMmS3R7aGrXhAAO8v9oGEAkG0WZXTpTKmgfxg/EE2qZ+XYC4FFU59IYwfLI4IYdKVWwitJudRrBGXLwelFM8bvwaKTzubn7ORBhhhU5TqMBK03Fl/vuYOKSFzEHXYY4eEFW+AQieCfUCzjTlSOPCOwqQI+fNG8ZXsIvk/ZX3Knqaej4KYug7Tmz9HHtlxgwWz4zk9VZL4imkXXsB1TjjjbUbaRxWxYp11MQymHV60sJlxPdMz6EM+caVcXU63P9PBItHmBV6mVj6UKy6F0tSzqoWnf8w/Lda+dsjgQll0lsbuqt8WB3V/LNgPW91E+xRdMHgNQFzEAlbSq8hel9TVJeL1MS3GN4D4wJjrSyQ1tulq2Ruq44W85bB1NX+xvS5HgjzTYxnQcf9r695yBWDHky60Z3Lv7 6HKWCakl 0nlV2Dt6KjzQMvllNicIHfs/OSFv4UKgW/VmXQUWVBjQJkdBfLYvzOm/4ekZS+K4ijiM5Iaau42SjQbRz2pr+B4W/5x9UdueckU2xGv2Y05C5/uIx+Nfwmhkp2GmV1u7ibLfnY9WuFxFTuSMa1EAMhzcLBeJSjKdpsH04G7WKrPC+hvz0h3/qbiwLomCVLnO/iM20JMlIkv6NukYuJPrJj2tSP+39rh3DvD5/B4w7cAHSqUG3FZXgEFgQTnOBcRom3kRyj6NE+R2wXp8Lex+zz0sTHuKn24FX3F6J3i2J00+xPGYj4ONV1Va/a4XwPPeg+6AkjaEm2Uv5stxV+p+GT3KAs90QMI5+mF1AIjZny+GQX6o+7J0OUxVLqvQ6BtaoJyoEKdimKMoZteEFrtH29cEr3EtMpFEUX1UvzUrQ7bsuPF++WIU8lxv6UVZ7R2uvDWTtHleVYllE3cfczsiPFOkZ1jiq3+scCDCT X-Bogosity: Ham, tests=bogofilter, spamicity=0.001050, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, 2024-10-12 at 16:50 -0700, Yang Shi wrote: > On Sat, Oct 12, 2024 at 1:24=E2=80=AFPM Avi Kivity wro= te: > >=20 > > On Sat, 2024-10-12 at 13:05 -0700, Yang Shi wrote: > > > On Sat, Oct 12, 2024 at 8:38=E2=80=AFAM Avi Kivity > > > wrote: > > > >=20 > > > > On Fri, 2024-10-11 at 15:29 -0700, Yang Shi wrote: > > > > > On Wed, Oct 9, 2024 at 9:04=E2=80=AFAM Avi Kivity > > > > > wrote: > > > > > >=20 > > > > > > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=3Dy, > > > > > > madvise(MADV_COLLAPSE) on=C2=A0 program text fails with EINVAL. > > > > > >=20 > > > > > > To reproduce, compile the reproducer with > > > > > >=20 > > > > > > clang -g -o text-hugepage=C2=A0 text-hugepage.c \ > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -fuse-ld=3Dlld \ > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -Wl,-zcommon-page-si= ze=3D2097152 -Wl,-zmax-page- > > > > > > size=3D2097152 > > > > > > \ > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 -Wl,-z,separate-load= able-segments > > > > > >=20 > > > > > > and run: > > > > >=20 > > > > > Didn't clang make the page cache dirty? > > > > >=20 > > > > > Having sync between clang and the execution made the problem > > > > > go > > > > > away > > > > > for me. > > > > >=20 > > > >=20 > > > > I see it even with sync (and msync just before the madvise > > > > calls). > > >=20 > > > Did you stop khugepaged? It may race with MADV_COLLAPSE. If it > > > failed > > > due to race with khugepaged, you should see -EAGAIN instead of > > > -EINVAL. > >=20 > >=20 > > I did not, but I don't imagine I hit the race in all my attempts. > >=20 > > >=20 > > > I did the below commands in a loop for 1000 times, it never > > > failed (I > > > modified the test program a little bit to print out failure if > > > MADV_COLLAPSE returns failure). I had khugepaged stopped and ran > > > the > > > test on v6.12-rc1 kernel on my AmpereOne machine. > > >=20 > > > rm text-hugepage > > > clang -g -o text-hugepage=C2=A0 text-hugepage.c -fuse-ld=3Dlld > > > -Wl,-zcommon-page-size=3D2097152 -Wl,-zmax-page-size=3D2097152 > > > -Wl,-z,separate-loadable-segments > > > sync > > > ./text-hugepage > > >=20 > > > >=20 > > > >=20 > > > > Tracing shows this (last lines before syscall exit): > > > >=20 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 hpage_collapse_s= can_file() { > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __rc= u_read_lock(); > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __rc= u_read_unlock(); > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > > >=20 > > > It meant collapse_file() was not called at all. > > > hpage_collapse_scan_file() failed. A couple of reasons may fail > > > it, > > > for example, refcount is not expected, not on lru, etc. You can > > > trace > > > huge_memory:mm_khugepaged_scan_file to get more information about > > > the > > > failure. > >=20 > >=20 > > =C2=A0=C2=A0 text-hugepage-689146 [023] 200457.073794: > > mm_khugepaged_scan_file: > > mm=3D0xffff92fc512aac00, scan_pfn=3D0x5a4310, filename=3Dtext-hugepage, > > present=3D0, swap=3D0, result=3Dpage_compound >=20 > Aha, it is because v6.10 doesn't support collapse non-PMD order large > folios. It has been fixed in v6.12-rc1. The patch series is: > https://lore.kernel.org/all/cover.1724140601.git.baolin.wang@linux.al > ibaba.com/ >=20 > The subject says "shmem", but it actually works for regular files > too. Thanks a lot. I will retest when 6.12 reaches Fedora testing.