From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 437B7CF2589 for ; Sat, 12 Oct 2024 23:50:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 370826B0082; Sat, 12 Oct 2024 19:50:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F97A6B0083; Sat, 12 Oct 2024 19:50:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 199BB6B0085; Sat, 12 Oct 2024 19:50:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EA4776B0082 for ; Sat, 12 Oct 2024 19:50:47 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CF81412045F for ; Sat, 12 Oct 2024 23:50:41 +0000 (UTC) X-FDA: 82666597692.29.5CFE769 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf04.hostedemail.com (Postfix) with ESMTP id BCE0740003 for ; Sat, 12 Oct 2024 23:50:39 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HcU27xti; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728777000; a=rsa-sha256; cv=none; b=PvUa0/srSOa4iTy1IYwb+KWpDElzF5eQGvCJSy4wclbwTcKrXohmCzcPJzCpiT9RoaxAMy r02aFp2a1IHp/BLqajDw7D7aCqiXN7RKbulXCT/1Ssde++Bc5OYo3UVgOUhiuhdMNhNIUI DasOqpT9FEy14CcJ52n5O/TVA73SpN8= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HcU27xti; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728777000; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HBrZ+DJVOZ06B4K3zZIpqPD1cfqWy036lMt96FyiM4o=; b=chcPJ6I4GCn804VQWopABkai2MGqnJ335ZMOSwRDTTqfDX3LWf0ly/II1vaGz4kQUtlTEy h6Ng1/u/iJvRS78wBl4mc6NBcysJZxzZ8TWkzwWhoLo0Kmlvb59XKjm915q2M3iP0G9ZnZ ABCN/Sq2Qkka42ejbld2eoDuoTV4Tk8= Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-5c941623a5aso4172115a12.0 for ; Sat, 12 Oct 2024 16:50:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728777044; x=1729381844; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HBrZ+DJVOZ06B4K3zZIpqPD1cfqWy036lMt96FyiM4o=; b=HcU27xtisdhANiB04smyLLhhFayL2DdQdTWcIU6lWIw89/tAFEMB8wvIFn1PdgPNap Wu8P/dw5EmxilRcwCL4GqP4sZjVktpcq2fEeQggJFelT0diofytCKzA/IzLsuGrfuPA3 lykA6e0IOtieXAxLPkKKNBsNeUx0RztxkKlKTJE3Jfhd1iyHiG8Or+O+6CdSiTPN74XY w5H8zmyeKPPut7DgaLYnjy5I2ZMheinI589f/HLODU6kzVTOTSTIJ76fPLEOJbvadAct r9jwIeWwcDxPqxqClxahXH7CEdsCJTvgsvO2sEofW5e5ctUMxKlzVvmQD1sIW+39dbvN sVnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728777044; x=1729381844; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HBrZ+DJVOZ06B4K3zZIpqPD1cfqWy036lMt96FyiM4o=; b=MEE3TPDIrQtQMViJlVZzxqDPeB1NfKRBbbZbBr6FCiGXUeXTvp/LjOTbAtARrHfqAk CNAz1DWns44CvhGrBMpnnLWYY2bdQ+ChiGUo/68sPLEfRvZX9Et4xdT3dquL6yg+r6e6 oGY6A1Ve4B9o5Or8Q5aGZuEbPNhF2StWP6bQijCo/CAXP3s2sGE251f7DG6fdbDrf3Mj 8FxKYEZMiFte7oMcsQNXu9mjU9J5pvdznN76NR+mqcOO1jJbHaftzecEkCyq4e01/LIE PdU1hx9rQC9pY85pI/fWLttZ/Cztg+mHWFqmMUHSjiiD7ceTuFjHCF7Ai7Gg2m5gv6aB bUJg== X-Gm-Message-State: AOJu0YxwsUOv/oK+3XOCB1xfl7vT1Iz8C3+Te9lBlcfOyZkPk6mXrqfM E751DcUyZjUT0OnhcyzvHYl2zGBGI6Q70Oa78OT6qKEgM6blCp7iGaknTMLhVTUwnWWsKIrREnV K8g3wmZV4uKywNoKAZBdJoC8UxgGaGZUH X-Google-Smtp-Source: AGHT+IEj+5Zif5nMU8ctF14hJMpIzabhOqF0JZUZwlNFf3CudN/iRg1nQ/Y8yS4TnHR2tQrub2Z4niyCgbF92AzYbu8= X-Received: by 2002:a05:6402:3907:b0:5c9:19ee:97f1 with SMTP id 4fb4d7f45d1cf-5c947596650mr6271437a12.19.1728777043723; Sat, 12 Oct 2024 16:50:43 -0700 (PDT) MIME-Version: 1.0 References: <8ac28fb858a2394cc72c3dc5924f1fd031fc6fe0.camel@scylladb.com> <04a11431e9edffb85470da3611c287cb7caf3281.camel@scylladb.com> In-Reply-To: <04a11431e9edffb85470da3611c287cb7caf3281.camel@scylladb.com> From: Yang Shi Date: Sat, 12 Oct 2024 16:50:32 -0700 Message-ID: Subject: Re: Possible regression with file madvise(MADV_COLLAPSE) To: Avi Kivity Cc: linux-mm , Baolin Wang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: BCE0740003 X-Rspamd-Server: rspam01 X-Stat-Signature: jo4s7m5gk1bbnx987imx5rwu5rwdxxdw X-HE-Tag: 1728777039-407858 X-HE-Meta: U2FsdGVkX196LLfDz6/L98YUbIhdymMpz2In4CFut1vYsSW/308CZl1zVSZYYa58rmBe/htIABKl7pZCjkQXN95YtvLGSZBfJvLNiAv3g+UGzS8b8ZLWfcHD2ox5H0RtFuwu9y6yi/Upe2tGQQxC9NPV4vCUrnlcToPuFnZKb8Zn7XOHyFbe4ubf6hwcs0Rd5REScFEf9Nhr1HdPShqGICULcjARKdqmyTHtn0Vc2/7mxDLmW5V2WYtXF6ZWQfNoN6Fr6KG9sxnTMVc/54PecU3bUIEZF+j0qCkl2BHLe3wMPaPXa3uAjA9L89FRDsWrb61tlN3IB49/7LafJuQuy9ZgjoVbW0yGqbCrAwj9ZWg+hURZP05Uacjha3jYhv152FmtpuomfmchHsLRbSwLYPV0XrFmH2y8TgYlffzR15n7Z7d8l4pmntGxnFsW4w6ZB+wqWQUya8rETXOlG5IEPo8BQTPe1u31Rd9AUSkkpGrr2gcb+rdx3El4fPAZhs1kE3QwIYb0xVUMLrdB5X+0SBxrrjhQ6jdeVAKS04W5bXWY5B88K1EHgA/Khl7s/oBomjPl/H+Koit1diROsJ3vWQlZZlRR6lbsaZBgBT6ObpYRrXuxgQXapOKIpvuEmUD7jpdpzxV4MW96M/FgJK3PasMaITW7GlnCyJKoRmG+OXnsFH1rP+Bpg1wanwpNtIx7DX6lwElHvVnWNqyxdoP4sA7LVSMuZ5v+k1qWI27uU9WyPUi71qyCGq4o9na9JBKxwC1RciujRWd8LGAnMpihc63Q7kjwqTM+9B0aX0NyEbrOFNLx2zpLqhqkoliw+Nc5+k2eUgv5fRjsSbrAQoWZUH8l4iFxGsNZuBAT/nl0OLJJqqZ3g/7StY9lOwdGOH6unkSnsTibUo+bOUpjGRQUNeAk1c1meLuaQcVbUb4MMK+ugz9DUuhClygzWdtdEIiuwFt0BfDpiNzYIxdMLJB fLds/QyH 40SvLkOIXd3wLj0eTr5XcFdpwpmIHLXd3hTgGJFV4XhxneFVsrBVaCyymie0tbaEzS45TAg4O6xy7KCBILZVTOhJT431YsarNP9Rh/XZxOucaGOQdc+jWzqJxY8MXdE2WUK3r03lz/HnZuAZJ2a8DmwAUY3bWXM7qdZLtWOa1rNq3s4kbnxkqd0O/BZniutDt3OnFVN9ped0PMEKlf5vL6DW0q5olQC71q+ttqa/Hug8QOSVUMMNwDWCCeAVdveOdbJ7btqos8oCX2W3Um5N/J9jW3FXaDKySqf6LtxxMBgnM441VQwfR00uYynnnJsHcfWWooZqTg5vZQQIua/xhElqZ0IX60MPRPKdJE0iDZsmo67NJXZmSxqZXWUElZ311Ml2JTCELRlkPKcKV1S6RllDWJSwMsvskHTdAGBqkxN+ammOUtJJTdK05VH3e8mBC1VTNgxlRHAGajGWRPZ6RcJEGlw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Oct 12, 2024 at 1:24=E2=80=AFPM Avi Kivity wrote= : > > On Sat, 2024-10-12 at 13:05 -0700, Yang Shi wrote: > > On Sat, Oct 12, 2024 at 8:38=E2=80=AFAM Avi Kivity w= rote: > > > > > > On Fri, 2024-10-11 at 15:29 -0700, Yang Shi wrote: > > > > On Wed, Oct 9, 2024 at 9:04=E2=80=AFAM Avi Kivity > > > > wrote: > > > > > > > > > > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=3Dy, > > > > > madvise(MADV_COLLAPSE) on program text fails with EINVAL. > > > > > > > > > > To reproduce, compile the reproducer with > > > > > > > > > > clang -g -o text-hugepage text-hugepage.c \ > > > > > -fuse-ld=3Dlld \ > > > > > -Wl,-zcommon-page-size=3D2097152 -Wl,-zmax-page- > > > > > size=3D2097152 > > > > > \ > > > > > -Wl,-z,separate-loadable-segments > > > > > > > > > > and run: > > > > > > > > Didn't clang make the page cache dirty? > > > > > > > > Having sync between clang and the execution made the problem go > > > > away > > > > for me. > > > > > > > > > > I see it even with sync (and msync just before the madvise calls). > > > > Did you stop khugepaged? It may race with MADV_COLLAPSE. If it failed > > due to race with khugepaged, you should see -EAGAIN instead of > > -EINVAL. > > > I did not, but I don't imagine I hit the race in all my attempts. > > > > > I did the below commands in a loop for 1000 times, it never failed (I > > modified the test program a little bit to print out failure if > > MADV_COLLAPSE returns failure). I had khugepaged stopped and ran the > > test on v6.12-rc1 kernel on my AmpereOne machine. > > > > rm text-hugepage > > clang -g -o text-hugepage text-hugepage.c -fuse-ld=3Dlld > > -Wl,-zcommon-page-size=3D2097152 -Wl,-zmax-page-size=3D2097152 > > -Wl,-z,separate-loadable-segments > > sync > > ./text-hugepage > > > > > > > > > > > Tracing shows this (last lines before syscall exit): > > > > > > > hpage_collapse_scan_file() { > > > > __rcu_read_lock(); > > > > __rcu_read_unlock(); > > > > } > > > > It meant collapse_file() was not called at all. > > hpage_collapse_scan_file() failed. A couple of reasons may fail it, > > for example, refcount is not expected, not on lru, etc. You can trace > > huge_memory:mm_khugepaged_scan_file to get more information about the > > failure. > > > text-hugepage-689146 [023] 200457.073794: mm_khugepaged_scan_file: > mm=3D0xffff92fc512aac00, scan_pfn=3D0x5a4310, filename=3Dtext-hugepage, > present=3D0, swap=3D0, result=3Dpage_compound Aha, it is because v6.10 doesn't support collapse non-PMD order large folios. It has been fixed in v6.12-rc1. The patch series is: https://lore.kernel.org/all/cover.1724140601.git.baolin.wang@linux.alibaba.= com/ The subject says "shmem", but it actually works for regular files too. > >