From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89993D2447F for ; Fri, 11 Oct 2024 06:33:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A0016B0096; Fri, 11 Oct 2024 02:33:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04F316B0098; Fri, 11 Oct 2024 02:33:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E31F06B0099; Fri, 11 Oct 2024 02:33:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BFF226B0096 for ; Fri, 11 Oct 2024 02:33:00 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E0F3AAC42F for ; Fri, 11 Oct 2024 06:32:51 +0000 (UTC) X-FDA: 82660353678.21.D801FB9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id E28C5C0004 for ; Fri, 11 Oct 2024 06:32:55 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gjjwc4z5; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of gshan@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=gshan@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728628333; a=rsa-sha256; cv=none; b=e/HpOI2zAjEsfIltCrZQpbjsyKgLFQrN22xyb6w+mrufgaPEs7pQIGoe3/zhf9Q1mRGv01 4F9ba0CTCqbEz3Zl137s0X4ZReCgd1Yf/7raN1v+ko7lW+N121f4E4MXfypcdCfVlqyh9p e4dzaqgPU65CJZSgHEMjX3t1i8nOHHA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gjjwc4z5; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of gshan@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=gshan@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728628333; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8h7elQyZajmKrA/h00+BaqcHtsznM1ahLT71NA8OsWU=; b=tlK1wP0fI0/hqCF38t+vGIK57SyWjaX9+02F4fJYtQsGjQGT7m4nIsp5zQv6VNAvJvMcyA Dd4ca3/32kGH2rBVMagj5rsvIkNObZ5WCP+EjLrQ4GBuidlgPKL1vTsGBWJ4nO/KBE4v4u KoEOLuq6KekvdDJn8aAWIcjeSIaNvMA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1728628377; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8h7elQyZajmKrA/h00+BaqcHtsznM1ahLT71NA8OsWU=; b=gjjwc4z5948rPTwpq00iFsa+wnBCBqJOQESiaW19QOQJCFeXwxHgczeHf7cmbX7FLzqB+Q ZhiSqr2+z3y0hLqoQBJ4GLWtpGGWZz/P374vnv13uNZhxTZGYml1izC+OxjMHAVHVf4Ll+ iBxS13AI9H3RdVE1oAGGVfmRfmTwnOE= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-537-frShwnHCPXarmwv7HhOuLw-1; Fri, 11 Oct 2024 02:32:55 -0400 X-MC-Unique: frShwnHCPXarmwv7HhOuLw-1 Received: by mail-pl1-f197.google.com with SMTP id d9443c01a7336-20bb491189aso23234645ad.3 for ; Thu, 10 Oct 2024 23:32:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728628374; x=1729233174; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8h7elQyZajmKrA/h00+BaqcHtsznM1ahLT71NA8OsWU=; b=VqUgEmXL5kSfJZJVkh/Olbtvcb65OLOTRapKk9fKiUKZGbToYPKTYeagjxfdlD7izB 4RyMa0bTfcFVRlM1FCYzvcpBok1ZUFMxcoyIyqDfVLIfeKF+r/9uWpJjVzgIthcre2L9 3Rw40Oe7umYpIggPixdAhCXoxqxfN59mr2yOjy7FH4gc/J9ALl0t4PBorCsd9onGSF/6 NnDdfm5koSn/lP4r1MHu2bnWWxe7aWWXFNcmry/5ZOG6B4FhsmtFWyJn4dwecf6rmhiG JZ3D8YMaAemBjMZcgoBuECl/XbL2snNoLx0u/eTvqzCDC3KRrEzWV0qtzUz3q7azv7vi Ta6A== X-Forwarded-Encrypted: i=1; AJvYcCUCkJppsBFcdazpp+07MTTP+gBxdjjmv0Y9m+4Qdoi7mW99lDnvzZDyFQ3HUWbSV0hASy9GZ736mQ==@kvack.org X-Gm-Message-State: AOJu0YxLyULSn/b0XZkHopVhykTa+FSwPbjxI2ByvAI/c1bJyRb5eK/8 UjGQa+KaDjJq4bDuX5CGj0zp8o8N1SsjNQ+cKmIuNeqWtVYMJG3JBYd1okASaFwu60dW8e/tZqm hDRCXAyiWjP4NE+6vzFDyF1Y0niKFWFh65B60KKsnd39RS0CF5mMgsLSD X-Received: by 2002:a17:903:1c6:b0:205:8407:6321 with SMTP id d9443c01a7336-20ca13e4956mr27969205ad.9.1728628374399; Thu, 10 Oct 2024 23:32:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHa1UeU8SdkB+hX8d1SX+8FeHF4Mk9a7WHJ6wu/VuqeVL5+ki9dkabGceiO8T7Gt1N80cfG+w== X-Received: by 2002:a17:903:1c6:b0:205:8407:6321 with SMTP id d9443c01a7336-20ca13e4956mr27969015ad.9.1728628373996; Thu, 10 Oct 2024 23:32:53 -0700 (PDT) Received: from [192.168.68.54] ([180.233.125.129]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c8c0e9b8fsm18273255ad.161.2024.10.10.23.32.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Oct 2024 23:32:53 -0700 (PDT) Message-ID: <108c788a-0742-4957-aaa3-6e2e257d11bd@redhat.com> Date: Fri, 11 Oct 2024 16:32:50 +1000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Possible regression with file madvise(MADV_COLLAPSE) To: Avi Kivity , linux-mm References: <8ac28fb858a2394cc72c3dc5924f1fd031fc6fe0.camel@scylladb.com> From: Gavin Shan In-Reply-To: <8ac28fb858a2394cc72c3dc5924f1fd031fc6fe0.camel@scylladb.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: E28C5C0004 X-Rspamd-Server: rspam01 X-Stat-Signature: aphuwxtk8eh433xt8dmmc6a3f1yuhbya X-HE-Tag: 1728628375-358100 X-HE-Meta: U2FsdGVkX1+CksOXePRUG9AvuG9N4hzn6/Q7AIWF79oCDE8g5Acx8hx4unsapc2t6GLfGDIfnFOZdrWsP8QWrKbO3I3HL1n9LyG065B2oc95kUZvGha1e9EI6+Q05HyS2lhEd4/lda20rVCylEVm/E5mG8CdgSdJejBB8kmF5y5fWqprPqO2Iw4zCdqf6FmmJiQaS1KJbTmvh8v1bMyBxPeTPm9aqmkHk0Ep4zjDOzIe+s0KW5bat1y8LnXMQ3Ra3DmNjHZiMtwk7z6V+RdIh1KxKJVPawYBISBSmx4wkK9aKdRroPx/ZOMUcfagXUjkX+HGht5gklrmnWPbRMRx357SxljExwcfDSeneJ1oiQV0mR7K1sIo0mp0oNytszEcvbi3Z/z9XGFJkl7DGYLtEyhNlmo90BW+g+ErpEfs71oXm/kcXMK1lZXQ3+O9vzKDyhceYgRFYeM4W2foLsx8C1cgwnmL11JvCmPEuf5decVTM17eGGuzcJVNqeUGSfomFQsCUW9mwI78x/OawoHWEy1OV2sJpYewVqMLJI7sEDr1lIrU37qRn1QkfX+tmBX+TyxfYq8Td0heoWMMKIticS9KlxNCD4K7w1Q4yuLHz7cfV02Y5XJ9rMOlap/frxWszD7Ng/0ZunFEU3eEJ94PaF8h6ZknI+DJgQkhHkizri92IouR3prmleVwbXWVt9Mk+DXj3NP81mGybOPGtzcbnsb+fKBv9RKOyUh/RJkg3V2Ro8XJKC7PqhiDnepOgqbQHSblFqUAUgH8YswfgKmNi07FFxZWMJIMneK+1eji9g9tsZfIuW3X0oM0zNxWv88nqilE9DZpb36VUosAwZCV+88PlgFWMB+vKzRhCFHVot4HRft8bEr8yi+yBVY+mob2SbjbAJ4AP4ONVquY5oeMvgoD7KtZASkj2HqjJMQSrl6ghfV2zro7GUWQset+b2DJJOyR5RisSBjYNZHZxKM kbjJvRBD Hxhg9TV05Qz7eLEEmlfhpy/gaYIj+BfbShq7dxSz1whzVhqk7BvkF+v7zA6CIfMYZAC6Qod8L8fjceUBewR7WeKNideFIvT6/lQwDCeAijjHS7RHPHF0HYIao5olBAOCNIl3f7SV8Gnf+QSPkK64eUHRRsMA2Vf7HzByRhhze7LxxZKGClG2NkAX8vJXpUgb7aVj7bSeBg4v2OL53tgNDtWHnlI8tu+fdVK9HJHGe+IPN5pJ4sDhx84c/zqa9hbY/9vz8si6biQWBAnqd+tg1Ub87gUkMhaC0UGuAUiKPZlzNXN+Hd5EK/KTw3ia1bah1iW9/RXgeXmbpoM18CzQug7c+XoAMtI5MwkEVOO2XSV2c/6nXYrK85SQ06LTqNdEZOOyv09fghQaoj5pN92w5NDoHWrtNij/sAyuRV3tKS1VlFRqzHGJeZGpwy93eR9KOOQXc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Avi, On 10/10/24 1:54 AM, Avi Kivity wrote: > On Linux 6.10.10 with CONFIG_READ_ONLY_THP_FOR_FS=y, > madvise(MADV_COLLAPSE) on  program text fails with EINVAL. > > To reproduce, compile the reproducer with > > clang -g -o text-hugepage text-hugepage.c \ > -fuse-ld=lld \ > -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152 \ > -Wl,-z,separate-loadable-segments > > and run: > > $ strace -e trace=madvise ./text-hugepage > madvise(0x400000, 2097152, MADV_HUGEPAGE) = 0 > madvise(0x400000, 2097152, MADV_POPULATE_READ) = 0 > madvise(0x400000, 2097152, MADV_COLLAPSE) = -1 EINVAL (Invalid > argument) > > (the funky linker options are needed to make sure the .text vma spans a > hugepage). > > > I say "possible regression" since I haven't tried it with an older > kernel, but I believe it worked at some point or other seeing that > others managed to get it to work. > > ==== text-hugepage.c ==== > #include > #include > #include > #include > > #include > > static > void > try_remap_text_segment() { > FILE *fp = fopen("/proc/self/maps", "r"); > if (!fp) { > return; > } > char *buf = NULL; > size_t n; > while (getline(&buf, &n, fp) >= 0) { > char *lstart = buf; > char *lmid = strchr(lstart, '-'); > if (!lmid) { > continue; > } > *lmid++ = '\0'; > char *lend = strchr(lmid, ' '); > if (!lend) { > continue; > } > *lend = '\0'; > > size_t start = strtoul(lstart, NULL, 16); > size_t end = strtoul(lmid, NULL, 16); > uintptr_t some_text_addr = (uintptr_t)&try_remap_text_segment; > if (some_text_addr >= start && some_text_addr < end) { > end &= ~(uintptr_t)0x1fffff; > madvise((void*)start, end - start, MADV_HUGEPAGE); > madvise((void*)start, end - start, MADV_POPULATE_READ); > madvise((void*)start, end - start, MADV_COLLAPSE); > break; > } > } > free(buf); > fclose(fp); > } > > void > huge_function() { > // Make sure .text is has a huge page full of stuff > asm volatile (".fill 4000000, 1, 0x90"); > } > > int > main() { > try_remap_text_segment(); > } > ==== end text-hugepage.c ==== > I'm able to reproduce the issue with upstream kernel (v6.12.rc2) on ARM64 where the base page size is 4KB. The reason why I looked into the issue is because of commit d659b715e94a ("mm/huge_memory: avoid PMD-size page cache if needed") where -EINVAL is enforced on madvise(MADV_COLLAPSE) on ARM64 where the base page size is 64KB. In order to reproduce the issue, I have to drop the clean pagecache and compile the test program every time. [root@dhcp-10-26-1-237 issue]# cat Makefile default: @echo 1 > /proc/sys/vm/drop_caches @gcc test.c -o test ./test [root@dhcp-10-26-1-237 issue]# make ./test test: test.c:54: try_remap_text_segment: Assertion `ret == 0' failed. <<< Error from madvise(MADV_COLLAPSE) make: *** [Makefile:4: default] Aborted (core dumped) Traced it a bit and found SCAN_FAIL is returned as the following call trace indicates. However, the progream ("test") is opened as readonly, I don't understand how PG_dirty is set. Backtrace ========= sys_madvise do_madvise madvise_behavior_valid madvise_walk_vmas madvise_vma_behavior can_modify_vma_madv madvise_collapse thp_vma_allowable_order hpage_collapse_scan_file collapse_file folio_test_dirty # SCAN_FAIL returned here Snapshot of /proc/`pidof test`/smaps before calling to madvise(MADV_COLLAPSE). [root@dhcp-10-26-1-237 issue]# cat /proc/`pidof test`/smaps | head -n 25 00400000-00600000 r-xp 00000000 fd:05 101812754 /home/gavin/sandbox/issue/test Size: 2048 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 2048 kB Pss: 2048 kB Pss_Dirty: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 2048 kB Private_Dirty: 0 kB Referenced: 2048 kB Anonymous: 0 kB KSM: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 1 VmFlags: rd ex mr mw me hg Thanks, Gavin