From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D81CFC36010 for ; Fri, 4 Apr 2025 12:53:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0596280003; Fri, 4 Apr 2025 08:53:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB4BC280001; Fri, 4 Apr 2025 08:53:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9079D280003; Fri, 4 Apr 2025 08:53:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6ECDE280001 for ; Fri, 4 Apr 2025 08:53:20 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B0CDB826B3 for ; Fri, 4 Apr 2025 12:53:20 +0000 (UTC) X-FDA: 83296352160.05.ABB9D0F Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf01.hostedemail.com (Postfix) with ESMTP id AD5BE4000A for ; Fri, 4 Apr 2025 12:53:18 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Rcea3R6t; spf=pass (imf01.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743771198; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BxSV5JdfZYerNlQnuHDGWvwKKG0TLYgttgt+Oowf87E=; b=nuIgMehFyvAnOoQBvAdmGcTf8QZFJDOw7hYnVSPb5YzK28wUf9PE3YBNEW8PXv0cs7i/9H twodsd01yXTxxle0DzO8LT25wLbX7lOJhM7Gkj6q20MlVBlVZ5yBsRueli9a3o0KkeRDpy xVqZHYX2QXFUl+Xfe0UbtlX85khqPOo= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Rcea3R6t; spf=pass (imf01.hostedemail.com: domain of richard.weiyang@gmail.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=richard.weiyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743771198; a=rsa-sha256; cv=none; b=A2dcwnQUN+eFVZf8vIGBO/i3TLkpBGYgxfnQUB4hGJKURs10Dx2Ee29eac15oWQ1LkX63g AWWhOLDRiY4OIx2Mk4FQk0BfPtA3VJCnKgoLdLfz2lNLSfQWtU0b99hbzN2cWTK2NwE6Rh WU+9vOTVjxiKiXpIE8Z33l+ex0cOW6k= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-abbd96bef64so372934466b.3 for ; Fri, 04 Apr 2025 05:53:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743771197; x=1744375997; darn=kvack.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=BxSV5JdfZYerNlQnuHDGWvwKKG0TLYgttgt+Oowf87E=; b=Rcea3R6tqgsmUrSit+tPmv7ZsuWENQvU5/8O15zZ8q72buo7+gdnJrDRxdDUUul/6O 43O4FxPzvILnjbNkoerXS431IVcAX4HWKfkvgb34Q1UN6rVaa1FIYXGyWmRPQO+vJ+by ulm6W1iub2qWkxezmXkp47gEADnCcAgsgPSMxBpu9l5HFBcL3emzmGoRcv9G7s7pinnA AHytUY70JqugM37s6AApRlTeHfjpL3RWq3HBQC2VdtL/KdRiGkHkwezP7sE4cZZfZW3O 97gJD29jzKsbqdgvSGTUqwmyq8QMCjhEwxfIk8XlUNIAnJHxV9sqYo0X7A+lUYyTtNg/ hqZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743771197; x=1744375997; h=user-agent:in-reply-to:content-disposition:mime-version:references :reply-to:message-id:subject:cc:to:from:date:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=BxSV5JdfZYerNlQnuHDGWvwKKG0TLYgttgt+Oowf87E=; b=u+elxdzfay5Xv7V3hIoz6+e/UHpFS9qhX+22WNQKvz9jH9Il32Jt4mvyIwxKtifpQn tjw9OAYNmahXnUxNupZDc7vlkEOyUnTemqq5lVtF4tmX4VrXc3ic9SmxD1jYGkibpkB6 iJZUZuy2TN1ROoATJZqdqqKvVukQJN0VNvAPBBlXrwZ1aaojtNMEOExwEpjZIYOnEged cVUmt4oqFm1yuSsFgTl0JAxiyj6TRyX/AkjLzfHbf8RvrwTPdJq6NPYZU9nGKtL9DlVp 4vsTYSMX45h5qLROmBsKFpoKzVnzQ8XJddb6S5X1x/kXXotbQZveNLJJm/1vnaoSUlhd aVdw== X-Forwarded-Encrypted: i=1; AJvYcCXhsdjkQ6xHNLT501hNj81BPEZcUYFRCGPmHySrLoWwQsX4+laUUivLSN0EBlNW2eG6y0w6RxycVg==@kvack.org X-Gm-Message-State: AOJu0YwqA2HUNcXVcuKLHF5oy+O5WPbAiEmsXQXVm+vV+GRliQ7KY3Fm 0sRgzIXWmHuehSHZqJ+/Z8jh6JKVzxDrxXJ811o3Ld+QJo2HYxow X-Gm-Gg: ASbGncvLIQdiEkWjTzmHx1phgnE7XjeBBzOtejVwmvFnLXNsOKGvwKEuqCnQ/0+SqIi 6EgP8jK6InW1iuxGbEv1pFQHdxSQh6tCaEMtrbDuV0591+4IaRKA+J+BU9SOwHaX+z46XavMXbv Z9aFkFUsZSPtPCZtnOhWE0TL/8g6DAs693BfSCz7vCw4n9eKmP2f5GsRApBKcM6FhYm1qjvv6tp 4+tg64WoYT7YHlppOrAlNx0zD0U1zQciMNhf6WBg95I4tnZGaDKL9a9l5lrTn1u1yncxqY0jhx/ J0ZMt4293Q3MvSvtpUhQPX9kBuc6T0up/AYpmYMXXqUOm/NEwahQu8w= X-Google-Smtp-Source: AGHT+IHhpmh0Zd+jSwcDgXoqq4LjVdAHEM1glKtn2rYqjD+Vi3fO5csG4iFVzNS8+2xBUDjtnt2smQ== X-Received: by 2002:a17:907:a4d:b0:ac7:3456:7ece with SMTP id a640c23a62f3a-ac7d1b28752mr343911266b.46.1743771196491; Fri, 04 Apr 2025 05:53:16 -0700 (PDT) Received: from localhost ([185.92.221.13]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5f0880a3cc5sm2335682a12.74.2025.04.04.05.53.15 (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 04 Apr 2025 05:53:15 -0700 (PDT) Date: Fri, 4 Apr 2025 12:53:15 +0000 From: Wei Yang To: Lorenzo Stoakes Cc: Andrew Morton , Vlastimil Babka , Jann Horn , "Liam R . Howlett" , Suren Baghdasaryan , David Hildenbrand , Matthew Wilcox , Rik van Riel , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] mm/vma: fix incorrectly disallowed anonymous VMA merges Message-ID: <20250404125315.5bou5ays7u7sv4rb@master> Reply-To: Wei Yang References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: AD5BE4000A X-Stat-Signature: pmgcq4zqqm65km5wfzpffeutgnc17sjd X-HE-Tag: 1743771198-463151 X-HE-Meta: U2FsdGVkX19aFCRI6XQd2rKtPXLEJt87B8FFBdn3j0ErCl+FFEUi5OwE9GfVNSatUYUS/c9q5pQ41NOUeY4mpCDgysjdJ/MQYhVcaML+siHwZK9cO7Uyc2ka8WBnp84z1u1BE06q4Ow/W0XXRjedTZMTn+TbXkAUH0NbbEiWIKT7e6gucPZFkK1cwRYokRQPt6j2CgavYUeEFfWvbhx/kwttrFFdmaT0JD20QfChcHanwEW0KkYt6edePiacQ9hNzknyCcAxOnjW9OJdibph9pf9/g3UIQUscwOOF/hyVLMDP4uQO9lx1j7QgFWgxaQK2PUNVB6zNmDa8e7pmT7+pYwk7y/sng2+KV/wNmu5B5H1M+y6uCGiqcm96FquyoVNBkb8siv8wFyayc0LfZgcrKKUx5rMZkirHjIpzxCwBI5WT3Lu0H6hT+HeG0fDJTqg2ztv98bxIrQPWmwqZT51RmavxpGd7nKlRnbCtDLdoSZk00eUcG/ZlBBgAp5MDQnHUFkPoOH4dDD+wPbSxHlBJDyKn1IBQG9rRsAfktfd9Aa2dzeYUAFcTrRgzsqtw3K5cRAJmmpcIn8UH50K1lecAUNaA+WIEpyrSO3ufbhSG33jIJtD7MzbevpQJM2ug+IEDUGcmhabvLTm/Om10g5JqzUNgVMujAFDHd0P3NJCAyb0G0HkNlS/8wbFv9GjQxJ7FFBHemEu2tFk3JHpxJhNoM5sucsFP8mXtEAmJJcUaQ6eLQKQV134nMmvY0U86fCeSfR90UusDqPQ2J5oQcv3i14bG367HgFQ5iGoFAPXn3dNZtcaqocHgBnQrjdaPlQaHKTuM2zs1p5wkLyu+UGnL/N/WC82zfDP57I9uCPkXP8Q8UD9u7KvO2OuR4WZuXbpVXtUgAzhm4c1fLxa4tbtD8F8loO5/hiMSHjpYGEXZswHIBt6jTLomwAzXenl5DbijQKet3FJH0pVx9VtztP L/EfCGe/ 9GPFDpTowIWYGt9ktgidTYuhwpseUvdes9WLhcTmvxfeppigBhfMuKPIiKfOqJMdMBev7du3uBs19kJvF7L3kb6+EWj+W63QNNuMXIViyl3DNyd1w5oCnigKhvVAtDahAor8Te5sTk0oPPtWZEsTtKOOTkbV62N/l5lGQtj783JlKSX1zK2l5FpDctkRk7p2gzCPC3xZKI3zyq4HpEwk9AEPG3/tgykOPmPR0HfGvUpzxlwBDkSNDyImrFsq4PDxKOnm6wORgUTWCIpzqGOQRNrJaY4RXkdc8F7UldrEZ8HlAPeLet5H+w57fgQ9CDPSpifr+4hjNqHMYAjcozNFzRbxCO8TxcfKzrSmxP5QBaZntWHzvYJwJs7DAaNr3jaPZ3mzl6uyuqPehja0dOUGJ1+s/e0B6L0kF2F8ja68zNlwpOgJ3f7xM9jXsPBrwJ3zZcUFbF3q7UzzlFCcb8B0Q2C9XH6eUuD0ymTQd+moc7SugUuf9mQ13f9fZ37NOFMkbDTRiB/yiJZBBZ2JKbhBr6GZ6qJ5+yu6gqPfJrrabv7Tlpk/G0zIIzi0yDE4dsWXwemdL8GBkqNRxrrLKhRgSTO9CNjaNk+XmKuKK X-Bogosity: Ham, tests=bogofilter, spamicity=0.135917, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 17, 2025 at 09:15:03PM +0000, Lorenzo Stoakes wrote: [...] >However, we have a problem here - typically the vma passed here is the >destination VMA. > >For instance in vma_merge_existing_range() we invoke: > >can_vma_merge_left() >-> [ check that there is an immediately adjacent prior VMA ] >-> can_vma_merge_after() > -> is_mergeable_vma() for general attribute check >-> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev) > >So if we were considering a target unfaulted 'prev': > > unfaulted faulted > |-----------|-----------| > | prev | vma | > |-----------|-----------| > >This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev). > >The list_is_singular() check for vma->anon_vma_chain, an empty list on >fault, would cause this merge to _fail_ even though all else indicates a >merge. > Great spot. It is hiding there for 15 years. >Equally a simple merge into a next VMA would hit the same problem: > > faulted unfaulted > |-----------|-----------| > | vma | next | > |-----------|-----------| > [...] >--- > mm/vma.c | 81 +++++++++++++++++++++++--------- > tools/testing/vma/vma.c | 100 +++++++++++++++++++++------------------- > 2 files changed, 111 insertions(+), 70 deletions(-) > >diff --git a/mm/vma.c b/mm/vma.c >index 5cdc5612bfc1..5418eef3a852 100644 >--- a/mm/vma.c >+++ b/mm/vma.c >@@ -57,6 +57,22 @@ struct mmap_state { > .state = VMA_MERGE_START, \ > } > >+/* >+ * If, at any point, the VMA had unCoW'd mappings from parents, it will maintain >+ * more than one anon_vma_chain connecting it to more than one anon_vma. A merge >+ * would mean a wider range of folios sharing the root anon_vma lock, and thus >+ * potential lock contention, we do not wish to encourage merging such that this >+ * scales to a problem. >+ */ I don't follow here. Take a look into do_wp_page(), where CoW happens. But I don't find where it will unlink parent anon_vma from vma->anon_vma_chain. Per my understanding, the unlink behavior happens in unlink_anon_vma() which unlink all anon_vma on vma->anon_vma_chain. And the normal caller of unlink_anon_vma() is free_pgtables(). Other callers are on error path to release prepared data. From this perspective, I don't see the chance to unlink parent anon_vma from vma->anon_vma_chain either. But maybe I missed something. If it is not too bother, would you mind giving me a hint? >+static bool vma_had_uncowed_parents(struct vm_area_struct *vma) >+{ >+ /* >+ * The list_is_singular() test is to avoid merging VMA cloned from >+ * parents. This can improve scalability caused by anon_vma lock. >+ */ >+ return vma && vma->anon_vma && !list_is_singular(&vma->anon_vma_chain); >+} >+ > static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_next) > { > struct vm_area_struct *vma = merge_next ? vmg->next : vmg->prev; >@@ -82,24 +98,28 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex > return true; > } > >-static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1, >- struct anon_vma *anon_vma2, struct vm_area_struct *vma) >+static bool is_mergeable_anon_vma(struct vma_merge_struct *vmg, bool merge_next) > { >+ struct vm_area_struct *tgt = merge_next ? vmg->next : vmg->prev; >+ struct vm_area_struct *src = vmg->middle; /* exisitng merge case. */ ^^^ A trivial typo here. >+ struct anon_vma *tgt_anon = tgt->anon_vma; >+ struct anon_vma *src_anon = vmg->anon_vma; >+ > /* >- * The list_is_singular() test is to avoid merging VMA cloned from >- * parents. This can improve scalability caused by anon_vma lock. >+ * We _can_ have !src, vmg->anon_vma via copy_vma(). In this instance we >+ * will remove the existing VMA's anon_vma's so there's no scalability >+ * concerns. > */ >- if ((!anon_vma1 || !anon_vma2) && (!vma || >- list_is_singular(&vma->anon_vma_chain))) >- return true; >- return anon_vma1 == anon_vma2; >-} >+ VM_WARN_ON(src && src_anon != src->anon_vma); > >-/* Are the anon_vma's belonging to each VMA compatible with one another? */ >-static inline bool are_anon_vmas_compatible(struct vm_area_struct *vma1, >- struct vm_area_struct *vma2) >-{ >- return is_mergeable_anon_vma(vma1->anon_vma, vma2->anon_vma, NULL); >+ /* Case 1 - we will dup_anon_vma() from src into tgt. */ >+ if (!tgt_anon && src_anon) >+ return !vma_had_uncowed_parents(src); >+ /* Case 2 - we will simply use tgt's anon_vma. */ >+ if (tgt_anon && !src_anon) >+ return !vma_had_uncowed_parents(tgt); >+ /* Case 3 - the anon_vma's are already shared. */ >+ return src_anon == tgt_anon; > } > > /* >@@ -164,7 +184,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg) > pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start); > > if (is_mergeable_vma(vmg, /* merge_next = */ true) && >- is_mergeable_anon_vma(vmg->anon_vma, vmg->next->anon_vma, vmg->next)) { >+ is_mergeable_anon_vma(vmg, /* merge_next = */ true)) { > if (vmg->next->vm_pgoff == vmg->pgoff + pglen) > return true; > } >@@ -184,7 +204,7 @@ static bool can_vma_merge_before(struct vma_merge_struct *vmg) > static bool can_vma_merge_after(struct vma_merge_struct *vmg) > { > if (is_mergeable_vma(vmg, /* merge_next = */ false) && >- is_mergeable_anon_vma(vmg->anon_vma, vmg->prev->anon_vma, vmg->prev)) { >+ is_mergeable_anon_vma(vmg, /* merge_next = */ false)) { > if (vmg->prev->vm_pgoff + vma_pages(vmg->prev) == vmg->pgoff) > return true; > } We have two sets API to check vma's mergeability: * can_vma_merge_before/after * can_vma_merge_left/right And xxx_merge_right() calls xxx_merge_before(), which is a little confusing. Now can_vma_merge_before/after looks almost same. Do you think it would be easier for reading to consolidate to one function? -- Wei Yang Help you, Help me