From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57CC5C43217 for ; Thu, 10 Nov 2022 20:59:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8DAE6B0071; Thu, 10 Nov 2022 15:59:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3E8F6B0072; Thu, 10 Nov 2022 15:59:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DEC96B0074; Thu, 10 Nov 2022 15:59:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8F2856B0071 for ; Thu, 10 Nov 2022 15:59:49 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 65264AB251 for ; Thu, 10 Nov 2022 20:59:49 +0000 (UTC) X-FDA: 80118749298.11.CE67129 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by imf02.hostedemail.com (Postfix) with ESMTP id ECA6C80006 for ; Thu, 10 Nov 2022 20:59:48 +0000 (UTC) Received: by mail-pf1-f174.google.com with SMTP id q9so3107241pfg.5 for ; Thu, 10 Nov 2022 12:59:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ttnTp5vp2+AUQBRn+QGy/iRGXLK+0tabr5kkUXhWOgs=; b=V6Q9AJjq/+dmrcYI7h+aulvTykjmhLd1RDGvAX9rhCqLvMzcJiQ8Bq6I4O8Zo+KbBU v+S7axetyuIG5g85HmYt13h/reXa2j++X+JW5oqNALhMY7dxU1aadTzPRDRYCdqxNroz KOTp3OLn+73So8MdZWAfOvC0Vxjh2c1pzAjcR6F+7kVKThpzp7508UVq1YyfTJnRt8bK Z15Z8mnTMqIgtcmB5EwLk5l0oviHvJvodz4eh8a1vTrwnm3JUorhU+AHtzIoVbZ0aqQO MniReX3X8yEvtBqyzq2JoLwiotAOZUmLFZYvLFsqLciS3BrALwofpQSyBf+i/ZU/pA1r 8bVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ttnTp5vp2+AUQBRn+QGy/iRGXLK+0tabr5kkUXhWOgs=; b=Tm81YThvFml3XNYGpJPNy24BF/Gte4+KsTkrb+070jfpSnH6sOWPp4cyOTxwmgXMJl PY9aIIBkGi3lWqjQo1g55HjScLXROH535ZnUmBn38Y1jbmatcc/9pA5LYozVQzxv2ndH M4vVz/AHOxsgR9Ow4B6uBbiNC/0fCloDlnI//qNbJVoxrgU0fjZ39POUUoHkrr7z8u9m b+DP9s9eJ3lgqoayjjny7ioCu80+kpPKKLewJJRu8DZrhfrWVj4fDJ/ChsvogFLT11BL PW4VTwXuf6VA52LK4J6YWFUSnxVMNhj6e/L6YGPtnklRP1Myj+P3eBvzStuU+WnU1X6R DOkw== X-Gm-Message-State: ACrzQf3Dnarg/xf1AonaDrGlE7MaiekXlERktaYqUP9EPj0ugMp1o2Mi tHUaRMU+iruhZSGx+wcodKU= X-Google-Smtp-Source: AMsMyM6EHzt5LxVevsndFmHExWuuCQcz+zx1M9l033QaIuOyzvy5wN/9s8a4u75MaKHlbgvIrK8GaQ== X-Received: by 2002:aa7:9d02:0:b0:565:b4fe:de85 with SMTP id k2-20020aa79d02000000b00565b4fede85mr3495651pfp.81.1668113987711; Thu, 10 Nov 2022 12:59:47 -0800 (PST) Received: from smtpclient.apple ([66.170.99.95]) by smtp.gmail.com with ESMTPSA id n17-20020aa79851000000b0052d4b0d0c74sm94457pfq.70.2022.11.10.12.59.46 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Nov 2022 12:59:47 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: [PATCH v8 1/2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing From: Nadav Amit In-Reply-To: <20221108011910.350887-2-mike.kravetz@oracle.com> Date: Thu, 10 Nov 2022 12:59:45 -0800 Cc: Linux-MM , kernel list , Naoya Horiguchi , David Hildenbrand , Axel Rasmussen , Mina Almasry , Peter Xu , Rik van Riel , Vlastimil Babka , Matthew Wilcox , Andrew Morton , Wei Chen , stable@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <9BB0EA0C-6E7C-462B-8374-5BFEC34E8415@gmail.com> References: <20221108011910.350887-1-mike.kravetz@oracle.com> <20221108011910.350887-2-mike.kravetz@oracle.com> To: Mike Kravetz X-Mailer: Apple Mail (2.3696.120.41.1.1) ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=V6Q9AJjq; spf=pass (imf02.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668113989; a=rsa-sha256; cv=none; b=V6gC5wBN3wGYZw71C5VPf9aGwepBDyKyG9hACrt+KG+qKQFhD6Gkozk52L5jxlZSPGiF0p Q+dwQi0SGYXvt+aMBk/uikAKqeAk6xQYv/gwsS6Jd9+hsY3cCe6Q+ZJCRqKC0adcvFDnMJ fn5ZqI9d9W8bj6cZEOtABVLhrmxqgTc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1668113989; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ttnTp5vp2+AUQBRn+QGy/iRGXLK+0tabr5kkUXhWOgs=; b=jB5rCrc8ON6Q3j6j/u51nxPr9tjXPX8p+OVDKQpKFNhNF4ylklTSkVmOeHBZ8cSeaPoKri a6KeJ+BacDG3uGXUK4YUYFcy+jZjuZB9VQVIc8W+FPsv3ygt6iz/L9a/THtjUDJ6x36imQ lO8UpcRMIwOUFMWuyo1cHXfoA7ymbRw= X-Stat-Signature: osq49tr7n1fts95hk7qru8t9o7mq1tb4 X-Rspamd-Queue-Id: ECA6C80006 X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=V6Q9AJjq; spf=pass (imf02.hostedemail.com: domain of nadav.amit@gmail.com designates 209.85.210.174 as permitted sender) smtp.mailfrom=nadav.amit@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam11 X-HE-Tag: 1668113988-399104 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Nov 7, 2022, at 5:19 PM, Mike Kravetz = wrote: > madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page > tables associated with the address range. For hugetlb vmas, > zap_page_range will call __unmap_hugepage_range_final. However, > __unmap_hugepage_range_final assumes the passed vma is about to be = removed > and deletes the vma_lock to prevent pmd sharing as the vma is on the = way > out. In the case of madvise(MADV_DONTNEED) the vma remains, but the > missing vma_lock prevents pmd sharing and could potentially lead to = issues > with truncation/fault races. I understand the problem in general. Please consider my feedback as = partial though. > @@ -5203,32 +5194,50 @@ void __unmap_hugepage_range_final(struct = mmu_gather *tlb, > unsigned long end, struct page *ref_page, > zap_flags_t zap_flags) > { > + bool final =3D zap_flags & ZAP_FLAG_UNMAP; > + Not sure why caching final in local variable helps. >=20 > void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long = start, > unsigned long end, struct page *ref_page, > zap_flags_t zap_flags) > { > + struct mmu_notifier_range range; > struct mmu_gather tlb; >=20 > + mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, = vma->vm_mm, > + start, end); > + adjust_range_if_pmd_sharing_possible(vma, &range.start, = &range.end); > tlb_gather_mmu(&tlb, vma->vm_mm); > + > __unmap_hugepage_range(&tlb, vma, start, end, ref_page, = zap_flags); Is there a reason for not using range.start and range.end? It is just that every inconsistency is worrying=E2=80=A6 >=20 > @@ -1734,6 +1734,9 @@ static void zap_page_range_single(struct = vm_area_struct *vma, unsigned long addr > lru_add_drain(); > mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, = vma->vm_mm, > address, address + size); > + if (is_vm_hugetlb_page(vma)) > + adjust_range_if_pmd_sharing_possible(vma, &range.start, > + &range.end); > tlb_gather_mmu(&tlb, vma->vm_mm); > update_hiwater_rss(vma->vm_mm); > mmu_notifier_invalidate_range_start(&range); > @@ -1742,6 +1745,12 @@ static void zap_page_range_single(struct = vm_area_struct *vma, unsigned long addr > tlb_finish_mmu(&tlb); > } >=20 > +void zap_vma_range(struct vm_area_struct *vma, unsigned long address, > + unsigned long size) > +{ > + __zap_page_range_single(vma, address, size, NULL); Ugh. So zap_vma_range() would actually be emitted as a wrapper function = that only calls __zap_page_range_single() (or worse __zap_page_range_single() which is large would be inlined), unless you use LTO. Another option is to declare __zap_page_range_size() in the header and = move this one to the header as inline wrapper.