From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82424C3DA49 for ; Fri, 26 Jul 2024 00:24:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBB4B6B009B; Thu, 25 Jul 2024 20:24:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D44046B009C; Thu, 25 Jul 2024 20:24:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBD6E6B009D; Thu, 25 Jul 2024 20:24:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9B2536B009B for ; Thu, 25 Jul 2024 20:24:28 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 38F421414BB for ; Fri, 26 Jul 2024 00:24:28 +0000 (UTC) X-FDA: 82380007416.17.443658B Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf07.hostedemail.com (Postfix) with ESMTP id 7F80E4000F for ; Fri, 26 Jul 2024 00:24:26 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0a50Oo0c; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721953465; a=rsa-sha256; cv=none; b=Vy9OTHU9tRRY04wPu/JvdBvDpJOVqE4L0OR+2TDuZd1LSZmr/llFJdcmZvCVvWgssoE6rs flKT7Gu+Ue4hIeXWQV7TJ2YdnVsKLqIdeHy7Zuu80QnMjdcN4D3gU2k+Xlrg8oPdm6/AmK 2BtlFtX7cWuorNeiFbwAJSMZ5oss43o= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0a50Oo0c; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of jthoughton@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721953465; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=77qlRcL4PUP5h8ZRxtqeaydLFX3D5DF3IEwpxS5zrYg=; b=8ihY45qxoc7rrIkZDMfWGMcNigwO/QmVvGUbNPWVk4/bVYzmYce/YG0TygZSC4rV1NO4RX G+ei1pbzBPydgCXkN5zSLY8yK0YaiKuFw8zm5zTRjEa97oAJgYLBHMvXR/iU/n8uu2PcuL iJqYomvqKGdH662UKr4izgZYN7Mjw+g= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-44fe76fa0b8so53061cf.0 for ; Thu, 25 Jul 2024 17:24:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721953465; x=1722558265; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=77qlRcL4PUP5h8ZRxtqeaydLFX3D5DF3IEwpxS5zrYg=; b=0a50Oo0cUwe00O7b8FQAO4WyT/zFzLYqbwUPGhfQQpSsBWpD71hJcj4f4Xa/pKtoec 5BTwhZz7DhxRtlzDxvsj4mZGk6lvad3Etkb8kFRbZqOTi2e+9dQHiq5wTRyO9YGrD65Q GnvSk6uhKrFBiS2QeprVzHN+f9sw7kP0hNSp99lXoBIQDh19HBx+37PivtZcWDix7wEL s4RemZ3gMw+JBMlj7GjkO3aWMQvJqVXOo8fTnwt9Qj6hiMy8bcq8jdtCODHhw1vVrVqr kMTutURFGNwcb+lmsHOpU1XqDpJvgDt88ZKiQkFqQaiQF8mMNxfa76p7TVrTVqlnMV0v MF7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721953465; x=1722558265; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=77qlRcL4PUP5h8ZRxtqeaydLFX3D5DF3IEwpxS5zrYg=; b=hQz8tP8g0zaHPa+AKw0rnykKlboZ2iNEwup5WETIwapUa6x0aK7/Aqxf0uV+zJKOpZ E4YnP9BDxLYzZ0NCo2h0hJzhSMllrQm084uWsXnQrBoFmdTbdGjw3iagF1V3S/a5qDny NMHhpxNDQUua/ejj3P+G289nu5mc46N8UHusQ9WQqKgX2vy+29WioNXnWjO49z4NA+rF rmdtUJBw/wC7lMVN35zr1GtGYRL+QAyV6bJO/+Yi9HyNkqBgT6baNVM08d1n3O0Pk7hw K5pUWeeiL/k9pHyJaP5kTu6CdoswlcL2S41b8ZAJrflOXyZ+QwUcLCMcshDqKJ9W8gMP +tbw== X-Gm-Message-State: AOJu0YyllZQy74acwgKmq9cbdZgstyIvcVBRRZA1tr4AC1ZPNEgTpCPQ +UMA8RZaCbyF+ASeyJZAGgNu0KosWnilk0L9Ar2CApPHKMr1UtCzgWiSmgLtb4PpBDBR99TzbY+ VGYdXao2+IEqfDFo7RcjaliqwTInJkMX3uBSo X-Google-Smtp-Source: AGHT+IFxRpjgJ6wObQXGzkn9wNtJ+JxPuAaUrFwCV1R1K/Oz3SbJ3MmktuoDvJaW8UnM7obg5LRzDolOme3IjG1RkEI= X-Received: by 2002:ac8:7e8f:0:b0:447:d78d:773b with SMTP id d75a77b69052e-44ff3a6e49amr1570931cf.6.1721953465392; Thu, 25 Jul 2024 17:24:25 -0700 (PDT) MIME-Version: 1.0 References: <20240715192142.3241557-1-peterx@redhat.com> <20240715192142.3241557-9-peterx@redhat.com> In-Reply-To: From: James Houghton Date: Thu, 25 Jul 2024 17:23:48 -0700 Message-ID: Subject: Re: [PATCH v3 8/8] mm/mprotect: fix dax pud handlings To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Jiang , Rik van Riel , Dave Hansen , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, Matthew Wilcox , Rick P Edgecombe , Oscar Salvador , Mel Gorman , Andrew Morton , Borislav Petkov , Christophe Leroy , Huang Ying , "Kirill A . Shutemov" , "Aneesh Kumar K . V" , Dan Williams , Thomas Gleixner , Hugh Dickins , x86@kernel.org, Nicholas Piggin , Vlastimil Babka , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 7F80E4000F X-Rspamd-Server: rspam01 X-Stat-Signature: jwrwxoqqjhuqbm3p1xrifpsxaz4whjwm X-HE-Tag: 1721953466-772301 X-HE-Meta: U2FsdGVkX18V9coHuJwSyTE7KNtYMRjf1A+5+XjU0+fEtMiO8+1RFrIXz1HTbkp6vu4kEcwhCZVxuVPXF/XnJ/chQyvAGFJdHSook3bPwtVisOkH3t5HRZh8xW+auyhdTBPKtlCD7Wk8/9RD/wSCNNjHRfR2wukH0onckq490hDRrohP4TiJfh9l8eOSn7X/HXFhsoOPlgH8jz0B7JiUQ0gZlGBLtaY3oGdkvkCdRf786L3e7z8Oegr9/DlYW95EbTtzp6XsKhMSkwJo0ccVuc9K9wMnKPlH4z1iWTdbirTAnCFC6jeXVnpNpeUVejnnOHU/p0LSL4fRLqK3O9I7WE2bfqi2YqSg9luDWWomh78ZOReabjMtJ8UstuyhvMKhNmsyMjtXNpHOrEdehUxHHKoJlVRV9W2AOWM39JoEO/bbmnG15ZANUQBUlpmLPUOM3chRPeCQfFdbIxGstQh11O3SrnHJVDfp/xuNO2XTWk2UwkH37+RUnQuFXAYBdGXLVK2t+bLEV9KbZOfdgfNm3SN5tWoY61lk/JOTmZoGPDzIH0mUnI9kQDTlve1ZrOvibIMZlqGn9zA0htTOtMSWNQ7MbxgAqvWJ2bgJ/PlGpuIpOF/OX1PzRirfBK1g+4CfSf9N26pyH7kr+l3ZQGnao14QlZPb/243VWh/Lgk2edjX2D1yBZGD7E3QadtDR9vC+5yetooSGHmV5n3cfM+iG0T4CcWoeGXIXWQWSTgCC77TqVKjU+wLm1D6rruDuvsSRJG7wgGRqnWznQYfa1X4tqvdubWa7IrVGWVlKnhTDVZ3fRLHeTh6HbjaiNVLNrR7jG4/LMxtTdu9lXWsyHH9ZRXmS343N+bqffeFTEh881cPc+UaoHiJxGgZsBMG1Eb/fo3XTCIgn9d00xER8MSb9ZW8I789CA8LfrzhkJjRjQsfAqibY5Cg5Ougxgpb/qTeIzwcpMDiLeou/6AWFLX UDG9BYqk KCJ4Ig7wfa1CuDOZq9Fetu80MUH2K/NR5wlK4JCj9xyVRaK5qjkHPfrFMy3bu6QnY0L1kkkAoBaAyzHMdHroMe0BxACshgTNvlVTO9NHJRzmDxrAAiCKrLZYg8/OdZQ8TJsPu6u//4CZUdjc4jc1DoBzchQbjbw2lVZ/YP/HbKaypBjjjVMemc3wfF0cWIFnB03fu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 25, 2024 at 3:41=E2=80=AFPM Peter Xu wrote: > > On Thu, Jul 25, 2024 at 11:29:49AM -0700, James Houghton wrote: > > > - pages +=3D change_pmd_range(tlb, vma, pud, addr, next= , newprot, > > > + > > > + if (pud_leaf(pud)) { > > > + if ((next - addr !=3D PUD_SIZE) || > > > + pgtable_split_needed(vma, cp_flags)) { > > > + __split_huge_pud(vma, pudp, addr); > > > + goto again; > > > > IIUC, most of the time, we're just going to end up clearing the PUD in > > this case. __split_huge_pud() will just clear it, then we goto again > > and `continue` to the next pudp. Is that ok? > > > > (I think it's ok as long as: you never map an anonymous page with a > > PUD, > > I think this is true. > > > and that uffd-wp is not usable with non-hugetlb PUD mappings of > > user memory (which I think is only DAX?). > > Uffd-wp has the async mode that can even work with dax puds.. even though= I > don't think anyone should be using it. Just like I'm more sure that nobo= dy > is using mprotect() too with dax pud, and it further justifies why nobody > cared this much.. > > What uffd-wp would do in this case is it'll make pgtable_split_needed() > returns true on this PUD, the PUD got wiped out, goto again, then > change_prepare() will populate this pud with a pgtable page. Then it goe= s > downwards, install PMD pgtable, then probably start installing pte marker= s > ultimately if it's a wr-protect operation. Ah, I didn't understand this patch correctly. You're right, you'll install PMDs underneath -- I thought we'd just find pud_none() and bail out. So this all makes sense, thanks! > > > So it seems ok today...?) > > Yes I think it's ok so far, unless you think it's not. :) > > > > > Also, does the comment in pgtable_split_needed() need updating? > > /* > * Return true if we want to split THPs into PTE mappings in change > * protection procedure, false otherwise. > */ > > It looks to me it's ok for now to me? THP can represents PUD in dax, and = we > indeed want to break THPs (no matter PUD/PMD) finally into PTE mappings. Oh, heh I was thinking of the other comment: /* * pte markers only resides in pte level, if we need pte markers, * we need to split. We cannot wr-protect shmem thp because file * thp is handled differently when split by erasing the pmd so far. */ I guess this is fine too, it just kind of reads as if this function is only called for PMDs. *shrug* > > > > > Somewhat related question: change_huge_pmd() is very careful not to > > clear the PMD before writing the new value. Yet change_pmd_range(), > > when it calls into __split_huge_pmd(), will totally clear the PMD and > > then populate the PTEs underneath (in some cases at least), seemingly > > reintroducing the MADV_DONTNEED concern. But your PUD version, because > > it never re-populates the PUD (or PMDs/PTEs underneath) does not have > > this issue. WDYT? I guess I'm wrong about this -- your PUD version is the same as the PMD version in this respect: both clear the PUD/PMD and then install lower page table entries. > > Could you elaborate more on the DONTNEED issue you're mentioning here? In change_huge_pmd(), there is a comment about not clearing the pmd before setting the new value. I guess the issue is: if a user is MADV_DONTNEEDing some memory and happens to see a cleared PMD, it will just move on. But in this case, if change_huge_pmd() temporarily cleared a PMD, then MADV_DONTNEED saw it and moved on, and then change_huge_pmd() installed a new PMD, the MADV_DONTNEED has basically skipped over the PMD, probably not what the user wanted. It seems like we have the same issue when a PMD is cleared when we're splitting it. But yeah, given that your PUD version is apparently no different from the PMD version in this respect, maybe this question isn't very interesting. :) > > > > > Thanks for this series! > > Thanks for reviewing it, James.