From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2B6DC53210 for ; Wed, 4 Jan 2023 23:13:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BD2D8E0002; Wed, 4 Jan 2023 18:13:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 36D9D8E0001; Wed, 4 Jan 2023 18:13:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 237138E0002; Wed, 4 Jan 2023 18:13:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 14E108E0001 for ; Wed, 4 Jan 2023 18:13:11 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BD91D1607DE for ; Wed, 4 Jan 2023 23:13:10 +0000 (UTC) X-FDA: 80318669340.27.958B76E Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by imf17.hostedemail.com (Postfix) with ESMTP id 292B240010 for ; Wed, 4 Jan 2023 23:13:07 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ouGR609R; spf=pass (imf17.hostedemail.com: domain of jthoughton@google.com designates 209.85.221.54 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672873988; a=rsa-sha256; cv=none; b=JbmE8TvgfmukDsXXAMZeL81l4fgeGxACXbFPTI9vQe6KEUmUbDiyNt+t8NHKJuq1IMIpO9 VuDFjNjvxtDHkexI5u/t5bsyBvlBOH7Gi9JzRyb0fBImveos4/zNcx5HItMft6jxE0qyVh 0wuOICZeFZfvzOj3KQLEAMyncXtz55w= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ouGR609R; spf=pass (imf17.hostedemail.com: domain of jthoughton@google.com designates 209.85.221.54 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672873988; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xjj6P6cPGhiVQiupD0AtIKtwyJPxy/e9xg2W8dCMeFw=; b=nILCTx3Kb+SR3Mkl1xHW44bc3uoCHlwWb6LvaGiWEQxxLfjMtaHOatsv3+5ltsONO4i2dG nCK8vpsTG2dy/6nxeUflm0PktWB12QWlLm8nPeGevAJCUanGa7zpMNwDyvx0YXYdodjSnz HMGlhhAjV1VvF5+IdkgGML8w40XhuP4= Received: by mail-wr1-f54.google.com with SMTP id bs20so32461177wrb.3 for ; Wed, 04 Jan 2023 15:13:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xjj6P6cPGhiVQiupD0AtIKtwyJPxy/e9xg2W8dCMeFw=; b=ouGR609RZOmkZkvxjmkRCqfeRj/MaXMqx30eH319QaNfdm7YgDjxkxbal7Kja6Qh7Z +8BGZA+ddz6SjOd7xOBtx619uDQtRU8ZJFD0c3aGCbOcQw94ZPH1JN8lGfibLu90IPMj BVOPJh1aEAKa558Atw5gIiF+YzZrWeh+nFdhqFxunNNAD3xiXY+ryhwiQEfVe+a4G2Vz l2x/gFlxRAxYBvtmcWIMqR1rhXRJZ9rxOz7LM9bpopaSRJr42ggzM/Q+zNHRrqiiNjQp MiLNX+CFFRsEFnHgCrNKgVVAXsvR2wgiCN8cWy3fkJMojvWnTHMSPpw8Pv0ZRcnodyb/ ZALA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xjj6P6cPGhiVQiupD0AtIKtwyJPxy/e9xg2W8dCMeFw=; b=M1RZC24IqxF+f9yeAVZ01Yt2fO0R3PEGGD1F6giOZcRbbDihbB3zMkaLogaK9FqjQJ MLIdei5OPa9DzFgBpIemML8yFkVnoF9vjHSuy8cHPXNWOgt7Xti01CkBEg8yzLxqlL6U 5ub/tl9igbnefT5zF7/uyyhmTJFFoWI+azRvdd1ZhLBqg6fjk0pzbJnuPx280kb6LpLo WIqyOJS/bgOiya/GW3q/8BbCjG83mqzRU66tWiCUdxbeQVQ9OIy7/EiuLMwxLNcg5GBy +acYuLZZQHQO1Nu51MsqVHiOPIX/VQETvkl3IksQUTxNW/f+UmejivpeL2zye7bXXd8x 4n1g== X-Gm-Message-State: AFqh2korEpWEsisV4OiB6bab57kAvfcELdbQ+29NXjJeGA1KN1HJ9jCk F34Yag0C+GHOZJ2B14zpHg/veWB1P12iM90Mrf80zw== X-Google-Smtp-Source: AMrXdXvQmmKJF7je2ykyKg49si6vBOiZgYQ24TRXbwt8ecY4G5pcHzqdaCr3ixetzN2QoOodEwa6JbYV4Q8cdSd4PLo= X-Received: by 2002:a5d:640b:0:b0:290:ef26:df02 with SMTP id z11-20020a5d640b000000b00290ef26df02mr947557wru.664.1672873986523; Wed, 04 Jan 2023 15:13:06 -0800 (PST) MIME-Version: 1.0 References: <20230101230042.244286-1-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Wed, 4 Jan 2023 23:12:53 +0000 Message-ID: Subject: Re: [PATCH] hugetlb: unshare some PMDs when splitting VMAs To: Peter Xu Cc: Mike Kravetz , Muchun Song , Axel Rasmussen , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 292B240010 X-Stat-Signature: n85db7atn8t5jq3sjzjhbtnmj7biap66 X-HE-Tag: 1672873987-785155 X-HE-Meta: U2FsdGVkX1+M6aiTxIU8lUd2V4ueWQ+TOcOjIdrKdqcV8wDVb1j/rYmY3fMMH0whcZ9XoX6Ih84mOfLS7yf2saRrcmV3SS5uM6+2f4ijn/iMQgpsd2yZNfkk2BpM7HnPeg7V14ugOs2se0DNkrjypdRPN45Aq5IU03Puvk/rxZx/vmuUSyw/29CYsQJ7XKg0ssuiV1NKXY0ITgUlxmXmU0hAxVsRq40vJwE6bRN1txmy/p3aUFYggokTweBnLvAh1SySWab3YUQRAbmj6o3kNBLs+wvee8ZJ/qCFBwsiAyPYjM9oMglRqZiXj2YWjLfkBKzZrNeA/OigDOo9nTp8sVQkGW8+R5o+KXJQjPWLmd4VWc7vR0Wo2ZPn30JW1YdoJFIOD1Li6GhiSFFTqJ+RDGwBPAqlU06DnGY0V78/zNbnApnXgd6+jIrL0ny61MWRI7euamxzBGEYTrYIDvQTeG0h6P1MSINGrs0mjna46TGpT+kXJKJdHOFnzw409Ag3fPSf/C6xw52+pZ/kEs7nwW+96D2QjVD6pbnbX5W8HLqTUXdtVV1mXDj3CMsN0Oaq360EOSaTXzu5uehTcCb0tkREMLIW57UmEwRCqiqrccopstwY/zdLJaYVzQ84TpYYQxh24hmt65sUyMuUqNnsaUfLus4hRQuph0vtUDVhaqMI7vDt4NPLsbJvdgSim51sNSq6kJoGkD7zbQz6oV2lVJ2QPKbJ2DegnDm9GG1tHmgWKbAVugbKuMXByF2z79i23jKNih6BoWoF0POggV8FVplNdxS7X9JoOxsdHInL+/NR9udGPdMYs2ky4TX1f9zAddIX54whdsCccpxN5nY2FAJZdRQS7KUWxsxWJyyHBFIXlEFXcTNMrnyzStD6Q+4DxOdJit/RST+WDAHaW8Zn58zOHyDADQJy86mY48mMBYzPIoo0aF64XopEF5u+uQUJywA206eSRBOLqSNXzuI y9sEPcNt 1XAohOItUEhxWY4M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 4, 2023 at 8:03 PM Peter Xu wrote: > > On Wed, Jan 04, 2023 at 07:10:11PM +0000, James Houghton wrote: > > > > I'll see if I can confirm that this is indeed possible and send a > > > > repro if it is. > > > > > > I think your analysis above is correct. The key being the failure to unshare > > > in the non-PUD_SIZE vma after the split. > > > > I do indeed hit the WARN_ON_ONCE (repro attached), and the MADV wasn't > > even needed (the UFFDIO_REGISTER does the VMA split before "unsharing > > all PMDs"). With the fix, we avoid the WARN_ON_ONCE, but the behavior > > is still incorrect: I expect the address range to be write-protected, > > but it isn't. > > > > The reason why is that hugetlb_change_protection uses huge_pte_offset, > > even if it's being called for a UFFDIO_WRITEPROTECT with > > UFFDIO_WRITEPROTECT_MODE_WP. In that particular case, I'm pretty sure > > we should be using huge_pte_alloc, but even so, it's not trivial to > > get an allocation failure back up to userspace. The non-hugetlb > > implementation of UFFDIO_WRITEPROTECT seems to also have this problem. > > > > Peter, what do you think? > > Indeed. Thanks for spotting that, James. > > Non-hugetlb should be fine with having empty pgtable entries. Anon doesn't > need to care about no-pgtable-populated ranges so far. Shmem does it with a > few change_prepare() calls to populate the entries so the markers can be > installed later on. Ah ok! :) > > However I think the fault handling is still not well handled as you pointed > out even for shmem: that's the path I probably never triggered myself yet > before and the code stayed there since a very early version: > > #define change_pmd_prepare(vma, pmd, cp_flags) \ > do { \ > if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ > if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd))) \ > break; \ > } \ > } while (0) > > I think a better thing we can do here (instead of warning and stop the > UFFDIO_WRITEPROTECT at the current stage) is returning with -ENOMEM > properly so the user can know the error. We'll need to touch the stacks up > to uffd_wp_range() as it's the only one that can trigger the -ENOMEM so > far, so as to not ignore retval from change_protection(). > > Meanwhile, I'd also wonder whether we should call pagefault_out_of_memory() > because it should be the same as when pgtable allocation failure happens in > page faults, we may want to OOM already. I can take care of hugetlb part > too along the way. I might be misunderstanding, the only case where hugetlb_change_protection() would *need* to allocate is when it is called from UFFDIO_WRITEPROTECT, not while handling a #pf. So I don't think any calls to pagefault_out_of_memory() need to be added. > > Man page of UFFDIO_WRITEPROTECT may need a fixup too to introduce -ENOMEM. > > I can quickly prepare some patches for this, and hopefully it doesn't need > to block the current fix on split. I don't think it should block this splitting fix. I'll send another version of this fix soon. > > Any thoughts? > > > > > > > > > To me, the fact it was somewhat difficult to come up with this scenario is an > > > argument what we should just unshare at split time as you propose. Who > > > knows what other issues may exist. > > > > > > > 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes") is the > > > > commit that introduced the WARN_ON_ONCE; perhaps it's a good choice > > > > for a Fixes: tag (if above is indeed true). > > > > > > If the key issue in your above scenario is indeed the failure of > > > hugetlb_unshare_all_pmds in the non-PUD_SIZE vma, then perhaps we tag? > > > > > > 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when > > > register wp") > > > > SGTM. Thanks Mike. > > Looks good here too. Thanks, Peter!