From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEA6CEE4993 for ; Mon, 21 Aug 2023 15:09:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A7EB8E0010; Mon, 21 Aug 2023 11:09:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 159048E0006; Mon, 21 Aug 2023 11:09:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3B2B8E0010; Mon, 21 Aug 2023 11:09:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E629D8E0006 for ; Mon, 21 Aug 2023 11:09:22 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BF03F1C9A47 for ; Mon, 21 Aug 2023 15:09:22 +0000 (UTC) X-FDA: 81148445364.05.D5B0C78 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf30.hostedemail.com (Postfix) with ESMTP id E5B6780013 for ; Mon, 21 Aug 2023 15:09:20 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=0VuG3D0c; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=zokeefe@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692630561; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1oXoZoJuq/woE5y++bD9WQmvRVH8quzKKFsoooaxq1U=; b=mNo1+3K2OKFPFZrKyW1zvpucA0lcszUaFOh0QQbg3OPM6dJ4m3aX5YkieTbqEPYTI/bpnn VoOlDqYdBFimLPvxx26cuYjTzQKwARo6pChENrbLGoo8foNKnolFUNBPZx9fMTsCTtDV7E rkLkINg44GlwQ3TSz3HapGF7FqDTGoo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=0VuG3D0c; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=zokeefe@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692630561; a=rsa-sha256; cv=none; b=mmFnDpeAlqWbsVWHr/CKqly40KtX7XaTUOaQCHQlFI6nIPecWAoi+UlyR6WhCJxsjelp52 GVeHysxeCmrssZhEQ8T4zl4nbudHdMFTO801xPpDRveoBDtOFchHDl5nkAsVCeSp6s4dNG sRtZ2S8HCNu9+QCv/pdSkW8694dZKho= Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-5232ce75e26so14955a12.1 for ; Mon, 21 Aug 2023 08:09:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692630559; x=1693235359; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1oXoZoJuq/woE5y++bD9WQmvRVH8quzKKFsoooaxq1U=; b=0VuG3D0cAeTL/+xjnA7JNL7tY4pXxB/TaRxiQmcFKvqB47W9SvpCkhRpS3yJVxVXyA ZSTxAiuxf4wvJoOfayjhuMnJ7VnB8Pk7ctFrHAsgdXGUDJlhb2kPNNtU0AbJcvGrjVC2 MJBghAqUEM/oznwDye8sU4lHbx3zzgURcgRM+HYxW+PXkrlWy9EBDNHbtlOx8S8cuuPq tZ4cqODsFbA2No6za+frgx125lKSY/nSllrTrm87PPYh0LLMUR2EL8d65LxH0S5D4lLQ N7en3HcGgLyAkHJj5yfUtidQHRaLbpSsHO5QiHFk6jYC8jEe2W9l/iJJ4mND8fISrIJ7 r5bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692630559; x=1693235359; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1oXoZoJuq/woE5y++bD9WQmvRVH8quzKKFsoooaxq1U=; b=EFRkBDzrsZG39aEp/eX3ykVhh+fpCE+0HnYNSaD3JJ/zlOUKelHvvl0QOcQIloMdwF DZyefXqiKLN9IIXFjgozXGaIlNwlnIgtZQEcEe/dJg3blQ13MwciXvH0dvs7oOqHm8Nl /v4O3vPw14Jmgu0cv0RndI52dIeCsIKfd9pL6oM721FnUx/YtdcKgatG5iq9TOyBcUu8 djw9Eh6UhpbNAQJQI8j1yRw9ShHEfoY5XIctryPFAFBR6fA+JZB7VVoXwAIDb1/nNEua mvgZwrce2FgkJsGx3abH3h95HOgnE861dYI17ZJR5brAHYe7xGrDY9rThj1/V+s+zB9H DvfQ== X-Gm-Message-State: AOJu0YyDOSaHuDkMoMO/Z0G0cb+ZnBta5Zds6qFKqDaRVOgdqwXN3pez Q2nS8m0YQYiZl5vbcCSEBfWHwCurcyyQ6Gp9Asp1UQ== X-Google-Smtp-Source: AGHT+IG+biksOoGPIZ3ACervhJwDQAtg5fB09XOFkWi+plYXi1gTQbF2w0KLDwnf9pV9btWcD9Fb9mxGBp51w5zIfnY= X-Received: by 2002:a50:d5c2:0:b0:522:4741:d992 with SMTP id g2-20020a50d5c2000000b005224741d992mr255782edj.4.1692630559271; Mon, 21 Aug 2023 08:09:19 -0700 (PDT) MIME-Version: 1.0 References: <20230812210053.2325091-1-zokeefe@google.com> In-Reply-To: From: "Zach O'Keefe" Date: Mon, 21 Aug 2023 08:08:42 -0700 Message-ID: Subject: Re: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" To: Yang Shi Cc: Saurabh Singh Sengar , Matthew Wilcox , Dan Williams , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: m98ezjnbfipa5iibay13rrdgiiy3fpyk X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E5B6780013 X-HE-Tag: 1692630560-269921 X-HE-Meta: U2FsdGVkX1/kukO1PUV2EG02Sdf6sWd0J3TUSsz2t9aP89CJ4g1Ek9TC/CAtyKD8NjCCMgpHf4xN2TqtbR5scSRUNH2qawFrXJawq5iYgBHEzP9p1DzZ8J09Sb1AQ78N//GXdBjt47scVTOEJz/hyicSusqr4qfWUmoqqRgAiSLzQDBPyv8N3pXd283rNyV6jmDwQVUdPr9QZrsE12Mz7yPNgxihsaJBIQCKH82Smop14mmv7KaHoXBDRKzmFUnUThrs9AGd0NETKnvTnMMaQ3DGVPIkn5voJKM5lAWFP4xwlOKYTbBzmClyjHL47ZQhaQs2g+KKre+DMSStJ2D4sht1vOMYKl2Q81A+ExQsURuvHzs3+oTk8RbSQy/pme9UpH2EL3w6Uf1z80VYEuqpQiZ2QG3Pe2caTkrk4F+irPO6+cjfdKyW/bNC+213eGeO1G+aYLundds+QA9wRdJNtJstr9WkcKsViINe6UbpwJNK6rSxSrbtiy0iL3K2eynTP0AWpQ2XBk7J3PPsc6YDH8UFadkj/ThzX8iSAXBy+pJ9YNBBd9av8hXEWE0hP3xNYTkZoIIxL8sEEdo19Yb9rNbKons18B3UvPhbwI1Uqy7qfUl8EqUw8JiImtfbaZffjix0rEZMJuRoCu+FWJ2AjXmtaOmI8txerxkrETCyh0aeAHN6oAuZeOC5x76XxPkRaTImf0UPEkktCDT4rW8Xwqa/EXaEygLQ02p1SoRWCJjIgMH4v9B6uqNA02nrsF+EMJCtJCPEG1B4KfZwwZLnZuX+gooScaGx08nShDy36mY6QnnwKdEgK5/+on8kfbIwsHtQ+FWw199YzYIcwjENN6rAkYyFATVqEhh7jvLPnl8GBjcjNqsQxneT7Xxb9y3qaod+HdSsv81+a3/wajQoF1z+wH9a1rWlTMT0a7nrSpUSR4wZ7imJ4gqJXUFIa2X4yeYJEua4w9hrGCO04zs jH2Ig/Oi 3wiiOULKUXhiKhen3c1VKQI54VstNPBmNz4h8taWakgDDsc/Fe6Muiahbnfa5wC8o9V3mkt1eG3UuhtXcCPCtCwgBLOKhZpJgkFfpHMzh5IyYAZHnse0EdCA3ltq2C9Fksp4PlCgqFeX0UV+qJzdZExlwxD0b1cJaGfNCk1ohT6BuHnEMaOKU7la7QWjr8Ls5JLZJBFZl2TP9PQiiCbUrvR0C4abHneshbjsciCWaqMY0tFBBGzozxEN6XDAXFGNzbfwWouX/rhBOQo9yEN01d0mfK725w/HulnLUINZIuyatRus+9B9Fz/XSfPsx50QIc7j7ZQa3I61tB3oHDyFW2HLV0oHiiJglzxzYD69OqwZGirH3rMXXvLOslq0NF496Wqvf4zRAELB5iMad2n62odvIzWeOH+7ujwJE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Aug 18, 2023 at 2:21=E2=80=AFPM Yang Shi wrot= e: > > On Thu, Aug 17, 2023 at 11:29=E2=80=AFAM Zach O'Keefe wrote: > > > > On Thu, Aug 17, 2023 at 10:47=E2=80=AFAM Yang Shi = wrote: > > > > > > On Wed, Aug 16, 2023 at 2:48=E2=80=AFPM Zach O'Keefe wrote: > > > > > > > > > We have a out of tree driver that maps huge pages through a file = handle and > > > > > relies on -> huge_fault. It used to work in 5.19 kernels but 6.1 = changed this > > > > > behaviour. > > > > > > > > > > I don=E2=80=99t think reverting the earlier behaviour of fault_pa= th for huge pages should > > > > > impact kernel negatively. > > > > > > > > > > Do you think we can restore this earlier behaviour of kernel to a= llow page fault > > > > > for huge pages via ->huge_fault. > > > > > > > > That seems reasonable to me. I think using the existence of a > > > > ->huge_fault() handler as a predicate to return "true" makes sense = to > > > > me. The "normal" flow for file-backed memory along fault path still > > > > needs to return "false", so that we correctly fallback to ->fault() > > > > handler. Unless there are objections, I can do that in a v2. > > > > > > Sorry for chiming in late. I'm just back from vacation and trying to = catch up... > > > > > > IIUC the out-of-tree driver tries to allocate huge page and install > > > PMD mapping via huge_fault() handler, but the cleanup of > > > hugepage_vma_check() prevents this due to the check to > > > VM_NO_KHUGEPAGED? > > > > > > So you would like to check whether a huge_fault() handler existed > > > instead of vma_is_dax()? > > > > Sorry for the multiple threads here. There are two problems: (a) the > > VM_NO_KHUGEPAGED check along fault path, and (b) we don't give > > ->huge_fault() a fair shake, if it exists, along fault path. The > > current code assumes vma_is_dax() iff ->huge_fault() exists. > > > > (a) is easy enough to fix. For (b), I'm currently looking at the > > possibility of not worrying about ->huge_fault() in > > hugepage_vma_check(), and just letting create_huge_pud() / > > create_huge_pmd() check and fallback as necessary. I think we'll need > > the explicit DAX check still, since we want to keep khugepaged and > > MADV_COLLAPSE away, and the presence / absence of ->huge_fault() isn't > > enough to know that (well.. today it kind of is, but we shouldn't > > depend on it). > > You meant something like: > > if (vma->vm_ops->huge_fault) { > if (vma_is_dax(vma)) > return in_pf; > > /Fall through */ > } I don't think this will work for Saurabh's case, since IIUC, they aren't using dax, but are using VM_HUGEPAGE|VM_MIXEDMAP, faulted in using ->huge_fault() The old (v5.19) fault path looked like: static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, unsigned long vm_flags) { /* Explicitly disabled through madvise. */ if ((vm_flags & VM_NOHUGEPAGE) || test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) return false; return true; } /* * to be used on vmas which are known to support THP. * Use transparent_hugepage_active otherwise */ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vm= a) { /* * If the hardware/firmware marked hugepage support disabled. */ if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_D= AX)) return false; if (!transhuge_vma_enabled(vma, vma->vm_flags)) return false; if (vma_is_temporary_stack(vma)) return false; if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG)) return true; if (vma_is_dax(vma)) return true; if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) return !!(vma->vm_flags & VM_HUGEPAGE); return false; } For non-anonymous, the next check (in create_huge_*) would be for that ->huge_fault handler, falling back as necessary if it didn't exist. The patch I sent out last week[1] somewhat restores this logic -- the only difference being we do the check for ->huge_fault in hugepage_vma_check() as well. This is so smaps can surface this possibility with some accuracy. I just realized it will erroneously return "true" for the collapse path, however.. Maybe Matthew was right about unifying everything here :P That's 2 mistakes I've made in trying to fix this issue (but maybe that's just me). [1] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@googl= e.com/