From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11521C25B75 for ; Fri, 31 May 2024 18:31:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 753F56B00A6; Fri, 31 May 2024 14:31:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 703B56B00A8; Fri, 31 May 2024 14:31:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CB646B00AA; Fri, 31 May 2024 14:31:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3E3E26B00A6 for ; Fri, 31 May 2024 14:31:01 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D5E2E16105A for ; Fri, 31 May 2024 18:31:00 +0000 (UTC) X-FDA: 82179532680.02.F842A20 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf22.hostedemail.com (Postfix) with ESMTP id E384EC001E for ; Fri, 31 May 2024 18:30:58 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H0WscpEB; spf=pass (imf22.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717180259; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pAcV3nRDrrpO9NXiDx5moxXJI52iIOrccnZ04YPs0bA=; b=eIO/rdPnQRFMQ339zoteeBe8eJmQhLHi1FV8ExtlsnKVQhVV2vPvAhvifpXeRqbcOkMs73 rVY4j4gQVNFWJqHZ1ed1UK2kcZjwT/hAciOse4cVSPNEunXPxn4QDAog1v9akBFG2WsaLU 5qSlfOcBF28JQT08CNvLb3fTvhXA0R4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717180259; a=rsa-sha256; cv=none; b=ipz379lr/tTv4I50cyfLZN4b/HQrMeDvvtk909JE616PMtVSGW4uqi26+ogCtxUa8MPpV4 77ptgkzBD20KYglUQgFy1gRVU8ilKwqw8FnLdcaQ2eoGgj3b03bygZRNFYU8NVBNsFKE9q AxhZc7tXw47b8GRyvtuZxUSzi4z6NpA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H0WscpEB; spf=pass (imf22.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.47 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a68952bade2so66991866b.1 for ; Fri, 31 May 2024 11:30:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717180257; x=1717785057; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pAcV3nRDrrpO9NXiDx5moxXJI52iIOrccnZ04YPs0bA=; b=H0WscpEBeFk3vAOxZQ++epb+6hG3cDzYcKQ50K1lGgDHu9HTfECOmdo7iRkQkaA3+z MY7H3r8XcW6+9E9iviNcZgcP6gFJHegZq724R9nWy1Yw075qf4z3O/FIFTlx7DVAXGHb xTxfTQKLxcmCtgIm07lKPcBUagJY24iiwUuI/eChE/xH7BoufyCSH0Ltzf8irH1hFGs8 eL2hW5nsC5UGc2I8TtMrRMokVwt0i7O0fnIArFIC8jp4IdcfHM4WtF12V04VHXViEgzz DoveGehFbAHhiON8cKwDGhcaVLRRIln3Lcc2wxAbYZdSbePmbnOS06CITzN+EMsKcOuB 9LcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717180257; x=1717785057; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pAcV3nRDrrpO9NXiDx5moxXJI52iIOrccnZ04YPs0bA=; b=C9lmblHs6gpya23IOL74JZap9RiczsCU3eYbREqo/oQm7vhTTXLAAJ4lYH1KKtaH4Z Xh4HLT33AYhoD5QekVBpS670WyH9JXjTWuXrHxJP7uX8v99rIC6Q0zxhKFFQw1tLxSKK 2IfNNB27vZZGKzEI37w8F6OfDSnecGnYk9a53YZorXsrZOOe99H5+GTllsFsC09OPyxU bc8M6ZKoP9qxD2GppOgqUOP0i5z/WqCETNExvqL70MxpyNQwInBM0rseJNjB3sGlZMry 6Tp02VsLvQRz9Yul9Yj3AjtyVCROB2rlcoxYkIXfkjKwWDfcxC5tuwYRWk5yPkq8rOwk MXzA== X-Forwarded-Encrypted: i=1; AJvYcCXDAHJSAaxFQiX8zMpvRzeHsK5mD1V2W1uzMAFjcmWfeyR+LQREAFgd//JWmmWZwQKGkmB8tEak41Yj9eqQVk2CJDA= X-Gm-Message-State: AOJu0Yz9tFUuSHiTI5Y1maMdYqvLcU+KihGyN3GeDOuE/c+HjsSHGZOI 9DYHsrzsqQeghuucxC6BeS9UXCeQb0LbsUlfo4h1ZERBBrIiSH6U2w/NmnQ8qeHIPse1V/LlZXD 27RngH33HZEkAsD8VFmQ7Fz5wN2U= X-Google-Smtp-Source: AGHT+IGukBgOEB/woAAro6+00wDZDArNYp7YhSIFBLTxiy77FJ9GAthlcaf0roy4SzO6wECnVjOpzwXbIxeSaYkEeC8= X-Received: by 2002:a17:907:5c4:b0:a59:9af5:2c9c with SMTP id a640c23a62f3a-a6820902b7amr254096266b.38.1717180256923; Fri, 31 May 2024 11:30:56 -0700 (PDT) MIME-Version: 1.0 References: <202405311534.86cd4043-lkp@intel.com> <890e5a79-8574-4a24-90ab-b9888968d5e5@redhat.com> In-Reply-To: From: Yang Shi Date: Fri, 31 May 2024 11:30:45 -0700 Message-ID: Subject: Re: [linus:master] [mm] efa7df3e3b: kernel_BUG_at_include/linux/page_ref.h To: David Hildenbrand Cc: kernel test robot , Peter Xu , Jason Gunthorpe , Vivek Kasireddy , Rik van Riel , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Andrew Morton , Matthew Wilcox , Christopher Lameter , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 1jibya1h59e5bp8wipx9ie38qyo13jur X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E384EC001E X-HE-Tag: 1717180258-137869 X-HE-Meta: U2FsdGVkX1+/3JaprwsuMfvzWPefGyHHXviB2BQT2NU1PitXWUDLs/nGqtcoWRwkfRcE62wUq20bXs5T8mRBThap5mVsTOBxKIx/etwBUt9N8a/+y1eDnKkusNtkM5dqfj8mn9EFuAj9OapCTlCmB3gycYMWH9ihiZyBeUd41bqGemS77JUNomhaBtF7h8MRUHoYY7TKqEbqqIHbazje+yPtyZ9dqngfvU6eJm7XMddgll9Yxx+wJTy6IR5bdO4C9LM5s8NQLdnm0GbAmDIjiY/QdboZ49bwZp6n8JY6QP6cQSmZsh92pfJD2hy+67a21s1sZc85/xfBY+W74/o+a3gEph6TKQsVxaUWwTdgcnc+r94cMNG6sPf+RQdJoUUsucIEZ38D2+uo9AcLh0geky+aL0kkSSmQFpxRaCQ/liVa6wJbEW9cTvpKyRl0UdFyRI+1tjs1kniSQStfRmoyRXm/AZeVWaZiwo4uMXzUKDyvKrQNG12uZ/2NIbvpYjTGrRfrX4dyFt4sWjD665V7iIBtaONpftk0RZaA/I/4WSzcwWNGBry3BEw0FsLDifXoPe4gJIVkzRjJliqyGtNvXhIzI3SKCMi78jFaem6jKDckSznssi3SULY15EGFRwFqKAxIz8Bfk9vjS2II1+CNmJE237b3+/fiNpXbbIWaIaekahYTRp+iYU7seuCPo5rOlxg8I3W2IzxefW0L5sTf65fAZTZ8+0O+wx8W7xPOZRJukn4GdzD9RLk5geDOvjA3tYPkiiSKP9aUYaxZ1bFDjOHihrpegR4TKwvp6OBqLHCAESIiveKOufjYC1dXRSWf4R71v2O5nZBUWKHTGIcpyA2hPw4Frh6RpyS9yabNmdkcuPZjdQHaKBAgo2IrldG9CssTHQU42QnWQLioXTyjIzy6nhDwswG2Dkvi/JITLj3ShXK0g9zQpsZnbcy/FoVljFq1cVBaI0TKx7EpKYx xu83I5mH 7uSvF9HbhZV5IW/9UqbpWb7YKSrKgQHGpCLZpbaq1YNS632hMlob9gSKGp87lK2xH6MIgez+xR7PW/608DAfSAhOKB5uI547jLet9IlCvPwfRxm87pt8FOGeKRSF6FQdNXyMenL0TOUlPcBMFYJIL7ePj/hm8mMapBEafUCV4HS+CemOKqZz4wuyZYM1ES1rKVE9Xity6V8VhWnFmqZoGgskudh3Vylwt05WsUN/f4KR8t8AVu6L+rQpQk9Muv4pZv5UYJIzSq1BFe3t6lZ5fOHJ33NVyHle05hhxncNMe3Yqh+XWmMIzOtgfRD9EQJc9f1tSSjwZz+o5g1v8yN5CqfjXYyB09Uv/58Cl/QNAAwbcF8G5CD1RS2R0Bc9VfH6SNDtiRrUJMf3GDWhqs9PqdhRpFaWbfgblelHtEEl0cv3CB43ofKuY7akbK8mPcrcsv6PviQu/mbUQmkPH+s2CG2i4LAg2q0Gbr6e/8rL1bOPkI94Wu03wM+VGo9PpNnNgmNlM/OuMG4e4OWAruxV6ILvCUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 31, 2024 at 11:24=E2=80=AFAM David Hildenbrand wrote: > > On 31.05.24 20:13, Yang Shi wrote: > > On Fri, May 31, 2024 at 11:07=E2=80=AFAM Yang Shi = wrote: > >> > >> On Fri, May 31, 2024 at 10:46=E2=80=AFAM David Hildenbrand wrote: > >>> > >>> On 31.05.24 18:50, Yang Shi wrote: > >>>> On Fri, May 31, 2024 at 1:24=E2=80=AFAM kernel test robot wrote: > >>>>> > >>>>> > >>>>> > >>>>> Hello, > >>>>> > >>>>> kernel test robot noticed "kernel_BUG_at_include/linux/page_ref.h" = on: > >>>>> > >>>>> commit: efa7df3e3bb5da8e6abbe37727417f32a37fba47 ("mm: align larger= anonymous mappings on THP boundaries") > >>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git mas= ter > >>>>> > >>>>> [test failed on linus/master e0cce98fe279b64f4a7d81b7f5c3a23d8= 0b92fbc] > >>>>> [test failed on linux-next/master 6dc544b66971c7f9909ff038b62149105= 272d26a] > >>>>> > >>>>> in testcase: trinity > >>>>> version: trinity-x86_64-6a17c218-1_20240527 > >>>>> with following parameters: > >>>>> > >>>>> runtime: 300s > >>>>> group: group-00 > >>>>> nr_groups: 5 > >>>>> > >>>>> > >>>>> > >>>>> compiler: gcc-13 > >>>>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp = 2 -m 16G > >>>>> > >>>>> (please refer to attached dmesg/kmsg for entire log/backtrace) > >>>>> > >>>>> > >>>>> we noticed the issue does not always happen. 34 times out of 50 run= s as below. > >>>>> the parent is clean. > >>>>> > >>>>> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 > >>>>> ---------------- --------------------------- > >>>>> fail:runs %reproduction fail:runs > >>>>> | | | > >>>>> :50 68% 34:50 dmesg.Kernel_panic-= not_syncing:Fatal_exception > >>>>> :50 68% 34:50 dmesg.RIP:try_get_f= olio > >>>>> :50 68% 34:50 dmesg.invalid_opcod= e:#[##] > >>>>> :50 68% 34:50 dmesg.kernel_BUG_at= _include/linux/page_ref.h > >>>>> > >>>>> > >>>>> > >>>>> If you fix the issue in a separate patch/commit (i.e. not just a ne= w version of > >>>>> the same patch/commit), kindly add following tags > >>>>> | Reported-by: kernel test robot > >>>>> | Closes: https://lore.kernel.org/oe-lkp/202405311534.86cd4043-lkp@= intel.com > >>>>> > >>>>> > >>>>> [ 275.267158][ T4335] ------------[ cut here ]------------ > >>>>> [ 275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275! > >>>>> [ 275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI > >>>>> [ 275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainte= d 6.7.0-rc4-00061-gefa7df3e3bb5 #1 > >>>>> [ 275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PI= IX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > >>>>> [ 275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_r= ef.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) > >>>>> [ 275.271159][ T4335] Code: c3 cc cc cc cc 44 89 e6 48 89 df e8 e4 = 54 11 00 eb ae 90 0f 0b 90 31 db eb d5 9c 58 0f 1f 40 00 f6 c4 02 0f 84 46 = ff ff ff 90 <0f> 0b 48 c7 c6 a0 54 d2 87 48 89 df e8 a9 e9 ff ff 90 0f 0b b= e 04 > >>>> > >>>> If I read this BUG correctly, it is: > >>>> > >>>> VM_BUG_ON(!in_atomic() && !irqs_disabled()); > >>>> > >>> > >>> Yes, that seems to be the one. > >>> > >>>> try_grab_folio() actually assumes it is in an atomic context (irq > >>>> disabled or preempt disabled) for this call path. This is achieved b= y > >>>> disabling irq in gup fast or calling it in rcu critical section in > >>>> page cache lookup path > >>> > >>> try_grab_folio()->try_get_folio()->folio_ref_try_add_rcu() > >>> > >>> Is called (mm-unstable) from: > >>> > >>> (1) gup_fast function, here IRQs are disable > >>> (2) gup_hugepte(), possibly problematic > >>> (3) memfd_pin_folios(), possibly problematic > >>> (4) __get_user_pages(), likely problematic > >>> > >>> (1) should be fine. > >>> > >>> (2) is possibly problematic on the !fast path. If so, due to commit > >>> a12083d721d7 ("mm/gup: handle hugepd for follow_page()") ? CCin= g Peter. > >>> > >>> (3) is possibly wrong. CCing Vivek. > >>> > >>> (4) is what we hit here > >>> > >>>> > >>>> And try_grab_folio() is used when the folio is a large folio. The > >>> > >>> > >>> We come via process_vm_rw()->pin_user_pages_remote()->__get_user_page= s()->try_grab_folio() > >>> > >>> That code was added in > >>> > >>> commit 57edfcfd3419b4799353d8cbd6ce49da075cfdbd > >>> Author: Peter Xu > >>> Date: Wed Jun 28 17:53:07 2023 -0400 > >>> > >>> mm/gup: accelerate thp gup even for "pages !=3D NULL" > >>> > >>> The acceleration of THP was done with ctx.page_mask, however it= 'll be > >>> ignored if **pages is non-NULL. > >>> > >>> > >>> Likely the try_grab_folio() in __get_user_pages() is wrong? > >>> > >>> As documented, we already hold a refcount. Likely we should better do= a > >>> folio_ref_add() and sanity check the refcount. > >> > >> Yes, a plain folio_ref_add() seems ok for these cases. > >> > >> In addition, the comment of folio_try_get_rcu() says, which is just a > >> wrapper of folio_ref_try_add_rcu(): > >> > >> You can also use this function if you're holding a lock that prevents > >> pages being frozen & removed; eg the i_pages lock for the page cache > >> or the mmap_lock or page table lock for page tables. In this case, it > >> will always succeed, and you could have used a plain folio_get(), but > >> it's sometimes more convenient to have a common function called from > >> both locked and RCU-protected contexts. > >> > >> So IIUC we can use the plain folio_get() at least for > >> process_vm_readv/writev since mmap_lock is held in this path. > >> > >>> > >>> > >>> In essence, I think: try_grab_folio() should only be called from GUP-= fast where > >>> IRQs are disabled. > >> > >> Yes, I agree. Just the fast path should need to call try_grab_folio(). > > > > try_grab_folio() also handles FOLL_PIN and FOLL_GET, so we may just > > keep calling it and add a flag to try_grab_folio, just like: > > > > if flag is true > > folio_ref_add() > > else > > try_get_folio() > > > try_grab_page() is what we use on the GUP-slow path. We'd likely want a > folio variant of that. > > We might want to call that gup_try_grab_folio() and rename the other one > to gup_fast_try_grab_folio(). Won't we duplicate the most code with two versions try_grab_folio()? I meant something like: try_grab_folio(struct page *page, int refs, unsigned int flags, bool fast) { if fast try_get_folio() else folio_ref_add() } We can keep the duplicated code minimum in this way. > > Or something like that :) > > -- > Cheers, > > David / dhildenb >