From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 518D2C54F30 for ; Wed, 28 May 2025 02:49:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C84F66B00AE; Tue, 27 May 2025 22:49:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5D236B00B0; Tue, 27 May 2025 22:49:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B4AEF6B00B1; Tue, 27 May 2025 22:49:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 92C766B00AE for ; Tue, 27 May 2025 22:49:52 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 26D97C1DFD for ; Wed, 28 May 2025 02:49:52 +0000 (UTC) X-FDA: 83490786624.16.BA933A7 Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf11.hostedemail.com (Postfix) with ESMTP id 306AC40008 for ; Wed, 28 May 2025 02:49:49 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WQ8WOF5z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748400590; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yK9YCUsj+Q6ACjIAt+sN1kZTfxUbf4fSYK+cHE4nqbk=; b=gyO9OGEvMtIFexyiEElx0j/aY82Btj6MHJ3Abp3nyBZDi7mnC98WxKZHfFhRRd1bWppY71 eFdPsdFZqMqQr2nKX8P16X70ng3vEJtrJAZu2jaMBCxqGKy80Skx/N8kVh2buyCmxOlvY9 ZAy/GLMKSF8B78JQwiv+2Gk5dBIbQbM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748400590; a=rsa-sha256; cv=none; b=QdTvvZXLDwmNq1E+2aDe+Tni6yTqZ9Sq1iYyPuWFAWMu3uBSQd0FiC8jAUrhkiozhYH+I8 z1+Aiep3FnQtR8/DrklsweVxr/afNCqZUe9OFf/Ma6kkfR9oI8rn8KhylChK++WWmjKddH vO86VR7jk9yKtEjGDEwCD/KeW+uSVaQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=WQ8WOF5z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-551ece14fbaso661432e87.2 for ; Tue, 27 May 2025 19:49:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748400588; x=1749005388; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yK9YCUsj+Q6ACjIAt+sN1kZTfxUbf4fSYK+cHE4nqbk=; b=WQ8WOF5zuD8XtZeD0iO9UT2KjB2PzP+j3rlnS0yEVGXCAyuDpWQDubE/s2GeLX5S/C w2oUxm2aepyocm8YSOgqRQPmSRNV5yxRheK8RRBbygsj5vUv06l34HTTlqR8opYSgex1 dXHDYy7YYIrXSXmcBDpqnvvHnfXnntApz+PoHP3NiCzdx5tisbyD4ejdGQIeShsUkmAB 4MAyenOls+xQy74YdWeIIim/k1sZkdY4/RIYBijb96KOm93mU75+VKddXWtShxN0OuEO eqQvmSNrhR+HqyJ/pXnjeey0zxdkL46v1wc2/w470tqzn+MXfIj+ZXzFSuLIo6Ilyeqi FNEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748400588; x=1749005388; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yK9YCUsj+Q6ACjIAt+sN1kZTfxUbf4fSYK+cHE4nqbk=; b=vfQKRqnUFEYw2mvuA1ql+9qnor3X2/BxbpYLKdoA8mCPeCf/QfcHdtzh65w/jZnkXX zEJLWwNyadA3WniL8YdI/B2/g0NbVdIKX47glkghG/8CAkGpnZwtV+grxmKQW7vjRnNy XGaqLmKedV6yWJRTpcl/6Y82kMTPaKMWwYiwq5Lgj8UGxyW87Sr661YT6S7ir8AkEphC 8nDbIelYaa/6BqyvseMcVC69D33ZLcQt7eDjJJHhYfk4sAv9TEV/Fb2D2s7FhMFnPFw9 pi56edox0ZqDNKkIq8EXSjkseG5ZckSY6oJnjG+Vpj4vLqWImW28eSjj0Mv31Y/X2qLI /4Fg== X-Forwarded-Encrypted: i=1; AJvYcCVkVhPPNCLuSzaDjvcZ+8DLNTramiDK5Iia1Ht1V8Y56j0Pkf7ugefnVSz5yLoKGOoNvkY1MYIIww==@kvack.org X-Gm-Message-State: AOJu0YyZquv35EtCgD652EL3+nnzdd6CStaEcohFeEWZ0hlv1UMJwQ57 y5hWMOl2lZy2JWkXb+kyB3UBX0kGlVRQArp7axbzjJXW8S7wBKVMft4AiMp6c6yOAZgqyZHZ/gK vUAE9A48C8tD3bjYp0Q+pTgnfk5ZUuMk= X-Gm-Gg: ASbGnctBsNiyAQIQjsa/QJEbfoi3RkJoz5Ge0n13sKQc/FBX+F1OZc+Wl/siTr/Y5rn zyQWKgRkerAiQQ8rCR7wY1wxlmH16a79j2/kTqLueu7QKn93J+yK1J7bVJZzc5N+T3/PjF2s9kj 3/BXJsdj2VXISmi2rs7RXgM86CZmhEV6sIBw== X-Google-Smtp-Source: AGHT+IGHQpmh5eZ5n3fx7yjwcn+ASzBVLIt9uyE3ISAsOHokACIbeMycPhupSxG2Uva+DuFtGEhBDNVtk/p81f+CVe4= X-Received: by 2002:a05:651c:19a4:b0:309:1c59:ec89 with SMTP id 38308e7fff4ca-32a777d76dcmr801231fa.7.1748400588002; Tue, 27 May 2025 19:49:48 -0700 (PDT) MIME-Version: 1.0 References: <20250522144418epcms1p2a31c1a5c95b1937077bddf1b30495e83@epcms1p2> <20250523023709epcms1p236d4f55b79adb9366ec1cf6d5792b06b@epcms1p2> <4e2305d6-b067-4963-b16a-367a254d22c1@nvidia.com> <20250526074845.GA2848800@tiffany> <20250526093258.GA3489925@tiffany> <20250526111744epcms1p89d664f5cebd1e690730f32b66c24e3c0@epcms1p8> <20250528012329.GA1545287@tiffany> In-Reply-To: <20250528012329.GA1545287@tiffany> From: Zhaoyang Huang Date: Wed, 28 May 2025 10:49:36 +0800 X-Gm-Features: AX0GCFuwG9GtS_uaC9Avo3TBFOBwHZdq4ZlsAngCdGs2eWwMvgHkEVaYaTUEwCY Message-ID: Subject: Re: reply: [RFC] pin_user_pages_fast failure count increased To: Hyesoo Yu Cc: jaewon31.kim@samsung.com, David Hildenbrand , John Hubbard , "zhaoyang.huang@unisoc.com" , "surenb@google.com" , "Steve.Kang@unisoc.com" , Jaewon Kim , "linux-mm@kvack.org" , Jang-Hyuck Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 306AC40008 X-Stat-Signature: ter8z1sy3abd1g4ijzajrjswn4cdkzqn X-Rspam-User: X-HE-Tag: 1748400589-926556 X-HE-Meta: U2FsdGVkX1+OyLUssdfI0b1JqsI2GriIA4s2Kor/+xuqf9f6/2AjXGN3LZbYWuy6HAhyAzhQb1tv0W4cWB1X+BInD55U9JGtrNpUOysj9yifaFUeAm/YR7xjtZlKXJz8nKhT+KRammen46cwpwffMyzrYLWPYOc6amfDQoEChpdtQ+1pTRXDqqikXVZjo8Z9jCQ/ZoA4XUx35IeS394Wyfhyft9xD2G9+ffrJpvSPWpYMb4ENbrgW7RO1RXsKnb1s9RSxNVaewE8jvkiC3YV2Z4vkt3D6LwhTzJeIu5whfSbuibhGLZboRl0NOoVp52e7g0s8axKz3KzOvQH//E6tMtF5q35m0offZvgQGbG1zu9oUQWdmKWD0SNdsfbTYd5nQ+Mz2IoAyzjDqp+MVxDX/hTjBNbJeMx8lzCKShsFPuQgD0SDerImviHovW+a5+3NbyamIpabQPl+Bv3gPTp12ws04iN0TPISQySP4GZBeHgOjacqDf12NFSi/y/PojklhGjRzf5YTARLhHKsf0OwISgnKv0sfrl43xXFAq+TIq0N0cnoa8PEDbh1sNaunTJZ4+0CtYLB6x3mWyecHBG/si0LBnz2Nav4QZgL3UxVhEAWzHG7KiaOK0qEx9cBWaSFjLI2OClRsd7Xgh7jLbENg2yI8yiHo7KGAyMlsedjwqb5TgT9I9CviiFfuxFfyPvVWYLd3cBzYA9wJlo/dG2s+Ws2JT4TVRc5nCd23RnK5nOwMotzjmyPfTg7SClJIx4unnOexWPiplpxBuwV3wKUPwm2W8ChjAUcnZpEjxS+4oNdqwqDoxNrIyVEbSm6aWOtXVcyIh8ZA7NIhSURIwRgi2ktEQbCx6lJISChsrKslPqq4bOyknPZQP1oR/I9nK0WDSQFUaa+0x05rhv/oZPpbbVURQ0x9UXhHYpHiaBAXbV27j3MYyf54hsiAN5UsPd3OkFQyrtyQJ35va7N10 jk7cImbl aTdZFunng/v2L1AILCYkSd1Wlujj6EhVgN2AKpd+lCO6l+FeRz3kdAjoWWyOBMPdgFEjoqY9zpdpRocfVwT2IVq2FicZ2xqa1P6roG8RcWjkSk34pGzy+/qE9b/Zy5yJGRVYJNLRgs4dRJvZX4TkHVT5hFhqx5gz/d7QgxeYsiFxWzb2h6O6jEpWx478lmT0sTn+X8XoUNQAJL/FB36Hx1zyYIkINkbatmnslS7Vp+VB/sALqGh+h1OLAhSMO6qugEcLxGKPSVdwZ2xK1J7JxR2UF2jmQo07H/X+VFT7v9Tf2w2j5OTS/78tKChgyo9c4kSM86zIUAMoEDnfgIKUlF+7ewBLluoQpnuqrqSxk03wRA9ewxa4DDB/4n3qt4Q0kN+iW+Mw+tHFI10cOd9bOMsGy/Qmn157fm1wQzKcKayc39ZErBRA9USQvYqs1a5CrGh17ymE8fc42RCT03Xsc0vrusz1GLu9tlF8AMFFn8gJQ6LS3BrOUjv2e1g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 28, 2025 at 9:25=E2=80=AFAM Hyesoo Yu w= rote: > > On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote: > > On Mon, May 26, 2025 at 7:17=E2=80=AFPM Jaewon Kim wrote: > > > > > > >On 26.05.25 11:33, Hyesoo Yu wrote: > > > >> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote: > > > >>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu = wrote: > > > >>>> > > > >>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote: > > > >>>>> On 5/22/25 7:37 PM, =EA=B9=80=EC=9E=AC=EC=9B=90 wrote: > > > >>>>> ... > > > >>>>>> I think this is what you meant, please let me know if you have= an idea to make this nicer. > > > >>>>>> We may be to able to prepare the patch next week. > > > >>>>>> > > > >>>>>> static long > > > >>>>>> check_and_migrate_movable_pages_or_folios(struct pages_or_fo= lios *pofs) > > > >>>>>> { > > > >>>>>> + bool any_unpinnable; > > > >>>>>> LIST_HEAD(movable_folio_list); > > > >>>>>> > > > >>>>>> - collect_longterm_unpinnable_folios(&movable_folio_list= , pofs); > > > >>>>>> - if (list_empty(&movable_folio_list)) > > > >>>>>> - return 0; > > > >>>>>> + any_unpinnable =3D collect_longterm_unpinnable_folios(= &movable_folio_list, pofs); > > > >>>>>> + if (list_empty(&movable_folio_list)) { > > > >>>>>> + if (any_unpinnable) > > > >>>>>> + pofs_unpin(pofs); > > > >>>>> > > > >>>>> I think this is correct, although as I mentioned in the other t= hread, > > > >>>>> that implies that commit 1aaf8c122918 (which didn't add nor rem= ove > > > >>>>> any pof unpinning) is probably not the true or only culprit, ri= ght? > > > >>>>> > > > >>>>>> + return any_unpinnable ? -EAGAIN : 0; > > > >>>>> > > > >>>>> Ha, the "?" operator almost always does more harm than good. > > > >>>>> > > > >>>>> Here, for example, it has obscured from you the fact that any_u= npinnable > > > >>>>> is being checked twice, when you could have merged those into a= single "if". > > > >>>>> > > > >>>> > > > >>>> Hello, > > > >>>> > > > >>>> I was wondering if the original problem - an infinite loop when = pages allocated by > > > >>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains u= nresolved. > > > >>>> (To be honest, I'm not quite sure how such pages end up being pi= nned via GUP. > > > >>>> Is that the expected behavior, or could it possibly indicate a= bug ?) > > > >>> The original problem arises from applying CMA as guestOS's memory > > > >>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->P= FN). > > > >>> You can check KVM code if you are interested. > > > >>> > > > >> > > > >> Thanks for the kind explanation. While I'm not deeply familiar wit= h KVM, my understanding > > > >> is that there are cases where GUP is used on CMA. > > > >> > > > >> So does that mean pinning memory from the CMA was actually intende= d to succeed ? > > > > > > > >Careful: KVM uses ordinary GUP, not GUP-longterm. > > > > > > Hi. David and Zhaoyang > > > > > > If possible, could you kindly explain the situation where the 1aaf8c1= 22918 was addeded? > > > If KVM does not user FOLL_LONGTERM, then why the function, > > > collect_longterm_unpinnable_folios, was changed at that time? > > > > > > First of all, I'm not a KVM expert. After reading Zhaoyang's mail, > > > I thought CMA free page was initially allocated then migrated by FOLL= _LONGTERM, > > > during the get_user_page for KVM's guest OS. If KVM does not use FOLL= _LONGTERM, > > > I am confused. > > > > > > Actually I did not understand the infinite loop situation. I thought = few times of -EAGAIN > > > might happen during the gup. But calling lru_add_drain_all by collect= _longterm_unpinnable_folios > > > would put the page to LRU. And other cma_alloc context or migration c= ontext, I guess, > > > put the pages back to LRU if there was race. > > Actually, it is pkvm which was introduced by google in AOSP. I am > > afraid I can just brief the callstack here for security reasons. The > > pin_user_pages will setup the 2nd stage mapping for the hva by the > > vm_ops->fault which is registered by kvm memfd driver and all PFNs are > > from CMA area. The driver will keep the pages out of the LRU which hit > > the original bug as it is counted but have the movable_page_list be > > empty and lead to infinite loop within __gup_longterm_locked > > > > pkvm_xxx_xxx(equal to user_mem_abort in kvm) > > { > > unsigned int flags =3D FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE; > > ... > > ret =3D pin_user_pages(hva, 1, flags, &page); > > __gup_longterm_locked > > do { > > nr_pinned_pages =3D __get_user_pages_locked(mm, > > start, nr_pages, > > pages, locked, gup_flags); > > rc =3D > > check_and_migrate_movable_pages(nr_pinned_pages, pages); > > } while (rc =3D=3D -EAGAIN); > > } > > Hello, Zhaoyang. > > I don't believe commit 1aaf8c was just intended to prevent an infinite lo= op. > The commit was introduced to allow pinning CMA memory in the pKVM on AOSP= . > > That leads me to question whether the assumption that CMA can be long-ter= m pinned is actually valid. That depends on the user of CMA, yes for my scenario since it worked for the guest os. For common scenario such as the file/anon mapping, the page will be judged as unpinnable for long-term and be migrated out of CMA area. > > In my opinion, it might be more appropriate to revert that commit 1aaf8c = and instead ensure > that pKVM avoids using CMA for memory that requires long-term pinning thr= ough GUP ? It is not a pkvm issue but a defect of applying FOLL_LONGTERM over non-LRU CMA pages. > > Alternatively, instead of changing the current logic that prevents longte= rm GUP from pinning CMA, > it would be better to propose a new patch that specifically addresses the= pKVM scenario like adding new FOLL_flags ? I don't think so. pin_user_pages is an exported API which can't make assumptions over the caller. > > Thanks, > Regards. >