From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68ADEC76196 for ; Fri, 31 Mar 2023 06:47:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 852A26B0071; Fri, 31 Mar 2023 02:47:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DC826B0072; Fri, 31 Mar 2023 02:47:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 656C76B0074; Fri, 31 Mar 2023 02:47:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4FABB6B0071 for ; Fri, 31 Mar 2023 02:47:12 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 111EF40371 for ; Fri, 31 Mar 2023 06:47:12 +0000 (UTC) X-FDA: 80628261504.11.10183BD Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf24.hostedemail.com (Postfix) with ESMTP id 2348A18000D for ; Fri, 31 Mar 2023 06:47:09 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NxEF3sYv; spf=pass (imf24.hostedemail.com: domain of ivan.orlov0322@gmail.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=ivan.orlov0322@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680245230; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jua4gde8du/TNpJ/ggqcjgKQp6GcI0FDmy/rSVOi32s=; b=RA2mfutqez42YDFmP/cdLB7hZzUuhrfq2J69dMdNzZv6KZAPeyBFZgw1zXLIEt53Swa0RO fuivUeAJPznfe2Vdvn2aKoGitdXNxLYZPR720r7qifPYoGgtPlWBe95y8Kac6mY6d74DwI ZH5O4dVWVzpqiaMD3Qa0TsmgMhfv5b8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=NxEF3sYv; spf=pass (imf24.hostedemail.com: domain of ivan.orlov0322@gmail.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=ivan.orlov0322@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680245230; a=rsa-sha256; cv=none; b=2ypzDMvRoueWGb5w0zs4goJQso3vSVYQoAihZmNjfyyDMY0SozL5WHEI+kQVcJ3baIA5+6 M5GaEP2zTdm4n1tAskN9SYciIDsFQ7P3pVSELp5YeSWXKTLevf46f+KHW+WfiA6VmkhgEa JAm0sqL5KXI/RsHR+lRl5kWZZWI1Igk= Received: by mail-ed1-f45.google.com with SMTP id er13so44768312edb.9 for ; Thu, 30 Mar 2023 23:47:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680245228; x=1682837228; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=jua4gde8du/TNpJ/ggqcjgKQp6GcI0FDmy/rSVOi32s=; b=NxEF3sYvh1iIM5RIATjDcILbq1foy1eGmaSEMRyr1IUhrIR67C+d2kIl2JNuL7ZGHF lEi46w9nGSGORYDZjZ7AHi72saYCISE5nfM/pSAv2M7bukwAoL3Q1lU9c3jPNQo/kTOS 4qZTDP/wo2qU5TKnAs68YD8kZTk9A29W3QR1saT875LRiVpgiMS5COKsYYXSZ9SzYN5K PXuZyAy3ZEB07qAGrzi3XfPnfZhkKlwJi+tEbQWqAh9UmPIVGCuxJALqFOwlzm0nCiNy J50EymvdjXSL5NIOBkf+NKnkSZnvgsuVbuNB8XEh66Ec9n++sQGlE0s7M5tWiRptqyII EXNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680245228; x=1682837228; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jua4gde8du/TNpJ/ggqcjgKQp6GcI0FDmy/rSVOi32s=; b=nbmSb7yMArP8ANqdk4W9GA3YbOn34m1r9vcAnMLPBCelMQyQ06la/YvgKzUUqklXSX iVmb4dF1xsusM8JwHUdQ5i1bWZMU4Q89qUpTofo0tRPsWMTh5f0dUqF36Y5Q/GkDWndu MeLe2cnzHkxOT1mK3RuwTVctROUsTWod7Yl9n70XYZqL9tI2xVYQrgB+mGATXh0Y5wZj 9cyCqgTIb7nGgyRWnzy5oT5QwPcDfl9aE7tksAKMNRHT8tfTYiBX2LHnUt2W3zAv4s4W GTZpJtwnuNfW/XCUmD/0LJQzbVEGvHm3I+tJEPMPJfvoM2WW8ehMqG4/beS1k2HflKcB 337Q== X-Gm-Message-State: AAQBX9dutWVEoq253rr/em1De3TzC//A2ENr8a8ltjEEAUoMP76s2+TM PvSVPGJwHoa+l0MhOp78AWw= X-Google-Smtp-Source: AKy350b2Iqmv44lD93m8YIgp3BTPaucZOgE+Uv7xecc4F3v+HsnNCqRs5RdZSfBP0Lw8Hr6y9hu4dw== X-Received: by 2002:a05:6402:2803:b0:502:e50:3358 with SMTP id h3-20020a056402280300b005020e503358mr5398408ede.3.1680245228301; Thu, 30 Mar 2023 23:47:08 -0700 (PDT) Received: from [192.168.10.16] ([37.252.81.51]) by smtp.gmail.com with ESMTPSA id d7-20020a50cd47000000b005027ecc148esm277087edj.65.2023.03.30.23.47.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Mar 2023 23:47:08 -0700 (PDT) Message-ID: Date: Fri, 31 Mar 2023 10:47:06 +0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH v2] mm: khugepaged: Fix kernel BUG in hpage_collapse_scan_file Content-Language: en-US To: Zach O'Keefe Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, himadrispandya@gmail.com, skhan@linuxfoundation.org, linux-kernel-mentees@lists.linuxfoundation.org, shy828301@gmail.com, syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com References: <20230330155305.423051-1-ivan.orlov0322@gmail.com> <20230331013301.ecgkjymaf3ws6rfb@google.com> From: Ivan Orlov In-Reply-To: <20230331013301.ecgkjymaf3ws6rfb@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: ubuxmaqeemjagegoxtpjs61sy9w4wce7 X-Rspamd-Queue-Id: 2348A18000D X-HE-Tag: 1680245229-114966 X-HE-Meta: U2FsdGVkX1/o9M8uwOGVzT61VGuN6ieNZe8gVA+qnEHJfl/ysxLRhRcsnzj9TgkC42bgV8FhU24PWCKWPwTqnKdU4aBExxozJZUtxw7SsyYK5i88GhzLOBwHRQMxN1qkY09QlN+w/7kKG5J3LGbRHimb4u2TfdHY33+RxavSY8Y9G5omddQ1MrhE690Gf+gC2gvlfohNKcUVTlsh1Ml/AJJIw/G4qmuwSGuzrMLCdPHOtJarENXy5XVgb3OKbbXdLadPqJqibCFq3asAouK/3ObrmE4DUZXp91Dhmqf3QVvZ3i04c9GD3OPpowiuqg/64+4vCU2gUPKSJ4YQc9DRAU5pfOfsOG4XbFzYKl/lw1M7ZXIH6tSVmyDl0oo+44+haRSSGhJAT+62ltXGpavNVZWz/jstjUMjq7sDB7luq1GBH42z+DM8sZG4NrH+KcH8Q4jG79f07DofMGCLr3VlAA9/Sc2Uc/qWQn1/gaPHKzfbVfRWJBb88ys/hALzIhWwGcm9vN0UTmpGL+tcZwvQXkJCtu86kfmO2nAJTxofleodDDws8Cte62T9FbrovnwAq6U50XVZHDTxPfXOlASmkpEyHF0bPitNRA8AJn9N3f4GsHu8mQl8lA3dmqN/eZ4sIJCYTVIr3qldUNo3jjHhVA4tZJsWd3leE5uMp1Ra4PYOUV+YSGLeKUU+byz0WCiyByxFQWBTjGGNrkHwCKqSB+kJ3W8m/KIdsPWJy63f8Iprm7a4RPbKIhZ+YStUJtEa7DZcSORLNmGMGJHT4SPVRwG52/o49BOeHiNAefguYPOjB1HXdqPKuDiShSSTJKuwglwimcvwAcoTF9WoZ1EPx/DbHyNhhYUct1c14dKps+oMWz8MtlOZrE6+I/yAvxg9tBb7sknuGoLpbCt1kPkRrfGEdJ9WdDOwbIgw2SYS8/cz99aI0Zysu3uK1nXNC5PiDM56vbzds8Sblm/Uy6S nwiiSktg H4xSqBjcEGvu6+iL/GLMFamMMhaDKcFTPFeuuX4BWFv+auVs/s2VIJQuK+czjVHvJVo8yEz+3h5HmNNGOcmMdiu1BY7jo8BCXtQD2cdOqX/JCr0b03ZuE6aDV3k1CIowyYxIkSypFfNV7oEOHkYlYdQfrL7BrdEcXEN/dpKFc17Gep4GptEe+P/QOFs1gIAQXNaFq7fjNeeFt88lyZUGd3tiD03u2Dhx4cSH4F+v1gQLo/hXlM5grsUxnhXZBgbabaSAGH8cpgmE7pbNvsQDbF1yl7U6v/V/7ywLFYWz7+vq2M/g+W4wVKy9yn5Ww1gOGQTGb+seXL3/aAgIUkyfcCRgOShz11xDdMlYMess+UZzPQoUKq+I2gueltjMz66gyTcEbJ+8gTThsvCQD+dTZX5Kqv+YVzu4GJyiWVj0mi2pIGOgvoY8a42eefjYDieW8Z2g5PUv4Ya5AvQ713vIdzIEfCuRyAMTHxZ6KSf5Zc6JPxvUaxeVJ95jBnRHCDJNFXKB3g1DxWlzZ1QCb9cGUPo5BnkHEflZXyJHgqf6MGKpSM+YghrPfmlRe6S1tCOhJhbKoPvJJp9n2FUikVr1zlHS3n2mG8LPmWMFB2GTsggrN2su1jEZmGevs4yZdSPBLjXOerqBIVwi6fLmc2aJ8nNyiCrhBhI282xMt+DtBi4VWxKrxD0Cz8nrV+P0dyM+I/Ss8W5N8eaq1uY0nMkbKdb1YtG8nW8Q4NjIRjgKbzF0teM1RbHf+2y601KDnNu0PKBxdRwp6NWGjCME= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/31/23 05:33, Zach O'Keefe wrote: > Thanks, Ivan. > > In the process of reviewing this, I starting thinking if the !shmem case was > also susceptible to a similar race, and I *think* it might be. Unfortunately, my > time has ran out, and I haven't been able to validate ; I'm less familiar with > the file-side of things. > > The underlying problem is race with truncation/hole-punch under OOM condition. > The nice do-while loop near the top of collapse_file() attempts to avoid this > scenario by making sure enough slots are available. However, when we drop xarray > lock, we open ourselves up to concurrent removal + slot deletion. Those slots > then need to be allocated again -- which under OOM condition is failable. > > The syzbot reproducer picks on shmem, but I think this can occur for file as > well. If we find a hole, we unlock the xarray and call > page_cache_sync_readahead(), which if it succeeds, IIUC, will have allocated a > new slot in our mapping pointing to the new page. We *then* locks the page. Only > after the page is locked are we protected from concurrent removal (Note: this is > what provides us protection in many of the xas_store() cases ; we've held the > slot's contained page-lock since verifying the slot exists, protecting us from > removal / reallocation races). > > Maybe I'm just low on caffeine at the end of the day, and am missing something, > but if I had more time, I'd be looking into the file-side some more to verify. > Apologies that hasn't occurred to me until now ; I was looking at one of your > comments and double-checked why I *thought* we were safe. > > Anyways, irrespective of that looming issues, some more notes to follow: > >> The 'xas_store' call during page cache scanning can potentially >> translate 'xas' into the error state (with the reproducer provided >> by the syzkaller the error code is -ENOMEM). However, there are no >> further checks after the 'xas_store', and the next call of 'xas_next' >> at the start of the scanning cycle doesn't increase the xa_index, >> and the issue occurs. >> >> This patch will add the xarray state error checking after the >> 'xas_store' and the corresponding result error code. It will >> also add xarray state error checking via WARN_ON_ONCE macros, >> to be sure that ENOMEM or other possible errors don't occur >> at the places they shouldn't. > > Thanks for the additions here. I think it's worthwhile providing even more > details about the specifics of the race we are fixing and/or guarding against to > help ppl understand how that -ENOMEM comes about if the do-while loop has > "Ensured" we have slots available (additionally, I think that comment can be > augmented). > >> Tested via syzbot. >> >> Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com >> Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc >> Signed-off-by: Ivan Orlov >> --- >> V1 -> V2: Add WARN_ON_ONCE error checking and comments >> >> mm/khugepaged.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index 92e6f56a932d..8b6580b13339 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -55,6 +55,7 @@ enum scan_result { >> SCAN_CGROUP_CHARGE_FAIL, >> SCAN_TRUNCATED, >> SCAN_PAGE_HAS_PRIVATE, >> + SCAN_STORE_FAILED, >> }; > > I'm still reluctant to add a new error code for this as this seems like quite a > rare race that requires OOM to trigger. I'd be happier just reusing SCAN_FAIL, > or, something we might get some millage out of later: SCAN_OOM. > > Also, a reminder to update include/trace/events/huge_memory.h, if you go that > route. > >> >> #define CREATE_TRACE_POINTS >> @@ -1840,6 +1841,15 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, >> goto xa_locked; >> } >> xas_store(&xas, hpage); >> + if (xas_error(&xas)) { >> + /* revert shmem_charge performed >> + * in the previous condition >> + */ > > Nit: Here, and following, I think standard convention for multiline comment is > to have an empty first and last line, eg: > > + /* > + * revert shmem_charge performed > + * in the previous condition > + */ > > Though, checkpatch.pl --strict didn't seem to care. > >> + mapping->nrpages--; >> + shmem_uncharge(mapping->host, 1); >> + result = SCAN_STORE_FAILED; >> + goto xa_locked; >> + } >> nr_none++; >> continue; >> } >> @@ -1992,6 +2002,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, >> >> /* Finally, replace with the new page. */ >> xas_store(&xas, hpage); >> + /* We can't get an ENOMEM here (because the allocation happened before) >> + * but let's check for errors (XArray implementation can be >> + * changed in the future) >> + */ >> + WARN_ON_ONCE(xas_error(&xas)); > > Nit: it's not just that allocation happened before -- need some guarantee we've > been protected from concurrent removal. This is what made me look at the file > side. > >> continue; >> out_unlock: >> unlock_page(page); >> @@ -2029,6 +2044,11 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr, >> /* Join all the small entries into a single multi-index entry */ >> xas_set_order(&xas, start, HPAGE_PMD_ORDER); >> xas_store(&xas, hpage); >> + /* Here we can't get an ENOMEM (because entries were >> + * previously allocated) But let's check for errors >> + * (XArray implementation can be changed in the future) >> + */ >> + WARN_ON_ONCE(xas_error(&xas)); > > Ditto. > > Apologies I won't be around to see this change through -- I'm just out of time, > and will be shutting my computer down tomorrow for 3 months. Sorry for the poor > timing, for raising issues, then disappearing. Hopefully I'm wrong and the > file-side isn't a concern. > > Best, > Zach > >> xa_locked: >> xas_unlock_irq(&xas); >> xa_unlocked: >> -- >> 2.34.1 >> Hello, Zach! Thank you very much for the detailed review! I thought that locking is our guarantee to not get an -ENOMEM, but I didn't have the deep understanding of the problem's cause, due to the fact I'm just starting my memory management journey :) In any case, I will do a research about this problem, and hopefully after you get out of the vacation you will see a new patch fully fixes this problem. Have a nice vacation! Thanks again!