From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CC6EC6FD1C for ; Wed, 22 Mar 2023 12:03:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A47056B0071; Wed, 22 Mar 2023 08:03:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F7126B0072; Wed, 22 Mar 2023 08:03:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BED46B0075; Wed, 22 Mar 2023 08:03:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7A32C6B0071 for ; Wed, 22 Mar 2023 08:03:38 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 28767C042D for ; Wed, 22 Mar 2023 12:03:37 +0000 (UTC) X-FDA: 80596399674.26.255DF4A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf20.hostedemail.com (Postfix) with ESMTP id 56C151C0006 for ; Wed, 22 Mar 2023 12:03:34 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf20.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679486615; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Xxf/5z3veqVkjBTK/8q6rxZ3KqkXQMTrf/ZgQd1TFbA=; b=1JuxBNxlJ2FH05xksCluSMnEeloBoPhWOfVB/AG77I4j4LzYFGp7g06++TGTZHi5Ke7V/p 3ZevJSQpuT5Ddbvxu+PMjbpGp90j/FMO2yIfa/6HsHFbEh1vGobWQzAjZm5yCAFB4h3QO3 axZYlIZ0eEl5RcC+HAVeptnSNF0ToT0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf20.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679486615; a=rsa-sha256; cv=none; b=Gec5TrqaTYxq+b9iCxGiy9GzPjmzzFjsTTApSHR/Z1vquusMwOh8w5SkM4A88cOJc9BK++ vtTkzMrA2H8FhG//l63k85lHAEh4me8RuckbqDnBOYaTWwheHRDjViCieSritJDt6UwrPq 5sAqqDTdAAdd8ZxUID07yax+oZqJE1Q= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 241224B3; Wed, 22 Mar 2023 05:04:17 -0700 (PDT) Received: from [10.1.37.147] (C02CF1NRLVDN.cambridge.arm.com [10.1.37.147]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5589E3F67D; Wed, 22 Mar 2023 05:03:32 -0700 (PDT) Message-ID: Date: Wed, 22 Mar 2023 12:03:31 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [RFC PATCH 0/6] variable-order, large folios for anonymous memory Content-Language: en-US To: Andrew Morton , "Matthew Wilcox (Oracle)" , "Yin, Fengwei" , Yu Zhao Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org References: <20230317105802.2634004-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: <20230317105802.2634004-1-ryan.roberts@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 56C151C0006 X-Rspam-User: X-Stat-Signature: 7rjwg35nbrdbeo7ium66hy7tt9sro65j X-HE-Tag: 1679486614-633729 X-HE-Meta: U2FsdGVkX1/gV70bvX6YEwlcojk//uHsCOar/Q7QjmZDdRdll0avQ8/KxfJzIAisyDFCvZwDPWFN0ZUWuHw3xlx/SnZe4qKOIEHZ92odhp54Rciu3EEyQREqHEJVyHv3qV9IT/sGkBKALJ+iIfSwYfDnByzgw90djWb3Ri91sUQPgMfIMK6ncURPvmUTD2osNf0bC81BYrwXUXC2lqMLFgbpe2H3l5E46eVUKLM9h0/MZU9ocbwVOxviuJmIzMWM0lrQHr0D+YY4XaqAA+bGsqu/uJHalyTA/V7X8Dd2LE3d0nSf9LxDck1nVypQdppj42FHBOdY8w2wgoaa3dwEhjdDO8EoBsg49CCMAF+YsyineoEdPhboDW59auklXymr6HmzjLi4SRFJLVblffURU3cS4Jkyden7f5FYF821LnQ6qAFudDp9SSYp4KkepXlVvbot+SO+niWYY5yhip0ZAZ45ePxLG2hMtUoiplfRMhQ6YcBkleWwETnjMg/sTxEDbTtPIoXOhaxq0iPD51S4MSuPmHv0zMmEvzJb6qoZK02ONYV8e98A/yA7C9Y2JugawbnxMXvkTuht6thGVNNPT42bOKBtTnEV0Yek3mKiXlMniKoePsvEXRHdNuXo0GLVWi0o77thfeGaBn45/EoM9XqkgaD6vIVlkzVRsN5Dftg5Bp6aYpepDITzMvxwCKaNOj0yuoFU5lFD4nDjNV8xeHydm+J0Ft1K6HyaESC9crzGhL04fVnob5n7jy3aAZmItP1K7wbZ1tEw38jtbpVrFY7wU7s92FPTBScRWARNIta+KnTulcQoBvkrWPxp3Ee9XNy58aBizgFkpLzqizzEXTiCNZBpSVA0YG5xdcZVIaiMimlHZ4xvZoSKW8emhXkio2Rh68/7jvCywUL0/265o7GRMhGPrT3x0ADS0OaFc+NafkMncIKoFV9QmPwqfo3CxIUzT8nluRLN5YcPwb0 z1vE7m4c ht6pb3qGejPsblDJvxzM9g7HIPgMuk8cfU83g/3ngFJiGuI78U6+WeHCIFp0zsfZyZ5iwmhL62ajxcVuNzI4+SHxnCAPW2v/EqVIP2Ear+aDdPZLWwNk6I+mxqOmk6wyAxrj+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Matthew, On 17/03/2023 10:57, Ryan Roberts wrote: > Hi All, > > [...] > > Bug(s) > ====== > > When I run this code without the last (workaround) patch, with DEBUG_VM et al, > PROVE_LOCKING and KASAN enabled, I see occasional oopses. Mostly these are > relating to invalid kernel addresses (which usually look like either NULL + > small offset or mostly zeros with a few mid-order bits set + a small offset) or > lockdep complaining about a bad unlock balance. Call stacks are often in > madvise_free_pte_range(), but I've seen them in filesystem code too. (I can > email example oopses out separately if anyone wants to review them). My hunch is > that struct pages adjacent to the folio are being corrupted, but don't have hard > evidence. > > When adding the workaround patch, which prevents madvise_free_pte_range() from > attempting to split a large folio, I never see any issues. Although I'm not > putting the system under memory pressure so guess I might see the same types of > problem crop up under swap, etc. > > I've reviewed most of the code within split_folio() and can't find any smoking > gun, but I wonder if there are implicit assumptions about the large folio being > PMD sized that I'm obviously breaking now? > > The code in madvise_free_pte_range(): > > if (folio_test_large(folio)) { > if (folio_mapcount(folio) != 1) > goto out; > folio_get(folio); > if (!folio_trylock(folio)) { > folio_put(folio); > goto out; > } > pte_unmap_unlock(orig_pte, ptl); > if (split_folio(folio)) { > folio_unlock(folio); > folio_put(folio); > orig_pte = pte_offset_map_lock(mm, pmd, addr, &ptl); > goto out; > } > ... > } I've noticed that its folio_split() with a folio order of 1 that causes my problems. And I also see that the page cache code always explicitly never allocates order-1 folios: void page_cache_ra_order(struct readahead_control *ractl, struct file_ra_state *ra, unsigned int new_order) { ... while (index <= limit) { unsigned int order = new_order; /* Align with smaller pages if needed */ if (index & ((1UL << order) - 1)) { order = __ffs(index); if (order == 1) order = 0; } /* Don't allocate pages past EOF */ while (index + (1UL << order) - 1 > limit) { if (--order == 1) order = 0; } err = ra_alloc_folio(ractl, index, mark, order, gfp); if (err) break; index += 1UL << order; } ... } Matthew, what is the reason for this? I suspect its guarding against the same problem I'm seeing. If I explicitly prevent order-1 allocations for anon pages, I'm unable to cause any oops/panic/etc. I'd just like to understand the root cause. Thanks, Ryan > > Will normally skip my large folios because they have a mapcount > 1, due to > incrementing mapcount for each pte, unlike PMD mapped pages. But on occasion it > will see a mapcount of 1 and proceed. So I guess this is racing against reclaim > or CoW in this case? > > I also see its doing a dance to take the folio lock and drop the ptl. Perhaps my > large anon folio is not using the folio lock in the same way as a THP would and > we are therefore not getting the expected serialization? > > I'd really appreciate any suggestions for how to pregress here! >