On Thu, 2023-02-09 at 22:19 -0800, Peter Collingbourne wrote: > On Wed, Feb 08, 2023 at 05:41:45AM +0000, Qun-wei Lin (林群崴) wrote: > > On Fri, 2023-02-03 at 18:51 +0100, Andrey Konovalov wrote: > > > On Fri, Feb 3, 2023 at 4:41 AM Kuan-Ying Lee (李冠穎) > > > wrote: > > > > > > > > > Hi Kuan-Ying, > > > > > > > > > > There recently was a similar crash due to incorrectly > > > > > implemented > > > > > sampling. > > > > > > > > > > Do you have the following patch in your tree? > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/9f7f5a25f335e6e1484695da9180281a728db7e2__;Kw!!CTRNKA9wMg0ARbw!hUjRlXirPMSusdIWe0RIPt0PNqIHYDCJyd7GSd4o-TgLMP0CKRUkjElH-jcvtaz42-sgE2U58964rCCbuNTJE5Jx$ > > > > > > > > > > > > > > > If not, please sync your 6.1 tree with the Android common > > > > > kernel. > > > > > Hopefully this will fix the issue. > > > > > > > > > > Thanks! > > > > > > > > Hi Andrey, > > > > > > > > Thanks for your advice. > > > > > > > > I saw this patch is to fix ("kasan: allow sampling page_alloc > > > > allocations for HW_TAGS"). > > > > > > > > But our 6.1 tree doesn't have following two commits now. > > > > ("FROMGIT: kasan: allow sampling page_alloc allocations for > > > > HW_TAGS") > > > > (FROMLIST: kasan: reset page tags properly with sampling) > > > > > > Hi Kuan-Ying, > > > > > > > Hi Andrey, > > I'll stand in for Kuan-Ying as he's out of office. > > Thanks for your help! > > > > > Just to clarify: these two patches were applied twice: once here > > > on > > > Jan 13: > > > > > > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/a2a9e34d164e90fc08d35fd097a164b9101d72ef__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745_3oO-3k$ > > > > > > > > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/435e2a6a6c8ba8d0eb55f9aaade53e7a3957322b__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745sDEOYWY$ > > > > > > > > > > > > Our codebase does not contain these two patches. > > > > > but then reverted here on Jan 20: > > > > > > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/5503dbe454478fe54b9cac3fc52d4477f52efdc9__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745Bl77dFY$ > > > > > > > > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/4573a3cf7e18735a477845426238d46d96426bb6__;Kw!!CTRNKA9wMg0ARbw!kE1XiSmunRcQb9rTpKGkFc1EFJA57qr1cj7v9EZAjUBzXcSzMl-ofCI2mdtEQsxn3J4n7Lkgxb0_G745K-J8O-w$ > > > > > > > > > > > And then once again via the link I sent before together with a > > > fix on > > > Jan 25. > > > > > > It might be that you still have to former two patches in your > > > tree if > > > you synced it before the revert. > > > > > > However, if this is not the case: > > > > > > Which 6.1 commit is your tree based on? > > > > > > https://urldefense.com/v3/__https://android.googlesource.com/kernel/common/*/53b3a7721b7aec74d8fa2ee55c2480044cc7c1b8__;Kw!!CTRNKA9wMg0ARbw!iEzuh9LYXlwXkpcWaHjncfr6lNgTky7OEAEzQ7cIFjlTD__7lwXqAhPJwWJXEnD8THUS7jnBK7hjnHw$  > > > > (53b3a77 Merge 6.1.1 into android14-6.1) is the latest commit in > > our > > tree. > > > > > Do you have any private MTE-related changes in the kernel? > > > > No, all the MTE-related code is the same as Android Common Kernel. > > > > > Do you have userspace MTE enabled? > > > > Yes, we have enabled MTE for both EL1 and EL0. > > Hi Qun-wei, > > Thanks for the information. We encountered a similar issue internally > with the Android 5.15 common kernel. We tracked it down to an issue > with page migration, where the source page was a userspace page with > MTE tags, and the target page was allocated using KASAN (i.e. having > a non-zero KASAN tag). This caused tag check faults when the page was > subsequently accessed by the kernel as a result of the mismatching > tags > from userspace. Given the number of different ways that page > migration > target pages can be allocated, the simplest fix that we could think > of > was to synchronize the KASAN tag in copy_highpage(). > > Can you try the patch below and let us know whether it fixes the > issue? > > diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c > index 24913271e898c..87ed38e9747bd 100644 > --- a/arch/arm64/mm/copypage.c > +++ b/arch/arm64/mm/copypage.c > @@ -23,6 +23,8 @@ void copy_highpage(struct page *to, struct page > *from) > > if (system_supports_mte() && test_bit(PG_mte_tagged, &from- > >flags)) { > set_bit(PG_mte_tagged, &to->flags); > + if (kasan_hw_tags_enabled()) > + page_kasan_tag_set(to, page_kasan_tag(from)); > mte_copy_page_tags(kto, kfrom); > } > } > Thank you so much, this patch has solved the problem. > Catalin, please let us know what you think of the patch above. It > effectively partially undoes commit 20794545c146 ("arm64: kasan: > Revert > "arm64: mte: reset the page tag in page->flags""), but this seems > okay > to me because the mentioned race condition shouldn't affect "new" > pages > such as those being used as migration targets. The smp_wmb() that was > there before doesn't seem necessary for the same reason. > > If the patch is okay, we should apply it to the 6.1 stable kernel. > The > problem appears to be "fixed" in the mainline kernel because of > a bad merge conflict resolution on my part; when I rebased commit > e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics") > past commit 20794545c146, it looks like I accidentally brought back > the > page_kasan_tag_reset() line removed in the latter. But we should > align > the mainline kernel with whatever we decide to do on 6.1. > > Peter