From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18D88CE79CE for ; Wed, 20 Sep 2023 13:48:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89E656B016C; Wed, 20 Sep 2023 09:48:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 84E616B016E; Wed, 20 Sep 2023 09:48:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7162E6B016F; Wed, 20 Sep 2023 09:48:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5E5186B016C for ; Wed, 20 Sep 2023 09:48:17 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0FD87160BC5 for ; Wed, 20 Sep 2023 13:48:17 +0000 (UTC) X-FDA: 81257105034.26.15B43F0 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by imf06.hostedemail.com (Postfix) with ESMTP id 2CE82180029 for ; Wed, 20 Sep 2023 13:48:13 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=EKdXpCKI; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695217694; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JzUq2Sj2mPHZh8yxCdJNM1Yfvc/PSx8ZmJQe+i0RUXs=; b=wXzf9lLDv3ysg0E6KPj2JzJTPtSqDnDVvCWS+qbPXJZKrvkMOQFGzjW731kga4QY/CL+7z GG7WPDBMwA4vdeEpKW5M5UgkOcJOFun/VhHMJrWn5JiiQH2fCzaIU2Ue9fscg1maYEHECb nfUdWLZZ3FxCKWXJXC1CQlxeT4Cihgc= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=EKdXpCKI; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695217694; a=rsa-sha256; cv=none; b=yAeiPorKTbXXeGgAYVV1S3+G9myjOT09ODaCqmc5kQsWgLe29BtqIFOS19F8eQYjxaLwjJ UAuKbawddp6Mo5raTtKSMAU0UvGf2KQh7t7topWNI+YHdBfVO0CvosaDHoX/7liPaos/dU Flme6361LaIlfJa6wu5W/dDboOn0bbU= Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-773e8a62b05so100584685a.0 for ; Wed, 20 Sep 2023 06:48:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695217692; x=1695822492; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JzUq2Sj2mPHZh8yxCdJNM1Yfvc/PSx8ZmJQe+i0RUXs=; b=EKdXpCKIwBAzVTuMUDRl0ANCX0hvaCqBtfSNPLxnsPUpcaewx1Okjl+DyWvQ3M8Br5 86PJheblfYIbTjJhyh3T3nY84CQMDsMki4ekqqEDRha1Aa9wa+AkWeY0ahgYaHg7en1n afOWTK8uEJoA9Ppx6QtpiRpoKN3KEbZh/bUkZthOIzhnfjzoMTqxHXKndkzK8698oTpH HosekyGgA+chL5CFuum7VYBH9f+T9mFde0L98RU/9NUtL676h4jj/9JE+UvE4yGPSvG9 aGIRqwGPFk5vGLQdo+PK7sMHZoBO46Zl8+U9mZaVSuJxsA+/yWmT6z38Ae0B0FCFbRY3 MpBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695217692; x=1695822492; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JzUq2Sj2mPHZh8yxCdJNM1Yfvc/PSx8ZmJQe+i0RUXs=; b=BetiydL4k5o9lzz5Acb15jrecQPxirZg4nW/tD2kpLBl2BYisO7RnTSblu3Y8GoPqx C8EMfhG5lxcVYwumG+CQc46935zkHg52Kg0sLbAMwPCIezYkdcAFOpXzt/z3+Dztajgv dPndC9p8qp+pcrASp6PjTa3uIcJKuAr8fTavTS5DrS0hKRRS53oubyjPaYVLXkQg5vs3 iNl4vbqHOkPiCDlcQVIgYE1wA2VoClGwLtPr9hYw3JerHTygNaz9MjgrrGFsnJBYbr5Y 21SyZwvwCfAwSh4yN93XLGR3zgSPhwcvMjcLcvpSsoUaZ0ANenBlMXER0a26GT0hm7bd Taww== X-Gm-Message-State: AOJu0YwbASq2PbaVlGauudxIetBk/Ua8RnlipVWPaIPkF6Her7bRGbFo v3gim2g8/SCtCjJBC0xy+SE1/A== X-Google-Smtp-Source: AGHT+IGHRaFY7ecGtVqpabGSdlc7vKjScTdkY/6QFb0NUOsZQ60DU2v5LwrLd3sbWoOSPF0aAvz9Uw== X-Received: by 2002:a0c:a9d2:0:b0:655:da3b:8c76 with SMTP id c18-20020a0ca9d2000000b00655da3b8c76mr2687047qvb.3.1695217692277; Wed, 20 Sep 2023 06:48:12 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:d7e0]) by smtp.gmail.com with ESMTPSA id q11-20020a0ce20b000000b00658793cda3esm868025qvl.72.2023.09.20.06.48.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 06:48:11 -0700 (PDT) Date: Wed, 20 Sep 2023 09:48:11 -0400 From: Johannes Weiner To: Vlastimil Babka Cc: Zi Yan , Mike Kravetz , Andrew Morton , Mel Gorman , Miaohe Lin , Kefeng Wang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene Message-ID: <20230920134811.GB124289@cmpxchg.org> References: <20230916195739.GB618858@monkey> <20230918145204.GB16104@cmpxchg.org> <20230918174037.GA112714@monkey> <20230919064914.GA124289@cmpxchg.org> <20230919184731.GC112714@monkey> <20230920003239.GD112714@monkey> <149ACAE8-D3E4-4009-828A-D3AC881FFB9C@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 2CE82180029 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: owdjmrc5kqi6iwfi1q4g9573shjd5kz5 X-HE-Tag: 1695217693-584892 X-HE-Meta: U2FsdGVkX190eVfg2t/m4QxN1uaJ1Rd9rYno2yeQ4O0zoiQR2HQ7zus7VpTjIbcGUVpZVo90g7T2T7mhQKPUCu+WUR1WniQRVg/VvK45mgjkJ9oQz0oAxBcB95VPilEyK1Dgsbhz+Yj9Zmk85V981Yg1tUDjNTyqm1KNxvQ6lx7ZwH9mwqsNhR9nR0teF1u/7hpMlqpHIVYtmENLP+yr3REASBxok1LQgzYxnay0dWqXJUgc2zx0HSp3JekE4eji/fICTkKcLYqjilsR+RLiBe+O80DPtI5fbnBcRGeywH9lB+VRajz3ofT16t35Mlu+oiwnVwLXGv51RH2cy7TdW5sN1qPLVHkbGX4UIPPu1mSgc3ipYmrwtTA3TXfEaP4lkPv8nbsdb8lR3hwYjk/g5Oe6xvFTCvhwzHHO4W0qcX1kLbBUGuhMFJejt3DGrZbLn7Esbn+Fk/zD+TFmwlk+1E7sdowZZvdvs01qE4TQViNzAS7qizwzI7KYJ5LrI5WTrssVJRpTHRIpKpihpETnhEnLljwL0eofQouzCzdQ0cXawnPx+Q5GzyG9yVSTK50RKorY7fxppN6LhnnJPx5gB3FlYkS6MweXwBsr+r1Gt0F5Vfj3+2xsEKdwbyG87RHcNpC+csGsUOiZs5xyx2VkFQB1dAsDqX1oMEdlqwFq9c3Py1SVr9pErOhvkzjA6CpccZ/P8MxjDgAGmygw4y9OqITlPI0FBQKHPdK99CNcybegRJGJQBqzi6clkZCc5ZnOc8g/m4qffg4w1S7Gt+TxWid0SUneN8nk5aE0Fu/BrjycfMmbtV2fPoapSHJ9Ub0YKxGZEzKmW0/cZ+gxrmMXq/9WHHWEE/k46croIGvOFPGUX0MFgbCkxmeU8aHJSY1eQrNLUok+mZjgNa/KXe/dx/chsrewvdkscRHv/F2GQtL/iFtctlL56wTPG+YkYJIEIrxNfB1VUnrKHyDkKel 5/0zBOfC J1tChQt0hQ1vGJ/OZPGAnmj1BpP4VaQMs7okulD+Lftl7lyb6lEUAmR8KkQyQirpDBRQ9kqgKD2wPhTgLltby9xBAg6DzePHyn2hlKo10xMbae+HMQs/Qw3SXSdLk8CCBEx2GYJmAbROlKgbm0Y7sa4tqN9cQXJwGXLhNmKkw0U5akRDW1cyT3DLbkKU4JxpbIzHBimNxkX+n9gkBhB+U500up0NLyBnyCb2d2EoGxZvRqFuOTF8TOvBCif6schOAGcFSPi8buYdHBS5Z+w/0GQbRSjuTMEyTUIfvwegl6kt5Aqdbjx6u3suCW2kBts/pSmMb3M2ztstPL7rmnxDTknT5k4V4mOpS6UFmxt5Jk0+afXi77kag77MARbrgSUX7msfIyIfkn2TMr0txK4X3BhCobIVvLekE7EWEohKwQu+AvFZeXM+yAVLtb4y3ui+65N0w8fW702V9rWKp8XKLuYB9M6ozJXhVvDuvY4vb44T0OS8b7gcIsZjfkSCqnlHAiHlDg5RGNS+s3wAJbFaFw5kXwoVnEDXGdfT6qnl0i95+nbyhKHNTzCjliw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 20, 2023 at 08:07:53AM +0200, Vlastimil Babka wrote: > On 9/20/23 03:38, Zi Yan wrote: > > On 19 Sep 2023, at 20:32, Mike Kravetz wrote: > > > >> On 09/19/23 16:57, Zi Yan wrote: > >>> On 19 Sep 2023, at 14:47, Mike Kravetz wrote: > >>> > >>>> --- a/mm/page_alloc.c > >>>> +++ b/mm/page_alloc.c > >>>> @@ -1651,8 +1651,13 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page, > >>>> end = pageblock_end_pfn(pfn) - 1; > >>>> > >>>> /* Do not cross zone boundaries */ > >>>> +#if 0 > >>>> if (!zone_spans_pfn(zone, start)) > >>>> start = zone->zone_start_pfn; > >>>> +#else > >>>> + if (!zone_spans_pfn(zone, start)) > >>>> + start = pfn; > >>>> +#endif > >>>> if (!zone_spans_pfn(zone, end)) > >>>> return false; > >>>> I can still trigger warnings. > >>> > >>> OK. One thing to note is that the page type in the warning changed from > >>> 5 (MIGRATE_ISOLATE) to 0 (MIGRATE_UNMOVABLE) with my suggested change. > >>> > >> > >> Just to be really clear, > >> - the 5 (MIGRATE_ISOLATE) warning was from the __alloc_pages call path. > >> - the 0 (MIGRATE_UNMOVABLE) as above was from the alloc_contig_range call > >> path WITHOUT your change. > >> > >> I am guessing the difference here has more to do with the allocation path? > >> > >> I went back and reran focusing on the specific migrate type. > >> Without your patch, and coming from the alloc_contig_range call path, > >> I got two warnings of 'page type is 0, passed migratetype is 1' as above. > >> With your patch I got one 'page type is 0, passed migratetype is 1' > >> warning and one 'page type is 1, passed migratetype is 0' warning. > >> > >> I could be wrong, but I do not think your patch changes things. > > > > Got it. Thanks for the clarification. > >> > >>>> > >>>> One idea about recreating the issue is that it may have to do with size > >>>> of my VM (16G) and the requested allocation sizes 4G. However, I tried > >>>> to really stress the allocations by increasing the number of hugetlb > >>>> pages requested and that did not help. I also noticed that I only seem > >>>> to get two warnings and then they stop, even if I continue to run the > >>>> script. > >>>> > >>>> Zi asked about my config, so it is attached. > >>> > >>> With your config, I still have no luck reproducing the issue. I will keep > >>> trying. Thanks. > >>> > >> > >> Perhaps try running both scripts in parallel? > > > > Yes. It seems to do the trick. > > > >> Adjust the number of hugetlb pages allocated to equal 25% of memory? > > > > I am able to reproduce it with the script below: > > > > while true; do > > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages& > > echo 2048 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages& > > wait > > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > > echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages > > done > > > > I will look into the issue. Nice! I managed to reproduce it ONCE, triggering it not even a second after starting the script. But I can't seem to do it twice, even after several reboots and letting it run for minutes. > With migratetypes 0 and 1 and somewhat harder to reproduce scenario (= less > deterministic, more racy) it's possible we now see what I suspected can > happen here: > https://lore.kernel.org/all/37dbd4d0-c125-6694-dec4-6322ae5b6dee@suse.cz/ > In that there are places reading the migratetype outside of zone lock. Good point! I had already written up a fix for this issue. Still trying to get the reproducer to work, but attaching the fix below in case somebody with a working environment beats me to it. --- >From 94f67bfa29a602a66014d079431b224cacbf79e9 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 15 Sep 2023 16:23:38 -0400 Subject: [PATCH] mm: page_alloc: close migratetype race between freeing and stealing There are three freeing paths that read the page's migratetype optimistically before grabbing the zone lock. When this races with block stealing, those pages go on the wrong freelist. The paths in question are: - when freeing >costly orders that aren't THP - when freeing pages to the buddy upon pcp lock contention - when freeing pages that are isolated - when freeing pages initially during boot - when freeing the remainder in alloc_pages_exact() - when "accepting" unaccepted VM host memory before first use - when freeing pages during unpoisoning None of these are so hot that they would need this optimization at the cost of hampering defrag efforts. Especially when contrasted with the fact that the most common buddy freeing path - free_pcppages_bulk - is checking the migratetype under the zone->lock just fine. In addition, isolated pages need to look up the migratetype under the lock anyway, which adds branches to the locked section, and results in a double lookup when the pages are in fact isolated. Move the lookups into the lock. Reported-by: Vlastimil Babka Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 47 +++++++++++++++++------------------------------ 1 file changed, 17 insertions(+), 30 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0ca999d24a00..d902a8aaa3fd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1222,18 +1222,15 @@ static void free_pcppages_bulk(struct zone *zone, int count, spin_unlock_irqrestore(&zone->lock, flags); } -static void free_one_page(struct zone *zone, - struct page *page, unsigned long pfn, - unsigned int order, - int migratetype, fpi_t fpi_flags) +static void free_one_page(struct zone *zone, struct page *page, + unsigned long pfn, unsigned int order, + fpi_t fpi_flags) { unsigned long flags; + int migratetype; spin_lock_irqsave(&zone->lock, flags); - if (unlikely(has_isolate_pageblock(zone) || - is_migrate_isolate(migratetype))) { - migratetype = get_pfnblock_migratetype(page, pfn); - } + migratetype = get_pfnblock_migratetype(page, pfn); __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); } @@ -1249,18 +1246,8 @@ static void __free_pages_ok(struct page *page, unsigned int order, if (!free_pages_prepare(page, order, fpi_flags)) return; - /* - * Calling get_pfnblock_migratetype() without spin_lock_irqsave() here - * is used to avoid calling get_pfnblock_migratetype() under the lock. - * This will reduce the lock holding time. - */ - migratetype = get_pfnblock_migratetype(page, pfn); - spin_lock_irqsave(&zone->lock, flags); - if (unlikely(has_isolate_pageblock(zone) || - is_migrate_isolate(migratetype))) { - migratetype = get_pfnblock_migratetype(page, pfn); - } + migratetype = get_pfnblock_migratetype(page, pfn); __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -2404,7 +2391,7 @@ void free_unref_page(struct page *page, unsigned int order) struct per_cpu_pages *pcp; struct zone *zone; unsigned long pfn = page_to_pfn(page); - int migratetype, pcpmigratetype; + int migratetype; if (!free_pages_prepare(page, order, FPI_NONE)) return; @@ -2416,23 +2403,23 @@ void free_unref_page(struct page *page, unsigned int order) * get those areas back if necessary. Otherwise, we may have to free * excessively into the page allocator */ - migratetype = pcpmigratetype = get_pfnblock_migratetype(page, pfn); + migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, FPI_NONE); return; } - pcpmigratetype = MIGRATE_MOVABLE; + migratetype = MIGRATE_MOVABLE; } zone = page_zone(page); pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, pcpmigratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, migratetype, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE); } pcp_trylock_finish(UP_flags); } @@ -2465,7 +2452,7 @@ void free_unref_page_list(struct list_head *list) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(is_migrate_isolate(migratetype))) { list_del(&page->lru); - free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE); + free_one_page(page_zone(page), page, pfn, 0, FPI_NONE); continue; } } @@ -2498,8 +2485,7 @@ void free_unref_page_list(struct list_head *list) pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); - free_one_page(zone, page, pfn, - 0, migratetype, FPI_NONE); + free_one_page(zone, page, pfn, 0, FPI_NONE); locked_zone = NULL; continue; } @@ -6537,13 +6523,14 @@ bool take_page_off_buddy(struct page *page) bool put_page_back_buddy(struct page *page) { struct zone *zone = page_zone(page); - unsigned long pfn = page_to_pfn(page); unsigned long flags; - int migratetype = get_pfnblock_migratetype(page, pfn); bool ret = false; spin_lock_irqsave(&zone->lock, flags); if (put_page_testzero(page)) { + unsigned long pfn = page_to_pfn(page); + int migratetype = get_pfnblock_migratetype(page, pfn); + ClearPageHWPoisonTakenOff(page); __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); if (TestClearPageHWPoison(page)) { -- 2.42.0