From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9960CD8CAC for ; Tue, 10 Oct 2023 17:02:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 643BD8D0008; Tue, 10 Oct 2023 13:02:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F4648D0002; Tue, 10 Oct 2023 13:02:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BC678D0008; Tue, 10 Oct 2023 13:02:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3D10D8D0002 for ; Tue, 10 Oct 2023 13:02:04 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 14896140380 for ; Tue, 10 Oct 2023 17:02:04 +0000 (UTC) X-FDA: 81330169368.13.9731940 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by imf09.hostedemail.com (Postfix) with ESMTP id 53584140115 for ; Tue, 10 Oct 2023 17:01:26 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=AGGLvF1m; spf=pass (imf09.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696957288; a=rsa-sha256; cv=none; b=pLmZUmgh5hCkk7V+LlPEtPnMKfZNpZtMrnskIsbxTY9zgqQduf/4nIjYLdDMsqDUHXavN/ PZz8ytXnyi+rJ2VlWEcdYW0crgvyT8a5+xguOYPqmF1XCmLWqTOHXhJN8xeQ0cQmkzWpqH QarT2+AlUxlbWU66FSAE7eTY1XE9OfY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=AGGLvF1m; spf=pass (imf09.hostedemail.com: domain of usama.arif@bytedance.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=usama.arif@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696957288; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6UedTWOigHlMvbjw9L2/ZKDZgaAgA7T2ZbalaNnedvY=; b=0uStZzoIUrBwseRQFhXF4GiNhsxa+GrPaiiWsB9Nat2+5JF6CXs4qMixUczdo7DCrA4KIb 3XMRaqZ5Vgdcjoq460nNegZILUn4JzqGX+295/M8iJJfgtjmwI6VLJOpo5wqy8sfaYZil3 ATk9V297UTM1QACOv76aXWedgOGtcZg= Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-3231dff4343so29382f8f.0 for ; Tue, 10 Oct 2023 10:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1696957285; x=1697562085; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=6UedTWOigHlMvbjw9L2/ZKDZgaAgA7T2ZbalaNnedvY=; b=AGGLvF1mwgjZMo9cCWNyoUqijYXjlEFH1bxbqXTi1uPn0vxG9uuKIpg3fj+wGC+YJ4 TUWC3Y7MmX71J8lQHex8W1TewSWUaaPmvWfUqXoc6JTInJClNqtinUJ9NTDaEXHexr8/ 561SIA8PHfGRJWPKaxl4C5XFGnB/Sio5z5i89mfXQ3g1c1GZbBOJBg9fC6/KELI40sIr Ac74A9KlS/ZtEb27MOBmX/5hpdi6No5wPuP5Grt5eLYvmTYhyxz907dGKcf5L83Sxkwl 6rzbyVFqfHH0To8c54z3isc4TQkoVwGCWrnkEO8xmSxyyV+LvcTRjsy/PmsI/cNY9/8y qbpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696957285; x=1697562085; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6UedTWOigHlMvbjw9L2/ZKDZgaAgA7T2ZbalaNnedvY=; b=SShiVofZ9Vbiz1diuZjhHFWL+if0QzOJrBTEhDUg+jeTJSX4zASO4HlmPVuv159Nbl lbd7l/6O3Ovs3ETcBGzS4UgrVpMurgMUvLOMGixQ7Qef5XKxck5Jbf2BKrs87rJ3E5WA eNnPgFq5dqy2T6m8IwF9cqIzxqoukfloqJ0pMxwHEmQBChOSlRlVDrG7TqOZGJKgVTLS 14KJ0gEdDPmRr0cjs/Cj21dxPWACAm4h+m+3QFAV/aFHYdecy5xC/hynfuPBDsWM9k/9 LAIdwZAmw0pV4waxpVZeJ2V275WCqfWdIBGXKAJyAWMaGC5xmGbGUH6+CDQWCkhMBVFq aeyQ== X-Gm-Message-State: AOJu0Yzdc3DIrfGym/idxAPTDqVUreg3fyw41Xe1a9f1fJGZaCZzO8OQ 5rWb7d6HuGpXDC+bDMNKe8oyng== X-Google-Smtp-Source: AGHT+IFSeO9aC0SmT/p48t7Ny0Ng06fLg3I9jSu7tyfPX7GoKbPaa/NmSpZTA8FDU2RqIFgvVJqjDg== X-Received: by 2002:adf:ec4d:0:b0:323:3336:b6ea with SMTP id w13-20020adfec4d000000b003233336b6eamr13612018wrn.27.1696957284844; Tue, 10 Oct 2023 10:01:24 -0700 (PDT) Received: from ?IPV6:2a02:6b6a:b5c7:0:4c48:be29:750e:6e92? ([2a02:6b6a:b5c7:0:4c48:be29:750e:6e92]) by smtp.gmail.com with ESMTPSA id z5-20020a7bc7c5000000b00401bbfb9b2bsm954wmk.0.2023.10.10.10.01.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Oct 2023 10:01:24 -0700 (PDT) Message-ID: <6b1d9860-3581-0b99-4fb7-4c1f5a2a05f3@bytedance.com> Date: Tue, 10 Oct 2023 18:01:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [External] Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, muchun.song@linux.dev, songmuchun@bytedance.com, fam.zheng@bytedance.com, liangma@liangbit.com, punit.agrawal@bytedance.com, Konrad Dybcio References: <20231009145605.2150897-1-usama.arif@bytedance.com> <20231010012345.GA108129@monkey> Content-Language: en-US From: Usama Arif In-Reply-To: <20231010012345.GA108129@monkey> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 53584140115 X-Stat-Signature: waj99o3u3q8psg1q8n18bgy66366inrg X-Rspam-User: X-HE-Tag: 1696957286-324217 X-HE-Meta: U2FsdGVkX1/i5bwz/2mvqCKRKyEwBKQKPmx5LtB9cyucMqPJTwdVhoL3PHQkcC4h+kohWCgON4PW+8GEX/tHTEUYRXvCZKlH+R2aYHYMEN10NBiKHZs3BPBoOfzOqYgkZoUrwixq7uMvRGupdhcz0Zib4fZ4iYlpmIJrA0WNeJQnI4y1PtfSa0i/CYOJslqkO0GAV195cBWiInpvOpHT9VTlkP6BKecmP93TzcggSUMgLInywHedfkPBmE1eA29y5wXi/U4y5hc6Fp+tOpPJDjGE3WCkgPl/wtSVLFn9F9ify2oWmyfJhz9m7SP6xSX2dnroHu+rrWvvl9nV4HRqqoV8FOQEUucRnhTAtVGD8aKREvw8dPQhg0EJt0/ecblDKIhbaC0B0PeeCZOeh+Ahv1P/J4ceNFPBAiJkYZqR75UVGU7pxYbA7OmXGASLQ8IyxchJfnJ5xDDQBdODemSHoTRm3HVqFXcgDJmwbWvIFOQzqAcRNv+nkf/QTV40q1xQZa5xmNjXU8mB1z1RBNzvMalMf1GgIByPgMl+6NpVL2Ez7gI81/AVsYkGjMhUGEnIifxTusulRafJkEWbc9V+Z114uXh4Z2v8C/miZIQ1jbF+4qmHXwKkc+rxiHCoGILOdjJNVDPrIccrayiRkmmdIhzRJGBLw7MxDxMPcs4hLEAhjtRDQkh/MBBJXmTT3jMDlpDDt9/H2cKeyRSloMdriqACrera3KuDdj9fIKbHS6tyKa667Y/YuG2K72lSnv2rgF4zDdxHnzTHuWN0uFX8LCKguX8cSIdguoOuY6zg38X9Cd3X9PvhaerMRnoVWPYM5f/lYduYz9L0Dux/e49llV4fTtGD/UJhWu0xlqd3pobPb7zJSFfppd1jxah0P9qaO/MrIjkw6vNj8w+/vTQpK/NMOXNhNMQ8eFDxem3rpjhm6H/mqGTzb6rYxrwpXG3SL/hoIkXJA7cfA/oZ6FB 7WPibCvh Qc9wODNRSDgqcadfWe06heFbD5B//XbrCD1aJqjFQkR9kyTuE8HX3x8nUCYfhHUj4cyCJwQX2xgPYNHpkyA1vq6mjHouZqBDWuOaR+FgQt9iVwsWRv4maGDrCjnFW/XclMK+VHGAqRvb2MSQUoNxxLouHK2i2usunkYnjVN02ocL5agELG9YezcxYSpu3j4TpZ/NzxETmBDj5MRtQF0c1/erku7VobrhBVXKBxkDL5V/0a03nmu7BCFROUCTjthooahDZZ6CnxaI6KauNWKdBKx0ABgWNJcQPsCNfmmuLHZtPmcZMsxDe3ZJQPhH9NV0FjMxydoNzETde+gBPkeU5ldtQtXxVJJSYHvcx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/10/2023 02:23, Mike Kravetz wrote: > On 10/09/23 15:56, Usama Arif wrote: >> Calling prep_and_add_allocated_folios when allocating gigantic pages >> at boot time causes the kernel to crash as folio_list is empty >> and iterating it causes a NULL pointer dereference. Call this only >> for non-gigantic pages when folio_list has entires. > > Thanks! > > However, are you sure the issue is the result of iterating through a > NULL list? For reference, the routine prep_and_add_allocated_folios is: > Yes, you are right, it wasnt an issue with the list, but the lock. If I do the below diff it boots. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 73803d62066a..f428af13e98a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2178,18 +2178,19 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h, static void prep_and_add_allocated_folios(struct hstate *h, struct list_head *folio_list) { + unsigned long flags; struct folio *folio, *tmp_f; /* Send list for bulk vmemmap optimization processing */ hugetlb_vmemmap_optimize_folios(h, folio_list); /* Add all new pool pages to free lists in one lock cycle */ - spin_lock_irq(&hugetlb_lock); + spin_lock_irqsave(&hugetlb_lock, flags); list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { __prep_account_new_huge_page(h, folio_nid(folio)); enqueue_hugetlb_folio(h, folio); } - spin_unlock_irq(&hugetlb_lock); + spin_unlock_irqrestore(&hugetlb_lock, flags); } /* FYI, this was an x86 VM with kvm enabled. Thanks, Usama > static void prep_and_add_allocated_folios(struct hstate *h, > struct list_head *folio_list) > { > struct folio *folio, *tmp_f; > > /* Add all new pool pages to free lists in one lock cycle */ > spin_lock_irq(&hugetlb_lock); > list_for_each_entry_safe(folio, tmp_f, folio_list, lru) { > __prep_account_new_huge_page(h, folio_nid(folio)); > enqueue_hugetlb_folio(h, folio); > } > spin_unlock_irq(&hugetlb_lock); > } > > If folio_list is empty, then the only code that should be executed is > acquiring the lock, notice the list is empty, release the lock. > > In the case of gigantic pages addressed below, I do see the warning: > > [ 0.055140] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled) > [ 0.055149] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4345 lockdep_hardirqs_on_prepare+0x1a8/0x1b0 > [ 0.055153] Modules linked in: > [ 0.055155] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc4+ #40 > [ 0.055157] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 > [ 0.055158] RIP: 0010:lockdep_hardirqs_on_prepare+0x1a8/0x1b0 > [ 0.055160] Code: 00 85 c0 0f 84 5e ff ff ff 8b 0d a7 20 74 01 85 c9 0f 85 50 ff ff ff 48 c7 c6 48 25 42 82 48 c7 c7 70 7f 40 82 e8 18 10 f7 ff <0f> 0b 5b e9 e0 d8 af 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 > [ 0.055162] RSP: 0000:ffffffff82603d40 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 > [ 0.055164] RAX: 0000000000000000 RBX: ffffffff827911e0 RCX: 0000000000000000 > [ 0.055165] RDX: 0000000000000004 RSI: ffffffff8246b3e1 RDI: 00000000ffffffff > [ 0.055166] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000 > [ 0.055166] R10: ffffffffffffffff R11: 284e4f5f4e524157 R12: 0000000000000001 > [ 0.055167] R13: ffffffff82eb6316 R14: ffffffff82603d70 R15: ffffffff82ee5f70 > [ 0.055169] FS: 0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000 > [ 0.055170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.055171] CR2: ffff88847ffff000 CR3: 000000000263a000 CR4: 00000000000200b0 > [ 0.055174] Call Trace: > [ 0.055174] > [ 0.055175] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0 > [ 0.055177] ? __warn+0x81/0x170 > [ 0.055181] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0 > [ 0.055182] ? report_bug+0x18d/0x1c0 > [ 0.055186] ? early_fixup_exception+0x92/0xb0 > [ 0.055189] ? early_idt_handler_common+0x2f/0x40 > [ 0.055194] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0 > [ 0.055196] trace_hardirqs_on+0x10/0xa0 > [ 0.055198] _raw_spin_unlock_irq+0x24/0x50 > [ 0.055201] hugetlb_hstate_alloc_pages+0x311/0x3e0 > [ 0.055206] hugepages_setup+0x220/0x2c0 > [ 0.055210] unknown_bootoption+0x98/0x1d0 > [ 0.055213] parse_args+0x152/0x440 > [ 0.055216] ? __pfx_unknown_bootoption+0x10/0x10 > [ 0.055220] start_kernel+0x1af/0x6c0 > [ 0.055222] ? __pfx_unknown_bootoption+0x10/0x10 > [ 0.055225] x86_64_start_reservations+0x14/0x30 > [ 0.055227] x86_64_start_kernel+0x74/0x80 > [ 0.055229] secondary_startup_64_no_verify+0x166/0x16b > [ 0.055234] > [ 0.055235] irq event stamp: 0 > [ 0.055236] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 0.055238] hardirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 0.055239] softirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 0.055240] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 0.055240] ---[ end trace 0000000000000000 ]--- > > This is because interrupts are not enabled this early in boot, and the > spin_unlock_irq() would incorrectly enable interrupts too early. I wonder > if this 'warning' could translate to a panic or NULL deref under certain > configurations? > > Konrad, I am interested to see if this addresses your booting problem. But, > your stack trace is a bit different. My 'guess' is that this will not address > your issue. If it does not, can you try the following patch? This > applies to next-20231009.