From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AD45C27C44 for ; Fri, 31 May 2024 10:17:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C3F236B0099; Fri, 31 May 2024 06:17:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BEF516B009A; Fri, 31 May 2024 06:17:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB79A6B009C; Fri, 31 May 2024 06:17:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8D3736B0099 for ; Fri, 31 May 2024 06:17:49 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 223921A1633 for ; Fri, 31 May 2024 10:17:49 +0000 (UTC) X-FDA: 82178289858.12.B86CFA3 Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com [209.85.208.171]) by imf07.hostedemail.com (Postfix) with ESMTP id 155BF4000D for ; Fri, 31 May 2024 10:17:46 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mFtRVV9U; spf=pass (imf07.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.171 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717150667; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TWqmSKkMMPozlvG9PvsxZ7J4hb8xyIyUVNhn6fNsCNY=; b=mneaW1B0YAdEVXV8H+SajRVlZzFsm9N2r1mrbhNS78HY9VXPazC6V9Y9GQmcMB7YimS8xL HYvttfTvyFA4uDv1QUEpkvwG53JhNVEfK1plG0RS04UwBqp0uiswuTD3G5K5mcfkd2GAph qH/G/vj0vBWYHpVW8wnvh67cAILZQTQ= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mFtRVV9U; spf=pass (imf07.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.171 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717150667; a=rsa-sha256; cv=none; b=e75tvRCxtCTwt0owyjjeNfznXSpuUKXOLVCkTUq0dsdBvFvhD/Tx5X66oKzKD9JTuNxB8K slNLRBevNSp+gNOeybFcmBQVE0r0xY3zvY4pZIIUZF5nrQPZAfEoK1h0yO/FZPgoqYC4Mi YtuvgFq128r/h2viNE31yNag3hw+Oeg= Received: by mail-lj1-f171.google.com with SMTP id 38308e7fff4ca-2e3efa18e6aso15858941fa.0 for ; Fri, 31 May 2024 03:17:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717150665; x=1717755465; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TWqmSKkMMPozlvG9PvsxZ7J4hb8xyIyUVNhn6fNsCNY=; b=mFtRVV9UO7kERaCClZ7YapPwFUHT//TrCcVnGxH395Zy/unYsuKPq0wXW6yWAcYGkM cFT2MEjnGd9OdJDHCvJuSM1tfuLMH6pVo1JPBadssp7ijjvt1J1ERDu348ha3Cn6KZ0F wT0fH6x2E3e9uVLV9E8EOQTqcTvzC1AonyIf2XZjPh53FZLqy1OgL/vz7/QQ2WAnxk8+ lm9cTLMb78uSm6uu8357g9MlcTTqgOtR1YCJxbXeVacqxRBP8tOi70DIgy8x14oF7AoT ETZbn1zNEiXahr0JIgAoJKgH+YYklZwrZ2eWi+F5+z7ZvJQoVSNBwNAtJV0C8dxGoMFj UlSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717150665; x=1717755465; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TWqmSKkMMPozlvG9PvsxZ7J4hb8xyIyUVNhn6fNsCNY=; b=WFZdawq0Q60HDmbqzblK3RSzzUBivBKP/StPdFFljrf+7CGN6cnxWINi0rp/b0sqVu vlS9799P7VP1LuoU2Kp/YCffBEe4Lq/E/gHYe0i6aZsVDvig/lZ5dC/6Wxt7nztdlmSq 1JDQfpSJtb0CFCJ181uFS00QlDzZxYT2Z6PExZNDZ6vnDjuSMl/v/NMCwLdHZ+aodFZF U9VSzSOyYef9RYPe2BK0yws2lUCxY4bTUXYnp4QbEeOVPHnuhACc0IAtQA7zOJG1VynL bc9paukju9qcputPWKdiK9hyC47wntSGKURMrgpSbEwE6RX8vj2coljlB4h55dnpHcok ezww== X-Forwarded-Encrypted: i=1; AJvYcCXq/7nlaJ3wHQm+3z6Hu0iQZsIQNNp1eDi3WcfGdE5/a/pmuStb7v91kPVIENtyht3tWJ28D+4llZhFNTqmcME+aWU= X-Gm-Message-State: AOJu0Yya5FA/RN39RtjN4qMpWlKIMAdQn1qE3MRqg1cvKXjW0VCr3SPv mNOfA2r0DQxK0oiwZnI4jh1NFUigUSgUoPj3gP/wCKZpmMaMUElDVJBJsCcrajjIaqZhyT+JJ+h OvJIoiV0RbdDHpm/fA+o1tQQIEnU= X-Google-Smtp-Source: AGHT+IFxiVOz/AQ6oFP/EWN8n0vVEn6fqbzgHGw6qCJpt+kwdpzR8gQlWbP8gFy7jvsJTfNAI60ggmalUAnQN2lqbpg= X-Received: by 2002:a2e:8e62:0:b0:2e9:550b:f28f with SMTP id 38308e7fff4ca-2ea950d195cmr4641081fa.6.1717150664874; Fri, 31 May 2024 03:17:44 -0700 (PDT) MIME-Version: 1.0 References: <20240531030520.1615833-1-zhaoyang.huang@unisoc.com> In-Reply-To: From: Zhaoyang Huang Date: Fri, 31 May 2024 18:17:33 +0800 Message-ID: Subject: Re: [PATCHv3] mm: fix incorrect vbq reference in purge_fragmented_block To: Barry Song <21cnbao@gmail.com> Cc: Uladzislau Rezki , "zhaoyang.huang" , Andrew Morton , Christoph Hellwig , Lorenzo Stoakes , Baoquan He , Thomas Gleixner , hailong liu , linux-mm@kvack.org, linux-kernel@vger.kernel.org, steve.kang@unisoc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 155BF4000D X-Stat-Signature: sefzryn54qj33ymrn5im5y3fcfxums3a X-Rspam-User: X-HE-Tag: 1717150666-240027 X-HE-Meta: U2FsdGVkX1/oM++rZwO3E4VYw3Zmw3p2uctqeTQuv+AvWdWSN8iK2iXwe8yNhEidpuoLSqLw63yTKVU2i3hpWs3jX+PWlAcnkwoK0aiU6BP4AW33Yo06l/V04rIvpEVdxkh1M433PJ1PiCC/pwd/uGF8mSWnD/pnbZX8jSnDxYLcRoiZnMh+FlQRrARYep7OC+BMfYKj0300ahgIqPmMm1RH4dnru935XhLtsB3WALUi/xHBgSaZhRzZWitEHJvj9WPwOl+pIQ512p8ZUeVbDiRSIbhL3J1oTHhxZ/9BY0V8W20bkrJIV3qkARd3fCUtL+t6pQIoI4lWDD9y+uD6wfku7GxrSAgRyjcjeAU3Q+E88tf36jJtatBHa1nyVivPTHxItvzCY9EkfF+Dq692NH53I1gcKBgABcByMWb3G0RR3ztZA92wZG5v92uEiBW12t52X6UWq9/IeYVCU9SyQvmQG3GdXU9tYD0UFFF95yxOExUr8HV3/UrCUuplKxycKAaEY4NMdTPsix0CpIsVJCtmHHrr9o6SQXkjUGk+uNKlurColVXQ3IXVhO+DC87lWl/38j+je9gB/CDiyNxAfaPtyP8P3qeilEQAWofNz7Ag2TM9i7RCibSDyxQdKBM+oRGxA6Kzy+d/dcpeGEXK4x4bosdETcKzYRhPoPW4f0UHi8ikLEJJLaMW9W4BflEgLkMiVASzGsTQjpvk+TdQlAHkpaMqXAC2plSmxdrh63Orlv0fO540gWX38WxwlG4ya96auOuuCqgZHzxOhZ6jQPUUbOSJKWvRkEKZcZF4JoYOAdPujVnz2kLjDKlS/a8Rgx/luWbpULXol1VwzxQe7wVm/Ds7bCiuVK12i6H211KF4yIWyA4W1OP7b+qXEhav5x1BpbHdZ3455vxVDz9leFlFEmSlTBNC7H7O8+ubZrHSPTYyx1xiuVTQH4hqX5ikkEhyvS+gwyen3GRGbNG O8FGQNX7 zRYVz58ORypFsFlzmWGczfZYRmx5BRgAJD4JO5yu9F1EghEnlK3ebEpUptRaaI9CriSkwk6LQ/igHcA8nV/tYRoaGKN1pt8PxXwcjEGhNVqXWlefPFRONXJWEE8Hk7Bdig/Cbyc1abUmNy3iRXUk7vscB/ZoCslJi8IbZiygj/UerqS87JLkMtqh3LDcbdQe1q4K5YXh+FxcddC1LtyXOYSMAiXJW+/BPBNu/z0qErcCWWDL7Fx1XqksfOlGfE600MdxSJQ0VW1TAzufXZWeaWlT0alolNcYD8qPnEY2WTDgLPu9raZYylr4aDIvIPeG5VN6Ihyck3yM8IrPkRJTwTLZdFUSWxOu+jk2xxX54TjqJkBwoXl7Sv8cE84I0PxGQ9lK/5u0GjxVF83kl3R+4V8Lz02Lc/PX5jd0NHQgBPzdroGsgmXbWJs4bCXIPtVOeiMyqZvW6T11cXJn5+x8VVHA5R5L8cxQ3LMnfPXKYPYv/zr6B7sVhKY9gVejvDsi2NcIlWso65f3h7IizuJLyN4YqlO4eVdlGpQkF0lMA9R5OxS8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 31, 2024 at 5:56=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Fri, May 31, 2024 at 9:13=E2=80=AFPM Zhaoyang Huang wrote: > > > > On Fri, May 31, 2024 at 4:05=E2=80=AFPM Uladzislau Rezki wrote: > > > > > > On Fri, May 31, 2024 at 11:05:20AM +0800, zhaoyang.huang wrote: > > > > From: Zhaoyang Huang > > > > > > > > vmalloc area runs out in our ARM64 system during an erofs test as > > > > vm_map_ram failed[1]. By following the debug log, we find that > > > > vm_map_ram()->vb_alloc() will allocate new vb->va which correspondi= ng > > > > to 4MB vmalloc area as list_for_each_entry_rcu returns immediately > > > > when vbq->free->next points to vbq->free. That is to say, 65536 tim= es > > > > of page fault after the list's broken will run out of the whole > > > > vmalloc area. This should be introduced by one vbq->free->next poin= t to > > > > vbq->free which makes list_for_each_entry_rcu can not iterate the l= ist > > > > and find the BUG. > > > > > > > > [1] > > > > PID: 1 TASK: ffffff80802b4e00 CPU: 6 COMMAND: "init" > > > > #0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc > > > > #1 [ffffffc08006b040] __schedule at ffffffc08111dde0 > > > > #2 [ffffffc08006b0a0] schedule at ffffffc08111e294 > > > > #3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f= 0 > > > > #4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c > > > > #5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8 > > > > #6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834 > > > > #7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803e= bc3c > > > > #8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc > > > > #9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0 > > > > > > > > Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utili= zed blocks") > > > > > > > > Suggested-by: Hailong.Liu > > > > Signed-off-by: Zhaoyang Huang > > > > > > > Is a problem related to run out of vmalloc space _only_ or it is a pr= oblem > > > with broken list? From the commit message it is hard to follow the re= ason. > > > > > > Could you please post a full trace or panic? > > Please refer to the below scenario for how vbq->free broken. > > step 1: new_vmap_block is called in CPU0 and get vb->va->addr =3D > > 0xffffffc000400000 > > step 2: vb is added to CPU1's vbq->vmap_block(xarray) by xa =3D > > addr_to_vb_xa(va->va_start); > > fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully > > utilized blocks") introduce a per_cpu like xarray mechanism to have vb > > be added to the corresponding CPU's xarray but not local. > > step 3: vb is added to CPU0's vbq->free by > > list_add_tail_rcu(&vb->free_list, &vbq->free); > > step 4 : purge_fragmented_blocks get vbq of CPU1 and then get above vb > > step 5 : purge_fragmented_blocks delete vb from CPU0's list with > > taking the vbq->lock of CPU1 > > step 5': vb_alloc on CPU0 could race with step5 and break the CPU0's vb= q->free > > > > As fc1e0d980037 solved the problem of staled TLB issue, we need to > > introduce a new variable to record the CPU in vmap_block instead of > > reverting to iterate the list(will leave wrong TLB entry) > > > > > > > --- > > > > v2: introduce cpu in vmap_block to record the right CPU number > > > > v3: use get_cpu/put_cpu to prevent schedule between core > > > > --- > > > > --- > > > > mm/vmalloc.c | 12 ++++++++---- > > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > > index 22aa63f4ef63..ecdb75d10949 100644 > > > > --- a/mm/vmalloc.c > > > > +++ b/mm/vmalloc.c > > > > @@ -2458,6 +2458,7 @@ struct vmap_block { > > > > struct list_head free_list; > > > > struct rcu_head rcu_head; > > > > struct list_head purge; > > > > + unsigned int cpu; > > > > }; > > > > > > > > /* Queue of free and dirty vmap blocks, for allocation and flushin= g purposes */ > > > > @@ -2586,10 +2587,12 @@ static void *new_vmap_block(unsigned int or= der, gfp_t gfp_mask) > > > > return ERR_PTR(err); > > > > } > > > > > > > > + vb->cpu =3D get_cpu(); > > > > vbq =3D raw_cpu_ptr(&vmap_block_queue); > > > > spin_lock(&vbq->lock); > > > > list_add_tail_rcu(&vb->free_list, &vbq->free); > > > > spin_unlock(&vbq->lock); > > > > + put_cpu(); > > > > > > > Why do you need get_cpu() here? Can you go with raw_smp_processor_id(= ) > > > and then access the per-cpu "vmap_block_queue"? get_cpu() disables > > > preemption and then a spin-lock is take within this critical section. > > > From the first glance PREEMPT_RT is broken in this case. > > get_cpu here is to prevent current task from being migrated to other > > COREs before we get the per_cpu vmap_block_queue. Could you please > > suggest a correct way of doing this? > > not quite sure if you have to pay the price of disabling preempt. > Does the below Hailong suggested fix your problem? > > vb->cpu =3D raw_smp_processor_id(); > vbq =3D per_cpu_ptr(&vmap_block_queue, vb->cpu); emm, it looks like 2 could race with 2' which also leads to wrong vbq->free status, right? taskA 1. CPU0: vb->cpu =3D raw_smp_processor_id(); 2. CPU1: vbq =3D per_cpu_ptr(&vmap_block_queue, vb->cpu(0)); taskB 2'. CPU0: static void *vb_alloc(unsigned long size, gfp_t gfp_mask) { rcu_read_lock(); vbq =3D raw_cpu_ptr(&vmap_block_queue); list_for_each_entry_rcu(vb, &vbq->free, free_list) { > > > > > > > > > I am on a vacation, responds can be with delays. > > > > > > -- > > > Uladzislau Rezki > > Thanks > Barry