From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9F8BC7EE23 for ; Tue, 23 May 2023 09:35:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 645B7900003; Tue, 23 May 2023 05:35:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F513900002; Tue, 23 May 2023 05:35:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 496CB900003; Tue, 23 May 2023 05:35:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 39EAC900002 for ; Tue, 23 May 2023 05:35:40 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F19E9AE279 for ; Tue, 23 May 2023 09:35:39 +0000 (UTC) X-FDA: 80821012398.19.548773F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 0848C40006 for ; Tue, 23 May 2023 09:35:37 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=h8cjX3LK; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684834538; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qlMG/EBQ1MQub4o4aoV0IwCgBW0dEWXUI8VEDaaPMDo=; b=BBnGiz63LwVBhU3cZRneoHWsFwsC5g08EsEpMJIyscW+Lsm1bPFZr0BWe1bW70e/N39Twl TqNYVA0FCG9fXRLiv2sqokhzObtJUZYn7OW1VPRVvp8gt2lX1bq/hsjGNJXfAdSCn1JcA7 JoJVSkpqA49kTltTcBf9o66+EgBzAow= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=h8cjX3LK; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684834538; a=rsa-sha256; cv=none; b=wtkz1e/KWEQ3FIxRaAJn/aDBGAM3Kfflj3g7V7nWS3JuHYm6ZsAzE0VvXmjzmjXxyVhBET cXcxHvTFifV7+wafh1H6ZomeIsfjiuvlPSkL61Ft2c0c7E4H6+dhxIvuQuTFudbMudfaKB fop0v95jMcI/1GkQlkFPd4NTFQLiI1Y= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684834537; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=qlMG/EBQ1MQub4o4aoV0IwCgBW0dEWXUI8VEDaaPMDo=; b=h8cjX3LKqE67+kpzGQWNeG5Wg/ijotMFJy3+L1uBviZ5dunQgXEhB+NbXgZ6gd8KnmsP8o azRmpfgBwNgU2MW1zyzAqdht/3f8GWKo3XrInRWL0pzkurxYM3Z0Zinxk6ruNsUuQDgSv3 SkiW9e6oMm+X1eKtFwz93e3Y5QC4Fec= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-137-IRE-HTaHNJG89zc-HTzJLw-1; Tue, 23 May 2023 05:35:35 -0400 X-MC-Unique: IRE-HTaHNJG89zc-HTzJLw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 34E5F2A59573; Tue, 23 May 2023 09:35:35 +0000 (UTC) Received: from localhost (ovpn-12-35.pek2.redhat.com [10.72.12.35]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6725E492B0A; Tue, 23 May 2023 09:35:33 +0000 (UTC) Date: Tue, 23 May 2023 17:35:30 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org, Nadav Amit Subject: Re: [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Message-ID: References: <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> <87cz2w415t.ffs@tglx> <87jzx1xou9.ffs@tglx> <87wn10wp45.ffs@tglx> <87h6s4w20b.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87h6s4w20b.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Queue-Id: 0848C40006 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 4ud31bpmxiisygoethj1owdey8xczt7u X-HE-Tag: 1684834537-43748 X-HE-Meta: U2FsdGVkX1+GY+o+EqB5gxVMbxSAR9pFtq8eE2SJrLJ8LLaVRBO3hfW1fZoEm4tTwaFbSaod368BS/6DZsfLqR8PKldhjS+QH7SImdlKOX9eVgngrjIA9irc7XYj0sawVENrDeF4OdG2tts+KVuslYaMf34mWNeUiy0nE81wDbgihp6qesyoL+5TcalDnF5Y3Ixf9QBUrN1eG91QVT0BcGaPkr8zW1ELRKNHdGXylfZy21gyxrKdYxoeQzL0cpvJ9ZG5vgXRJoLaAdb33RYbBvFTa+yM81+5KP0haSjn2E/W6y8g6uvKAPY/gMReCtRRNLLFxQDbFwtPCfulAIQuzGPkgHvyeb6ZbHR9DaGdEjpwgnT6/Jvzprdti8hey9uiM+PaO5j2l+K7eajYDfFDuWvGyIHNulkYNWFWVBZTOeTflyRhkM1fO/9fGEfnFRuKUivUK31xE4kxRSO6g26OUS6ALmGwVbPDSFaWwwd4ckD4IF2XIaq24zpGU6M7WtLV1T+zwjHQOSOICHSiE3hD6B0ZGXwKCGDlIbPQkFx3ii9nRbAypjHIhQZPjG9QufxBhWmw2+hV8TSC8rXkEWmfAua14ckTZguJXPWgJ1STux9VBWPSOvkH5RTtx+CH/PYg98II5GcNhHwUMUOQ/DXE5x149IhjVOvdiHUaEyIEYnnG6icc6VEyd2OWsq80ohExF7GMMJQoA/nP1v4H2ViLMHVAD/9k20dcFKgiudzvlAW+EEd8Xzz5ZFqub5+yu6nIC29ighQ0hRfjSSoK8tXLjjIy0LaVLZIqftaMhDZsg5E6j4E+PzdcNKoevgwa2A1gwpUOByU/h5NT7r2WZ7Vvn0/idRJ+unmHp0Rjl+V1qFA7szfsQryHKkyQJPHNh07u4mycQcP+H1xjCvmA6iVoK3zDbQUsqqefuBf3foRYHeENfRlI1o7s2aV0no9g18NHS8EuXSGt5dInra7q+xv 6o7kdRMh HacEJp/bUhCWkWfw/5j41PNq7GwQ+0d47i+M1XW6+80hac+bdukPSyUlcr8Hto8SKi8ccFELjRzkrXBD6kNoebp7kF8Be2wb8jQDs2qqSTQOnI4+RKYsLk3kyeAWZY6TSE1ZWQQL1UAyMGaPR7uu66JmO6Uo13pcV0FWGsW1AmiRGgsV70mz2scOV8yBWQdzhXq2ANcy1O37c0czVXPSqo7qDqUeZ0avxu1C/whnZRLz9bUiIv/MXGeT0+WQbaWuRzm5hKY3T1R9wAQCeJLkOsW6nktmFphnQiUX6SGDOrmKIwN1fy6kAKacKJNbkMj0ZXiJw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 05/22/23 at 10:21pm, Thomas Gleixner wrote: > On Mon, May 22 2023 at 22:34, Baoquan He wrote: > > On 05/22/23 at 02:02pm, Thomas Gleixner wrote: > >> > @@ -1736,6 +1737,14 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > >> > list_replace_init(&purge_vmap_area_list, &local_purge_list); > >> > spin_unlock(&purge_vmap_area_lock); > >> > > >> > + vb = container_of(va, struct vmap_block, va); > >> > >> This cannot work vmap_area is not embedded in vmap_block. vmap_block::va > >> is a pointer. vmap_area does not link back to vmap_block, so there is no > >> way to find it based on a vmap_area. > > > > Oh, the code is buggy. va->flags can tell if it's vmap_block, then we > > can deduce the vb pointer. > > No. It _CANNOT_ work whether you check the flags or not. > > struct foo { > ..... > struct bar bar; > }; > > container_of(ptr_to_bar, struct foo, bar) returns the pointer to the > struct foo which has struct bar embedded. > > But > > struct foo { > ..... > struct bar *bar; > }; > > cannot do that because ptr_to_bar points to some object which is > completely disconnected from struct foo. > > Care to look at the implementation of container_of()? > > Here is what it boils down to: > > void *member_pointer = bar; > > p = (struct foo *)(member_pointer - offsetof(struct foo, bar); > > So it uses the pointer to bar and subtracts the offset of bar in struct > foo. This obviously can only work when struct bar is embedded in struct > foo. > > Lets assume that *bar is the first member of foo, i.e. offset of *bar in > struct foo is 0 > > p = (struct foo *)(member_pointer - 0); > > So you end up with > > p == member_pointer == bar > > But you won't get there because the static_assert() in container_of() > will catch that and the compiler will tell you in colourful ways. Thanks a lot, learn it now. I never noticed container_of() is not suitable for pointer member of struct case. > > Once the vmap area is handed over for cleaning up the vmap block is gone > and even if you let it stay around then the vmap area does not have any > information where to find the block. > > You'd need to have a pointer to the vmap block in vmap area or embed > vmap area into vmap block. Got it now. Embedding vmap_area into vmap_block seems not feasible because va need be reused when inserting into free_vmap_area_root/list. Adding a pointer to vmap_block looks do-able. Since vm_map_ram area doesn't have vm_struct associated with it, we can reuse the space of '->vm' to add vb pointer like below. Since in the existing code there are places where we use 'if(!va->vm)' to check if it's a normal va, we need be careful to find all of them out and replace with new and tighter checking. Will give a draft code change after all is done and testing is passed. diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..e2ba6d59d679 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -15,6 +15,7 @@ struct vm_area_struct; /* vma defining user mapping in mm_types.h */ struct notifier_block; /* in notifier.h */ struct iov_iter; /* in uio.h */ +struct vmap_block; /* in mm/vmalloc.c */ /* bits in flags of vmalloc's vm_struct below */ #define VM_IOREMAP 0x00000001 /* ioremap() and friends */ @@ -76,6 +77,7 @@ struct vmap_area { union { unsigned long subtree_max_size; /* in "free" tree */ struct vm_struct *vm; /* in "busy" tree */ + struct vmap_block *vb; /* in "busy and purge" tree */ }; unsigned long flags; /* mark type of vm_map_ram area */ }; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c0f80982eb06..d97343271e27 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2061,6 +2061,10 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) return ERR_PTR(err); } + spin_lock(&vmap_area_lock); + va->vb = vb; + spin_unlock(&vmap_area_lock); + vbq = raw_cpu_ptr(&vmap_block_queue); spin_lock(&vbq->lock); list_add_tail_rcu(&vb->free_list, &vbq->free);