From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CD3BC433F5 for ; Fri, 17 Sep 2021 08:48:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 33B6961248 for ; Fri, 17 Sep 2021 08:48:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 33B6961248 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5B13F6B0071; Fri, 17 Sep 2021 04:48:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 560DF6B0072; Fri, 17 Sep 2021 04:48:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40CC46B0073; Fri, 17 Sep 2021 04:48:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id 2E87E6B0071 for ; Fri, 17 Sep 2021 04:48:00 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 466DE184C4DD4 for ; Fri, 17 Sep 2021 08:47:59 +0000 (UTC) X-FDA: 78596437878.25.5EBE054 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf23.hostedemail.com (Postfix) with ESMTP id E391D90000A6 for ; Fri, 17 Sep 2021 08:47:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631868478; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wcuAPbm9xw3yksCit77EG+81MmjR4Nbi9Tedl5nxVZs=; b=FntBswwKecE7nH4oiznjG3rwD6OkLYAGZftya4VOolW52lePqlsvnXvj69iSumKFJ2E9HG kyn307JUVXvaDs11ltFiglUU1uTEjX3Kh3KifLP6Q+uNX9x5zENTsYtjpD5EQF5QnZ8khZ N6nTHj/4GgHitCujgJB6BzxZv3XpE7U= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-475-tbTUiXZCNK6TT4Q02En57A-1; Fri, 17 Sep 2021 04:47:57 -0400 X-MC-Unique: tbTUiXZCNK6TT4Q02En57A-1 Received: by mail-wr1-f70.google.com with SMTP id s13-20020a5d69cd000000b00159d49442cbso3445716wrw.13 for ; Fri, 17 Sep 2021 01:47:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=wcuAPbm9xw3yksCit77EG+81MmjR4Nbi9Tedl5nxVZs=; b=VddSazrkw3VqTx7brdBjmrPrAdYpx1B1kfv5+GCDFiBVA2o4qDz++nTnVbdMmBXA8u O87WmHayR2f86iQCXiL8RSmLhg3S94GfGYOT9eJcr0LU9PJMEc7KQgRBzioflDwAJD8G htHg/bfSnsyTFi0yj67N/IyBmFlLHbUyRebKXZ6cUHDpmCSQstwjgJXGnIfZON/XZCer lhBE0IWv8yCsrvfL6+ea4M1e77W+hsn0ZC/QLpOQStXWFPhh3yR9BGpSXTZzVOKuMC60 nDy6xMMsomKGUslzT2M1jZezQB1oEUO4KniSw0RHBxmlgXtt0quB52Sr2iyPvqu+Xg3r Wy9Q== X-Gm-Message-State: AOAM531IB8L5KCptpRnSrfHLbQem3Erx/iL3R14gDE+2lHmKfJf6PS4F Vhlsf+HMnC+S8vKIxv7xCqJrMliNYxdKHiU1LC9HvUQMa983yFetVyuNTd8a5iUA40PiTQeSTc+ ahfv4p2S0YBOz+0XHeKR0/ieIRAQ53VM+5OgcQIjWV0s7BZ9RpSyj/1VBlaY= X-Received: by 2002:a5d:6d8e:: with SMTP id l14mr10694176wrs.26.1631868475817; Fri, 17 Sep 2021 01:47:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJybao8lYYRsgqWQpVjn5+Se3mBigm2FdtvAQE8Fl/F+9zEOtBbl/e5Gt5erJhx/kHtztbs+yA== X-Received: by 2002:a5d:6d8e:: with SMTP id l14mr10694132wrs.26.1631868475343; Fri, 17 Sep 2021 01:47:55 -0700 (PDT) Received: from [192.168.3.132] (p5b0c655a.dip0.t-ipconnect.de. [91.12.101.90]) by smtp.gmail.com with ESMTPSA id g1sm6061083wrr.2.2021.09.17.01.47.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Sep 2021 01:47:54 -0700 (PDT) To: Uladzislau Rezki Cc: LKML , Ping Fang , Andrew Morton , Roman Gushchin , Michal Hocko , Oscar Salvador , Linux Memory Management List References: <20210908132727.16165-1-david@redhat.com> <20210916193403.GA1940@pc638.lan> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1] mm/vmalloc: fix exact allocations with an alignment > 1 Message-ID: <221e38c1-4b8a-8608-455a-6bde544adaf0@redhat.com> Date: Fri, 17 Sep 2021 10:47:54 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210916193403.GA1940@pc638.lan> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Stat-Signature: ymifey5y8i5jjq8tx3hyp5chj7tkyu48 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FntBswwK; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf23.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E391D90000A6 X-HE-Tag: 1631868478-199985 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >>> This patch looks like a KASAN specific to me. So i would like to avoi= d >>> such changes to >>> the vmalloc internals in order to simplify further maintenance and >>> keeping it generic >>> instead. >> >> There is nothing KASAN specific in there :) It's specific to exact >> applications -- and KASAN may be one of the only users for now. >> > Well, you can name it either way :) So it should not be specific by the > design, otherwise as i mentioned the allocator would be like a peace of > code that handles corner cases what is actually not acceptable. Let's not overstress the situation of adding essentially 3 LOC in order=20 to fix a sane use case of the vmalloc allocator that was not considered=20 properly by internal changes due to 68ad4a330433 ("mm/vmalloc.c: keep=20 track of free blocks for vmap allocation"). >=20 >>> >>> Currently the find_vmap_lowest_match() adjusts the search size >>> criteria for any alignment >>> values even for PAGE_SIZE alignment. That is not optimal. Because a >>> PAGE_SIZE or one >>> page is a minimum granularity we operate on vmalloc allocations thus >>> all free blocks are >>> page aligned. >>> >>> All that means that we should adjust the search length only if an >>> alignment request is bigger than >>> one page, in case of one page, that corresponds to PAGE_SIZE value, >>> there is no reason in such >>> extra adjustment because all free->va_start have a page boundary anyw= ay. >>> >>> Could you please test below patch that is a generic improvement? >> >> I played with the exact approach below (almost exactly the same code i= n >> find_vmap_lowest_match()), and it might make sense as a general improv= ement >> -- if we can guarantee that start+end of ranges are always PAGE-aligne= d; I >> was not able to come to that conclusion... > All free blocks are PAGE aligned that is how it has to be. A vstart als= o > should be aligned otherwise the __vunmap() will complain about freeing > a bad address: >=20 > > if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad address (%p)\= n", > addr)) > return; > >=20 > BTW, we should add an extra check to the alloc_vmap_area(), something l= ike > below: >=20 > > if (!PAGE_ALIGNED(ALIGN(vstart, align))) { > WARN_ONCE(1, "vmalloc: vstart or align are not page aligned (0= x%lx, 0x%lx)\n", > vstart, align); > return ERR_PTR(-EBUSY); > } > >=20 > to check that passed pair of vstart and align are correct. >=20 Yes we better should. >> >> vmap_init_free_space() shows me that our existing alignment code/check= s >> might be quite fragile. >> > It is not important to page align a first address. As i mentioned befor= e > vstart and align have to be correct and we should add such check. >=20 >> >> But I mainly decided to go with my patch instead because it will also = work >> for exact allocations with align > PAGE_SIZE: most notably, when we tr= y >> population of hugepages instead of base pages in __vmalloc_node_range(= ), by >> increasing the alignment. If the exact range allows for using huge pag= es, >> placing huge pages will work just fine with my modifications, while it= won't >> with your approach. >> >> Long story short: my approach handles exact allocations also for bigge= r >> alignment, Your optimization makes sense as a general improvement for = any >> vmalloc allocations. >> >> Thanks for having a look! >> > The problem is that there are no users(seems only KASAN) who set the ra= nge > that corresponds to exact size. If you add an aligment overhead on top = of So there is one user and it was broken by 68ad4a330433 ("mm/vmalloc.c:=20 keep track of free blocks for vmap allocation"). > it means that search size should include it because we may end up with = exact > free block and after applying aligment it will not fit. This is how all= ocators > handle aligment. This is an implementation detail of the vmalloc allocator ever since=20 68ad4a330433 ("mm/vmalloc.c: keep track of free blocks for vmap=20 allocation"). If callers pass an exact range, and the alignment they specify applies,=20 why should we reject such an allocation? It's leaking an implementation=20 detail fixed easily internally into callers. >=20 > Another approach is(you mentioned it in your commit message): >=20 > urezki@pc638:~/data/raid0/coding/linux-next.git$ git diff mm/kasan/shad= ow.c > diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c > index 082ee5b6d9a1..01d3bd76c851 100644 > --- a/mm/kasan/shadow.c > +++ b/mm/kasan/shadow.c > @@ -200,7 +200,7 @@ static int __meminit kasan_mem_notifier(struct noti= fier_block *nb, > if (shadow_mapped(shadow_start)) > return NOTIFY_OK; > =20 > - ret =3D __vmalloc_node_range(shadow_size, PAGE_SIZE, sh= adow_start, > + ret =3D __vmalloc_node_range(shadow_size, 1, shadow_sta= rt, > shadow_end, GFP_KERNEL, > PAGE_KERNEL, VM_NO_GUARD, > pfn_to_nid(mem_data->start_pfn= ), > urezki@pc638:~/data/raid0/coding/linux-next.git$ >=20 > why not? Also you can increase the range in KASAN code. No, that's leaking implementation details to the caller. And no,=20 increasing the range and eventually allocating something bigger (e.g.,=20 placing a huge page where it might not have been possible) is not=20 acceptable for KASAN. If you're terribly unhappy with this patch, please suggest something=20 reasonable to fix exact allocations: a) Fixes the KASAN use case. b) Allows for automatic placement of huge pages for exact allocations. c) Doesn't leak implementation details into the caller. Thanks --=20 Thanks, David / dhildenb