From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CBE0C433E0 for ; Wed, 29 Jul 2020 06:36:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE7E02070B for ; Wed, 29 Jul 2020 06:36:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HiPOR7V7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE7E02070B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 75FA96B0008; Wed, 29 Jul 2020 02:36:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70F416B000A; Wed, 29 Jul 2020 02:36:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FEE78D0002; Wed, 29 Jul 2020 02:36:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id 480B16B0008 for ; Wed, 29 Jul 2020 02:36:51 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id EFEDA2D4AA for ; Wed, 29 Jul 2020 06:36:50 +0000 (UTC) X-FDA: 77090155380.15.bit75_280328026f70 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id BE4B71825ADF0 for ; Wed, 29 Jul 2020 06:36:50 +0000 (UTC) X-HE-Tag: bit75_280328026f70 X-Filterd-Recvd-Size: 8222 Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [216.205.24.74]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Wed, 29 Jul 2020 06:36:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1596004609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CDnOGzrtle3KYYzuMAUvVtP9mTXNkGzn83ZG7Mp/mr0=; b=HiPOR7V7oX7UhogtotciVM17AzyYfpFW0DJQhnew3vsYj9TSm5rzQTahoD/4uC5Z7EayK2 JHptLw89FckzvqT1LMkA7hiSbstQ3NRmcMZcmElRLq/jcvNtA0qVwgZgLLNnGUWS9qURaE tvUcNN6Rpx39STp8BGRzv2VeZhcpui0= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-311-VKYnsK36Nu67TgXPtA6PbA-1; Wed, 29 Jul 2020 02:36:39 -0400 X-MC-Unique: VKYnsK36Nu67TgXPtA6PbA-1 Received: by mail-ej1-f72.google.com with SMTP id e21so8112086ejr.9 for ; Tue, 28 Jul 2020 23:36:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=vCrW+09lbptolyLVHV/kS+2Iwz8f8KaQB9+crgZ4nRc=; b=TKSJmyj/xW/iRnq3TnFaIK3WAGEs+xvt5+0gXYBQq18EKPFKzlUfyTRK909CpztyBT Kbhzs4hbZKzkqN2/Q4nuQNNwAeQLGn4oZ0/g+Ecbb1WuV7rv154/rzceJchNTEDjjxqj H2Ceetmv+CQUtbzozkg9aMoNaoZ/2LGrQePuHM8dYpb80WQa1O1A1IXFarnzdt9WCqBZ i3XNVXE9km+5DN3k7mJbbcI3Gfrc6f1eEgOWERN5EKyYfFRL9Rvk2Yvb89EGKHXiu0hg B8SQ8YvXCDWSiffc8Xi8yCtxzYdoEzr+LGCF5mebQZ7NBpfAO3NoB2rCkAuoTUnb3uQX VlzQ== X-Gm-Message-State: AOAM5329NbExsgCIw6aNQcXFtXWqr7UKtZT8tWshKQGBLjdzNlDvnjYE +i4jDEB/ec7YgROX/HdV13lqI9m3u+8L5EnQJ5o80qntnJb6x/njSuIsa1ek78NrZjrxIe5SxQn 8b+qB3uIQlcY= X-Received: by 2002:a17:906:af72:: with SMTP id os18mr22077221ejb.43.1596004598344; Tue, 28 Jul 2020 23:36:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwcOEQf3BfvpROnJUn2gc1RpMHWSlSuMLRoJwDln6tha4VHmjjudQEPLpp9B2oHlZl36OhE4A== X-Received: by 2002:a17:906:af72:: with SMTP id os18mr22077187ejb.43.1596004598057; Tue, 28 Jul 2020 23:36:38 -0700 (PDT) Received: from [192.168.3.122] (p5b0c648d.dip0.t-ipconnect.de. [91.12.100.141]) by smtp.gmail.com with ESMTPSA id y1sm1101921ede.7.2020.07.28.23.36.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 Jul 2020 23:36:37 -0700 (PDT) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [RFC PATCH 0/6] decrease unnecessary gap due to pmem kmem alignment Date: Wed, 29 Jul 2020 08:36:36 +0200 Message-Id: References: <20200729033424.2629-1-justin.he@arm.com> Cc: Dan Williams , Vishal Verma , Mike Rapoport , David Hildenbrand , Catalin Marinas , Will Deacon , Greg Kroah-Hartman , "Rafael J. Wysocki" , Dave Jiang , Andrew Morton , Steve Capper , Mark Rutland , Logan Gunthorpe , Anshuman Khandual , Hsin-Yi Wang , Jason Gunthorpe , Dave Hansen , Kees Cook , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, Wei Yang , Pankaj Gupta , Ira Weiny , Kaly Xin In-Reply-To: <20200729033424.2629-1-justin.he@arm.com> To: Jia He X-Mailer: iPhone Mail (17F80) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BE4B71825ADF0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 29.07.2020 um 05:35 schrieb Jia He : >=20 > =EF=BB=BFWhen enabling dax pmem as RAM device on arm64, I noticed that km= em_start > addr in dev_dax_kmem_probe() should be aligned w/ SECTION_SIZE_BITS(30),i= .e. > 1G memblock size. Even Dan Williams' sub-section patch series [1] had bee= n > upstream merged, it was not helpful due to hard limitation of kmem_start: > $ndctl create-namespace -e namespace0.0 --mode=3Ddevdax --map=3Ddev -s 2g= -f -a 2M > $echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind > $echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id > $cat /proc/iomem > ... > 23c000000-23fffffff : System RAM > 23dd40000-23fecffff : reserved > 23fed0000-23fffffff : reserved > 240000000-33fdfffff : Persistent Memory > 240000000-2403fffff : namespace0.0 > 280000000-2bfffffff : dax0.0 <- aligned with 1G boundary > 280000000-2bfffffff : System RAM > Hence there is a big gap between 0x2403fffff and 0x280000000 due to the 1= G > alignment. >=20 > Without this series, if qemu creates a 4G bytes nvdimm device, we can onl= y > use 2G bytes for dax pmem(kmem) in the worst case. > e.g. > 240000000-33fdfffff : Persistent Memory=20 > We can only use the memblock between [240000000, 2ffffffff] due to the ha= rd > limitation. It wastes too much memory space. >=20 > Decreasing the SECTION_SIZE_BITS on arm64 might be an alternative, but th= ere > are too many concerns from other constraints, e.g. PAGE_SIZE, hugetlb, > SPARSEMEM_VMEMMAP, page bits in struct page ... >=20 > Beside decreasing the SECTION_SIZE_BITS, we can also relax the kmem align= ment > with memory_block_size_bytes(). >=20 > Tested on arm64 guest and x86 guest, qemu creates a 4G pmem device. dax p= mem > can be used as ram with smaller gap. Also the kmem hotplug add/remove are= both > tested on arm64/x86 guest. >=20 Hi, I am not convinced this use case is worth such hacks (that=E2=80=99s what i= t is) for now. On real machines pmem is big - your example (losing 50% is e= xtreme). I would much rather want to see the section size on arm64 reduced. I rememb= er there were patches and that at least with a base page size of 4k it can = be reduced drastically (64k base pages are more problematic due to the ridi= culous THP size of 512M). But could be a section size of 512 is possible on= all configs right now. In the long term we might want to rework the memory block device model (eve= ntually supporting old/new as discussed with Michal some time ago using a k= ernel parameter), dropping the fixed sizes - allowing sizes / addresses aligned with subsection size - drastically reducing the number of devices for boot memory to only a hand= full (e.g., one per resource / DIMM we can actually unplug again. Long story short, I don=E2=80=99t like this hack. > This patch series (mainly patch6/6) is based on the fixing patch, ~v5.8-r= c5 [2]. >=20 > [1] https://lkml.org/lkml/2019/6/19/67 > [2] https://lkml.org/lkml/2020/7/8/1546 > Jia He (6): > mm/memory_hotplug: remove redundant memory block size alignment check > resource: export find_next_iomem_res() helper > mm/memory_hotplug: allow pmem kmem not to align with memory_block_size > mm/page_alloc: adjust the start,end in dax pmem kmem case > device-dax: relax the memblock size alignment for kmem_start > arm64: fall back to vmemmap_populate_basepages if not aligned with > PMD_SIZE >=20 > arch/arm64/mm/mmu.c | 4 ++++ > drivers/base/memory.c | 24 ++++++++++++++++-------- > drivers/dax/kmem.c | 22 +++++++++++++--------- > include/linux/ioport.h | 3 +++ > kernel/resource.c | 3 ++- > mm/memory_hotplug.c | 39 ++++++++++++++++++++++++++++++++++++++- > mm/page_alloc.c | 14 ++++++++++++++ > 7 files changed, 90 insertions(+), 19 deletions(-) >=20 > --=20 > 2.17.1 >=20