From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECE75C5479D for ; Wed, 11 Jan 2023 12:25:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5DED28E0002; Wed, 11 Jan 2023 07:25:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 567CD8E0001; Wed, 11 Jan 2023 07:25:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E1BC8E0002; Wed, 11 Jan 2023 07:25:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 298E78E0001 for ; Wed, 11 Jan 2023 07:25:04 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EB81F1205F3 for ; Wed, 11 Jan 2023 12:25:03 +0000 (UTC) X-FDA: 80342437686.01.2BA9A23 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf16.hostedemail.com (Postfix) with ESMTP id 2AAB718000F for ; Wed, 11 Jan 2023 12:25:01 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=N2QpjVxB; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673439902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BxqrK3UggFkObKeF8m9fkKcJMpSdrnVFlTFoyviCOnI=; b=Ol+nZEE7VZ559DDiSJGgraFS9gqznJ/okgHTVnQQW2kwO0hA7M1akQQ2+QYBZ0/Pn/f1wS tPw7fjznosRefAlz7f4PAMkNq4HOLBbOHnLUKlNLyGDcsTECq8s8NjSpZF0L7+W+KlI1yl 9Nm4FdSh2yFOIy6dR4OPItr/PteZ4o8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=N2QpjVxB; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673439902; a=rsa-sha256; cv=none; b=P1n8mVoFKnZQUojd0LJ0TxS07DNyCa4v8G3Zr4aEabCSEH44ENpSKI8XtyWtv0dBjkgVIz kS/NI/s0FOfoP2WH922YV6njRlj0HeprEUB6YtOuwI0GTnqbIPvsgFsuh31ZWAAjxnGDlB qCQZxl0/oEQ+xtIz/exW3iIIsUk5uRI= Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 33A94B81B7A; Wed, 11 Jan 2023 12:25:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E0B80C433EF; Wed, 11 Jan 2023 12:24:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1673439898; bh=bVAjXqboLDicl+ZTd7HBoOMhof0CuCWy3ut4Gr+zlkA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=N2QpjVxB+4TgL5NZ334u9FSlDMkWEMlOybl75y/9KVv/WZILycdN3ztnzLoGRlgO3 UfZBqM1OYh5kGREnx1DdbaYWZprGmaleY7HQcPlteEgJH3j4Z5obD/yg+QbEc1d6Fn SMlYSBeWVjqZ7g+pxya/gllHcjPGg+tfNn9zMKIAVkbEQ/aRLmEi+UUkE1PkWzNZrW 9x+lTKT96QsxORXX551trD77gXFZ3GOrMk2aT4i5ZOANpUPr1RmWVoejkKh0vRgc5N xvn1F07M99RurtIUh3qPQoiXdBJqcDH3/3sfGXQenHZMvIGYyZBMBtJ/KwLodfARDH Gb926IGNsRHCg== Date: Wed, 11 Jan 2023 14:24:43 +0200 From: Mike Rapoport To: Michal Hocko Cc: Jonathan Corbet , Andrew Morton , Bagas Sanjaya , David Hildenbrand , Johannes Weiner , Lorenzo Stoakes , "Matthew Wilcox (Oracle)" , Mel Gorman , Vlastimil Babka , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 2/2] docs/mm: Physical Memory: add structure, introduction and nodes description Message-ID: References: <20230110152358.2641910-1-rppt@kernel.org> <20230110152358.2641910-3-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: nnk3bkncmejiyomxs19wmz1h9b64bynj X-Rspam-User: X-Rspamd-Queue-Id: 2AAB718000F X-Rspamd-Server: rspam06 X-HE-Tag: 1673439901-44434 X-HE-Meta: U2FsdGVkX1+Z7YpVNQnj5sqt11USj6xwBmm0XL94IOo7iSO/Mb3JYfS/mpi1RSd088662BBfyt3lp2vpgNhFlewb6rFxcHOitHUTZvqzmBSRsmBbgr02KXZXVbrW1pHe3C1z/Z3lUHllBv7EX7jpfDnfju4IfOcGl6eBmwX+OdSrnfm6YiicHtrVOrtbi3ufuNUf15S7dz1YRHI3jyk+2nabiT0sl8CVRCb0HQQ8srsU/Uc9L4v4X6/LKvgSwC1FXbJxUKUGWwjCn0znnte1Z6Sez5iIB6YwTC26m03U1tKOoaZ2JiyzVZrQU8mv9OIkcNTv72yTg0eWJ79qMBHh8HWZcC1I39gRH9cYSMF3zT9kcFHkN7qsNLCDH1p16lwldigMP9krXmTCOtDRK404gV91h9diM7AteyJvAWBejSqh0tktsdyJ6y3MpDoTF1D7Il328okTSFkSGYNPSurrjelnPh5lWyrAbu42x+ZiKO+whmtLdJwSD1EgC31up0xKhBGxB+U6d4aC8HgRgW65wqT1znShlISJGc+F2gNYHbcxvULHqk5fai5Gvsz1hiqCeUsatED+m4ZhJLMCnMLSDmakMtD4AOZqFR6fzkgSog33Y2rJArb8CQ+yxec03iS+UNfCJHbMjv5JgSx6BxntHZOc5qeD3MaxpMKZ0kQwdpCMI+Oig6nCPgk7erUWIC+pyO1vYhZB38qBobWU/IwWbodUQJd2Br/QBnnIGjHoSerOW7oTwJnM2SUcPufc6XtFw/3XSy6ssv4EFcCKIEQ8F4DlNYNOIhAqlzyF/ezMRHriiShvnpeFn9bhsFxb+AK13T++Uu+DJ2lszUqsVyPeIzM7fmq+Q9ygSDV8HsCvgR2tvQHltxx4S6TM9PlQRX5hFCK0KdmsHws+K7RVrBHpGUXKu7BJz8M8lV+iIS/DQNcy2yVrnZDxX3Vt04kGj0s3PJ3ewqQG+Mx7RjFhsjh IRmF9bec nU9XK5uJrKukObKAcWyZBljcJNY8N0Tw9g8Yu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 10, 2023 at 05:54:10PM +0100, Michal Hocko wrote: > On Tue 10-01-23 17:23:58, Mike Rapoport wrote: > [...] > > +* ``ZONE_DMA`` and ``ZONE_DMA32`` represent memory suitable for DMA by > > + peripheral devices that cannot access all of the addressable memory. > > I think it would be better to not keep the historical DMA based menaning > and teach that future developers. You can say something like > > ZONE_DMA and ZONE_DMA32 have historically been used for memory suitable > for DMA. For many years there are better more robust interfaces to > get memory with DMA specific requirements (Documentation/core-api/dma-api.rst). But even today ZONE_DMA(32) means that the memory is suitable for DMA. This is nicely encapsulated with dma APIs and there should be no new GFP_DMA users, but still memory outside ZONE_DMA is not suitable for DMA. > > + Depending on the architecture, either of these zone types or even they both > > + can be disabled at build time using ``CONFIG_ZONE_DMA`` and > > + ``CONFIG_ZONE_DMA32`` configuration options. Some 64-bit platforms may need > > + both zones as they support peripherals with different DMA addressing > > + limitations. > > + > > +* ``ZONE_NORMAL`` is for normal memory that can be accessed by the kernel all > > + the time. DMA operations can be performed on pages in this zone if the DMA > > + devices support transfers to all addressable memory. ``ZONE_NORMAL`` is > > + always enabled. > > + > > +* ``ZONE_HIGHMEM`` is the part of the physical memory that is not covered by a > > + permanent mapping in the kernel page tables. The memory in this zone is only > > + accessible to the kernel using temporary mappings. This zone is available > > + only on some 32-bit architectures and is enabled with ``CONFIG_HIGHMEM``. > > + > > +* ``ZONE_MOVABLE`` is for normal accessible memory, just like ``ZONE_NORMAL``. > > + The difference is that most pages in ``ZONE_MOVABLE`` are movable. > > This is really confusing because those pages are not really movable. You > cannot move a page itself. I guess you meant to say something like > > The difference is that there are means to migrate memory via > migrate_pages interface. A typical example would be a memory mapped to > userspace which can be rellocate the underlying memory content and > update page tables so that userspace doesn't notice the physical data > placement has changed. I agree that this sentence is a bit confusing, but there's a clarification below. Also, I'd like to keep this at high level without going to the details about how exactly the pages can be migrated. > > That means > > + that while virtual addresses of these pages do not change, their content may > > + move between different physical pages. ``ZONE_MOVABLE`` is only enabled when > > + one of ``kernelcore``, ``movablecore`` and ``movable_node`` parameters is > > + present in the kernel command line. See :ref:`Page migration > > + ` for additional details. > > This is not really true. The movable zone can be also enabled by memory > hotplug. In fact it is one of the more common usecases for the zone > because memory hot remove largerly depends on memory to be migrated for > offlining to succeed in most cases. Right. How about this version of ZONE_MOVABLE description: * ``ZONE_MOVABLE`` is for normal accessible memory, just like ``ZONE_NORMAL``. The difference is that the contents of most pages in ``ZONE_MOVABLE`` is movable. That means that while virtual addresses of these pages do not change, their content may move between different physical pages. Often ``ZONE_MOVABLE`` is populated during memory hotplug, but it may be also populated on boot using one of ``kernelcore``, ``movablecore`` and ``movable_node`` kernel command line parameters. See :ref:`Page migration ` and :ref:`Memory Hot(Un)Plug <_admin_guide_memory_hotplug>` for additional details. > > +* ``ZONE_DEVICE`` represents memory residing on devices such as PMEM and GPU. > > + It has different characteristics than RAM zone types and it exists to provide > > + :ref:`struct page ` and memory map services for device driver > > + identified physical address ranges. ``ZONE_DEVICE`` is enabled with > > + configuration option ``CONFIG_ZONE_DEVICE``. > > + > > +It is important to note that many kernel operations can only take place using > > +``ZONE_NORMAL`` so it is the most performance critical zone. Zones are > > +discussed further in Section :ref:`Zones `. > > + > > +The relation between node and zone extents is determined by the physical memory > > +map reported by the firmware, architectural constraints for memory addressing > > +and certain parameters in the kernel command line. > > + > > +For example, with 32-bit kernel on an x86 UMA machine with 2 Gbytes of RAM the > > +entire memory will be on node 0 and there will be three zones: ``ZONE_DMA``, > > +``ZONE_NORMAL`` and ``ZONE_HIGHMEM``:: > > + > > + 0 2G > > + +-------------------------------------------------------------+ > > + | node 0 | > > + +-------------------------------------------------------------+ > > + > > + 0 16M 896M 2G > > + +----------+-----------------------+--------------------------+ > > + | ZONE_DMA | ZONE_NORMAL | ZONE_HIGHMEM | > > + +----------+-----------------------+--------------------------+ > > + > > + > > +With a kernel built with ``ZONE_DMA`` disabled and ``ZONE_DMA32`` enabled and > > +booted with ``movablecore=80%`` parameter on an arm64 machine with 16 Gbytes of > > +RAM equally split between two nodes, there will be ``ZONE_DMA32``, > > +``ZONE_NORMAL`` and ``ZONE_MOVABLE`` on node 0, and ``ZONE_NORMAL`` and > > +``ZONE_MOVABLE`` on node 1:: > > + > > + > > + 1G 9G 17G > > + +--------------------------------+ +--------------------------+ > > + | node 0 | | node 1 | > > + +--------------------------------+ +--------------------------+ > > + > > + 1G 4G 4200M 9G 9320M 17G > > + +---------+----------+-----------+ +------------+-------------+ > > + | DMA32 | NORMAL | MOVABLE | | NORMAL | MOVABLE | > > + +---------+----------+-----------+ +------------+-------------+ > > I think it is useful to note that nodes and zones can overlap in the > physical address range. It is not uncommong to interleave two nodes and > it is also possible that memory holes are memory hotplugged into MOVABLE > zone arbitrarily in the physical address range. Hmm, not sure I understand what you mean by "overlap". For interleaved nodes you mean that node 0 may span, say [0x0, 0x2000) and [0x4000, 06000) and node 1 spans [0x2000, 0x4000) and [0x6000, 0x8000)? And as for MOVABLE zone, you mean that it can appear between ranges of NORMAL zone? > Other than that looks good to me and thanks for taking care of filling > up these gaps! This is highly appreciated. Thanks! I'd appreciate more inputs ;-) > -- > Michal Hocko > SUSE Labs -- Sincerely yours, Mike.