From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f71.google.com (mail-ed1-f71.google.com [209.85.208.71]) by kanga.kvack.org (Postfix) with ESMTP id 338766B6DFB for ; Tue, 4 Dec 2018 04:26:02 -0500 (EST) Received: by mail-ed1-f71.google.com with SMTP id d41so7733617eda.12 for ; Tue, 04 Dec 2018 01:26:02 -0800 (PST) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id b11-v6si4379021ejl.286.2018.12.04.01.26.00 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Dec 2018 01:26:00 -0800 (PST) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 04 Dec 2018 10:26:00 +0100 From: osalvador@suse.de Subject: [RFC Get rid of shrink code - memory-hotplug] Message-ID: <72455c1d4347d263cb73517187bc1394@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: mhocko@suse.com Cc: david@redhat.com, dan.j.williams@gmail.com, pasha.tatashin@soleen.com, linux-mm@kvack.org [Sorry, I forgot to cc linux-mm before] Hi, I wanted to bring up a topic that showed up during a discussion about simplifying shrink code [1]. During that discussion, Michal suggested that we might be able to get rid of the shrink code. To put you on track, the shrink code was introduced by 815121d2b5cd ("memory_hotplug: clear zone when removing the memory") just to match the work we did in __add_zone() and do the reverse thing there. It is a nice thing to have as a) it keeps a zone/node boundaries strict and b) we are consistent because we do the reverse operation than move_pfn_range_to_zone. But, I think that we can live without it: 1) since c6f03e2903c9ecd8fd709a5b3fa8cf0a8ae0b3da ("mm, memory_hotplug: remove zone restrictions") we became more flexible and now we can have ZONE_NORMAL and ZONE_MOVABLE interleaved during hotplug. So keeping a strict zone boundary does not really make sense anymore. In the same way, we can also have interleaved nodes. 2) From the point of view of a pfn walker, we should not care if the section removed was the first one, the last one, or some section in-between, as we should skip non-valid pfns. When the topic arose, I was a bit worried because I was not sure if anything out there would trust the node/zone boundaries blindly without checking anything. So I started to dig in to see who were the users of - zone_start_pfn - zone_end_pfn - zone_intersects - zone_spans_pfn - node_start_pfn - node_end_pfn Below, there is a list with the places I found that use these variables. For the sake of simplicity, I left out the places where they are only used during boot-time, as there is no danger in there. === ZONE related === [Usages of zone_start_pfn / zone_end_pfn] * split_huge_pages_set() - It uses pfn_valid() * alloc_gigantic_page() - It uses pfn_range_valid_gigantic()->pfn_valid() * pagetypeinfo_showblockcount_print() - It uses pfn_to_online_page() * mark_free_pages() - It uses pfn_valid() * __reset_isolation_suitable() - It uses pfn_to_online_page() * reset_cached_positions() * isolate_freepages_range() * isolate_migratepages_range() * isolate_migratepages() - They use pageblock_pfn_to_page() In case !zone->contiguous, we will call __pageblock_pfn_to_page()->pfn_to_online_page() In case zone->contiguous, we just return with pfn_to_page(). So we just need to make sure that zone->contiguous has the right value. * create_mem_extents - What? * count_highmem_pages: count_data_pages: copy_data_pages: - page_is_saveable()->pfn_valid() [Usages of zone_spans_pfn] * move_freepages_block * set_pfnblock_flags_mask * page_outside_zone_boundaries - I would say this is safe, as, if anything, when removing the shrink code the system can think that we span more than we actually do, no the other way around. [Usages of zone_intersects] * default_zone_for_pfn default_kernel_zone_for_pfn - It should not be a problem === NODE related === [Usages of node_start_pfn / node_end_pfn] * vmemmap_find_next_valid_pfn() - I am not really sure if this represents a problem * memtrace_alloc_node() - Should not have any problem as we currently support interleaved nodes. * kmemleak_scan() - It is ok, but I think we should check for the pfn to belong to the node here? * Crash core: - VMCOREINFO_OFFSET(pglist_data, node_start_pfn) is this a problem? * lookup_page_ext() - For !CONFIG_SPARSEMEM, node_start_pfn is used. * kcore_ram_list() - Safe, as kclist_add_private() uses pfn_valid. So overall, besides a couple of places I am not sure it would cause trouble, I would tend to say this is doable. Another thing that needs remark is that Patchset [3] aims for not touching pages during hot-remove path, so we will have to find another way to trigger clear/set_zone_contiguous, but that is another topic. While it is true that the current shrink code can be simplified as showed in [2], I think that getting rid of it would be a nice thing to do unless we need to keep the code around. I would like to hear other opinions though. Is it too risky? Is there anything I overlooked that might cause trouble? Did I miss anything? [1] https://patchwork.kernel.org/patch/10700791/ [2] https://patchwork.kernel.org/patch/10700791/ [3] https://patchwork.kernel.org/cover/10700783/ Thanks Oscar Salvador