From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=tzVr=HD=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 40856C433E0
	for <linux-mm@archiver.kernel.org>; Mon,  1 Feb 2021 20:00:13 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 9FAE664EA2
	for <linux-mm@archiver.kernel.org>; Mon,  1 Feb 2021 20:00:12 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9FAE664EA2
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=soleen.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 049B46B0074; Mon,  1 Feb 2021 15:00:12 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 021536B0075; Mon,  1 Feb 2021 15:00:11 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id EA0526B007D; Mon,  1 Feb 2021 15:00:11 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236])
	by kanga.kvack.org (Postfix) with ESMTP id D305C6B0074
	for <linux-mm@kvack.org>; Mon,  1 Feb 2021 15:00:11 -0500 (EST)
Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 9D9BA181AC9C6
	for <linux-mm@kvack.org>; Mon,  1 Feb 2021 20:00:11 +0000 (UTC)
X-FDA: 77770765422.10.seat16_32158e1275c4
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin10.hostedemail.com (Postfix) with ESMTP id 75C7416A0B9
	for <linux-mm@kvack.org>; Mon,  1 Feb 2021 20:00:11 +0000 (UTC)
X-HE-Tag: seat16_32158e1275c4
X-Filterd-Recvd-Size: 7639
Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41])
	by imf23.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Mon,  1 Feb 2021 20:00:10 +0000 (UTC)
Received: by mail-ej1-f41.google.com with SMTP id p20so6966325ejb.6
        for <linux-mm@kvack.org>; Mon, 01 Feb 2021 12:00:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=soleen.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=lQZW1SaFjZh0MEmlR6482SykUFNPeKKF6EWw3EBP+ns=;
        b=e3CD2TSS+KFsGyXpJNM0Kv1gIGyh1JkUR8Zx2Z7dLDvc4cYhNziZyNYYIQeLEs5B7C
         lHA7cwxfZ6IiaQbN8XOiFzRjlnwlzlpZIwABdf3zK4Mo1QmZuRX9FgXmU7XQeyp7sWMW
         WXuFFWGooyRJf2qibd1clRGkCKunnpYaAIEWU6RrgyOY49WU1OVn6HZtj8S8S+PwRIR7
         G3yjsbGahAIXRQDe65MrtlOIejn2NvYyYP8TzgOvRd31AxBqbXJdpEGtCuD98Yk8XJxo
         pB7ywaFJYo2o1ZRUyjhrHiliee2Zs9lyvgfmVsBRCeI2wMOjzRzvJHP/rwC6T3qM7qvB
         BdbA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=lQZW1SaFjZh0MEmlR6482SykUFNPeKKF6EWw3EBP+ns=;
        b=qYesByKxwxzSSip4B0nJ6HH+xcnclcqaITlLcev5eviCFe1oiV2Gnctz9pq5wA8Zk5
         t0LjUT+9DYaCr3RweDlNSsZX50LljNu1CkG7eWbTcYtrtj6CuAsLfCFw4Fg0QPlfKKYF
         6N/iLw3D3Al7WK5dM2kDb3lIfFLeHZfFzG7QQjQ6k9UJ0fvkONL0PBKVauDGNR93Zs7f
         OeIvSaNcIycmIp8iJe9J9tRnPN4pRoxRQQDTp/jByZ78QKoDPn4+sUYqmBs0jmasDeBO
         SqIAvt+wPMB4f73xiDf4e8njm+sbfv6iUEDBhOM4pc4cABesdEc9lFruW9YEblR1w3my
         Hx8Q==
X-Gm-Message-State: AOAM531drg7rhFFGnNjnzkRMkwEQpGfQbxsbo3VUsQ2ZFIeLP0FhO33p
	PD4j9ueoYENetIlGncAqcoTww5JBerVnIM5vcwxHSQ==
X-Google-Smtp-Source: ABdhPJwsjeYkl+h33w7dOEAgBFIgJ7vAcJe9HM7j751I3zTbupPayPX3GEglcn/eoEfe1Ko6r+yvfynIlbnaQqNSSNE=
X-Received: by 2002:a17:906:eddd:: with SMTP id sb29mr10428190ejb.383.1612209608749;
 Mon, 01 Feb 2021 12:00:08 -0800 (PST)
MIME-Version: 1.0
References: <20210127172706.617195-1-pasha.tatashin@soleen.com> <a8a72826-0a27-de9b-bfb7-be8286cd61fe@arm.com>
In-Reply-To: <a8a72826-0a27-de9b-bfb7-be8286cd61fe@arm.com>
From: Pavel Tatashin <pasha.tatashin@soleen.com>
Date: Mon, 1 Feb 2021 14:59:32 -0500
Message-ID: <CA+CK2bBSJaL9Hn_LBy78ccaCt7=r9cSaEqUVemRVmKg6cwpLnQ@mail.gmail.com>
Subject: Re: [PATCH v11 0/6] arm64: MMU enabled kexec relocation
To: James Morse <james.morse@arm.com>
Cc: James Morris <jmorris@namei.org>, Sasha Levin <sashal@kernel.org>, 
	"Eric W. Biederman" <ebiederm@xmission.com>, kexec mailing list <kexec@lists.infradead.org>, 
	LKML <linux-kernel@vger.kernel.org>, Jonathan Corbet <corbet@lwn.net>, 
	Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, 
	Linux ARM <linux-arm-kernel@lists.infradead.org>, Marc Zyngier <maz@kernel.org>, 
	Vladimir Murzin <vladimir.murzin@arm.com>, Matthias Brugger <matthias.bgg@gmail.com>, 
	linux-mm <linux-mm@kvack.org>, Mark Rutland <mark.rutland@arm.com>, steve.capper@arm.com, 
	rfontana@redhat.com, Thomas Gleixner <tglx@linutronix.de>, Selin Dag <selindag@gmail.com>, 
	Tyler Hicks <tyhicks@linux.microsoft.com>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hi James,

> The problem I see with this is rewriting the relocation code. It needs to work whether the
> machine has enough memory to enable the MMU during kexec, or not.
>
> In off-list mail to Pavel I proposed an alternative implementation here:
> https://gitlab.arm.com/linux-arm/linux-jm/-/tree/kexec+mmu/v0
>
> By using a copy of the linear map, and passing the phys_to_virt offset into
> arm64_relocate_new_kernel() its possible to use the same code when we fail to allocate the
> page tables, and run with the MMU off as it does today.
> I'm convinced someone will crawl out of the woodwork screaming 'regression' if we
> substantially increase the amount of memory needed to kexec at all.
>
> From that discussion: this didn't meet Pavel's timing needs.
> If you depend on having all the src/dst pages lined up in a single line, it sounds like
> you've over-tuned this to depend on the CPU's streaming mode. What causes the CPU to
> start/stop that stuff is very implementation specific (and firmware configurable).
> I don't think we should let this rule out systems that can kexec today, but don't have
> enough extra memory for the page tables.
> Having two copies of the relocation code is obviously a bad idea.

I understand that having an extra set of page tables could potentially
waste memory, especially if VAs are sparse, but in this case we use
page tables exclusively for contiguous VA space (copy [src, src +
size]). Therefore, the extra memory usage is tiny. The ratio for
kernels with  4K page_size is (size of relocated memory) / 512.  A
normal initrd + kernel is usually under 64M, an extra space which
means ~128K for the page table. Even with a huge relocation, where
initrd is ~512M the extra memory usage in the worst case is just ~1M.
I really doubt we will have any problem from users because of such
small overhead in comparison to the total kexec-load size.

>
>
> (as before: ) Instead of trying to make the relocations run quickly, can we reduce them?
> This would benefit other architectures too.

This was exactly my first approach [1] where I tried to pre-reserve
memory similar to how it is done for a crash kernel, but I was asked
to go away [2] as this is an ARM64 specific problem, where current
relocation performance is prohibitively slow. I have tested on x86,
and it does not suffer from this problem, relocation performance is
just as fast as with MMU enabled ARM64.

>
> Can the kexec core code allocate higher order pages, instead of doing everything page at
> at time?

Yes, however, failures during kexec-load due to failure to coalesce
huge pages can add extra hassle to users, and therefore this should be
only an optimization with fallback to base pages.

>
> If you have a crash kernel reservation, can we use that to eliminate the relocations
> completely?
> (I think this suggestion has been lost in translation each time I make it.
> I mean like this:
> https://gitlab.arm.com/linux-arm/linux-jm/-/tree/kexec/kexec_in_crashk/v0
> Runes to test it:
> | sudo ./kexec -p -u
> | sudo cat /proc/iomem | grep Crash
> |  b0200000-f01fffff : Crash kernel
> | sudo ./kexec --mem-min=0xb0200000 --mem-max=0xf01ffffff -l ~/Image --reuse-cmdline
>
> I bet its even faster!)

There is a problem with this approach. While, with kexec_load() call
it is possible to specify physical destinations for each segment, with
kexec_file_load() it is not possible. The secure systems that do IMA
checks during kexec load require kexec_file_load(), and we cannot
ahead of time specify destinations for these segments (at least
without substantially changing common kexec code which is not going to
happen as this arm64 specific problem).

>
>
> I think 'as fast as possible' and 'memory constrained' are mutually exclusive
> requirements. We need to make the page tables optional with a single implementation.

In my opinion having two different types of relocations will only add
extra corner cases, confusion about different performance, and bugs.
It is better to have two types: 1. crash kernel type without
relocation, 2. fast relocation where MMU is enabled.

[1] https://lore.kernel.org/lkml/20190709182014.16052-1-pasha.tatashin@soleen.com
[2] https://lore.kernel.org/lkml/20190710065953.GA4744@localhost.localdomain/

Thank you,
Pasha