linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Mike Rapoport <rppt@kernel.org>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"song@kernel.org" <song@kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"pmladek@suse.com" <pmladek@suse.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"dborkman@redhat.com" <dborkman@redhat.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"bp@alien8.de" <bp@alien8.de>, "mbenes@suse.cz" <mbenes@suse.cz>,
	"imbrenda@linux.ibm.com" <imbrenda@linux.ibm.com>
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
Date: Thu, 21 Apr 2022 07:29:23 +0000	[thread overview]
Message-ID: <B070EF43-F4E0-42E4-A0AF-D7FAA99D5C82@fb.com> (raw)
In-Reply-To: <3F75142B-3E87-4195-A026-3A7F1E595960@fb.com>


[-- Attachment #1.1: Type: text/plain, Size: 1418 bytes --]



> On Apr 20, 2022, at 7:42 AM, Song Liu <songliubraving@fb.com> wrote:
>
> Hi Linus,
>
>> On Apr 19, 2022, at 7:18 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> On Tue, Apr 19, 2022 at 7:03 PM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>>
>>> Here is the quote from Song's cover letter for bpf_prog_pack series:
>>
>> I care about performance as much as the next person, but I care about
>> correctness too.
>>
>> That large-page code was a disaster, and was buggy and broken.
>>
>> And even with those four patches, it's still broken.
>>
>> End result: there's no way that thigh gets re-enabled without the
>> correctness being in place.
>>
>> At a minimum, to re-enable it, it needs (a) that zeroing and (b)
>> actual numbers on real loads. (not some artificial benchmark).
>>
>> Because without (a) there's no way in hell I'll enable it.
>>
>> And without (b), "performance" isn't actually an argument.
>
> I will send patch to do (a) later this week.
>
> For (b), we have seen direct map fragmentation causing visible
> performance drop for our major services. This is the shadow
> production benchmark, so it is not possible to run it out of
> our data centers. Tracing showed that BPF program was the top
> trigger of these direct map splits.

Attached is the patch for (a). I also sent it to the mail list.

Thanks,
Song






[-- Attachment #1.2: Type: text/html, Size: 2208 bytes --]

[-- Attachment #2: 0001-bpf-invalidate-unused-part-of-bpf_prog_pack.patch --]
[-- Type: application/octet-stream, Size: 4638 bytes --]

From e1c1a094ffc1b82f1d7636f33d990a43cf6a7ff8 Mon Sep 17 00:00:00 2001
From: Song Liu <song@kernel.org>
Date: Wed, 20 Apr 2022 23:58:28 -0700
Subject: [PATCH bpf] bpf: invalidate unused part of bpf_prog_pack

bpf_prog_pack enables sharing huge pages among multiple BPF programs.
These pages are marked as executable, but some part of these huge page
may not contain proper BPF programs. To make these pages safe, fill such
unused part of these pages with invalid instructions. This is done when a
pack is first allocated, and when a bpf program is freed..

Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator")
Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]")
Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c | 22 ++++++++++++++++++++++
 include/linux/bpf.h         |  2 ++
 kernel/bpf/core.c           | 20 ++++++++++++++++----
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8fe35ed11fd6..57099d89f2bf 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -228,6 +228,28 @@ static void jit_fill_hole(void *area, unsigned int size)
 	memset(area, 0xcc, size);
 }
 
+#define INVALID_BUF_SIZE PAGE_SIZE
+static char invalid_insn_buf[INVALID_BUF_SIZE];
+
+static int __init bpf_init_invalid_insn_buf(void)
+{
+	jit_fill_hole(invalid_insn_buf, INVALID_BUF_SIZE);
+	return 0;
+}
+pure_initcall(bpf_init_invalid_insn_buf);
+
+void bpf_arch_invalidate_text(void *dst, size_t len)
+{
+	size_t i = 0;
+
+	while (i < len) {
+		size_t s = min_t(size_t, len - i, INVALID_BUF_SIZE);
+
+		bpf_arch_text_copy(dst + i, invalid_insn_buf, s);
+		i += s;
+	}
+}
+
 struct jit_context {
 	int cleanup_addr; /* Epilogue code offset */
 
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb5298735ce..2157392a94d1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2382,6 +2382,8 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 
 void *bpf_arch_text_copy(void *dst, void *src, size_t len);
 
+void bpf_arch_invalidate_text(void *dst, size_t len);
+
 struct btf_id_set;
 bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b2a634d0f842..3f8bdbc0e994 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -873,7 +873,7 @@ static size_t select_bpf_prog_pack_size(void)
 	return size;
 }
 
-static struct bpf_prog_pack *alloc_new_pack(void)
+static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns)
 {
 	struct bpf_prog_pack *pack;
 
@@ -886,6 +886,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 		kfree(pack);
 		return NULL;
 	}
+	bpf_fill_ill_insns(pack->ptr, bpf_prog_pack_size);
 	bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
 	list_add_tail(&pack->list, &pack_list);
 
@@ -894,7 +895,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 	return pack;
 }
 
-static void *bpf_prog_pack_alloc(u32 size)
+static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns)
 {
 	unsigned int nbits = BPF_PROG_SIZE_TO_NBITS(size);
 	struct bpf_prog_pack *pack;
@@ -922,7 +923,7 @@ static void *bpf_prog_pack_alloc(u32 size)
 			goto found_free_area;
 	}
 
-	pack = alloc_new_pack();
+	pack = alloc_new_pack(bpf_fill_ill_insns);
 	if (!pack)
 		goto out;
 
@@ -965,6 +966,7 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
 	nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size);
 	pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT;
 
+	bpf_arch_invalidate_text(hdr, hdr->size);
 	bitmap_clear(pack->bitmap, pos, nbits);
 	if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
 				       bpf_prog_chunk_count(), 0) == 0) {
@@ -1103,7 +1105,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr,
 
 	if (bpf_jit_charge_modmem(size))
 		return NULL;
-	ro_header = bpf_prog_pack_alloc(size);
+	ro_header = bpf_prog_pack_alloc(size, bpf_fill_ill_insns);
 	if (!ro_header) {
 		bpf_jit_uncharge_modmem(size);
 		return NULL;
@@ -1204,6 +1206,16 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
 	bpf_prog_unlock_free(fp);
 }
 
+void __weak bpf_arch_invalidate_text(void *dst, size_t len)
+{
+	char buf[1] = {};
+	int i;
+
+	WARN_ONCE(1, "Please override bpf_arch_invalidate_text for bpf_prog_pack\n");
+	for (i = 0; i < len; i++)
+		bpf_arch_text_copy(dst + i, buf, 1);
+}
+
 int bpf_jit_get_func_addr(const struct bpf_prog *prog,
 			  const struct bpf_insn *insn, bool extra_pass,
 			  u64 *func_addr, bool *func_addr_fixed)
-- 
2.30.2


  parent reply	other threads:[~2022-04-21  7:30 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-15 16:44 Song Liu
2022-04-15 16:44 ` [PATCH v4 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-15 17:43   ` Rik van Riel
2022-04-15 16:44 ` [PATCH v4 bpf 2/4] page_alloc: use vmalloc_huge for large system hash Song Liu
2022-04-15 17:43   ` Rik van Riel
2022-04-25  7:07     ` Geert Uytterhoeven
2022-04-25  8:17       ` Linus Torvalds
2022-04-25  8:24         ` Geert Uytterhoeven
2022-04-15 16:44 ` [PATCH v4 bpf 3/4] module: introduce module_alloc_huge Song Liu
2022-04-15 18:06   ` Rik van Riel
2022-06-16 16:10   ` Dave Hansen
2022-04-15 16:44 ` [PATCH v4 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
2022-04-15 19:05 ` [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
2022-04-16  1:34   ` Song Liu
2022-04-16  1:42     ` Luis Chamberlain
2022-04-16  1:43       ` Luis Chamberlain
2022-04-16  5:08   ` Christoph Hellwig
2022-04-16 19:55     ` Song Liu
2022-04-16 20:30       ` Linus Torvalds
2022-04-16 22:26         ` Song Liu
2022-04-18 10:06           ` Mike Rapoport
2022-04-19  0:44             ` Luis Chamberlain
2022-04-19  1:56               ` Edgecombe, Rick P
2022-04-19  5:36                 ` Song Liu
2022-04-19 18:42                   ` Mike Rapoport
2022-04-19 19:20                     ` Linus Torvalds
2022-04-20  2:03                       ` Alexei Starovoitov
2022-04-20  2:18                         ` Linus Torvalds
2022-04-20 14:42                           ` Song Liu
2022-04-20 18:28                             ` Luis Chamberlain
2022-04-21  7:29                             ` Song Liu [this message]
2022-04-21  3:25                       ` Nicholas Piggin
2022-04-21  5:48                         ` Linus Torvalds
2022-04-21  6:02                           ` Linus Torvalds
2022-04-21  9:07                             ` Nicholas Piggin
2022-04-21  8:57                           ` Nicholas Piggin
2022-04-21 15:44                             ` Linus Torvalds
2022-04-21 23:30                               ` Nicholas Piggin
2022-04-22  0:49                                 ` Linus Torvalds
2022-04-22  1:51                                   ` Nicholas Piggin
2022-04-22  2:31                                     ` Linus Torvalds
2022-04-22  2:57                                       ` Nicholas Piggin
2022-04-21 15:47                             ` Edgecombe, Rick P
2022-04-21 16:15                               ` Linus Torvalds
2022-04-22  0:12                                 ` Nicholas Piggin
2022-04-22  2:29                                   ` Edgecombe, Rick P
2022-04-22  2:47                                     ` Linus Torvalds
2022-04-22 16:54                                       ` Edgecombe, Rick P
2022-04-22  3:08                                     ` Nicholas Piggin
2022-04-22  4:31                                       ` Nicholas Piggin
2022-04-22 17:10                                         ` Edgecombe, Rick P
2022-04-22 20:22                                           ` Edgecombe, Rick P
2022-04-22  3:33                                     ` Nicholas Piggin
2022-04-21  9:47                           ` Nicholas Piggin
2022-04-19 21:24                 ` Luis Chamberlain
2022-04-19 23:58                   ` Edgecombe, Rick P
2022-04-20  7:58                   ` Petr Mladek
2022-04-19 18:20               ` Mike Rapoport
2022-04-24 17:43       ` Linus Torvalds
2022-04-25  6:48         ` Song Liu
2022-04-21  3:19     ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B070EF43-F4E0-42E4-A0AF-D7FAA99D5C82@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dborkman@redhat.com \
    --cc=edumazet@google.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mbenes@suse.cz \
    --cc=mcgrof@kernel.org \
    --cc=pmladek@suse.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox