From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E2D97C369D9
	for <linux-mm@archiver.kernel.org>; Wed, 30 Apr 2025 09:16:34 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 901016B00C5; Wed, 30 Apr 2025 05:16:32 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8B2F16B00C6; Wed, 30 Apr 2025 05:16:32 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 750596B00C7; Wed, 30 Apr 2025 05:16:32 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 4817B6B00C5
	for <linux-mm@kvack.org>; Wed, 30 Apr 2025 05:16:32 -0400 (EDT)
Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 86E31C71BF
	for <linux-mm@kvack.org>; Wed, 30 Apr 2025 09:16:33 +0000 (UTC)
X-FDA: 83390154666.07.4E96D09
Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180])
	by imf09.hostedemail.com (Postfix) with ESMTP id 9D5AD140004
	for <linux-mm@kvack.org>; Wed, 30 Apr 2025 09:16:30 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=bytedance.com header.s=google header.b=dwEeN6I9;
	dmarc=pass (policy=quarantine) header.from=bytedance.com;
	spf=pass (imf09.hostedemail.com: domain of sunjiadong.lff@bytedance.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=sunjiadong.lff@bytedance.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1746004591;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:in-reply-to:
	 references:dkim-signature; bh=/FpfmJioAdNJY0+TNoNhbjTHHg1RLhLRSXLwyGVuMec=;
	b=CBOG4pkb945Jskff8GmbRw4sfwPgnBQhEnY7mfH4naZtPeu7sYyJavLn00zjAz3klgqvQp
	2aoup1eISX3aTtps9sG2mcKgiKNsSeuJRiLp3bjKe+kZr+YJwZzVHV0y0OGuLuaYQGlPme
	E0Z7/6+wDIv/wr25yvbJ9iGApDw+hic=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746004591; a=rsa-sha256;
	cv=none;
	b=lgCyPofz60NVrfyppsejJ6t/EgtfXzP3VNe+L/zH3/eUgovEzB5177sZd6eWrECSrVrIQ6
	HbJKo3GT3eI9V1zFWBiyECfDbGPIx2j7xDOrd4lY3elInc+UdQYn1EcO3DjqLyYDqIGNHc
	I+rOG5TRTKJ6p+di1ow+5YaHZLFHWuw=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=bytedance.com header.s=google header.b=dwEeN6I9;
	dmarc=pass (policy=quarantine) header.from=bytedance.com;
	spf=pass (imf09.hostedemail.com: domain of sunjiadong.lff@bytedance.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=sunjiadong.lff@bytedance.com
Received: by mail-vk1-f180.google.com with SMTP id 71dfb90a1353d-523ffbe0dbcso7158812e0c.0
        for <linux-mm@kvack.org>; Wed, 30 Apr 2025 02:16:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance.com; s=google; t=1746004589; x=1746609389; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:user-agent
         :mime-version:from:from:to:cc:subject:date:message-id:reply-to;
        bh=/FpfmJioAdNJY0+TNoNhbjTHHg1RLhLRSXLwyGVuMec=;
        b=dwEeN6I92HjfaKpiS7n8bbpS4y04e2hs8QMAbThWo7YdvtDZWNpXd5/kCpqFm2kcJ8
         hBqbyrsSmW4aBQ1zDTmd+ZJHBqd2dnkrHtn4YiJygqbIUSqAjr5epQkHT4faDJUaOxma
         7av1LhytthYNvHwhPbmriOOI1b8tLmr0V+eMDS05teHaQT6PgNg4gwlBdgCY6r1irmpy
         lwZ+DHcJo5NYg3TQiGJzNWHv1T+qykoKFwpQ54rV5vtGGOJG1uJqSWItSlK7+xFtYu3G
         d3btluSNaPTC+mBlT2eLybQmKdIhTkHGvRPtagDcyXC9xF7actPb9X6h2YSXxVMndS5l
         YA9A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1746004589; x=1746609389;
        h=content-transfer-encoding:cc:to:subject:message-id:date:user-agent
         :mime-version:from:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/FpfmJioAdNJY0+TNoNhbjTHHg1RLhLRSXLwyGVuMec=;
        b=LxdFrWHedM/13MTJxgYcXYJHqsnfc0q6DPBzCgX58OKfZ+YCs7JT92uC7fjq/NtDXU
         fnmFQgCoJk9fxPHMLLa360udzAiOsv9n2nIYfamMkxBXw/lTgxJellGTiJXW5YVPe20s
         9cdNuCcQmULQ0SjCYPxoqzxkrb9qKT/auE5ZNY1KEVw9Y+dALs8cTP3lB7pzjfO7r72F
         gL2v5kF4mBM3cc/k6Ui0huLxmzezClddMFfPetoqlmJMKL45SxPIjrKryVnpzPvBT7my
         agYObq/cTw5PNi7G66vGo2TtgrSnqwGE2ARS6SJGxOlUBvdIbWFLmzMXRNDJQCFfe1fZ
         gOjQ==
X-Forwarded-Encrypted: i=1; AJvYcCXVU49lqOqdZmWgEYQc4vyAvG2i1x6RRCu/qztBMrkr57eqv1Nq6Jqe8irWKRMu89y1jHhCXs4rTg==@kvack.org
X-Gm-Message-State: AOJu0Yw3TxeMr0zE5yA2sHPYx2v6O9tqbr9+4FWiEMz98xJoon4cMVXT
	qIhRDgpCsL0+yyzTImVzj83mvcciKKCs2450mKYWeG0IiUB7oH1nUXsHZF1NM0HO0Aw97ecJ0Lx
	2J0xSf7x3OzVgWN8lr5gdVMKCnbSdsrWadyct
X-Gm-Gg: ASbGncv2ubFOp8BWiEYFtzJFbNjTINMYW+f8HzMFpvddXPkFiJ1Kk29p1zfGy7gNwcM
	qfNFL9vnV381ZMEWI9+SHMXvNqKOyPs3jFjD954S9F9+Z1pz7VvAdQ8mSIbAwa0H6xszKNzcnzJ
	SDYUyNhutoHCQfHm8e45+ZWy4=
X-Google-Smtp-Source: AGHT+IGUdyWIFWemUxR8cGlbp009kWxdylwhOsecl9akidqkmz+IWKOdRPTajULbkm3Y8k761QMI67Wsmq2VLhQzLm0=
X-Received: by 2002:a05:6122:2224:b0:529:2644:676f with SMTP id
 71dfb90a1353d-52acd87ab74mr1464395e0c.8.1746004589261; Wed, 30 Apr 2025
 02:16:29 -0700 (PDT)
Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST;
 Wed, 30 Apr 2025 04:16:28 -0500
Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST;
 Wed, 30 Apr 2025 04:16:28 -0500
From: Jiadong Sun <sunjiadong.lff@bytedance.com>
Mime-Version: 1.0
User-Agent: Mozilla Thunderbird
X-Original-From: Jiadong Sun <sunjiadong.lff@bytedance.com>
Date: Wed, 30 Apr 2025 04:16:28 -0500
X-Gm-Features: ATxdqUE1GlU75dtjYcQizNLW2r1BtvIcFNfsvG_F7FYhdKaVVj2RrFltONH9NoA
Message-ID: <CAP2HCOmAkRVTci0ObtyW=3v6GFOrt9zCn2NwLUbZ+Di49xkBiw@mail.gmail.com>
Subject: [RFC] optimize cost of inter-process communication
To: luto@kernel.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, 
	akpm@linux-foundation.org
Cc: x86@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, 
	dave.hansen@linux.intel.com, viro@zeniv.linux.org.uk, 
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, 
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, 
	duanxiongchun@bytedance.com, yinhongbo@bytedance.com, 
	dengliang.1214@bytedance.com, xieyongji@bytedance.com, 
	chaiwen.cc@bytedance.com, songmuchun@bytedance.com, yuanzhu@bytedance.com, 
	sunjiadong.lff@bytedance.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam12
X-Rspamd-Queue-Id: 9D5AD140004
X-Rspam-User: 
X-Stat-Signature: 8npho9nc6abc71q3jxkg56yybif4bp3f
X-HE-Tag: 1746004590-27773
X-HE-Meta: U2FsdGVkX19UXlpkyRA7cstV1At1T9NMiosQPkj7fEZEeIVKMx+0tTdB7Hliyoy8o+gAGrAkmN7deLr/749QRxCAqg5ee0+ScVNrbgaGKgmnGuNH3rMRtii/K8GziyNgdfHxLHoT1ocM1mEwZ8J9H521MHDGkXvw/exkrZ2D+JJ7jFdVfNTsbGh3uMajK8rF8XFwyGHWMnd8Knp2vlzpUKOhKVyjSxrnC3+pnTNTjDoDQpetJEOa/CgYzdyJ9f4Uo4ylVKAxfMiAzMcgw3xbTlz4DKuEdt/tw4/XWRppuPYXwBaoUaH3RyEyVgAOu5rdqpACpPUlk8EzGB1/1K0IOriPd1FJBcdOAKnFjNF5dsYySmH2cywqtHdwgCvS3xasLKqcDWf4p1UDXpJoC8AmS+P4u+EiCb9ETsfPii6FyptzjZ+D/QCzbKNm4e4POiLRjE34yAYswKUdFbzkV7Grp4CgbRfFn9WKz3jqKqoAyRV1iJxqtFdrTsxePyrM9DW+LFtRxpep6JbNp6heQQzwZG3Xo/vNQ0QIaBMq/QkmtmMF/k67GwfDwUk4M1FhA5ULCSbWjO+RrCh4aZpCKoVjPwEDdYqjcoZ+vPW57p5BbudRMWZWmgsDKaDtfV8wsjs/ASzHzyn+cY/u+b7wNryXvHiI/9CQzBhtwXb7aSjxJr4DsyMsV3Tz2pLMFHZD+jpLj4RY3mBTbjsSZLnJT5DAZUwKpW1U2DBPZZU0x+CNrqla8gBIL3rHQ6jk4Ph1jkwRPgr5e+GzbQngsspGtyVpvnIra9Syx/TS1nkcQ+LQdneiNMU09upEcMyfm8EJ6+MzhEILpzhtZE4THrM2C5nKvjfecnUSIGIIPcaeLyJa8c7ykGAGrRDsBq6IDJx4XEha5bZxQcJC66WDmlROAauWL0YSNhuP6OrL9KBm/Z5Hfdl6jA4yloRPTcMYt4LGt0hdmacmgFate6qxfHOcE5q
 77bJgvBo
 HkFQB0TsLjNpLkFdzAUtxDLeQ7VGmWyO8qtL9C3/3NkcRbh2ygVzs/L4BqNXPC1PwykssdSt5l/jENJKTIHjD+3u7xj6zCTTA0rnZlElVJ+NOie/ZFJpVkLCMtJANXKXph5AHrYq0DCHsVtC2hl1o8qLGjmMHIrKovJ61FV7Mnsu9UQapLRt4uWDEaGj9R/xmdJJP/07RV8ia2J3F3nr2dmdTnhHxJZDzaUAQYLtmKQapwMQu2fqU/E1zvNPPd9p1nFmeUoJSZKJzNHSGC/AUw7LT06CytpJp1pdvBTPT36kL7WZAIfOoIhcubZioRjwEcCJKF/RfeoU/JqRTK1q4GyP7QEqicS3bgEx5XhvCGIFI5SwWjK/ijoYU1F0eYxdTQqB1cAZmaCKKcEH2eVgbFsOT6pyz8cVLRo9bnuzuVDnkv+BamBYy6zG6Ag31SFMUOJbFOqzSscHF/Iw=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

# Background

In traditional inter-process communication (IPC) scenarios, Unix domain
sockets are commonly used in conjunction with the epoll() family for
event multiplexing. IPC operations involve system calls on both the data
and control planes, thereby imposing a non-trivial overhead on the
interacting processes. Even when shared memory is employed to optimize
the data plane, two data copies still remain. Specifically, data is
initially copied from a process's private memory space into the shared
memory area, and then it is copied from the shared memory into the
private memory of another process.

This poses a question: Is it possible to reduce the overhead of IPC with
only minimal modifications at the application level? To address this, we
observed that the functionality of IPC, which encompasses data transfer
and invocation of the target thread, is similar to a function call,
where arguments are passed and the callee function is invoked to process
them. Inspired by this analogy, we introduce RPAL (Run Process As
Library), a framework designed to enable one process to invoke another
as if making a local function call, all without going through the kernel.

# Design

First, let=E2=80=99s formalize RPAL=E2=80=99s core objectives:

1. Data-plane efficiency: Reduce the number of data copies from two (in
the shared memory solution) to one.
2. Control-plane optimization: Eliminate the overhead of system calls
and kernel's thread switches.
3. Application compatibility: Minimize the modifications to existing
applications that utilize Unix domain sockets and the epoll() family.

To attain the first objective, processes that use RPAL share the same
virtual address space. So one process can access another's data directly
via a data pointer. This means data can be transferred from one process
to another with just one copy operation.

To meet the second goal, RPAL relies on the shared address space to do
lightweight context switching in user space, which we call an "RPAL
call". This allows one process to execute another process's code just
like a local function call.

To achieve the third target, RPAL stays compatible with the epoll family
of functions, like epoll_create(), epoll_wait(), and epoll_ctl(). If an
application uses epoll for IPC, developers can switch to RPAL with just
a few small changes. For instance, you can just replace epoll_wait()
with rpal_epoll_wait(). The basic epoll procedure, where a process waits
for another to write to a monitored descriptor using an epoll file
descriptor, still works fine with RPAL.

## Address space sharing

For address space sharing, RPAL partitions the entire userspace virtual
address space and allocates non-overlapping memory ranges to each
process. On x86_64 architectures, RPAL uses a memory range size covered
by a single PUD (Page Upper Directory) entry, which is 512GB. This
restricts each process=E2=80=99s virtual address space to 512GB on x86_64,
sufficient for most applications in our scenario. The rationale is
straightforward: address space sharing can be simply achieved by copying
the PUD from one process=E2=80=99s page table to another=E2=80=99s. So one =
process can
directly use the data pointer to access another's memory.


  |------------| <- 0
  |------------| <- 512 GB
  |  Process A |
  |------------| <- 2*512 GB
  |------------| <- n*512 GB
  |  Process B |
  |------------| <- (n+1)*512 GB
  |------------| <- STACK_TOP
  |  Kernel    |
  |------------|

## RPAL call

We refer to the lightweight userspace context switching mechanism as
RPAL call. It enables the caller (or sender) thread of one process to
directly switch to the callee (or receiver) thread of another process.

When Process A=E2=80=99s caller thread initiates an RPAL call to Process B=
=E2=80=99s
callee thread, the CPU saves the caller=E2=80=99s context and loads the cal=
lee=E2=80=99s
context. This enables direct userspace control flow transfer from the
caller to the callee. After the callee finishes data processing, the CPU
saves Process B=E2=80=99s callee context and switches back to Process A=E2=
=80=99s caller
context, completing a full IPC cycle.


  |------------|                |---------------------|
  |  Process A |                |  Process B          |
  | |-------|  |                | |-------|           |
  | | caller| --- RPAL call --> | | callee|    handle |
  | | thread| <------------------ | thread| -> event  |
  | |-------|  |                | |-------|           |
  |------------|                |---------------------|

# Security and compatibility with kernel subsystems

## Memory protection between processes

Since processes using RPAL share the address space, unintended
cross-process memory access may occur and corrupt the data of another
process. To mitigate this, we leverage Memory Protection Keys (MPK) on
x86 architectures.

MPK assigns 4 bits in each page table entry to a "protection key", which
is paired with a userspace register (PKRU). The PKRU register defines
access permissions for memory regions protected by specific keys (for
detailed implementation, refer to the kernel documentation "Memory
Protection Keys"). With MPK, even though the address space is shared
among processes, cross-process access is restricted: a process can only
access the memory protected by a key if its PKRU register is configured
with the corresponding permission. This ensures that processes cannot
access each other=E2=80=99s memory unless an explicit PKRU configuration is=
 set.

## Page fault handling and TLB flushing

Due to the shared address space architecture, both page fault handling
and TLB flushing require careful consideration. For instance, when
Process A accesses Process B=E2=80=99s memory, a page fault may occur in Pr=
ocess
A's context, but the faulting address belongs to Process B. In this
case, we must pass Process B's mm_struct to the page fault handler.

TLB flushing is more complex. When a thread flushes the TLB, since the
address space is shared, not only other threads in the current process
but also other processes that share the address space may access the
corresponding memory (related to the TLB flush). Therefore, the cpuset
used for TLB flushing should be the union of the mm_cpumasks of all
processes that share the address space.

## Lazy switch of kernel context

In RPAL, a mismatch may arise between the user context and the kernel
context. The RPAL call is designed solely to switch the user context,
leaving the kernel context unchanged. For instance, when an RPAL call
takes place, transitioning from caller thread to callee thread, and
subsequently a system call is initiated within callee thread, the kernel
will incorrectly utilize the caller's kernel context (such as the kernel
stack) to process the system call.

To resolve context mismatch issues, a kernel context switch is triggered
at the kernel entry point when the callee initiates a syscall or an
exception/interrupt occurs. This mechanism ensures context consistency
before processing system calls, interrupts, or exceptions. We refer to
this kernel context switch as a "lazy switch" because it defers the
switching operation from the traditional thread switch point to the next
kernel entry point.

Lazy switch should be minimized as much as possible, as it significantly
degrades performance. We currently utilize RPAL in an RPC framework, in
which the RPC sender thread relies on the RPAL call to invoke the RPC
receiver thread entirely in user space. In most cases, the receiver
thread is free of system calls and the code execution time is relatively
short. This characteristic effectively reduces the probability of a lazy
switch occurring.

## Time slice correction

After an RPAL call, the callee's user mode code executes. However, the
kernel incorrectly attributes this CPU time to the caller due to the
unchanged kernel context.

To resolve this, we use the Time Stamp Counter (TSC) register to measure
CPU time consumed by the callee thread in user space. The kernel then
uses this user-reported timing data to adjust the CPU accounting for
both the caller and callee thread, similar to how CPU steal time is
implemented.

## Process recovery

Since processes can access each other=E2=80=99s memory, there is a risk tha=
t the
target process=E2=80=99s memory may become invalid at the access time (e.g.=
, if
the target process has exited unexpectedly). The kernel must handle such
cases; otherwise, the accessing process could be terminated due to
failures originating from another process.

To address this issue, each thread of the process should pre-establish a
recovery point when accessing the memory of other processes. When such
an invalid access occurs, the thread traps into the kernel. Inside the
page fault handler, the kernel restores the user context of the thread
to the recovery point. This mechanism ensures that processes maintain
mutual independence, preventing cascading failures caused by
cross-process memory issues.

# Performance

To quantify the performance improvements driven by RPAL, we measured
latency both before and after its deployment. Experiments were conducted
on a server equipped with two Intel(R) Xeon(R) Platinum 8336C CPUs (2.30
GHz) and 1 TB of memory. Latency was defined as the duration from when
the client thread initiates a message to when the server thread is
invoked and receives it.

During testing, the client transmitted 1 million 32-byte messages, and
we computed the per-message average latency. The results are as follows:

*****************
Without RPAL: Message length: 32 bytes, Total TSC cycles: 19616222534,
Message count: 1000000, Average latency: 19616 cycles
With RPAL: Message length: 32 bytes, Total TSC cycles: 1703459326,
Message count: 1000000, Average latency: 1703 cycles
*****************

These results confirm that RPAL delivers substantial latency
improvements over the current epoll implementation=E2=80=94achieving a
17,913-cycle reduction (an ~91.3% improvement) for 32-byte messages.

We have applied RPAL to an RPC framework that is widely used in our data
center. With RPAL, we have successfully achieved up to 15.5% reduction
in the CPU utilization of processes in real-world microservice scenario.
The gains primarily stem from minimizing control plane overhead through
the utilization of userspace context switches. Additionally, by
leveraging address space sharing, the number of memory copies is
significantly reduced.

# Future Work

Currently, RPAL requires the MPK (Memory Protection Key) hardware
feature, which is supported by a range of Intel CPUs. For AMD
architectures, MPK is supported only on the latest processor,
specifically, 4th Generation AMD EPYC=E2=84=A2 Processors and subsequent
generations. Patch sets that extend RPAL support to systems lacking MPK
hardware will be provided later.

RPAL is currently implemented on the Linux v5.15 kernel, which is
publicly available at:

             https://github.com/openvelinux/kernel/tree/5.15-rpal

Accompanying test programs are also provided in the samples/rpal/
directory. And the user-mode RPAL library, which realizes user-space
RPAL call, is in the samples/rpal/librpal directory.

We are in the process of porting RPAL to the latest kernel version,
which still requires substantial effort. We hope to firstly get some
community discussions and feedback on RPAL's optimization approaches and
architecture.

Look forward to your comments.

Jiadong Sun (11):
   rpal: enable rpal service registration
   rpal: enable virtual address space partitions
   rpal: add user interface for rpal service
   rpal: introduce service level operations
   rpal: introduce thread level operations
   rpal: enable epoll functions support for rpal
   rpal: enable lazy switch
   rpal: enable pku memory protection
   rpal: support page fault handling and tlb flushing
   rpal: allow user to disable rpal
   samples: add rpal samples

  arch/x86/Kconfig                                 |    2 +
  arch/x86/entry/entry_64.S                        |  140 +++++++++++
  arch/x86/events/amd/core.c                       |   16 ++
  arch/x86/include/asm/cpufeatures.h               |    3 +-
  arch/x86/include/asm/pgtable.h                   |   13 +
  arch/x86/include/asm/pgtable_types.h             |   11 +
  arch/x86/include/asm/tlbflush.h                  |    5 +
  arch/x86/kernel/Makefile                         |    2 +
  arch/x86/kernel/asm-offsets.c                    |    4 +-
  arch/x86/kernel/nmi.c                            |   21 ++
  arch/x86/kernel/process.c                        |   19 ++
  arch/x86/kernel/process_64.c                     |  106 ++++++++
  arch/x86/kernel/rpal/Kconfig                     |   21 ++
  arch/x86/kernel/rpal/Makefile                    |    4 +
  arch/x86/kernel/rpal/core.c                      |  698
+++++++++++++++++++++++++++++++++++++++++++++++++++
  arch/x86/kernel/rpal/internal.h                  |  130 ++++++++++
  arch/x86/kernel/rpal/mm.c                        |  456
++++++++++++++++++++++++++++++++++
  arch/x86/kernel/rpal/pku.c                       |  240 +++++++++++++++++=
+
  arch/x86/kernel/rpal/proc.c                      |  208 ++++++++++++++++
  arch/x86/kernel/rpal/service.c                   |  869
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  arch/x86/kernel/rpal/thread.c                    |  432
++++++++++++++++++++++++++++++++
  arch/x86/mm/fault.c                              |  243 +++++++++++++++++=
+
  arch/x86/mm/mmap.c                               |   10 +
  arch/x86/mm/tlb.c                                |  170 ++++++++++++-
  config.x86_64                                    |    2 +
  fs/binfmt_elf.c                                  |  103 +++++++-
  fs/eventpoll.c                                   |  306
+++++++++++++++++++++++
  fs/exec.c                                        |   11 +
  fs/file_table.c                                  |   10 +
  include/linux/file.h                             |   13 +
  include/linux/mm_types.h                         |    3 +
  include/linux/rpal.h                             |  529
+++++++++++++++++++++++++++++++++++++++
  include/linux/sched.h                            |   15 ++
  init/init_task.c                                 |    8 +
  kernel/entry/common.c                            |   29 +++
  kernel/exit.c                                    |    5 +
  kernel/fork.c                                    |   23 ++
  kernel/sched/core.c                              |  749
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
  kernel/sched/fair.c                              |  128 ++++++++++
  mm/memory.c                                      |   13 +
  mm/mmap.c                                        |   35 +++
  mm/mprotect.c                                    |  112 +++++++++
  mm/rmap.c                                        |    5 +
  samples/rpal/Makefile                            |   14 ++
  samples/rpal/client.c                            |  182 ++++++++++++++
  samples/rpal/librpal/asm_define.h                |    9 +
  samples/rpal/librpal/asm_x86_64_rpal_call.S      |   57 +++++
  samples/rpal/librpal/debug.h                     |   12 +
  samples/rpal/librpal/fiber.c                     |  119 +++++++++
  samples/rpal/librpal/fiber.h                     |   64 +++++
  samples/rpal/librpal/jump_x86_64_sysv_elf_gas.S  |   81 ++++++
  samples/rpal/librpal/make_x86_64_sysv_elf_gas.S  |   82 ++++++
  samples/rpal/librpal/ontop_x86_64_sysv_elf_gas.S |   84 +++++++
  samples/rpal/librpal/private.h                   |  302
++++++++++++++++++++++
  samples/rpal/librpal/rpal.c                      | 2560
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=
+++++++++++++++++++++++++++++++++++++
  samples/rpal/librpal/rpal.h                      |  155 ++++++++++++
  samples/rpal/librpal/rpal_pkru.h                 |   78 ++++++
  samples/rpal/librpal/rpal_queue.c                |  239 +++++++++++++++++=
+
  samples/rpal/librpal/rpal_queue.h                |   55 ++++
  samples/rpal/librpal/rpal_x86_64_call_ret.S      |   45 ++++
  samples/rpal/server.c                            |  249
+++++++++++++++++++
  61 files changed, 10304 insertions(+), 5 deletions(-)
  create mode 100644 arch/x86/kernel/rpal/Kconfig
  create mode 100644 arch/x86/kernel/rpal/Makefile
  create mode 100644 arch/x86/kernel/rpal/core.c
  create mode 100644 arch/x86/kernel/rpal/internal.h
  create mode 100644 arch/x86/kernel/rpal/mm.c
  create mode 100644 arch/x86/kernel/rpal/pku.c
  create mode 100644 arch/x86/kernel/rpal/proc.c
  create mode 100644 arch/x86/kernel/rpal/service.c
  create mode 100644 arch/x86/kernel/rpal/thread.c
  create mode 100644 include/linux/rpal.h
  create mode 100644 samples/rpal/Makefile
  create mode 100644 samples/rpal/client.c
  create mode 100644 samples/rpal/librpal/asm_define.h
  create mode 100644 samples/rpal/librpal/asm_x86_64_rpal_call.S
  create mode 100644 samples/rpal/librpal/debug.h
  create mode 100644 samples/rpal/librpal/fiber.c
  create mode 100644 samples/rpal/librpal/fiber.h
  create mode 100644 samples/rpal/librpal/jump_x86_64_sysv_elf_gas.S
  create mode 100644 samples/rpal/librpal/make_x86_64_sysv_elf_gas.S
  create mode 100644 samples/rpal/librpal/ontop_x86_64_sysv_elf_gas.S
  create mode 100644 samples/rpal/librpal/private.h
  create mode 100644 samples/rpal/librpal/rpal.c
  create mode 100644 samples/rpal/librpal/rpal.h
  create mode 100644 samples/rpal/librpal/rpal_pkru.h
  create mode 100644 samples/rpal/librpal/rpal_queue.c
  create mode 100644 samples/rpal/librpal/rpal_queue.h
  create mode 100644 samples/rpal/librpal/rpal_x86_64_call_ret.S
  create mode 100644 samples/rpal/server.c

--
2.20.1