Remote Procedure Call

The importance of RPC

RPC (Remote Procedure Call) hides low level details. RPC is a model for masking low-level details by facilitating the programmers’ task. The purpose of RPC is to make the invocation of a remote service look like as if it were a local procedure call.

The parameters of the invocation are contained in a request message, the response to the service in a response message. All imperative languages (like C) have procedure calls, so RPC is well-integrated with traditional programming languages, and allows you to standardize concentrated and distributed programs.

Communication schema

RPC V.S. Local Procedure Call

There are some important semantic differences between local procedure call.

Typically, the data formats between client and server are different. There are three data formats: client-side, server-side and common.

In an RPC, two different processes are involved. They do not share the address space, so all the function parameters must be passed by value. They have separate lives since they are executed by different processes. Those processes can run on heterogeneous machines: this leads to different data formats. Communicating to a server leads to malfunctions, either in the nodes or in the interconnection.

All the function parameters are passed by value, since they do not share the same address space. You cannot pass pointers to an RPC.

Different Implementations

There are many different implementations of RPC, with characteristics significantly different, including:

Sun/ONC RPC: fairly widespread RPC mechanism in the Unix/Linux standardized by Open Network Computing and used in NFS and NIS/YP
DCE RPC: RPC mechanism underlying the Distributed Computing Environment (DCE) standardized by The Open Group
MSRPC, Microsoft-specific implementation of DCE/RPC used in many proprietary network services and middleware (including DCOM)
XML-RPC and SOAP, based on Web standards (XML and HTTP)
gRPC (Google Remote Procedure Call)

Be careful not to confuse the general RPC mechanism with a particular implementation.

RPC Realization

stub procedure

Client invokes a stub which is in charge of processing the parameters and requests the medium to transport the request. The request message arrives at a Server stub, which extracts from the message the parameters, invokes the procedure, and upon completion of the service sends the result back to the Client.

A Client has a local stub procedure for each remote procedure that it can call. A Server has a local stub for each procedure that can be called by a remote Client.

Tasks of stubs:

parameter marshaling, i.e. packing parameters into a message
extracting parameters from messages
transforming the data representation
accessing communication primitives to send and receive messages

Stubs are automatically generated from the specification of the interface.

Stub acts like an Internet proxy server!

Realization

Interface Definition Language (IDL)

RPC involves the definition of abstract languages for service specification. IDLs are languages used to specify interface operations and their their parameters.

Example of IDL of DCE RPC:

interface calcolatore {
    void moltiplica([in] float a, [in] float b, [out] float *res);
    void radice([in] float a, [out] float *res);
}

In the above code, the float is DCE-defined, not C-defined. Moreover, float *res is not a pointer, but a return value.

An IDL specification is processed by a compiler that produces both the Client and the Server stub. Compilers for IDLs can produce stubs in different languages.

Binding between Client & Server

Binding binds the RPC interface used by a Client to the implementation provided by a Server. Several solutions (static vs. dynamic) are possible for binding between the Client and the Server.

Within static binding, binding is performed at compile time, and remains unchanged throughout the lifetime of the application. Within dynamic binding, a dynamic choice allows requests to be redirected to the most unloaded handler or the one at that time available in the system.

You use the same binding for many requests and calls to the same Server.

Binding Phases

It’s important to discriminate two different binding phases: a static one, and then a dynamic one.

The first one is Naming (service name): the Client must specify who it wants to be connected to, in terms of the identifying name of the service (this is the naming, static phase). Each service, every operation, must be identified by a unique name.

The second phase is Addressing (Server address): the Client must actually be connected to the Server that provides the service in the system at the time of invocation (this is the addressing, dynamic phase).

In RPC, binding occurs in two stages:

Naming: resolved by statically associating an identifier (e.g., a number) to the service interface.
Addressing: any servants ready for the service are sought (use of a naming system). There are two possibilities:
1. Client can directly query the server for address using broadcast/multicast
2. Client can query a name server, which registers all servants

Binding usually occurs less frequently than the calls themselves, typically using the same binding for many requests and calls to the same server.

Binding e Name Server

Realization with Name Server

The name server enables the connection between the Client and the Server. The following operations are provided:

search: lookup(service, version, &server)
registration: register(service, version, server)
delete: unregister(service, version, server)

Also an update operation is provided.

Caution: if the name server depends on the residence node, any change must be communicated to the name server.

Naming

Several possible architectures:

Centralized: a central server that maintains the list of all services within the network. It leads to single point of failure paradigm.
Decentralized: one server at each host that maintains a list of local services. The caller must know in advance which host is providing the desired service.
Hybrid: one central server, possibly replicated, that maintains a list of service providers.

Sun/ONC RPC Naming: port mapper

In Sun/ONC RPC, naming is completely decentralized, and is realized through the port mapper. portmap is a service listening on port 111 that provides the functionality of lookup and service registration. Each service is uniquely identified by a program ID. Services can have different versions:

Malfunctions

An important difference between RPCs and local procedure calls concerns the possible malfunctions.

Possible malfunctions are:

the Client machine may crash during the call
messages may be lost or duplicated, due to malfunctions on the communication channel
the Server machine may crash before the request or during execution of the service before sending the response

Moreover, in case of stateful Server, as a result of crashing the Server may lose the state of the interaction with the Client. Service procedures currently running, but requested by crashed Clients are called orphans. For very heavy services, it would be appropriate to notify the Server of Client crashes in order to terminate the orphans. Depending on the measures that are taken to deal with these failures, we get different remote procedure call semantics.

Client Malfunctions and orphans

In the event of a client crash, you must deal with orphans. There are different strategies:

extermination: each orphan resulting from a crash is destroyed,
timed termination: each computation has a deadline, beyond which it is automatically aborted,
reincarnation (in epochs): time is divided into epochs, and everything that relates to a previous epoch is considered obsolete

Semantic Types

may-be

In the case where no fault-tolerance measures are taken, we obtain a remote procedure call semantics of type may-be. If the call fails (typically due to the end of a time-out) the Client cannot infer what has happened. Possible scenarios:

request message loss
server crash
remote procedure executed but loss of response message

This semantics is called may-be since in case of call failure the Client is not able to tell whether the remote procedure was executed or not. Moreover, if the call is successful, the remote procedure may have been executed even more than once (Client sends only one request, but there may be messages duplicated by the underlying communication service). It has a simple implementation, but raises issues of state consistency on the Server.

at-least-once

In case the stub client decides to retry sending the request message an N number of times. If the call is successful, the procedure has been executed one or multiple times (duplicate requests and messages), hence at-least-once. If the call fails the client can think of:

permanent network malfunction
server crash

This type of semantics may be fine in the case of idempotent procedures, i.e., which can be executed many times with the same effect on the C/S interaction state. Example: stateless server of NFS.

at-most-once

Server could keep track of request identifiers and discard requests duplicates. It could also store the response message and retransmit it until an receipt of an acknowledgement from the Client. This allows for at-most-once semantics (at most one), in as it is guaranteed that in each case the remote procedure is:

not executed
executed only partially (in case of a server crash)
executed only once

exaclty-once

The Server could implement procedures as atomic transactions, so that procedures are executed either entirely or not at all. This allows for exactly-once semantics, in that as it is guaranteed that in each case the remote procedure is:

not executed
executed only once

This semantics represents the ideal case. RPC implementations usually present semantics of type at-most-once.

Heterogeneity and Data Representation Conversion

Internet heterogeneity (machines and networks). The Client and Server can run on different architectures that use different data representations:

Characters (ASCII, ISO8859-*, Unicode UTF-*,…)
Integers (size, 1 or 2’s complement)
Length (integers of 2 or 4 bytes)
Reals (exp and mantissa length, format, …)
Byte order within a word (little endian V.S. big endian)

To communicate between heterogeneous nodes, there are two solutions:

Each node converts the data for the recipient (more performance)
Agree on a common data representation format (more flexibility)

There are some important standards:

XDR (Sun Microsystems)
ASN.1/X.680 (ITU-T/OSI)
XML (W3C)

eXternal Data Representation (XDR)

XDR provides a set of conversion procedures to transform the native representation of the data into an external XDR representation and vice versa. XDR makes use of a stream, a buffer, to create a message with data in XDR form. Data are inserted/extracted into/from the XDR stream one at a time, operations of serialization/deserialization.

XDR

Serialization example:

// ...
XDR *xdrs;
char buf[BUFSIZE];

// stream XDR creation
xdrmem_create(xdrs, buf, BUFSIZE, XDR_ENCODE);

// input within the stream of an integer
// converted in XDR format
i=260;
xdr_int(xdrs, &i);

Deserialization is done in the same way, with the same routines, used in the opposite direction. Opposite, we specify XDR_ DECODE flag in xdrmem_create().

Conversion built-in functions:

Function	Converted Data Type
`rdx_bool()`	Logic
`rdx_char()`, `rdx_u_char()`	Character
`rdx_short()`, `rdx_u_short()`	16 bit Integer
`rdx_enum()`	Enumerator
`rdx_float()`	Floating point number
`rdx_int()` `rdx_u_int()`	Integer
`rdx_long()` `rdx_u_long()`	32 bit Integer
`rdx_void()`	Null
`rdx_opaque()`	Bytes with no meaning in particular
`rdx_double()`	Double precision floating point number

Functions for composed data types:

Function	Converted Data Type
`rdx_array()`	Vector with elements of any type
`rdx_vector()`	Fixed-length vectors
`rdx_string()`	Characters sequence with `NULL`
`rdx_bytes()`	Bytes vector without terminator
`rdx_reference()`	Data reference
`rdx_pointer()`	Data reference, `NULL` included
`rdx_union()`	Unions

Caution: in the case of structured information, XDR serializes it with pointer chasing and flattening.

OSI Levels & RPC

RFC & ISO/OSI model

Java Remote Method Invocation

Java RMI is a Java-specific implementation of RPC model. The Java platform simplifies the management of node heterogeneity and enables to completely mask syntactic differences between local and remote.

Distributed Object Model

In the distributed object model of Java RMI, a remote object consists of an object with methods that can be invoked by another JVM (Java Virtual Machine), running on a different host. Moreover, the object is described via remote interfaces declaring methods that are accessible by remote.

Architecture

RMI System

Stub is a local proxy over where to perform the calls to the remote object.

Skeleton is a remote element that receives the stub’s invocations and realizes them by making the corresponding calls on the server.

Remote Reference Layer provides support for calls forwarded from the stub. The RRL is responsible for establishing the connection between the client and the server by performing encoding and decoding operations on the data.

Transport Layer locates the RMI server related to the remote object requested, handles connections (TCP/IP, timeout) and transmissions (sequential, serialized).

Registry is a naming service that allows the server to publish a service and the client to retrieve its proxy.

Application Development

To implement a distributed application using Java RMI you first need to define the service interface. Then, design the implementations of the remotely usable components and compile the necessary classes (using javac tool). Generate stubs and skeletons of the remotely usable classes (with the RMI compiler rmic). At this point, it is possible to activate the service registry. You need to register the service: the server must do a bind on the registry. Then perform lookup on the registry: the client must get the reference to the remote object. At this point, the interaction between the client and the server can proceed.

Let’s make an Echo Application example. The client application asks the user to insert a string and communicate to a server using an RMI to perform the echo of the user message.

Service Interface

The service interface is the contract that must be respected by both servers and clients (to whom it must be known). Service interfaces are Java interfaces that must inherit from java.rmi.Remote. All public methods of these interfaces will be invocable remotely. Such methods must declare to raise java.rmi.RemoteException exception in case of errors.

import java.rmi.Remote;
import java.rmi.RemoteException;

interface EchoService extends Remote {
    public String getEcho(String echo) throws RemoteException;
}

Server

The server must extend UnicastRemoteObject class and implement constructor and methods from remote interface.

import java.rmi.Naming;
import java.rmi.RemoteException;
import java.rmi.server.UnicastRemoteObject;

public class EchoServer extends UnicastRemoteObject implements EchoService {
    public EchoServer() throws RemoteException {
        super();
    }

    @Override
    public String getEcho(String echo) throws RemoteException {
        return echo;
    }

    public static void main(String[] args) {
        try {
            EchoService service = new EchoServer();
            Naming.rebind("EchoService", service);
        } catch(Exception e) {
            System.err.println(e.getStackTrace());
            System.exit(1);
        }
    }
}

Client

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.rmi.Naming;

public class EchoClient {
    public static void main(String[] args) {
        try {
            InputStreamReader isr = new InputStreamReader(System.in);
            BufferedReader sReader = new BufferedReader(isr);

            // connection to remote RMI registry
            EchoService service = (EchoService) Naming.lookup("EchoService");

            // interaction with user
            System.out.print("# PROMPT: ");
            String msg = sReader.readLine();

            // request of remote service
            String echString = service.getEcho(msg);

            // print the outcome
            System.out.println("# ECHO: " + echString + "\n");
        } catch (Exception e) {
            e.printStackTrace();
            System.exit(1);
        }
    }
}

Compiling Phase

Open a terminal and type:

# java compile: generate .class files
javac EchoService.java EchoServer.java EchoClient.java

# RMI compile: generate stubs and skeletons
rmic EchoServer

On separate terminals, start the registry:

rmiregistry

Start the server:

java Server

Start the client:

java Client

The user can interact with the client.

Serialization

Within Java programming, marshaling and unmarshaling are performed using built-in serialization technique. Serialization allows complex data read/write operations transparently by the media (i.e., by the stream type).

Write process is called serialization. It transforms complex objects into simple sequences of bytes over an output stream. Java default method is writeObject.

Read process is called deserialization. It decodes a bytes sequence to build a copy of the original object from an input stream. Java default method is readObject.

Java uses serialization in a couple of important fields: data persistence, and transmissions of objects from different machines. RMI’s parameters and return values are serialized during the transmission, and deserialized once received.

An object is not entirety serialized: only those pieces of information that characterize its instance are selected. This description excludes methods, constants, static and transient variables. If a class contains non-serializable objects, it must declare them as transient. Upon deserialization, a copy of the transmitted instance will be recreated by using the object code and the received information. It implies that .class file must be accessible.

Persistence Example

This example makes use of serialization to save a string in a record.dat.

Object Serialization

import java.io.FileOutputStream;
import java.io.ObjectOutputStream;

/**
 * Perform serialization of the string.
 */
public class StringSer {
    public static void main(String[] args) {
        // hard-coded message
        String recString = new String("Hello, World!");

        try {
            FileOutputStream fOutputStream = new FileOutputStream("record.dat");
            ObjectOutputStream outputStream = new ObjectOutputStream(fOutputStream);

            outputStream.writeObject(recString);
            outputStream.close();

            System.out.println("# Serialization: " + recString + "\n");
        } catch (Exception e) {
            System.err.println(e.getStackTrace());
            System.exit(1);
        }
    }
}

Object Deserialization

import java.io.FileInputStream;
import java.io.ObjectInputStream;

/**
 * Perform deserialization of the string.
 */
public class StringDeser {
    public static void main(String[] args) {
        String recString = new String();

        try {
            FileInputStream fInputStream = new FileInputStream("record.dat");
            ObjectInputStream inputStream = new ObjectInputStream(fInputStream);

            recString = (String)inputStream.readObject();
            inputStream.close();
            System.out.println("# Deserialization: " + recString + "\n");
        } catch (Exception e) {
            System.err.println(e.getStackTrace());
            System.exit(1);
        }
    }
}

Try to run StringSer and then StringDeser. If you try to read the file, you’ll get something like:

<AC><ED>^@^Et^@^MHello, World!

Parameters Pass

The parameters pass depends on the type of parameter being considered.

Type	Local Method	Remote Method
Primitive Types	pass-by-value	pass-by-value
Objects	pass-by-reference	pass-by-value (serialization)
Remote Exported Objects	pass-by-reference (Stub object)	pass-by-reference (remote)

Pass-by-value is performed over serializable objects, whose location is not relevant to the state. Pass-by-reference is performed over remote objects, whose function is strictly related to the location where they execute.

Service Localization

A client running on a machine needs to locate the server it wants to connect to. Obliviously, the server is running on another machine. There are many solutions:

the client knows in advance where the server is
the user tells the client application where the server is
the client uses a naming service placed in a well-known Internet location to get the server address

Java uses the RMI registry, which is a naming service. It stores a set of couples (name, reference). The first field, name, is an arbitrary uninterpreted string; reference is the actual reference to the remote object.

java.rmi.Naming class defines RMI registry services. The provided static methods are:

public final class Naming {
    public static Remote lookup(String);
    public static void bind(String, Remote);
    public static void unbind(String);
    public static void rebind(String, Remote);
    public static String[] list(String);
}

Registry Security

The use of registry brings a huge security issue. The registry is detectable by querying all ports of a host. By accessing the registry it is possible to redirect, for malicious purposes, calls to registered RMIs servers. The solution adopted by Java is to allow the invocation of bind(), rebind() and unbind() methods only from the host on which the registry is running. Client/Server architecture changes are not accepted from the outside. There must be at least a registry running on the host where the registry calls are made.

Each JVM defines differentiated and protected execution environments for different application scopes, with a focus on interactions with distributed applications and the use of remote code. A step forward RMI secureness is to prevent classes downloaded from remote hosts from performing operations for which they have not been enabled in advance.

ClassLoader class performs control over the loading of remote code, associated with a security manager for the protection specification. Java provided RMISecurityManager class until its deprecation in version 17. The Java security manager mechanism is hard to use in the proper manner. Furthermore, there is no real prospect of improving the mechanism to make it safe. See JEP 411 for more details.

As you can also read from this StackOverflow answer, you should re-architect your applications so that you don’t need to download RMI classes from an untrusted source or via a channel that is insecure.

Serialization Filtering

The deserialization of untrusted data, particularly from an unknown, untrusted or unauthenticated client, is a dangerous practice. The content of the input data stream determines: created objects, the values of their fields, and the references between them. Through careful stream construction, an adversary can execute code in arbitrary classes with malicious intent. Through a careful flow construction, a malicious client can execute code in arbitrary classes (with malicious intents). For a more accurate lecture, see Oracle Docs.

Distributed Objects Garbage Collector

In a distributed object system, it is desirable to automatically deallocate the remote objects that no longer have any reference within clients. The RMI system uses a distributed garbage collection algorithm based on reference counting.

Each JVM updates a set of counters, each one is associated with a specific object. Each counter represents the number of references to a certain object that (at that moment) are active on a JVM. Each time a reference to a remote object is created, its counter increments. Only for the first occurrence, an alert message is sent the host of the new client. When a reference is deleted its counter is decremented. If it was the last occurrence, an alert message is sent the server.