DEV Community

Cover image for Pipy - A Programmable network proxy for cloud, edge, and IoT
Ali Naqvi
Ali Naqvi

Posted on

Pipy - A Programmable network proxy for cloud, edge, and IoT

Pipy logo
In this article we are going to introduce Pipy, an open-source cloud-native network stream processor, explain its modular design, architecture and will see a quick example of how we can have a high-performance network proxy up and running quickly to serve our specific needs. Pipy has been battle-tested and is already in use by multiple commercial clients.

Pipy is an open-source, light-weight, high-performance, modular, programmable, cloud-native network stream processor that is ideal for a variety of use-cases ranging from (but not limited to) edge routers, load balancers & proxy solutions, API gateways, static HTTP servers, service mesh sidecars, and other applications.

As it turns out, each of those attributes to Pipy has a pretty specific meaning of its own, so let’s take a look.

Lightweight

The compiled Pipy executable is around 10 MB in size, and requires a very small memory footprint to run.

High-performance

Pipy is written in C++ and built on top of the Asio asynchronous I/O library.

Modular

Pipy has a modular design at its core, with many small reusable modules (filters) that can be linked together to form a pipeline through which network data flows and is processed.

Stream processor

Pipy operates on network streams in an event-driven fashion by abstracting out network stream data bytes into events. Pipy provides abstractions of event-driven pipeline where pipelines consumes the input stream, performs user-provided transformations, and outputs the stream.

Pipy streams abstract out data bytes into events belonging to one of four categories:

Event Description
Data Network streams are composed of data bytes and come in chunks. Pipy abstracts out chunks into a Data event.
MessageStart MessageEnd StreamEnd These three non-data events work as markers, giving the raw byte streams high-level semantics for business logic to rely on.

Programmable

Pipy handles all of the hard work by taking care of low level details and give you back the power of deciding how individual pieces of puzzle should be put together to achieve your business goals using the planet's mostly widely used programming language. Pipy does that with the help of built-in JavaScript engine called PipyJS, which is part of the Pipy code base, but has no dependency on it. PipyJS is highly customizable and predictable in performance, with no garbage collection overhead. In the future, PipyJS might move to its standalone package.

Zero dependency

Pipy comes with batteries included and compiled executables has no external dependency. Pipy when compiled with GLibc will have runtime dependency on it, and Pipy comes as fully static executable with literally Zero dependency on any runtime.

Compatibility

Pipy supports architectures like Intel, AMD, ARM, LoongArch, Hygon and being constantly tested on platforms like RHEL/CentOS, Debian, Ubuntu, macOS (Intel + M1), FreeBSD, OpenEuler, OpenWrt, ArchLinux etc.

Pipy Design

The internal workings of Pipy can be related to that of Unix Pipelines but unlike Unix pipelines which deal with discrete bytes, Pipy deals with streams of events.

Pipy process Incoming streams via a chain of filters, where each filter deals with general concerns like request logging, authentication, SSL offloading, request forwarding, etc. Each filter reads from its input and writes to its output, with the output of one filter connected to the input of the next.

Pipelines

A chain of filters is called a pipeline and Pipy categorizes pipelines by their input sources in 4 different categories.

  • Port pipeline
    is created when there is an incoming TCP connection. It reads Data events from the connection, processes them, and then writes the result back to the client. This resembles the "request and response" communication model as in HTTP, where the input to the pipeline is the request and the output from the pipeline is the response. It's safe to consider every incoming connection to Pipy has a port pipeline related to it, handling the communication happening in that connection.

  • Timer pipeline
    is one that is created periodically and gets a pair of MessageStart and MessageEnd events as its only input. Whatever it outputs is simply discarded after all the processing in the filters. This type of pipeline can be used to carry out cron job-like tasks.

  • Signal pipeline
    is created when a signal is sent to the Pipy process. It gets a pair of MessageStart and MessageEnd events as its only input. Whatever it outputs is simply discarded after all the processing in the filters. This type of pipeline is useful when certain tasks need to be performed when a signal is received.

  • Sub-pipeline
    is a pipeline that can be started from other pipelines by using a joint filter. The most basic joint filter, among a couple of others, is link. It receives events from its predecessor in the main pipeline, sends them to a sub-pipeline for processing, and then pumps down to its succeeding filter in the main pipeline whatever that sub-pipeline outputs.

    One way to look at joint filters and sub-pipelines is analogize them to callers and callees in a sub-routine calling process in procedural programming. The input to the joint filter is the sub-routine's parameters, and the output is the return value.

    Unlike sub-pipelines, the other types of pipelines, namely port pipelines, timer pipelines and singal pipelines, cannot be "called" internally from a joint filter. They can only be started externally by an incoming connection, a timer or a signal. We call these pipelines root pipelines.

Module

A module is a PipyJS source file containing scripts that configure a set of pipeline layouts.

A pipeline layout tells Pipy what filters a pipeline has and in what order. Note that configuring a pipeline layout in a module doesn't create any pipelines at that moment. It only defines what a pipeline looks like when it is actually created at runtime to handle some input, though in some cases when the meaning is obvious, we use the term "pipeline" for "pipeline layout" just for brevity.

Since "modules" and "files" have a one-on-one relationship, we use the 2 terms interchangeably.

Context

Another important notion in Pipy is that of contexts. A context is a set of variables attached to a pipeline. Every pipeline gets access to the same set of variables across a Pipy instance. In other words, contexts have the same shape. When you start a Pipy instance, the first thing you do is to define the shape of the context by defining variable(s) and their initial values.
Every root pipeline clones the initial context you define at the start. When a sub-pipeline starts, it either shares or clones its parent’s context, depending on which joint filter you use. For instance, a link filter shares its parent’s context while a demux filter clones it.

To the scripts embedded in a pipeline, these context variables are their global variables, which means that these variables are always accessible to scripts from anywhere if they live in the same script file.

This might seem odd to a seasoned programmer because global variables usually mean they are globally unique. You have only one set of these variables, whereas in Pipy we can have many sets of them (aka contexts) depending on how many root pipelines are open for incoming network connections and how many sub-pipelines clone their parents’ contexts.

Quick Start

For the impatient, we can run the production version of pipy via docker with one of the tutorial scripts provided on the official pipy GitHub repository. Let's follow the norm of classic Hello World!, but let's change the wordings to Hi there!

The Pipy Docker image can be configured with a few environment variables:

  • PIPY_CONFIG_FILE=</path/to/config-file> denotes the location of Pipy configuration file

  • PIPY_SPAWN=n for the number of Pipy instances you want to start, where n is the number
    of instances, this is a zero-based index where a value of 0 represents 1 instance. For example, you use PIPY_SPAWN=3 for 4 instances.

$ docker run --rm -e PIPY_CONFIG_FILE=\
https://raw.githubusercontent.com/flomesh-io/pipy/main/tutorial/01-hello/hello.js \
-e PIPY_SPAWN=1 -p 8080:8080 flomesh/pipy-pjs:latest
Enter fullscreen mode Exit fullscreen mode

This will start the Pipy server with provided script. Keen users might have noticed that instead of the local file we have provided the repository link of Pipy script via environment variable PIPY_CONFIG_FILE and Pipy is smart enough to handle such cases.

pipy()

.listen(8080)
  .serveHTTP(
    new Message('Hi, there!\n')
  )

Enter fullscreen mode Exit fullscreen mode

In this script, we have defined one Port pipeline which listens on port 8080 and returns "Hi, there!" to each HTTP request received on the listened port.

As we have exposed local port 8080 via the above docker run command, so we can proceed with a test on the same port

$ curl http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

executing the above command should display Hi, there! into the console.

For learning, development, or debugging purposes it’s recommended to proceed with the local installation (either build Pipy from sources or download a pre-built release for your specific OS) of Pipy as it comes with an admin web console along with documentation and tutorials.

once installed locally, running pipy without any arguments starts the admin console on port 6060 , but it can be configured to listen on the different port via --admin-port= argument.

Pipy admin console running on port 6060

To build Pipy from its source or to install a pre-compiled binary for your operating system, please refer to Getting Started document on Pipy GitHub repository.

Run via CLI

To start a Pipy proxy, run pipy with a PipyJS script file, for example, the script
in tutorial/01-hello/hello.js if you need a simple echo server that responds with the same message
body as in every incoming request:

$ pipy tutorial/01-hello/hello.js
Enter fullscreen mode Exit fullscreen mode

Alternatively, while developing and debugging, one can start Pipy with a builtin web UI:

$ pipy tutorial/01-hello/hello.js --admin-port=6060
Enter fullscreen mode Exit fullscreen mode

Show Command-Line Options

$ pipy --help
Enter fullscreen mode Exit fullscreen mode

Writing Network Proxy

Suppose we are running separate instances of different services and we would like to add a proxy to forward the traffic to the relevant services based on the request URL path. This would give us the benefit of exposing a single URL and scaling our services in the backend without users having to remember distinct services’ URLs. In normal situations, your services would be running on different nodes and each service could have multiple instances running. In this example, though, we are assuming we are running the services below and we want to distribute traffic to them based on the URI.

Service URI Host:Port
service-hi /hi/* "127.0.0.1:8080", "127.0.0.1:8082"
service-echo /echo "127.0.0.1:8081"
service-tell-ip /ip/* "127.0.0.1:8082"

Pipy scripts are written in JavaScript, and you can use any text editor of your choice to edit them. Alternatively, if you have installed Pipy locally, you can use Pipy admin Web UI, which comes with syntax highlighting, autocompletion, hints, as well as the possibility of running scripts, all from the same console.
So, let’s start a Pipy instance, without any arguments, so the Pipy admin console will start on port 6060. Now open your favorite web browser and navigate to http://localhost:6060. You will see the built-in Pipy Administration Web UI (Figure 1).

Create a Pipy program

A good design practice is that code and configurations are separated. Pipy supports such modular design via its Plugins that you can think of as JavaScript modules. That said, we will be storing our configuration data under the config folder and our coding logic in separate files under the plugins folder. The main proxy server script will be stored in the root folder, the main proxy script (proxy.js) will include and combine the functionality defined in separate modules. Once we are done with the steps detailed below, our final folder structure will look like:

├── config
│   ├── balancer.json
│   ├── proxy.json
│   └── router.json
├── plugins
│   ├── balancer.js
│   ├── default.js
│   └── router.js
└── proxy.js
Enter fullscreen mode Exit fullscreen mode

So let's start:

  1. Click New Codebase, enter /proxy (or any name you would like to give to your code base) for the Code-base name in the dialog and then click Create. You will be brought to a code editor for the newly created code-base.
  2. Click the + button up above to add a new file. Enter /config/proxy.json (this is the configuration file which we will use to configure our proxy) for its filename and then click Create.
  3. You will now see proxy.json listed under the configfolder on the left pane. Click on the file to open it and add the configuration shown below and make sure you save your file by hitting the disk icon on the top panel.

    {
    "listen": 8000,
    "plugins": [
        "plugins/router.js",
        "plugins/balancer.js",
        "plugins/default.js"
    ]
    }
    
  4. Repeat steps 2 and 3 to create another file, /config/router.json, which will store routes information, with the following configuration data:

    {
    "routes": {
        "/hi/*": "service-hi",
        "/echo": "service-echo",
        "/ip/*": "service-tell-ip"
    }
    }
    
  5. Repeat steps 2 and 3 to create another file, /config/balancer.json, which will store our service-to-target map, with the following data:

    {
    "services": {
        "service-hi"      : ["127.0.0.1:8080", "127.0.0.1:8082"],
        "service-echo"    : ["127.0.0.1:8081"],
        "service-tell-ip" : ["127.0.0.1:8082"]
    }
    }
    
  6. Let’s write our very first Pipy script which will be used as a default fallback when we receive a request for which we don’t have any target (endpoint/url) configured. Repeat the above steps to create file /plugins/default.js. The name here is just a convention and Pipy doesn’t rely on names, so you can choose any name you like. The script will contain the code shown below, which returns the HTTP Status code 404 with a message of No handler found:

    pipy()
    
    .pipeline('request')
    .replaceMessage(
        new Message({ status: 404 }, 'No handler found')
    )
    
  7. Create the file /plugins/router.js, which stores our routing logic:

    (config =>
    
    pipy({
    _router: new algo.URLRouter(config.routes),
    })
    
    .export('router', {
    __serviceID: '',
    })
    
    .pipeline('request')
    .handleMessageStart(
        msg => (
        __serviceID = _router.find(
            msg.head.headers.host,
            msg.head.path,
        )
        )
    )
    
    )(JSON.decode(pipy.load('config/router.json')))
    
  8. Create the file /plugins/balancer.js, which stores our load balancing logic As a side-note, Pipy comes with multiple Load Balancing algorithms, but for simplicity sake, we will be using the Round Robin algorithm here.

    (config =>
    
    pipy({
    _services: (
        Object.fromEntries(
        Object.entries(config.services).map(
            ([k, v]) => [
            k, new algo.RoundRobinLoadBalancer(v)
            ]
        )
        )
    ),
    
    _balancer: null,
    _balancerCache: null,
    _target: '',
    })
    
    .import({
    __turnDown: 'proxy',
    __serviceID: 'router',
    })
    
    .pipeline('session')
    .handleStreamStart(
        () => (
        _balancerCache = new algo.Cache(
            // k is a balancer, v is a target
            (k  ) => k.select(),
            (k,v) => k.deselect(v),
        )
        )
    )
    .handleStreamEnd(
        () => (
        _balancerCache.clear()
        )
    )
    
    .pipeline('request')
    .handleMessageStart(
        () => (
        _balancer = _services[__serviceID],
        _balancer && (_target = _balancerCache.get(_balancer)),
        _target && (__turnDown = true)
        )
    )
    .link(
        'forward', () => Boolean(_target),
        ''
    )
    
    .pipeline('forward')
    .muxHTTP(
        'connection',
        () => _target
    )
    
    .pipeline('connection')
    .connect(
        () => _target
    )
    
    )(JSON.decode(pipy.load('config/balancer.json')))
    
  9. Now let’s write the entry point or the proxy server script which will use the above plugins. Creating a new code base(step 1) would have created a default main.js file as an entry point. We can use that as our main entry point, or if you prefer to go with a different name, feel free to delete main.js and create a new file with the name of your choice. Let’s delete it and create a new file named /proxy.js. Make sure you click the top flag icon to make it the main entry point, as this will ensure script execution is started when you hit the run button (the arrow icon on the right).

    (config =>
    
    pipy()
    
    .export('proxy', {
    __turnDown: false,
    })
    
    .listen(config.listen)
    .use(config.plugins, 'session')
    .demuxHTTP('request')
    
    .pipeline('request')
    .use(
        config.plugins,
        'request',
        'response',
        () => __turnDown
    )
    
    )(JSON.decode(pipy.load('config/proxy.json')))
    

if you have followed the steps above, then you will have something similar to what you see in the screenshot below:

Pipy Web UI

Now let’s run our script by hitting the play icon button (4th from right). If we didn’t make any mistake in our scripts, we will see Pipy run our proxy script and we will see an output like:

Pipy Web UI

$ curl -i http://localhost:8000

HTTP/1.1 404 Not Found
content-length: 10
connection: keep-alive

No handler
Enter fullscreen mode Exit fullscreen mode

that make sense, as we haven't configured any target for root. Let's try one of our configured routes, e.g., /hi

$ curl -i http://localhost:8000/hi

HTTP/1.1 502 Connection Refused
content-length: 0
connection: keep-alive
Enter fullscreen mode Exit fullscreen mode

We got 502 Connection Refused as we have no service running on our configured target port.

You can update /config/balancer.json with details like host, port of your already running services to make it fit for your use case, or let’s just write a script in Pipy which will listen on our configured ports and return simple messages.

Save the snippet below to a file on your local computer named mock-proxy.js and remember the location where you stored it.

pipy()

.listen(8080)
  .serveHTTP(
    new Message('Hi, there!\n')
  )

.listen(8081)
  .serveHTTP(
    msg => new Message(msg.body)
  )

.listen(8082)
  .serveHTTP(
    msg => new Message(
      `You are requesting ${msg.head.path} from ${__inbound.remoteAddress}\n`
    )
  )
Enter fullscreen mode Exit fullscreen mode

Open a new terminal window and run this script via Pipy (where /path/to is referring the location where you have stored this script file):

$ pipy /path/to/mock-proxy.js
`
2022-02-15 18:56:31 [INF] [config]
2022-02-15 18:56:31 [INF] [config] Module /mock-proxy.js
2022-02-15 18:56:31 [INF] [config] ================
2022-02-15 18:56:31 [INF] [config]
2022-02-15 18:56:31 [INF] [config]  [Listen on :::8080]
2022-02-15 18:56:31 [INF] [config]  ----->|
2022-02-15 18:56:31 [INF] [config]        |
2022-02-15 18:56:31 [INF] [config]       serveHTTP
2022-02-15 18:56:31 [INF] [config]        |
2022-02-15 18:56:31 [INF] [config]  <-----|
2022-02-15 18:56:31 [INF] [config]  
2022-02-15 18:56:31 [INF] [config]  [Listen on :::8081]
2022-02-15 18:56:31 [INF] [config]  ----->|
2022-02-15 18:56:31 [INF] [config]        |
2022-02-15 18:56:31 [INF] [config]       serveHTTP
2022-02-15 18:56:31 [INF] [config]        |
2022-02-15 18:56:31 [INF] [config]  <-----|
2022-02-15 18:56:31 [INF] [config]  
2022-02-15 18:56:31 [INF] [config]  [Listen on :::8082]
2022-02-15 18:56:31 [INF] [config]  ----->|
2022-02-15 18:56:31 [INF] [config]        |
2022-02-15 18:56:31 [INF] [config]       serveHTTP
2022-02-15 18:56:31 [INF] [config]        |
2022-02-15 18:56:31 [INF] [config]  <-----|
2022-02-15 18:56:31 [INF] [config]  
2022-02-15 18:56:31 [INF] [listener] Listening on port 8080 at ::
2022-02-15 18:56:31 [INF] [listener] Listening on port 8081 at ::
2022-02-15 18:56:31 [INF] [listener] Listening on port 8082 at ::
Enter fullscreen mode Exit fullscreen mode

Now we have our mock services listening on ports 8080, 8081, and 8082. So, let’s do a test again on our proxy server and you will see the correct response returned from our mock service.

Summary

We have used a number of Pipy features, including variable declaration, importing/exporting variables, plugins, Pipelines, sub-pipelines, filter chainings, Pipy filters like handleMessageStart, handleStreamStart, link, pipy classes like JSON, algo.URLRouter, algo.RoundRobinLoadBalancer, algo.Cache, etc. A thorough explanation of all these concepts is out of the scope of this article, but you are encouraged to read Pipy documentation, which is accessible via Pipy’s admin web UI, and follow the step-by-step tutorials which come with it.

Key Takeaways

  • Pipy is an open-source lightweight high-performance cloud-native network traffic processor.
  • Due to its modular design, performance, low memory footprint, and network support, Pipy is suitable for a multitude of use-cases like edge-routers, load balancers, proxying solution, API gateways, service mesh sidecars etc.
  • Pipy comes with a built-in small embeddable JavaScript engine and its support for expressions, functions, contexts gives you full control of your logic to handle streams in a way specific to your needs.
  • It is low overhead, in that with very little code and configuration we can have full-fledged network services up and running.
  • Pipy is multi-platform and runs on almost any platform which is supported by C/C++ compilers like Clang or GCC
  • Pipy is battle-tested and in use by many commercial clients from industries like financial institutions, insurance, auto, retail etc. Pipy is under active development and has a dedicated full-time team of committers

Conclusion

Pipy is an open-source, extremely fast, and lightweight network traffic processor which can be used in a variety of use cases ranging from edge routers, load balancing & proxying (forward/reverse), API gateways, Static HTTP Servers, Service mesh sidecars, and many other applications. Pipy is in active development and maintained by full-time committers and contributors, though still an early version, it has been battle-tested and in production use by several commercial clients.

This article provided a very brief overview and a high-level introduction to Pipy. Step-by-step tutorials and documentation can be found on its GitHub page or accessed via Pipy admin console web UI. The community is welcome to contribute to Pipy development, give it a try for their use-case, or provide their feedback and insights.

Top comments (1)

Collapse
 
kiliman profile image
Kiliman

I don't have a use for this at the moment, but I just wanted to say that the level of detail in this article is top notch. Very impressive!