DEV Community


Posted on

Unified Stream and Batch Processing of WorldQuant 101 Alphas in DolphinDB

Image description

In 2015, the formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. To conveniently calculate these 101 alphas in DolphinDB, you can use the functions encapsulated in the module wq101alpha (wq101alpha.dos).

This module has the following advantages over Python:

Better Performance: DolphinDB outperforms Python by a median of 15.5x. For 27.5% of the alphas, DolphinDB is more than 100 times faster than Python.
Unified Stream and Batch Processing: Functions defined in this module can be used for both stream and batch processing.
Simplicity: Most of the 101 alphas can be written with DolphinDB built-in functions. No need to develop new functions.

All scripts in this tutorial are compatible with DolphinDB V2.00.8, 1.30.20 or higher.

  1. Naming Conventions All function names in module wq101alpha start with “WQAlpha” followed by a number from 1 to 101. For examples, WQAlpha1,WQAlpha2, etc. The following is a list of parameters for alphas defined in the wq101alpha module. Each alpha may use a different set of parameters (see appendix).

Image description

Two types of alphas are defined in the wq101alpha module: alphas with and without industry information.

Image description

  1. Examples This chapter expounds how to calculate alphas with specific examples.

2.1 Environment Setup
Add the module file wq101alpha.dos to [home]/modules.

The [home] directory is specified by the configuration parameter home. (To check the value of home, use getHomeDir())

2.2 Data Preparation
You can simulate daily data with DailyDataSimulation and industry information with IndustryInfo.

Alternatively, if you already have tables of daily data and industry information, you need to perform an equal join to combine these two tables, and make sure that the column names are consistent with the parameters defined in the module.

If you need to change the column names, you can use function prepareData in module prepare101 (add prepare101.dos to [home]/modules). It converts the column names to the defined parameters.

rawData is a table containing non-industry information.
infoData is a table containing industry information.
startTime and endTime determine the start time and end time of data.
Other parameters are column names to be converted.
Import the wq101alpha module and load the data you prepared:

use wq101alpha
use prepare101
login('admin', '123456')
rawData = loadTable("dfs://k_day_level", "k_day")
infoData = select * from loadTable("dfs://info", "info_data")
startTime = timestamp(2010.01.01)
endTime = timestamp(2010.01.31)
data = prepareData(rawData=rawData, startTime=startTime, endTime=endTime, securityidName="securityid", tradetimeName="tradetime", openName="open", closeName="close", highName="high", lowName="low", volumeName="vol", vwapName="vwap", infoSecurityidName="securityid", capName="cap", indclassName="indclass", infoData=infoData)
Enter fullscreen mode Exit fullscreen mode

2.3 Calculating Alphas Without Industry Information
In the wq101alpha module, the calculation of alphas without industry information is generally conducted on two dimensions: time-series and cross-section. For these factors, you need to prepare a matrix as the input, and then call function WQAlpha#. Check Appendix for specific parameters.

For example, you can calculate alpha 1 and alpha 2 as follows:

use wq101alpha
input1 = exec close from data where tradetime between startTime : endTime pivot by tradetime, securityid
res1 = WQAlpha1(input1)

input2 = dict(`vol`close`open, panel(data.tradetime, data.securityid, [data.vol, data.close,]))
res2 = WQAlpha2(input2.vol, input2.close,

Enter fullscreen mode Exit fullscreen mode

We provide function prepare# and calAlpha# in the prepare101 module to save your time spent on specifying parameters.

Function prepare# prepares the parameters required for each alpha and function calAlpha# encapsulates function prepare# and wqAlpha#.

Take alpha 1 as an example:

def prepare1(data, startTime, endTime){
    p = exec close from data where tradetime between startTime : endTime pivot by tradetime, securityid
    return p

def calAlpha1(data, startTime, endTime){
    input = prepare1(data, startTime, endTime)
    return WQAlpha1(input)

//call the module
use prepare101

res = calAlpha1(data, startTime, endTime)
Enter fullscreen mode Exit fullscreen mode

In addition to matrices, parameters of function WQAlpha41, WQAlpha54, and WQAlpha101 can also be vectors.

For example, you can calculate alpha 101 using a SQL statement as follows:

use wq101alpha

res = select tradetime, securityid, `alpha101 as factorname, WQAlpha101(close, open, high, low) as val from data where tradetime between startTime : endTime

2.4 Calculating Alphas with Industry Information
To calculate alphas with industry information, you need to specify a table as the input.

Take alpha 48 as an example:

use wq101alpha

res = WQAlpha48(data)

You can also use function calAlpha# in prepare101 module.

def calAlpha48(data, startTime, endTime){
input = select * from data where tradetime between startTime : endTime
return WQAlpha48(input)

//call the module
use prepare101

res = calAlpha48(data, startTime, endTime)

The alpha calculation in the paper 101 Formulatic Alphas adopts several industry classifications, such as IndClass, subindustry, IndClass.industry, IndClass.sector. For the sake of convenience, only IndClass is used in this module.

Functions in the wq101alpha module return a matrix or a table. You can save your results to database if needed. Please refer to wq101alphaStorage.

  1. Performance Comparison Our testings show that the wq101alpha module of DolphinDB outperforms Python pandas and Numpy.

CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz

OS: 64-bit CentOS Linux 7 (Core)

We use the simulated daily data in a year to conduct performance testing (see TestData).

3.1 DolphinDB vs. Python Pandas
We compare the performance of alpha calculation implemented by DolphinDB module wq101alpha and Python pandas.

The following is the main script for performance testing of the wq101alpha module (see full script in wq101alphaDDBTime):

times = array(INT, 0)
for (i in 1:102){
if (i in passList) times.append!(NULL)
alphaName = exec name from defs() where name = "wq101alpha::WQAlpha"+string(i)
alphaSyntax = exec syntax from defs() where name = "wq101alpha::WQAlpha"+string(i)
function = alphaName + alphaSyntax
t1 = time(now())
res = parseExpr(function[0]).eval()
t2 = time(now())
times.append!(t2 - t1)

The following is the main script for performance testing of Python pandas (see full script in wq101alphaPyTime):

times = []

nofunc = [48, 56, 58, 59, 63, 67, 69, 70, 76, 79, 80, 82, 87, 89, 90, 91, 93, 97, 100]

for i in range(1, 102):
if i in nofunc:
times.append('no function')
factor = getattr(Alphas, "alpha{:03d}".format(i))
t1 = time.time()
res = factor(stock)
t2 = time.time()
times.append(t2 - t1)
except Exception:

The execution time of all 101 alphas can be found in PerformanceComparison.

Alphas that have not yet been implemented in Python pandas or whose results are questionable are excluded.

69 alphas are available for comparison (in millisecond):

Image description

Image description

The result shows that wq101alpha in DolphinDB outperforms python pandas. DolphinDB is faster than Python by a median of 15.5x. For 27.5% of the alphas, DolphinDB is more than 100 times faster than python.

3.2 DolphinDB vs. NumPy
Considering NumPy may have better performance than pandas, we choose 11 alphas that are time-consuming in pandas and implement them with NumPy. See partialAlphaNumpyTime for test results of NumPy.

Performance comparison of DolphinDB and NumPy:

Image description

We can see that while NumPy is faster than pandas, DolphinDB outperforms both.

DolphinDB has optimized the implementation of its window functions. In comparison, NumPy is not optimized for window calculations implemented by numpy.lib.stride_tricks.sliding_window_view.

  1. Stream Processing It is complex to implement most alphas in real time, which requires more than one stream engine. DolphinDB provides the streamEngineParser function to automatically form a pipeline of stream engines to carry out the specified metrics calculation. In streamEngineParser, you can directly call functions in module wq101alpha .

See full script in wq101alphaStreamTest for the implementation of real-time alpha calculations.

Take alpha 1 for example:

Define the schemata of input and output tables.

inputSchema = table(1:0, ["SecurityID","TradeTime","close"], [SYMBOL,TIMESTAMP,DOUBLE])
resultStream = table(10000:0, ["SecurityID","TradeTime", "factor"], [SYMBOL,TIMESTAMP, DOUBLE])

Call the wq101alpha module and use WQAlpha1 as the metrics for the streamEngineParser function.

use wq101alpha
metrics = <[WQAlpha1(close)]>
streamEngine = streamEngineParser(name="WQAlpha1Parser", metrics=metrics, dummyTable=inputSchemaT, outputTable=resultStream, keyColumn="SecurityID", timeColumn=
tradetime, triggeringPattern='perBatch', triggeringInterval=4000)
Check the status of the stream engines with function getStreamEngineStat().



name user status lastErrMsg numGroups ...
------------- ----------- ------ ---------- --------- ...
WQAlpha1Parser0 admin OK 0 0

WQAlpha1Parser2 admin OK 0 0

name user status lastErrMsg numRows ...
--------------- ----- ------ ---------- ------- ...
WQAlpha1Parser1 admin OK 0 2


Append data to stream engines and check the output table resultStream.

//check the result
res = exec factor from resultStream pivot by TradeTime, SecurityID

  1. Conclusion This tutorial introduces how to calculate 101 alphas with DolphinDB built-in functions in the wq101alpha module. This module features efficiency, speed, and simplicity, and achieves unified batch and stream processing.

Appendix: Required Parameters for Each Alpha
Alphas without industry information

Image description

Alphas with industry information

Image description

Top comments (0)