[![](https://travis-ci.org/kaelzhang/stock-pandas.svg?branch=master)](https://travis-ci.org/kaelzhang/stock-pandas) [![](https://codecov.io/gh/kaelzhang/stock-pandas/branch/master/graph/badge.svg)](https://codecov.io/gh/kaelzhang/stock-pandas) [![](https://img.shields.io/pypi/v/stock-pandas.svg)](https://pypi.org/project/stock-pandas/) [![](https://img.shields.io/pypi/l/stock-pandas.svg)](https://github.com/kaelzhang/stock-pandas) [ma]: #ma-simple-moving-averages [ema]: #ema-exponential-moving-average [macd]: #macd-moving-average-convergence-divergence [boll]: #boll-bollinger-bands [rsv]: #rsv-raw-stochastic-value [kdj]: #kdj-a-variety-of-stochastic-oscillator [kdj]: #kdjc-another-variety-of-stochastic-oscillator [rsi]: #rsi-relative-strength-index [bbi]: #bbi-bull-and-bear-index [llv]: #llv-lowest-of-low-values [hhv]: #hhv-highest-of-high-values [column]: #column [increase]: #increase [style]: #style [repeat]: #repeat [change]: #change [cumulation]: #cumulation-and-datetimeindex [datetimeindex]: https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.html # [stock-pandas](https://github.com/kaelzhang/stock-pandas) **stock-pandas** inherits and extends `pandas.DataFrame` to support: - Stock Statistics - Stock Indicators, including: - Trend-following momentum indicators, such as [**MA**][ma], [**EMA**][ema], [**MACD**][macd], [**BBI**][bbi] - Dynamic support and resistance indicators, such as [**BOLL**][boll] - Over-bought / over-sold indicators, such as [**KDJ**][kdj], [**RSI**][rsi] - Other indicators, such as [**LLV**][llv], [**HHV**][hhv] - For more indicators, welcome to [request a proposal](https://github.com/kaelzhang/stock-pandas/issues/new?assignees=&labels=feature&template=FEATURE_REQUEST.md&title=), or fork and send me a pull request, or extend stock-pandas yourself. You might read the [Advanced Sections](https://github.com/kaelzhang/stock-pandas#advanced-sections) below. - To [cumulate][cumulation] kline data based on a given time frame, so that it could easily handle real-time data updates. `stock-pandas` makes automatical trading much easier. `stock-pandas` requires Python >= **3.6** and Pandas >= **1.0.0**(for now) With the help of `stock-pandas` and mplfinance, we could easily draw something like: ![](boll.png) The code example is available at [here](https://github.com/kaelzhang/stock-pandas-examples/blob/master/example/bollinger_bands.ipynb). ## Install For now, before installing `stock-pandas` in your environment #### Have `g++` compiler installed ```sh # With yum, for CentOS, Amazon Linux, etc yum install gcc-c++ # With apt-get, for Ubuntu apt-get install g++ # For macOS, install XCode commandline tools xcode-select --install ``` If you use docker with `Dockerfile` and use python image, ```Dockerfile FROM python:3.8 ... ``` The default `python:3.8` image already contains g++, so we do not install g++ additionally. #### Install `stock-pandas` ```sh # Installing `stock-pandas` requires `numpy` to be installed first pip install numpy pip install stock-pandas ``` Be careful, you still need to install `numpy` explicitly even if `numpy` and `stock-pandas` both are contained in `requirement.txt` ```txt numpy stock-pandas other-dependencies ... ``` ```sh pip install numpy pip install -r requirement.txt ``` ## Usage ```py from stock_pandas import StockDataFrame # or import stock_pandas as spd ``` We also have some examples with annotations in the [`example`](https://github.com/kaelzhang/stock-pandas/tree/master/example) directory, you could use [JupyterLab](https://jupyter.org/) or Jupyter notebook to play with them. ### StockDataFrame `StockDataFrame` inherits from `pandas.DataFrame`, so if you are familiar with `pandas.DataFrame`, you are already ready to use `stock-pandas` ```py import pandas as pd stock = StockDataFrame(pd.read_csv('stock.csv')) ``` As we know, we could use `[]`, which called **pandas indexing** (a.k.a. `__getitem__` in python) to select out lower-dimensional slices. In addition to indexing with `colname` (column name of the `DataFrame`), we could also do indexing by `directive`s. ```py stock[directive] # Gets a pandas.Series stock[[directive0, directive1]] # Gets a StockDataFrame ``` We have an example to show the most basic indexing using `[directive]` ```py stock = StockDataFrame({ 'open' : ..., 'high' : ..., 'low' : ..., 'close': [5, 6, 7, 8, 9] }) stock['ma:2'] # 0 NaN # 1 5.5 # 2 6.5 # 3 7.5 # 4 8.5 # Name: ma:2,close, dtype: float64 ``` Which prints the 2-period simple moving average on column `"close"`. #### Parameters - **date_col** `Optional[str] = None` If set, then the column named `date_col` will convert and set as [`DateTimeIndex`](datetimeindex) of the data frame - **to_datetime_kwargs** `dict = {}` the keyworded arguments to be passed to `pandas.to_datetime()`. It only takes effect if `date_col` is specified. - **time_frame** `str | TimeFrame | None = None` time frame of the stock. For now, only the following time frames are supported: - `'1m'` or `TimeFrame.M1` - `'3m'` or `TimeFrame.M3` - `'5m'` or `TimeFrame.M5` - `'15m'` or `TimeFrame.M15` - `'30m'` or `TimeFrame.M30` - `'1h'` or `TimeFrame.H1` - `'2h'` or `TimeFrame.H2` - `'4h'` or `TimeFrame.H4` - `'6h'` or `TimeFrame.H6` - `'8h'` or `TimeFrame.H8` - `'12h'` or `TimeFrame.H12` ### stock.exec(directive: str, create_column: bool=False) -> np.ndarray Executes the given directive and returns a numpy ndarray according to the directive. ```py stock['ma:5'] # returns a Series stock.exec('ma:5', create_column=True) # returns a numpy ndarray ``` ```py # This will only calculate without creating a new column in the dataframe stock.exec('ma:20') ``` The difference between `stock[directive]` and `stock.exec(directive)` is that - the former will create a new column for the result of `directive` as a cache for later use, while `stock.exec(directive)` does not unless we pass the parameter `create_column` as `True` - the former one accepts other pandas indexing targets, while `stock.exec(directive)` only accepts a valid **stock-pandas** directive string - the former one returns a `pandas.Series` or `StockDataFrame` object while the latter one returns an [`np.ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) ### stock.alias(alias: str, name: str) -> None Defines column alias or directive alias - **alias** `str` the alias name - **name** `str` the name of an existing column or the directive string ```py # Some plot library such as `mplfinance` requires a column named capitalized `Open`, # but it is ok, we could create an alias. stock.alias('Open', 'open') stock.alias('buy_point', 'kdj.j < 0') ``` ### stock.get_column(key: str) -> pd.Series Directly gets the column value by `key`, returns a pandas `Series`. If the given `key` is an alias name, it will return the value of corresponding original column. If the column is not found, a `KeyError` will be raised. ```py stock = StockDataFrame({ 'open' : ..., 'high' : ..., 'low' : ..., 'close': [5, 6, 7, 8, 9] }) stock.get_column('close') # 0 5 # 1 6 # 2 7 # 3 8 # 4 9 # Name: close, dtype: float64 ``` ```py try: stock.get_column('Close') except KeyError as e: print(e) # KeyError: column "Close" not found stock.alias('Close', 'close') stock.get_column('Close') # The same as `stock.get_column('close')` ``` ### stock.append(other, *args, **kwargs) -> StockDataFrame Appends rows of `other` to the end of caller, returning a new object. This method has nearly the same hehavior of [`pandas.DataFrame.append()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html), but instead it returns an instance of `StockDataFrame`, and it applies `date_col` to the newly-appended row(s) if possible. ### stock.directive_stringify(directive: str) -> str > Since 0.26.0 Gets the full name of the `directive` which is also the actual column name of the data frame ```py stock.directive_stringify('kdj.j') # "kdj.j:9,3,3,50.0" ``` And also ```py from stock_pandas import directive_stringify('kdj.j') # "kdj.j:9,3,3,50.0" ``` Actually, `directive_stringify` does not rely on StockDataFrame instances. ### stock.rolling_calc(size, on, apply, forward, fill) -> np.ndarray > Since 0.27.0 Applies a 1-D function along the given column or directive `on` - **size** `int` the size of the rolling window - **on** `str | Directive` along which the function should be applied - **apply** `Callable[[np.ndarray], Any]` the 1-D function to apply - **forward?** `bool = False` whether we should look backward (default value) to get each rolling window or not - **fill?** `Any = np.nan` the value used to fill where there are not enough items to form a rolling window ```py stock.rolling_calc(5, 'open', max) # Whose return value equals to stock['hhv:5,open'].to_numpy() ``` ### stock.cumulate() -> StockDataFrame Cumulate the current data frame `stock` based on its time frame setting ```py StockDataFrame(one_minute_kline_data_frame, time_frame='5m').cumulate() # And you will get a 5-minute kline data ``` see [Cumulation and DatetimeIndex][cumulation] for details ### stock.cum_append(other: DataFrame) -> StockDataFrame Append `other` to the end of the current data frame `stock` and apply cumulation on them. And the following slice of code is equivalent to the above one: ```py StockDataFrame(time_frame='5m').cum_append(one_minute_kline_data_frame) ``` see [Cumulation and DatetimeIndex][cumulation] for details ### stock.fulfill() -> self > Since 1.2.0 Fulfill all stock indicator columns. By default, adding new rows to a `StockDataFrame` will not update stock indicators of the new row. Stock indicators will only be updated when accessing the stock indicator column or calling `stock.fulfill()` Check the [test cases](https://github.com/kaelzhang/stock-pandas/blob/master/test/test_fulfill.py) for details ### directive_stringify(directive_str) -> str > since 0.30.0 Similar to `stock.directive_stringify()` but could be called without class initialization ```py from stock_pandas import directive_stringify directive_stringify('boll') # boll:21,close ``` ## Cumulation and DatetimeIndex Suppose we have a csv file containing kline data of a stock in 1-minute time frame ```py csv = pd.read_csv(csv_path) print(csv) ``` ``` date open high low close volume 0 2020-01-01 00:00:00 329.4 331.6 327.6 328.8 14202519 1 2020-01-01 00:01:00 330.0 332.0 328.0 331.0 13953191 2 2020-01-01 00:02:00 332.8 332.8 328.4 331.0 10339120 3 2020-01-01 00:03:00 332.0 334.2 330.2 331.0 9904468 4 2020-01-01 00:04:00 329.6 330.2 324.9 324.9 13947162 5 2020-01-01 00:04:00 329.6 330.2 324.8 324.8 13947163 <- There is an update of 2020-01-01 00:04:00 ... 16 2020-01-01 00:16:00 333.2 334.8 331.2 334.0 12428539 17 2020-01-01 00:17:00 333.0 333.6 326.8 333.6 15533405 18 2020-01-01 00:18:00 335.0 335.2 326.2 327.2 16655874 19 2020-01-01 00:19:00 327.0 327.2 322.0 323.0 15086985 ``` > Noted that duplicated records of a same timestamp will not be cumulated. The records except the latest one will be disgarded. ```py stock = StockDataFrame( csv, date_col='date', # Which is equivalent to `time_frame=TimeFrame.M5` time_frame='5m' ) print(stock) ``` ``` open high low close volume 2020-01-01 00:00:00 329.4 331.6 327.6 328.8 14202519 2020-01-01 00:01:00 330.0 332.0 328.0 331.0 13953191 2020-01-01 00:02:00 332.8 332.8 328.4 331.0 10339120 2020-01-01 00:03:00 332.0 334.2 330.2 331.0 9904468 2020-01-01 00:04:00 329.6 330.2 324.9 324.9 13947162 2020-01-01 00:04:00 329.6 330.2 324.8 324.8 13947162 ... 2020-01-01 00:16:00 333.2 334.8 331.2 334.0 12428539 2020-01-01 00:17:00 333.0 333.6 326.8 333.6 15533405 2020-01-01 00:18:00 335.0 335.2 326.2 327.2 16655874 2020-01-01 00:19:00 327.0 327.2 322.0 323.0 15086985 ``` You must have figured it out that the data frame now has [`DatetimeIndex`es][datetimeindex]. But it will not become a 15-minute kline data unless we cumulate it, and only cumulates new frames if you use `stock.cum_append(them)` to cumulate `them`. ```py stock_15m = stock.cumulate() print(stock_15m) ``` Now we get a 15-minute kline ``` open high low close volume 2020-01-01 00:00:00 329.4 334.2 324.8 324.8 62346461.0 2020-01-01 00:05:00 325.0 327.8 316.2 322.0 82176419.0 2020-01-01 00:10:00 323.0 327.8 314.6 327.6 74409815.0 2020-01-01 00:15:00 330.0 335.2 322.0 323.0 82452902.0 ``` For more details and about how to get full control of everything, check the online Google Colab notebook here. ## Syntax of `directive` ```ebnf directive := command | command operator expression operator := '/' | '\' | '><' | '<' | '<=' | '==' | '>=' | '>' expression := float | command command := command_name | command_name : arguments command_name := main_command_name | main_command_name.sub_command_name main_command_name := alphabets sub_command_name := alphabets arguments := argument | argument , arguments argument := empty_string | string | ( directive ) ``` #### `directive` Example Here lists several use cases of column names ```py # The middle band of bollinger bands # which is actually a 20-period (default) moving average stock['boll'] # kdj j less than 0 # This returns a series of bool type stock['kdj.j < 0'] # kdj %K cross up kdj %D stock['kdj.k / kdj.d'] # 5-period simple moving average stock['ma:5'] # 10-period simple moving average on open prices stock['ma:10,open'] # Dataframe of 5-period, 10-period, 30-period ma stock[[ 'ma:5', 'ma:10', 'ma:30' ]] # Which means we use the default values of the first and the second parameters, # and specify the third parameter stock['macd:,,10'] # We must wrap a parameter which is a nested command or directive stock['increase:(ma:20,close),3'] # stock-pandas has a powerful directive parser, # so we could even write directives like this: stock[''' repeat : ( column:close > boll.upper ), 5 '''] ``` ## Built-in Commands of Indicators Document syntax explanation: - **param0** `int` which means `param0` is a required parameter of type `int`. - **param1?** `str='close'` which means parameter `param1` is optional with default value `'close'`. Actually, all parameters of a command are of string type, so the `int` here means an interger-like string. ### `ma`, simple Moving Averages ``` ma:, ``` Gets the `period`-period simple moving average on column named `column`. `SMA` is often confused between simple moving average and smoothed moving average. So `stock-pandas` will use `ma` for simple moving average and `smma` for smoothed moving average. - **period** `int` (required) - **column?** `enum<'open'|'high'|'low'|'close'>='close'` Which column should the calculation based on. Defaults to `'close'` ```py # which is equivalent to `stock['ma:5,close']` stock['ma:5'] stock['ma:10,open'] ``` ### `ema`, Exponential Moving Average ``` ema:, ``` Gets the Exponential Moving Average, also known as the Exponential Weighted Moving Average. The arguments of this command is the same as `ma`. ### `macd`, Moving Average Convergence Divergence ``` macd:, macd.signal:,, macd.histogram:,, ``` - **fast_period?** `int=12` fast period (short period). Defaults to `12`. - **slow_period?** `int=26` slow period (long period). Defaults to `26` - **signal_period?** `int=9` signal period. Defaults to `9` ```py # macd stock['macd'] stock['macd.dif'] # macd signal band, which is a shortcut for stock['macd.signal'] stock['macd.s'] stock['macd.signal'] stock['macd.dea'] # macd histogram band, which is equivalent to stock['macd.h'] stock['macd.histogram'] stock['macd.h'] stock['macd.macd'] ``` ### `boll`, BOLLinger bands ``` boll:, boll.upper:,, boll.lower:,, ``` - **period?** `int=20` - **times?** `float=2.` - **column?** `str='close'` ```py # boll stock['boll'] # bollinger upper band, a shortcut for stock['boll.upper'] stock['boll.u'] stock['boll.upper'] # bollinger lower band, which is equivalent to stock['boll.l'] stock['boll.lower'] stock['boll.l'] ``` ### `rsv`, Raw Stochastic Value ``` rsv: ``` Calculates the raw stochastic value which is often used to calculate KDJ ### `kdj`, a variety of stochastic oscillator The variety of [Stochastic Oscillator](https://en.wikipedia.org/wiki/Stochastic_oscillator) indicator created by [Dr. George Lane](https://en.wikipedia.org/wiki/George_Lane_(technical_analyst)), which follows the formula: ``` RSV = rsv(period_rsv) %K = ema(RSV, period_k) %D = ema(%K, period_d) %J = 3 * %K - 2 * %D ``` And the `ema` here is the exponential weighted moving average with initial value as `init_value`. PAY ATTENTION that the calculation forumla is different from wikipedia, but it is much popular and more widely used by the industry. **Directive Arguments**: ``` kdj.k:,, kdj.d:,,, kdj.j:,,, ``` - **period_rsv?** `int=9` The period for calculating RSV, which is used for K% - **period_k?** `int=3` The period for calculating the EMA of RSV, which is used for K% - **period_d?** `int=3` The period for calculating the EMA of K%, which is used for D% - **init_value?** `float=50.0` The initial value for calculating ema. Trading softwares of different companies usually use different initial values each of which is usually `0.0`, `50.0` or `100.0`. ```py # The %D series of KDJ stock['kdj.d'] # which is equivalent to stock['kdj.d:9,3,3,50.0'] # The KDJ serieses of with parameters 9, 9, and 9 stock[['kdj.k:9,9', 'kdj.d:9,9,9', 'kdj.j:9,9,9']] ``` ### `kdjc`, another variety of stochastic oscillator Unlike `kdj`, `kdjc` uses **close** value instead of high and low value to calculate `rsv`, which makes the indicator more sensitive than `kdj` The arguments of `kdjc` are the same as `kdj` ### `rsi`, Relative Strength Index ``` rsi: ``` Calculates the N-period RSI (Relative Strength Index) - **period** `int` The period to calculate RSI. `period` should be an int which is larger than `1` ### `bbi`, Bull and Bear Index ``` bbi:,,, ``` Calculates indicator BBI (Bull and Bear Index) which is the average of `ma:3`, `ma:6`, `ma:12`, `ma:24` by default - **a?** `int=3` - **b?** `int=6` - **c?** `int=12` - **d?** `int=24` ### `llv`, Lowest of Low Values ``` llv:, ``` Gets the lowest of low prices in N periods - **period** `int` - **column?** `str='low'` Defaults to `'low'`. But you could also get the lowest value of close prices ```py # The 10-period lowest prices stock['llv:10'] # The 10-period lowest close prices stock['llv:10,close'] ``` ### `hhv`, Highest of High Values ``` hhv:, ``` Gets the highest of high prices in N periods. The arguments of `hhv` is the same as `llv` ## Built-in Commands for Statistics ### `column` ``` column: ``` Just gets the series of a column. This command is designed to be used together with an operator to compare with another command or as a parameter of some statistics command. - **name** `str` the name of the column ```py # A bool-type series indicates whether the current price is higher than the upper bollinger band stock['column:close > boll.upper'] ``` ### `increase` ``` increase:,, ``` Gets a `bool`-type series each item of which is `True` if the value of indicator `on` increases in the last `period`-period. - **on** `str` the command name of an indicator on what the calculation should be based - **repeat?** `int=1` - **direction?** `1 | -1` the direction of "increase". `-1` means decreasing For example: ```py # Which means whether the `ma:20,close` line # (a.k.a. 20-period simple moving average on column `'close'`) # has been increasing repeatedly for 3 times (maybe 3 days) stock['increase:(ma:20,close),3'] # If the close price has been decreasing repeatedly for 5 times (maybe 5 days) stock['increase:close,5,-1'] ``` ### `style` ``` style: