Mastering Financial Analysis of Value Investing with Python for Better Investment Decisions in Taiwan Exchange Market — (3)

Asset Allocation via Capital Asset Pricing Model (CAPM) & Efficient Frontier

AgriEnvCoder
11 min readMar 2, 2023

Investing Stock is a necessary task, after you realise that the money you deposit in bank is getting smaller and smaller due to inflation. Nonetheless, did I invest the right stock? How can I make ourselves feel at ease? I believe financial analyses can definitely help with that. Therefore, here we propose a framework of value investing and also build a practical work flow step-by-step. Stay tuned! Let’s dive in!

These analyses aim for effective long-term investment rather than swing trading.

  1. Choose worth investing companies at present
  2. Allocate personal asset into stock market.

This is the 3rd article in the series.

To have an introduction and practical exercise of DuPont analysis for single stock please visit here. (1st article)

To learn about how to conduct an objective Stock Selection Strategy please visit here. (2nd article)

Allocating Asset

From the 2nd article, we chose 5 stocks from ETF 0050 list which have consistently positive performance throughout years at the top 15. There are ‘9910’, ’3008', ’2330', ’2395', ’5871'.

As determining in several stocks in the market, the problem comes to: how should I allocate my limit asset in the exchange market.

“High Risk High Return”.

It is a rule of thumb we all understand. Some stocks can bring you big fortune, but accompany with low predictability, vice versa. “Spread risk through asset allocation,” namely “Don’t put all your eggs in one basket.” In exchange market, there are some models to help us analyse the risk-return profile and find the optimal asset allocation no matter how much money you have on hand.

CAPM:Capital Asset Pricing Model

Theory

CAPM, abbreviated from Capital Asset Pricing Model, is a mathematical model that estimates the risk-return profile related to the market and expected return by analyzing long-term historical data.

We introduce this model here so as to make an overview of our candidate stocks from previous analyses. At first, let’s decode the model from its formula:Daily, monthly or yearly return rate here is 1-dimensional time-series data based on historical stock data. Every value is a return from individual stock. Expected return rate of investment is the individual stock, for instance, TSMC:2330, AUO: 2409, LARGAN:3008. The return of a time period is calculated as

(ER_i — R_f) = \beta_i(ER_m-R_f)

  • ER_i: Expected return rate of investment

Daily, monthly or yearly return rate here is 1-dimensional time-series data based on historical stock data. Every value is a return from individual stock. Expected return rate of investment is the individual stock, for instance, TSMC:2330, AUO:2409, LARGAN:3008. The return of a time period is calculated by

R = \frac{P_t — P_{t-1}}{P_{t-1}}

R: Return

P_t: Stock price at time t

P_{t-1}: Stock price at time

  • Rf: risk-free rate

Risk Free Rate is the rate of return of an investment, which any individual investor can gain the return with zero risk. In Taiwan, there is no universal data that refers to it. Since most tutorial online uses United States 10-Year Government Bond and Taiwan exchange market adheres to U.S. exchange market, we would use it as our Rf here. Also it is easier to track and update.

  • \beta_i: Risk coefficient

Risk coefficient is the simplified relation between return rate of investment and market while eliminating the risk-free rate. It indicates the sensitivity and direction when encountering bull or bear market.

  • ER_m: return rate of market

Return rate of market is the trend of a gross market, which is a 1-dimensional time-series data. Normally, it can be the easiest solution in exchange market by targeting the most prestigious exchange trades, such as S&P500 in the U.S.. In Taiwan, we use ETF 0050 as our market return, representing the average return from Taiwan exchange market.

Practice

Using the API from FinMind, we got real-time daily price of our interested stock, ETF 0050 and also United States 10-Year Government Bond. As obtaining all information we need for CAPM, a model is built for each individual stock. Every stock has its relation to the gross market. Here we demonstrate 3 out of 5 stocks.

However, before we dive into building CAPM. There is one crucial factor needed to be considered: dividend.

Dividend

We should bear in mind that in Taiwan market (at least I know) every stock has dividend distribution. At the ex-dividend date, the stock price would drop according the amount of dividend. It is irrational to count the price drop at ex-dividend date because the difference of price is the dividend you get, not just the loss. Hence, in CAPM we should take it into account. The appropriate way is eliminating the return data from ex-dividend date.

Here are some functions (load_Rf(), load_ERm(), load_ERi) to extract Rf, ER_m, ER_i respectively and exclude data from ex-dividend date but still we keep the dividend for future usage.

Let’s demo the script followed by these steps:

  1. load Rf, ER_m, ER_i
  2. intersect the date among Rf, ER_m, ER_i
  3. calculate the ‘daily return’ column to Rf, ER_m, ER_i
from function.func import load_ERm,load_ERi,load_Rf
import pandas as pd

etf0050price, div_dict = load_ERm()
us_bond = load_Rf()
stockprice, div_dict_can = load_ERi(candidate)

# Overlapped date
day_span = sorted(set(stockprice['date'].values) & set(etf0050price['date'].values) & set(us_bond['date'].values))
df_day_span = pd.DataFrame(day_span,columns=['specific date'])
stockprice = pd.merge(stockprice,df_day_span,left_on='date',right_on='specific date')
etf0050price = pd.merge(etf0050price,df_day_span,left_on='date',right_on='specific date')
us_bond = pd.merge(us_bond,df_day_span,left_on='date',right_on='specific date')
Risk coefficient
# Calculate the Daily Return
candidate = stockprice.stock_id[0]

stockprice[f'daily return {candidate}'] = stockprice['close'].pct_change(periods=1)
etf0050price['daily return'] = etf0050price['close'].pct_change(periods=1)

price0050 = etf0050price['close'].iloc[1:]
pricestock = stockprice['close'].iloc[1:]

ERi = stockprice[f'daily return {candidate}'].iloc[1:]
ERm = etf0050price['daily return'].iloc[1:]
Rf = us_bond['daily return'][1:]/100 # % to real value

We can also makes some plots to show the relation between target stock and ETF0050 (Fig. 1–4).

Fig. 1 Line graphs of price fluctuation. Black line: ETF0050; Orange line:2330
Fig. 2 Scatter plot of daily return (ETF0050 vs. 2330). Red line: regression line
Fig. 3 Line graphs of price fluctuation. Black line: ETF0050; Orange line:2395.
Fig. 4 Scatter plot of daily return (ETF0050 vs. 2330). Red line: regression line

We can see 2330 has the nearly identical trend to ETF0050 in the line graph, and also high correlation according to the scatter plot. It is because ETF0050 consitutes a big proportion of 2330 (~40%).

Revenons à nos moutons! As Rf, ER_m, ER_i being ready, we can fit in our CAPM via scipy.stats.lingress

from scipy import stats
# Calculate the beta and alpha for CAPM
beta,alpha,r_value,p_value,std_err = stats.linregress((ERm-Rf),(ERi-Rf))Then, we get a clean dataset prepared for CAPM, and it is good to go

Subsequently, it’s good to compare some characteristics from CAPM’s result with each other. To do so, we create CAPM.ipynb to illustrate our CAPM’s process and result. Please check it out!

Tab. 1 Result of CAPM in 2330, 2395, 9910.

So far, Risk coefficient (beta) is calculated and can imply the sensitivity of individual stock towards ER_m. 2330 compared with 2395 and 9910 are more sensitive to ER_m, which means its risk is higher. Furthermore, correlation coefficient (r) tells us 2330 is close to ER_m. It makes much sense because ETF 0050 has a big share of 2330.

It also is worthy looking at alpha. According to the document, it is basically the intercept of linear regression. It shows to what extent the ER_i in general preformed better/worse than the ER_m.

To sum up, CAPM gave us some characteristics of our target stocks. We have known better which stock is safer, which stock sticks with the market return and which stock is the most risky. However, it’s time to design a model that theoretically have the lowest risk and the highest return.

Efficient Frontier

Efficient frontier is the set of optimal portfolios in this scatter plot, which is the visualization of modern portfolio theory (MPT), or mean-variance analysis. A theory which Harry Markowitz introduced in 1952, and was award a Nobel Memorial Prize in Economic Sciences Prize in 1990.

Question now here is: how should we do? Let’s find out with me step by step

Until now, CAPM generates the expected return of every interested stock. Furthermore, through the 1-dimensional return data, we can also calculate the variation of expected return.

Expected Return?

Daily return is irrational for long-term investment due to the fluctuation of daily market. The more applicable way is to look at annual return, so the expected return of investing this stock can be unveiled. There are approximately 250 trading day throughout a year.

from function.func import load_ERi
import pandas as pd
from datetime import datetime, timedelta


# list the candidate stock
candidates = ['9910','3008','2330','2395','5871']
# create a day span dataframe template
end_date = datetime.today().date()
start_date = end_date - timedelta(days=365*7)
dnt_range = pd.date_range(start_date,end_date,freq='D')
df_day_span = pd.DataFrame(dnt_range,columns=['date'])
df_all = pd.DataFrame(dnt_range,columns=['date'])

# concatenate and align stocks into a dataframe
stock_dict = {}
div_dict = {}
for candidate in candidates:
stockprice, div_dict_can = load_ERi(candidate)
stockprice = pd.merge(stockprice,df_day_span,left_on='date',right_on='date')
df_day_span = stockprice[['date']]
div_dict.update(div_dict_can)
stockprice[f'return annualised {candidate}'] = stockprice['close'].pct_change(periods=1).values * 250
df_all = pd.merge(df_all,stockprice[['date',f'return annualised {candidate}']],left_on='date',right_on='date')

Annual Return with dividend

Furthermore, in annual return, we can also add the dividend that we get when we excluded ex-dividend date but keep the value.

# add dividend to our annualised return according to the yearly divident record. 
for candidate, div in div_dict.items():
for year, div_val in div.items():
new_val = df_all[df_all['date'].apply(lambda x: x.strftime('%Y')) == year][f'return annualised {candidate}'] + div_val/100
if len(new_val) != 0:
df_all.loc[df_all['date'].apply(lambda x: x.strftime('%Y')) == year,f'return annualised {candidate}'] = new_val

So far we have the historical annual retur. Next step is to calculate variance. We just made a covariance matrix and extract diagonal value which represents variance. Then, we are able to visualise a 2-D scatter plot to demonstrate the risk and return by pointing expected return/mean (x-axis) and volatility/variance (y-axis).

cov_matrix = df_all.iloc[1:,:].cov()
mean = df_all.iloc[1:,1:].mean().values

var = np.diag(cov_matrix)
sd = np.sqrt(var) # Daily standard deviation
variation = sd # Annual standard deviation = volatility

df_plot = pd.DataFrame({'Stock ID': candidates,
'Return': mean,
'Volatility':variation})

plt.style.use("seaborn-whitegrid")

colors = ["r","b","g",'orange','purple']
sns.set_palette(sns.color_palette(colors))

sns.scatterplot(df_plot,x='Volatility', y='Return',hue='Stock ID')
#plt.savefig(f'img/scatter_ef_ori.png',dpi=300)
plt.show()
Fig. 5 Scatter plot/Mean-variance analysis of Return (mean) and Volatility (variance), with five stocks 9910, 3008, 2330, 2395, and 5871

So far, we know how different stocks perform simply by its return and risk. For instance, Fig. 5 shows that 2330 has the lowest risk and the 2nd highest return, and 5871 has the highest return but the 3rd at risk.

But is it the result? We just choose 2330 because it seems to perform the greatest here? Of course no!

Magic step!

If we assign a weight to each stock, a new point on the scatter plot would appear. Then anew combination of stock investment will have an unique mean and standard error. Let’s try [0.2, 0.2, 0.2, 0.2, 0,2] to [9910, 3008, 2330, 2395, 5871] respectively (Fig. 6).

weight = [0.2,0.2,0.2,0.2,0.2]
var = cov_matrix.mul(weight, axis=0).mul(weight, axis=1).sum().sum()
combin_variation = np.sqrt(var)
combin_mean = np.dot(weight, mean)

df_plot_combination = pd.DataFrame({'Stock ID': 'combination',
'Return': [combin_mean],
'Volatility':[combin_variation]})

df_plot2 = pd.concat([df_plot,df_plot_combination],axis=0)

plt.style.use("seaborn-whitegrid")
colors = ["r","b","g",'orange','purple','grey']
sns.set_palette(sns.color_palette(colors))

sns.scatterplot(df_plot2,x='Volatility', y='Return',hue='Stock ID')
#plt.savefig(f'img/scatter_ef_3basic_1con.png',dpi=300)
plt.show()
Fig. 6 Scatter plot/Mean-variance analysis of Return (mean) and Volatility (variance), with five stocks 9910, 3008, 2330, 2395, 5871 and a combination of stocks with equal weight at 0.2 each

The combination point however has lowest volatility and the 3rd hghest return, which is not bad.

Subsequently, as random assigning the weight to our interested stock, there are countless combination that create hundreds of thousands of points on the scatter plot. Every point is regarded as an unique allocation of of asset in stocks. Each weight is the percentage of your asset in each stock. In other word, if we follow a point on the scatter plot, and use the weight that corresponds to the point, we can expect the return and risk according to our asset allocation.

How to get more combination? More Magic!!!

Simply we just need to increase the number of random weight (Fig. 7)

Fig. 7 Combination at 1, 100, 10000 in scatter plot/mean-variance analysis of Return (mean) and Volatility (variance)

The points to the left and top create a curve, so-called Efficient Frontier. Maximum return and minimum risk are easy to find, simply pick the top point (maximum return) or the left point (minimum risk) from our result (Fig. 7). Nonetheless, the most popular is the balance point on efficient frontier. To find it, we should introduce an indicator: Sharpe Ratio

Sharpe Ratio

Sharpe ratio introduces risk-free return in this analysis. Basically, we consider risk-free return point has zero risk, where on the x-axis of the plot. Then starting from x-axis, a increase of unit of risk, the return is also going to increase. The average return earned by increasing per unit of risk is Sharpe Ratio, which is defined as:

SR = \frac{R_p — R_f}{\sigma_p — 0}

SR: Sharpe Ratio

R_p: return of portfolio

R_f: risk-free rate

\sigma_p: standard error of the portfolio

Sharpe ratio is the slope of a line. Steeper a line is, higher Sharpe ratio the point of portfolio contains. Let’s first include our zero-risk investment Rf in our plot (Fig. 8).

Rf = load_Rf()
Rf_mean = (Rf['value'].values/100).mean()
Rf_variations = 0
df_plot_Rf = pd.DataFrame({'Stock ID': 'Rf',
'Return': [Rf_mean],
'Volatility':[Rf_variations]})

df_plot3 = pd.concat([df_plot2,df_plot_Rf],axis=0)

plt.style.use("seaborn-whitegrid")
colors = ["r","b","g",'orange','purple','grey','black']
sns.set_palette(sns.color_palette(colors))
sns.scatterplot(df_plot3,x='Volatility', y='Return',hue='Stock ID')
#plt.savefig(f'img/scatter_ef_points_{num_portfolios}_Rf.png',dpi=300)
plt.show()
Fig. 8 Mean-variance analysis of Return (mean) and Volatility (variance) with 10,000 combination and R_f

The maximal Sharpe Ratio is the point that can make the line from R_f to it the steepest. Let’s try to find it and plot it here (Fig. 9).

# Calculate Sharpe Ratio 
SR_all_points = (df_plot2.Return.values - Rf_mean)/df_plot2.Volatility.values
SR_max = np.max(SR_all_points)
x = df_plot2.Return.values[np.argmax(SR_all_points)]
y = df_plot2.Volatility.values[np.argmax(SR_all_points)]
SR_max_point = df_plot2.loc[(df_plot2['Return']==x)&(df_plot2['Volatility']==y)].replace('combination','max sharpe ratio')
SR_max_point.replace('combination','max sharpe ratio')
Tab. 2 Return and volatility of the max sharpe ratio point
Fig. 9 Capital Market Line passes through R_f and Max Sharpe Ratio point

From Fig. 9 we can see, with the maximal Sharpe Ratio, the line starts from R_f on x-axis will be exact the tangent line of efficient frontier, giving the best balance of risk and return. Moving upward or downward on the frontier, the risk-return combination cannot defeat the line, which we call it Capital Market Line (CML).

Finally, we have selected a point portfolios. Then, we just need to look into this point. Basically, the difference among the points is the weights of each stock. In the other word, we have the weights of each stock that turns out to be the Max Sharpe Ratio point. That is exactly what the result we want to get.

Leveraging the weights, we can allocate our asset to each stock at the exact given way. The expected best result is now coming to light.

# query the best weight point
z = zip(weights,combin_means,combin_variations)
for w, m, v in list(z) :
if m == SR_max_point.Return.values[0] and v == SR_max_point.Volatility.values[0]:
print(f'Best weight allocation: {w}')
break

pd.DataFrame({'Stock ID':candidates, 'Asset Allocation':w})
Tab. 3 Asset allocation expectedly performance the best. It gives us expected 25.7% expected annual return.

This is the third (3rd) section of our “Mastering Financial Analysis of Value Investing with Python for Better Investment Decisions in Taiwan Exchange Market”. This is also the last article in this series. I hope you have a great journey in the exchange stock market! Bon Voyage et Au Revoir!

Last but not least, we provide a tutorial in Python on GitHub. You can click on this link and play by yourself.

--

--