# Mastering Financial Analysis of Value Investing with Python for Better Investment Decisions in Taiwan Exchange Market — (3)

## Asset Allocation via C**apital Asset Pricing Model** (**CAPM**) & Efficient Frontier

Investing Stock is a necessary task, after you realise that the money you deposit in bank is getting smaller and smaller due to inflation. Nonetheless, did I invest the right stock? How can I make ourselves feel at ease? I believe financial analyses can definitely help with that. Therefore, here we propose a framework of value investing and also build a practical work flow step-by-step. Stay tuned! Let’s dive in!

These analyses aim for effective **long-term** investment rather than swing trading.

- Choose worth investing companies at present
- Allocate personal asset into stock market.

This is the **3rd article** in the series.

To have an** introduction and practical exercise of DuPont analysis **for single stock please visit here. (1st article)

To learn about how to **conduct an objective Stock Selection Strategy** please visit here. (2nd article)

# Allocating Asset

From the 2nd article, we chose 5 stocks from ETF 0050 list which have consistently positive performance throughout years at the top 15. There are ‘9910’, ’3008', ’2330', ’2395', ’5871'.

As determining in several stocks in the market, the problem comes to: how should I allocate my limit asset in the exchange market.

“High Risk High Return”.

It is a rule of thumb we all understand. Some stocks can bring you big fortune, but accompany with low predictability, vice versa. **“Spread risk through asset allocation,”** namely** “Don’t put all your eggs in one basket.” **In exchange market, there are some models to help us analyse the risk-return profile and find the optimal asset allocation no matter how much money you have on hand.

# CAPM:Capital Asset Pricing Model

## Theory

CAPM, abbreviated from Capital Asset Pricing Model, is a mathematical model that estimates the risk-return profile related to the market and expected return by analyzing long-term historical data.

We introduce this model here so as to make an overview of our candidate stocks from previous analyses. At first, let’s decode the model from its formula:Daily, monthly or yearly return rate here is 1-dimensional time-series data based on historical stock data. Every value is a return from individual stock. Expected return rate of investment is the individual stock, for instance, TSMC:2330, AUO: 2409, LARGAN:3008. The return of a time period is calculated as

*(ER_i — R_f) = \beta_i(ER_m-R_f)*

*ER_i: Expected return rate of investment*

Daily, monthly or yearly return rate here is 1-dimensional time-series data based on historical stock data. Every value is a return from individual stock. Expected return rate of investment is the individual stock, for instance, TSMC:2330, AUO:2409, LARGAN:3008. The return of a time period is calculated by

*R = \frac{P_t — P_{t-1}}{P_{t-1}}*

*R: Return*

*P_t: Stock price at time t*

*P_{t-1}: Stock price at time*

*Rf: risk-free rate*

*Risk Free Rate* is the rate of return of an investment, which any individual investor can gain the return with zero risk. In Taiwan, there is no universal data that refers to it. Since most tutorial online uses United States 10-Year Government Bond and Taiwan exchange market adheres to U.S. exchange market, we would use it as our *Rf* here. Also it is easier to track and update.

*\beta_i: Risk coefficient*

*Risk coefficient* is the simplified relation between return rate of investment and market while eliminating the risk-free rate. It indicates the sensitivity and direction when encountering bull or bear market.

*ER_m: return rate of market*

*Return rate of market* is the trend of a gross market, which is a 1-dimensional time-series data. Normally, it can be the easiest solution in exchange market by targeting the most prestigious exchange trades, such as S&P500 in the U.S.. In Taiwan, we use ETF 0050 as our market return, representing the average return from Taiwan exchange market.

## Practice

Using the API from FinMind, we got real-time daily price of our interested stock, ETF 0050 and also United States 10-Year Government Bond. As obtaining all information we need for CAPM, a model is built for each individual stock. Every stock has its relation to the gross market. Here we demonstrate 3 out of 5 stocks.

However, before we dive into building CAPM. There is one crucial factor needed to be considered: **dividend.**

**Dividend**

We should bear in mind that in Taiwan market (at least I know) every stock has dividend distribution. At the ex-dividend date, the stock price would drop according the amount of dividend. It is irrational to count the price drop at ex-dividend date because the difference of price is the dividend you get, not just the loss. Hence, in CAPM we should take it into account. The appropriate way is eliminating the return data from ex-dividend date.

Here are some functions (load_Rf(), load_ERm(), load_ERi) to extract *Rf, ER_m*, *ER_i *respectively* *and exclude data from ex-dividend date but still we keep the dividend for future usage.

Let’s demo the script followed by these steps:

- load
*Rf, ER_m, ER_i* - intersect the date among
*Rf, ER_m, ER_i* - calculate the ‘daily return’ column to
*Rf, ER_m, ER_i*

`from function.func import load_ERm,load_ERi,load_Rf`

import pandas as pd

etf0050price, div_dict = load_ERm()

us_bond = load_Rf()

stockprice, div_dict_can = load_ERi(candidate)

# Overlapped date

day_span = sorted(set(stockprice['date'].values) & set(etf0050price['date'].values) & set(us_bond['date'].values))

df_day_span = pd.DataFrame(day_span,columns=['specific date'])

stockprice = pd.merge(stockprice,df_day_span,left_on='date',right_on='specific date')

etf0050price = pd.merge(etf0050price,df_day_span,left_on='date',right_on='specific date')

us_bond = pd.merge(us_bond,df_day_span,left_on='date',right_on='specific date')

Risk coefficient

# Calculate the Daily Return

candidate = stockprice.stock_id[0]

stockprice[f'daily return {candidate}'] = stockprice['close'].pct_change(periods=1)

etf0050price['daily return'] = etf0050price['close'].pct_change(periods=1)

price0050 = etf0050price['close'].iloc[1:]

pricestock = stockprice['close'].iloc[1:]

ERi = stockprice[f'daily return {candidate}'].iloc[1:]

ERm = etf0050price['daily return'].iloc[1:]

Rf = us_bond['daily return'][1:]/100 # % to real value

We can also makes some plots to show the relation between target stock and ETF0050 (Fig. 1–4).

We can see 2330 has the nearly identical trend to ETF0050 in the line graph, and also high correlation according to the scatter plot. It is because ETF0050 consitutes a big proportion of 2330 (~40%).

Revenons à nos moutons! As *Rf, ER_m, ER_i *being ready, we can fit in our CAPM via scipy.stats.lingress

`from scipy import stats`

# Calculate the beta and alpha for CAPM

beta,alpha,r_value,p_value,std_err = stats.linregress((ERm-Rf),(ERi-Rf))Then, we get a clean dataset prepared for CAPM, and it is good to go

Subsequently, it’s good to compare some characteristics from CAPM’s result with each other. To do so, we create CAPM.ipynb to illustrate our CAPM’s process and result. Please check it out!

So far,* **Risk coefficient (beta)** *is calculated and can imply the sensitivity of individual stock towards *ER_m*. 2330 compared with 2395 and 9910 are more sensitive to *ER_m*, which means its risk is higher. Furthermore, **correlation coefficient (r)** tells us 2330 is close to *ER_m*. It makes much sense because ETF 0050 has a big share of 2330.

It also is worthy looking at **alpha**. According to the document, it is basically the intercept of linear regression. It shows to what extent the *ER_i* in general preformed better/worse than the *ER_m*.

To sum up, CAPM gave us some characteristics of our target stocks. We have known better which stock is safer, which stock sticks with the market return and which stock is the most risky. However, it’s time to design a model that theoretically have the lowest risk and the highest return.

# Efficient Frontier

Efficient frontier is the set of optimal portfolios in this scatter plot, which is the visualization of modern portfolio theory (MPT), or mean-variance analysis. A theory which Harry Markowitz introduced in 1952, and was award a Nobel Memorial Prize in Economic Sciences Prize in 1990.

Question now here is: how should we do? Let’s find out with me step by step

Until now, CAPM generates the expected return of every interested stock. Furthermore, through the 1-dimensional return data, we can also **calculate the variation of expected return.**

## Expected Return?

Daily return is irrational for long-term investment due to the fluctuation of daily market. The more applicable way is to look at annual return, so the expected return of investing this stock can be unveiled. There are approximately 250 trading day throughout a year.

`from function.func import load_ERi`

import pandas as pd

from datetime import datetime, timedelta

# list the candidate stock

candidates = ['9910','3008','2330','2395','5871']

# create a day span dataframe template

end_date = datetime.today().date()

start_date = end_date - timedelta(days=365*7)

dnt_range = pd.date_range(start_date,end_date,freq='D')

df_day_span = pd.DataFrame(dnt_range,columns=['date'])

df_all = pd.DataFrame(dnt_range,columns=['date'])

# concatenate and align stocks into a dataframe

stock_dict = {}

div_dict = {}

for candidate in candidates:

stockprice, div_dict_can = load_ERi(candidate)

stockprice = pd.merge(stockprice,df_day_span,left_on='date',right_on='date')

df_day_span = stockprice[['date']]

div_dict.update(div_dict_can)

stockprice[f'return annualised {candidate}'] = stockprice['close'].pct_change(periods=1).values * 250

df_all = pd.merge(df_all,stockprice[['date',f'return annualised {candidate}']],left_on='date',right_on='date')

## Annual Return with dividend

Furthermore, in annual return, we can also add the **dividend** that we get when we excluded ex-dividend date but keep the value.

`# add dividend to our annualised return according to the yearly divident record. `

for candidate, div in div_dict.items():

for year, div_val in div.items():

new_val = df_all[df_all['date'].apply(lambda x: x.strftime('%Y')) == year][f'return annualised {candidate}'] + div_val/100

if len(new_val) != 0:

df_all.loc[df_all['date'].apply(lambda x: x.strftime('%Y')) == year,f'return annualised {candidate}'] = new_val

So far we have the historical annual retur. Next step is to calculate variance. We just made a covariance matrix and extract diagonal value which represents variance. Then, we are able to visualise a 2-D scatter plot to demonstrate the risk and return by pointing expected return/mean (x-axis) and volatility/variance (y-axis).

`cov_matrix = df_all.iloc[1:,:].cov()`

mean = df_all.iloc[1:,1:].mean().values

var = np.diag(cov_matrix)

sd = np.sqrt(var) # Daily standard deviation

variation = sd # Annual standard deviation = volatility

df_plot = pd.DataFrame({'Stock ID': candidates,

'Return': mean,

'Volatility':variation})

plt.style.use("seaborn-whitegrid")

colors = ["r","b","g",'orange','purple']

sns.set_palette(sns.color_palette(colors))

sns.scatterplot(df_plot,x='Volatility', y='Return',hue='Stock ID')

#plt.savefig(f'img/scatter_ef_ori.png',dpi=300)

plt.show()

So far, we know how different stocks perform simply by its return and risk. For instance, Fig. 5 shows that 2330 has the lowest risk and the 2nd highest return, and 5871 has the highest return but the 3rd at risk.

But is it the result? We just choose 2330 because it seems to perform the greatest here? Of course no!

**Magic step!**

If we assign a weight to each stock, a new point on the scatter plot would appear. Then anew combination of stock investment will have an unique mean and standard error. Let’s try [0.2, 0.2, 0.2, 0.2, 0,2] to [9910, 3008, 2330, 2395, 5871] respectively (Fig. 6).

`weight = [0.2,0.2,0.2,0.2,0.2]`

var = cov_matrix.mul(weight, axis=0).mul(weight, axis=1).sum().sum()

combin_variation = np.sqrt(var)

combin_mean = np.dot(weight, mean)

df_plot_combination = pd.DataFrame({'Stock ID': 'combination',

'Return': [combin_mean],

'Volatility':[combin_variation]})

df_plot2 = pd.concat([df_plot,df_plot_combination],axis=0)

plt.style.use("seaborn-whitegrid")

colors = ["r","b","g",'orange','purple','grey']

sns.set_palette(sns.color_palette(colors))

sns.scatterplot(df_plot2,x='Volatility', y='Return',hue='Stock ID')

#plt.savefig(f'img/scatter_ef_3basic_1con.png',dpi=300)

plt.show()

The combination point however has lowest volatility and the 3rd hghest return, which is not bad.

Subsequently, as random assigning the weight to our interested stock, there are countless combination that create hundreds of thousands of points on the scatter plot. Every point is regarded as an unique allocation of of asset in stocks. Each weight is the percentage of your asset in each stock. In other word, if we follow a point on the scatter plot, and use the weight that corresponds to the point, we can expect the return and risk according to our asset allocation.

How to get more combination? **More Magic!!!**

Simply we just need to increase the number of random weight (Fig. 7)

The points to the left and top create a curve, so-called **Efficient Frontier**. **Maximum return** and **minimum risk** are easy to find, simply pick the top point (maximum return) or the left point (minimum risk) from our result (Fig. 7). Nonetheless, the most popular is the balance point on efficient frontier. To find it, we should introduce an indicator: **Sharpe Ratio**

## Sharpe Ratio

Sharpe ratio introduces risk-free return in this analysis. Basically, we consider risk-free return point has zero risk, where on the x-axis of the plot. Then starting from x-axis, a increase of unit of risk, the return is also going to increase. The average return earned by increasing per unit of risk is Sharpe Ratio, which is defined as:

*SR = \frac{R_p — R_f}{\sigma_p — 0}*

*SR: Sharpe Ratio*

*R_p: return of portfolio*

*R_f: risk-free rate*

*\sigma_p: standard error of the portfolio*

Sharpe ratio is the slope of a line. Steeper a line is, higher Sharpe ratio the point of portfolio contains. Let’s first include our zero-risk investment *Rf* in our plot (Fig. 8).

`Rf = load_Rf()`

Rf_mean = (Rf['value'].values/100).mean()

Rf_variations = 0

df_plot_Rf = pd.DataFrame({'Stock ID': 'Rf',

'Return': [Rf_mean],

'Volatility':[Rf_variations]})

df_plot3 = pd.concat([df_plot2,df_plot_Rf],axis=0)

plt.style.use("seaborn-whitegrid")

colors = ["r","b","g",'orange','purple','grey','black']

sns.set_palette(sns.color_palette(colors))

sns.scatterplot(df_plot3,x='Volatility', y='Return',hue='Stock ID')

#plt.savefig(f'img/scatter_ef_points_{num_portfolios}_Rf.png',dpi=300)

plt.show()

The maximal Sharpe Ratio is the point that can make the line from *R_f *to it the steepest. Let’s try to find it and plot it here (Fig. 9).

`# Calculate Sharpe Ratio `

SR_all_points = (df_plot2.Return.values - Rf_mean)/df_plot2.Volatility.values

SR_max = np.max(SR_all_points)

x = df_plot2.Return.values[np.argmax(SR_all_points)]

y = df_plot2.Volatility.values[np.argmax(SR_all_points)]

SR_max_point = df_plot2.loc[(df_plot2['Return']==x)&(df_plot2['Volatility']==y)].replace('combination','max sharpe ratio')

SR_max_point.replace('combination','max sharpe ratio')

From Fig. 9 we can see, with the maximal Sharpe Ratio, the line starts from *R_f* on x-axis will be exact the tangent line of efficient frontier, giving the best balance of risk and return. Moving upward or downward on the frontier, the risk-return combination cannot defeat the line, which we call it Capital Market Line (CML).

Finally, we have selected a point portfolios. Then, we just need to look into this point. Basically, the difference among the points is the weights of each stock. In the other word, we have the weights of each stock that turns out to be the Max Sharpe Ratio point. That is exactly what the result we want to get.

Leveraging the weights, we can allocate our asset to each stock at the exact given way. The expected best result is now coming to light.

`# query the best weight point`

z = zip(weights,combin_means,combin_variations)

for w, m, v in list(z) :

if m == SR_max_point.Return.values[0] and v == SR_max_point.Volatility.values[0]:

print(f'Best weight allocation: {w}')

break

pd.DataFrame({'Stock ID':candidates, 'Asset Allocation':w})

This is the third (3rd) section of our “Mastering Financial Analysis of Value Investing with Python for Better Investment Decisions in Taiwan Exchange Market”. This is also the last article in this series. I hope you have a great journey in the exchange stock market! Bon Voyage et Au Revoir!

*Last but not least, we provide a tutorial in Python on GitHub. You can click on this **link** and play by yourself.*