This article will use Python To visualize stock data , Like drawing K Line chart , And explore the meaning and relationship of each index , Finally, using the moving average method to explore the investment strategy .

Data import

Here stock data is stored in stockData.txt In text file , We use pandas.read_table() Function to read file data into DataFrame format .

Parameters usecols=range(15) Limit read only before 15 Column data ,parse_dates=[0] Represents parsing the first column of data into a time format ,index_col=0 The first column of data is specified as the index .

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt


%matplotlib inline


%config InlineBackend.figure_format = 'retina'


%pylab inline


pylab.rcParams['figure.figsize'] = (10, 6) # Set drawing size


# Read data

stock = pd.read_table('stockData.txt', usecols=range(15), parse_dates=[0], 

stock = stock[::-1]  # Reverse order


The above shows the front 5 Row data , To get more information about the data , have access to .info() method . It tells us that there are 20 That's ok , Index is time format , Date from 2015 year 1 month 5 Day to day 2015 year 1 month 30 day . In all 14 column , The name and data format of each column are listed , And there are no missing values .

<class 'pandas.core.frame.DataFrame'>

DatetimeIndex: 20 entries, 2015-01-05 to 2015-01-30

Data columns (total 14 columns):

    open        20 non-null float64

high            20 non-null float64

close           20 non-null float64

low             20 non-null float64

volume          20 non-null float64

price_change    20 non-null float64

p_change        20 non-null float64

ma5             20 non-null float64

ma10            20 non-null float64

ma20            20 non-null float64

v_ma5           20 non-null float64

v_ma10          20 non-null float64

v_ma20          20 non-null float64

turnover        20 non-null float64

dtypes: float64(14)

memory usage: 2.3 KB

When looking at the names of each column , We found that ’open’ Does not appear to precede other column names , For a clearer view , use .columns Get all the column names of the data as follows :


Index(['    open', 'high', 'close', 'low', 'volume', 'price_change',

       'p_change', 'ma5', 'ma10', 'ma20', 'v_ma5', 'v_ma10', 'v_ma20',



So I found out ’open’ Extra space before column name , We use the following method to correct the column name .

stock.rename(columns={'    open':'open'}, inplace=True)

thus , We have completed the import and cleaning of stock data , The next step is to use a visual approach to look at the data .

Data observation

first , The column name of our observation data , Its meaning corresponds to the following :

These indicators can be divided into two categories :

Price related indicators

Price of the day : the opening quotation , Closing price , highest , minimum price

Price change : Price change and fluctuation

average price :5,10,20 Average daily price

Volume related indicators


turnover rate : volume / Total shares issued ×100%

Average volume :5,10,20 Daily average

Because these indicators change with time , So let's look at their time series .

Time series diagram

Time as abscissa , The daily closing price is the vertical coordinate , Make a line chart , We can observe the fluctuation of stock price over time . Use directly here DataFrame Drawing tool with data format , It has the advantage of being able to make drawings quickly , And automatically optimize the form of graphic output .


If we open daily , Closing price and maximum , The lowest prices are drawn together in the form of polylines , It's a mess , It's not easy to analyze . So what's a good way to show these four indicators in a picture ? Here's the answer .

K Line chart

According to legend K The line chart originated in the shogunate era of Tokugawa, Japan , Businessmen at that time used this map to record the market and price fluctuation of rice market , later K Line chart is introduced to the stock market . The four index data of each day are recorded with the following candle shape figures , Different colors represent ups and downs .

picture source : Line theory Module provides drawing K Functions of line graphs candlestick_ohlc(), But if you want to draw more beautiful K We need to work on the line chart . As defined below pandas_candlestick_ohlc() Function to plot the K Line chart , Most of the code is formatting the axes .

from import candlestick_ohlc

from matplotlib.dates import DateFormatter, WeekdayLocator,DayLocator, MONDAY


def pandas_candlestick_ohlc(stock_data, otherseries=None):    


    # Set drawing parameters , Mainly the coordinate axis

    mondays = WeekdayLocator(MONDAY)

    alldays = DayLocator()  

    dayFormatter = DateFormatter('%d')


    fig, ax = plt.subplots()


    if stock_data.index[-1] - stock_data.index[0] < pd.Timedelta('730 days'):

        weekFormatter = DateFormatter('%b %d')  




        weekFormatter = DateFormatter('%b %d, %Y')




    # establish K Line chart   

    stock_array = np.array(stock_data.reset_index()[['date','open','high','low'

    stock_array[:,0] = date2num(stock_array[:,0])

    candlestick_ohlc(ax, stock_array, colorup = "red", colordown="green",width=



    # Other line charts can be drawn at the same time

    if otherseries is not None:

        for each in otherseries:

            plt.plot(stock_data[each], label=each)            






    plt.setp(plt.gca().get_xticklabels(), rotation=45,horizontalalignment=


Here red stands for rising , Green represents decline .

Relative variation

It's not about the absolute value of the price in stocks , But the relative change . There are many ways to measure the relative value of stock prices , The easiest way is to divide the stock price by the initial price .

stock['return'] = stock['close'] / stock.close.iloc[0]


The second method is to calculate the daily fluctuation , But there are two ways to calculate it :

These two may lead to different analysis results , The first formula is used for the up and down range in the sample data , And I'm on it 100%.

stock['p_change'].plot(grid=True).axhline(y=0, color='black', lw=2)

To solve the dilemma in the second method , We introduce a third approach , It's the logarithm of the price , The formula is as follows :

close_price = stock['close']

log_change = np.log(close_price) - np.log(close_price.shift(1))

log_change.plot(grid=True).axhline(y=0, color='black', lw=2)


After observing the price trend , Let's look at the relationship between the indicators . Some representative indicators are selected below , And use pandas.scatter_matrix() function , Make a scatter diagram of the two pairs of index data , The diagonal is the histogram of each index data .

small = stock[['close', 'price_change', 'ma20','volume', 'v_ma20','turnover']]

_ = pd.scatter_matrix(small)

Trading volume can be found clearly in the figure (volume) And turnover (turnover) There is a very obvious linear relationship , In fact, the definition of turnover rate is : Volume divided by total number of shares issued , Multiply by 100%. So in the following analysis, we will remove the turnover index , Correlation is used to reduce the dimension of data .

The scatter chart above looks a little dazzled , We can use numpy.corrcof() To directly calculate the correlation coefficient of each index data .

small = stock[['close', 'price_change', 'ma20','volume', 'v_ma20']]

cov = np.corrcoef(small.T)


array([[ 1.        ,  0.30308764,  0.10785519,  0.91078009, -0.37602193],

       [ 0.30308764,  1.        , -0.45849273,  0.3721832 , -0.25950305],

       [ 0.10785519, -0.45849273,  1.        , -0.06002202,  0.51793654],

       [ 0.91078009,  0.3721832 , -0.06002202,  1.        , -0.37617624],

       [-0.37602193, -0.25950305,  0.51793654, -0.37617624,  1.        ]])

If you think it's not convenient to see the numbers , We continue to convert the above correlation matrix into a graph , As shown in the figure below , The correlation coefficient is represented by color . We found that (0,3) The correlation coefficient of location is very large , View value reached 0.91. The two strongly positive indicators are closing price and volume .

img = plt.matshow(cov,

plt.colorbar(img, ticks=[-1,0,1])

In the way of matrix chart, we quickly find the strongly correlated indexes among several indexes . Then make a line chart of closing price and trading volume , Because their values are very different , So we use two sets of ordinate system to do the drawing .

stock[['close','volume']].plot(secondary_y='volume', grid=True)

Observe the trend of these two indicators , Most of the time, share prices go up , Volume also rose , vice versa . But not in some cases , It may be that trading volume is affected by the inertia of the previous period , Or something else .

Moving average

Mr. Wu Jun once told about his investment experience , The main idea is that a good way to invest is not to make predictions , It's about being able to make the right response and decision at the right time . The stock market can't predict either , What we can do is to choose the right strategy to deal with different situations .

Good indicators drive decision making . One of the indicators that we haven't used in the above analysis is 5,10,20 Average daily price , They are also called moving averages , Let's use this indicator to demonstrate a simple stock trading strategy .( warning : This is just a demonstration , Not investment advice .)

To get more data to demonstrate , We use pandas_datareader Download the latest Google stock data directly from Yahoo .

import datetime

import as web


# Set the time span of stock data

start = datetime.datetime(2016,10,1)

end =


# from yahoo Get from google Share price data for .

goog = web.DataReader("GOOG", "yahoo", start, end)


# Modify index and column names , To adapt to the analysis of this paper

goog.index.rename('date', inplace=True)

goog.rename(columns={'Open':'open', 'High':'high', 'Low':'low','Close':'close'
}, inplace=True)



Only the daily price and volume in the data , So we need to figure it out ourselves 5 Daily average price and 10 Average daily price , And the line chart of the average price ( Also called moving average ) And K Line drawings together .

goog["ma5"] = np.round(goog["close"].rolling(window = 5, center =False).mean(),

goog["ma20"] = np.round(goog["close"].rolling(window = 20, center =False).mean
(), 2)

goog = goog['2017-01-01':]


pandas_candlestick_ohlc(goog, ['ma5','ma20'])

Look at the picture above , We found that 5 Daily average vs K The line chart is close , and 20 The daily average is flatter , It can be seen that the moving average can smooth short-term fluctuations , Better reflect the long-term trend . compare 5 Daily average and 20 Daily average , Especially focus on their intersection , These are the timing of the deal . Moving average strategy , The easiest way is : When 5 Daily average surpasses from below 20 Daily average time , Purchase of shares , When 5 The daily average goes up to 20 Below the daily average , Sale of shares .

To find out when to trade , We calculate 5 Daily average price and 20 Difference of daily average price , And take its sign , Made in the figure below . When the horizontal line in the figure jumps, it is the trading opportunity .

goog['ma5-20'] = goog['ma5'] - goog['ma20']

goog['diff'] = np.sign(goog['ma5-20'])

goog['diff'].plot(ylim=(-2,2)).axhline(y=0,color='black', lw=2)

For easier observation , Average price difference calculated above , Then take the difference between adjacent dates , Get signal index . When the signal is 1 Time , Purchase of shares ; When the signal is -1 Time , Sales of shares ; When the signal is 0 Time , Do nothing .

goog['signal'] = np.sign(goog['diff'] - goog['diff'].shift(1))


As can be seen from the above figure , From the beginning of this year to now , There are two rounds of buying and selling opportunities . up to now , Everything seems to be going well , Let's see how the profits of these two rounds of transactions are .

trade = pd.concat([

    pd.DataFrame({"price": goog.loc[goog["signal"] == 1, "close"],

                  "operation": "Buy"}),

    pd.DataFrame({"price": goog.loc[goog["signal"] == -1, "close"],

                  "operation": "Sell"})    





The above table lists the transaction dates , Operation and price of the day . But I'm sorry to find out , The selling price of both rounds is lower than the buying price , In fact, we lost money in the above way !!!

Are you angry ? From the original analysis to now , It's all fake ! I warned before , The analysis here is just to demonstrate the idea of moving average strategy , Not real investment advice . How complex is the stock market , How can a small strategy win ?

Is this strategy useless ? This is not so. ! If a longer time span is considered , such as 5 year ,10 year , And consider a longer average , For example 20 Daily average and 50 Daily average comparison ; Although there are times of loss in the process , But it's more likely to win . in other words , The strategy is also feasible on a longer time scale . But even if you make it , Can we win the market again ? Other methods are needed at this time , For example, reasonable allocation of investment proportion .

That's the same thing , There are risks in the stock market , Investment should be cautious . This is not an analysis of stocks , But the basic method of data analysis is to borrow stock data , And demonstrate what indicators are good indicators .

reference material :

An Introduction to Stock Market Data Analysis with Python (Part 1)

An Introduction to Stock Market Data Analysis with Python (Part 2)

K Line theory

K Example of line drawing

source : Fish heart DrFish

Sharing circle of friends Another kind of appreciation

The more we share, The more we have


Welcome to data Jun efficient data analysis community

Add my personal wechat to enter the big data dry goods group :tongyuannow 

at present 100000+ People are interested in joining us