This article will usePython To visualize stock data, For example, drawingK Line graph, And explore the meaning and relationship of each index, Finally, using the moving average method to explore the investment strategy.

Data import

Here stock data is stored instockData.txt In text file, We usepandas.read_table() Function to read file data intoDataFrame format.

Among them parametersusecols=range(15) Limit read only before15 Column data,parse_dates=[0] Represents parsing the first column of data into a time format,index_col=0 The first column of data is specified as the index.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt


%matplotlib inline


%config InlineBackend.figure_format = 'retina'


%pylab inline


pylab.rcParams['figure.figsize'] = (10, 6) # Set drawing size


# Read data

stock = pd.read_table('stockData.txt', usecols=range(15), parse_dates=[0], 

stock = stock[::-1]  # Reverse order


The above shows the front5 Row data, To get more information about the data, have access Method. It tells us that there are20 That's ok, Index is time format, Date from2015 year1 month5 Day to2015 year1 month30 day. All in all14 column, The name and data format of each column are listed, And there are no missing values.

<class 'pandas.core.frame.DataFrame'>

DatetimeIndex: 20 entries, 2015-01-05 to 2015-01-30

Data columns (total 14 columns):

    open        20 non-null float64

high            20 non-null float64

close           20 non-null float64

low             20 non-null float64

volume          20 non-null float64

price_change    20 non-null float64

p_change        20 non-null float64

ma5             20 non-null float64

ma10            20 non-null float64

ma20            20 non-null float64

v_ma5           20 non-null float64

v_ma10          20 non-null float64

v_ma20          20 non-null float64

turnover        20 non-null float64

dtypes: float64(14)

memory usage: 2.3 KB

When looking at the names of each column, We found’open’ Does not appear to precede other column names, For a clearer view, Use.columns Get all the column names of the data as follows:


Index(['    open', 'high', 'close', 'low', 'volume', 'price_change',

       'p_change', 'ma5', 'ma10', 'ma20', 'v_ma5', 'v_ma10', 'v_ma20',



Then discovered’open’ Extra space before column name, We use the following method to correct the column name.

stock.rename(columns={'    open':'open'}, inplace=True)

thus, We have completed the import and cleaning of stock data, The next step is to use a visual approach to look at the data.

Data observation

First, The column name of our observation data, Its meaning corresponds to the following:

These indicators can be divided into two categories:

Price related indicators

Day price: The opening quotation, Closing price, Highest, Minimum price

Price changes: Price change and fluctuation

average price:5,10,20 Daily average price

Volume related indicators


Turnover rate: volume/ Total shares issued×100%

Average volume:5,10,20 Daily average

Because these indicators change with time, So let's look at their time series.

Time series diagram

Time as abscissa, The daily closing price is the vertical coordinate, Make a broken line graph, We can observe the fluctuation of stock price over time. Use directly hereDataFrame Drawing tool with data format, It has the advantage of being able to make drawings quickly, And automatically optimize the form of graphic output.


If we open daily, Closing price and maximum, The lowest prices are drawn together in the form of polylines, It's a mess, It's not easy to analyze. So what's a good way to show these four indicators in a picture? Here's the answer.

K Line graph

Legend has it thatK The line chart originated in the shogunate era of Tokugawa, Japan, Businessmen at that time used this map to record the market and price fluctuation of rice market, laterK Line chart is introduced to the stock market. The four index data of each day are recorded with the following candle shape figures, Different colors represent ups and downs.

picture source: Line theory Module provides drawingK Functions of line graphscandlestick_ohlc(), But if you want to draw more beautifulK We need to work on the line chart. As defined belowpandas_candlestick_ohlc() Function to plot theK Line graph, Most of the code is formatting the axes.

from import candlestick_ohlc

from matplotlib.dates import DateFormatter, WeekdayLocator,DayLocator, MONDAY


def pandas_candlestick_ohlc(stock_data, otherseries=None):    


    # Set drawing parameters, Mainly the coordinate axis

    mondays = WeekdayLocator(MONDAY)

    alldays = DayLocator()  

    dayFormatter = DateFormatter('%d')


    fig, ax = plt.subplots()


    if stock_data.index[-1] - stock_data.index[0] < pd.Timedelta('730 days'):

        weekFormatter = DateFormatter('%b %d')  




        weekFormatter = DateFormatter('%b %d, %Y')




    # EstablishK Line graph  

    stock_array = np.array(stock_data.reset_index()[['date','open','high','low'

    stock_array[:,0] = date2num(stock_array[:,0])

    candlestick_ohlc(ax, stock_array, colorup = "red", colordown="green",width=



    # Other line charts can be drawn at the same time

    if otherseries is not None:

        for each in otherseries:

            plt.plot(stock_data[each], label=each)            






    plt.setp(plt.gca().get_xticklabels(), rotation=45,horizontalalignment=


Here red stands for rising, Green represents decline.

Relative variation

It's not about the absolute value of the price in stocks, But the relative change. There are many ways to measure the relative value of stock prices, The easiest way is to divide the stock price by the initial price.

stock['return'] = stock['close'] / stock.close.iloc[0]


The second method is to calculate the daily fluctuation, But there are two ways to calculate it:

These two may lead to different analysis results, The first formula is used for the up and down range in the sample data, And multiplied.100%.

stock['p_change'].plot(grid=True).axhline(y=0, color='black', lw=2)

To solve the dilemma in the second method, We introduce a third approach, It's the logarithm of the price, The formula is as follows:

close_price = stock['close']

log_change = np.log(close_price) - np.log(close_price.shift(1))

log_change.plot(grid=True).axhline(y=0, color='black', lw=2)


After observing the price trend, Let's look at the relationship between the indicators. Some representative indicators are selected below, And usepandas.scatter_matrix() function, Make a scatter diagram of the two pairs of index data, The diagonal is the histogram of each index data.

small = stock[['close', 'price_change', 'ma20','volume', 'v_ma20','turnover']]

_ = pd.scatter_matrix(small)

Trading volume can be found clearly in the figure(volume) Turnover rate(turnover) There is a very obvious linear relationship, In fact, the definition of turnover rate is: Volume divided by total number of shares issued, Multiplied by100%. So in the following analysis, we will remove the turnover index, Correlation is used to reduce the dimension of data.

The scatter diagram above looks a little dazzled, We can usenumpy.corrcof() To directly calculate the correlation coefficient of each index data.

small = stock[['close', 'price_change', 'ma20','volume', 'v_ma20']]

cov = np.corrcoef(small.T)


array([[ 1.        ,  0.30308764,  0.10785519,  0.91078009, -0.37602193],

       [ 0.30308764,  1.        , -0.45849273,  0.3721832 , -0.25950305],

       [ 0.10785519, -0.45849273,  1.        , -0.06002202,  0.51793654],

       [ 0.91078009,  0.3721832 , -0.06002202,  1.        , -0.37617624],

       [-0.37602193, -0.25950305,  0.51793654, -0.37617624,  1.        ]])

If you think it's not convenient to see the numbers, We continue to convert the above correlation matrix into a graph, As shown in the figure below, The correlation coefficient is represented by color. We found that(0,3) The correlation coefficient of location is very large, View value reached0.91. The two strongly positive indicators are closing price and volume.

img = plt.matshow(cov,

plt.colorbar(img, ticks=[-1,0,1])

In the way of matrix chart, we quickly find the strongly correlated indexes among several indexes. Then make a line chart of closing price and trading volume, Because their values are very different, So we use two sets of ordinate system to do the drawing.

stock[['close','volume']].plot(secondary_y='volume', grid=True)

Observe the trend of these two indicators, Most of the time, share prices go up, Volume also rose, Vice versa. But not in some cases, It may be that trading volume is affected by the inertia of the previous period, Or something else.

Moving average

Mr. Wu Jun once told about his investment experience, The main idea is that a good way to invest is not to make predictions, It's about being able to make the right response and decision at the right time. The stock market can't predict either, What we can do is to choose the right strategy to deal with different situations.

Good indicators drive decision making. One of the indicators that we haven't used in the above analysis is5,10,20 Daily average price, They are also called moving averages, Let's use this indicator to demonstrate a simple stock trading strategy.( warning: This is just a demonstration, Not investment advice.)

To get more data to demonstrate, We usepandas_datareader Download the latest Google stock data directly from Yahoo.

import datetime

import as web


# Set the time span of stock data

start = datetime.datetime(2016,10,1)

end =


# fromyahoo Get ingoogle Share price data for.

goog = web.DataReader("GOOG", "yahoo", start, end)


# Modify index and column names, To adapt to the analysis of this paper

goog.index.rename('date', inplace=True)

goog.rename(columns={'Open':'open', 'High':'high', 'Low':'low','Close':'close'
}, inplace=True)



Only the daily price and volume in the data, So we need to figure it out ourselves5 Daily average price and sum10 Daily average price, And the line graph of the average price( Also called moving average) AndK Line drawings together.

goog["ma5"] = np.round(goog["close"].rolling(window = 5, center =False).mean(),

goog["ma20"] = np.round(goog["close"].rolling(window = 20, center =False).mean
(), 2)

goog = goog['2017-01-01':]


pandas_candlestick_ohlc(goog, ['ma5','ma20'])

Observe the above figure, We found5 Daily mean line andK The line chart is close, and20 The daily average is flatter, It can be seen that the moving average can smooth short-term fluctuations, Better reflect the long-term trend. compare5 Daily mean line and20 Daily mean line, Especially focus on their intersection, These are the timing of the deal. Moving average strategy, The easiest way is: When5 Daily average surpasses from below20 Daily mean line time, Buying stocks, When5 The daily average goes up to20 Below the daily average, Selling stocks.

To find out when to trade, We calculate5 Daily average price and sum20 Difference of daily average price, And take its sign, As shown below. When the horizontal line in the figure jumps, it is the trading opportunity.

goog['ma5-20'] = goog['ma5'] - goog['ma20']

goog['diff'] = np.sign(goog['ma5-20'])

goog['diff'].plot(ylim=(-2,2)).axhline(y=0,color='black', lw=2)

For easier observation, Average price difference calculated above, Then take the difference between adjacent dates, Get signal index. When the signal is1 Time, Purchase of shares; When the signal is-1 Time, Sales of shares; When the signal is0 Time, Do nothing.

goog['signal'] = np.sign(goog['diff'] - goog['diff'].shift(1))


As can be seen from the above figure, From the beginning of this year to now, There are two rounds of buying and selling opportunities. up to now, Everything seems to be going well, Let's see how the profits of these two rounds of transactions are.

trade = pd.concat([

    pd.DataFrame({"price": goog.loc[goog["signal"] == 1, "close"],

                  "operation": "Buy"}),

    pd.DataFrame({"price": goog.loc[goog["signal"] == -1, "close"],

                  "operation": "Sell"})    





The above table lists the transaction dates, Operation and price of the day. But I'm sorry to find out, The selling price of both rounds is lower than the buying price, In fact, we lost money in the above way!!!

Are you angry? From the original analysis to now, It's all fake! I warned before, The analysis here only demonstrates the idea of moving average strategy, Not real investment advice. How complex is the stock market, How can a small strategy win?

Is this strategy useless? This is not so.! If a longer time span is considered, such as5 year,10 year, And consider a longer average, For example20 Daily mean line and50 Daily average comparison; Although there are times of loss in the process, But it's more likely to win. In other words, The strategy is also feasible on a longer time scale. But even if you make it, Can we win the market again? Other methods are needed at this time, For example, reasonable allocation of investment proportion.

That's the same thing, There are risks in the stock market, Investment should be cautious. This is not an analysis of stocks, But the basic method of data analysis is to borrow stock data, And demonstrate what indicators are good indicators.

Reference material:

An Introduction to Stock Market Data Analysis with Python (Part 1)

An Introduction to Stock Market Data Analysis with Python (Part 2)

K Line theory

K Example of line drawing

source: Fish heartDrFish

Sharing circle of friends Another kind of appreciation

The more we share, The more we have


Welcome to data Jun efficient data analysis community

Add my personal wechat to enter the big data dry goods group:tongyuannow 

at present100000+ People are interested in joining us