Authorized to be reproduced from official account THU Data school (ID:datapi)

author : Dou Yingtong




Intron





The author 2015 year 7 In April, we created a wechat group chat based on sharing Didi's taxi red bag , The purpose of creation is to make it convenient for everyone to share red envelopes without disturbing others , You can easily get red envelopes when you need them . With the increase of the number of people in the group and the types of red packets shared , The group has become a variety of
O2O service APP A distribution center for coupons . from 2015 year 8 Monthly arrival 2017 year 8 month , The group has generated about 20000 red packet sharing records
, The author recently exported these records , Number of passes , time , Analyze these data in semantic and other dimensions , Let's share my own interpretation for you to learn and exchange .

 



Quantity dimension





The main members of this group are university students in Beijing . In two years, the group has co generated 21477 Chat records , The effective red packet sharing record is about 20000 strip , Group members in 10 From dozens of people to 500 A crowd of people .




Chat records can be exported as Excel Table format , The format of a single chat record is shown in the figure below 1 As shown in .






chart 1





The data of each column is wechat group number ( you 're right , Wechat group also has group number ), Message sent time , Sender's wechat nickname , Sender micro signal , Sending form ( Receive or send ), Message type ( text , Webpage , Animation expression , Photo wallpaper ) And message content . Because most of the red envelopes are shared in the form of web pages , And each
APP Use only their own fixed domain names , Such as didi taxi  xiaojukeji.com , Hungry? Use
ele.me. Through the statistics of different domain names , The author analyzed the most number of red envelopes 12 individual APP And their quantity proportion ( chart 2), this 12 class APP Of the total number of red packets 95%.






chart 2




It can be seen from the figure that take out red envelopes are the most numerous among all kinds of red envelopes , Because of food, clothing, housing and transportation ,“ food ” The highest frequency . Hungry? Red packets account for nearly half of all red packets
, This is in line with the 2016 and 2017 Annual take out APP
Conclusions of market share report ( Hungry and meituan take out market share ) atypism . This is because of the limitations of wechat group chat participants' identities and regions, so the statistical results can only reflect the sales in a small range APP Market share of .




Except for the APP outside , There are many red envelopes APP also : Where are you going? , Unibike , Dida carpooling , Love fresh bee , One meter fresh , Ctrip ,
Daily excellent fresh , Lehui , Youku , Happy Xiaole ,Airbnb, China Mobile , Zupool , in stock . What needs to be added is , chart 2 The red envelopes of Beijing East include the Beijing East Shopping Mall , Jingdong home and Jingdong Finance , Netease's red packet includes Netease's strict selection , Koala Haigou and Yinyang teacher .




above APP Basically covering most of the mainstream services in China O2O
Serving APP, At the same time, it also reflects the consumption characteristics of college students . carpooling , take-out food , Fresh delivery , online shopping , Entertainment is the main consumption form of College Students .




From figure 1 You can see that each red packet has a corresponding slogan when sharing , The author analyzes the high frequency vocabulary of these advertisements , And make it into a cloud of words , As shown in the figure 3 As shown in .






chart 3




Careful readers may find that there are several types of red envelopes , One is propaganda APP ( And services provided ) itself , One is advertising of other brands , There are film and TV series and brand promotion activities , Another type is
APP Own star endorsement , For example, hungry Wang Zulan and Kobe Bryant . I analyzed 2015 year 8 month ,2016 year 8 Month and 2017 year 8 The proportion of these three kinds of red envelopes in the three months , As shown in the figure 4 As shown in .









chart 4






2015 In the summer of O2O At the beginning of the rapid development of services , At that time, their market share was not high , Therefore, the main purpose of red envelopes is to promote their own services , here we are 2016 Summer of , O2O Service competition has reached a hot stage ( Takeout and travel ), At that time, the discount of red envelopes was relatively large , More people to share , So we see a significant increase in the proportion of advertisements of other brands ,
Red packet advertisement can be used as O2O One of the revenue sources of service providers
. I have no industry experience , But it is speculated that the exposure rate and click through rate of red packet advertising are higher than that of some other forms of advertising .2017 Summer of , At this time, the take out and travel market pattern has been determined , Reduce the discount of red envelopes , Decrease in number of Sharers , So most of the ads are for themselves APP Publicity of , The common advertising language is “ The first X Maximum amount of red packets received ”, To stimulate people to click on links and generate consumption .




Quantity dimension





chart 5 It's the top seven red envelopes APP The trend of the number of red envelopes in two years .




( It is recommended to watch it horizontally )



chart 5





The following conclusions can be drawn from the trend of the number of red packets : first , Are you hungry? And the red envelopes of meituan take out are the mainstream of all take out red envelopes
. stay 2016 year 8 Months ago , The number of red packets for meituan takeout is higher than that for hungry , Are you hungry? The number of red packets is over , Far higher than the number of meituan red packets . The reason for the change is not hunger. Increase the promotion efforts , It's because most of the group members ( A college student in Beijing ) Moved from one campus to another , And the scale of meituan takeout in the original campus is larger than that in the new campus . same , stay 2017 year 6 After month , The overall decline in the number of red envelopes is due to the fact that most of the group members graduated from University , Demand for takeout is down . This reflects the instability of small-scale data from another perspective .




second , It's also take out red envelopes , We can see in 2016 year 2 Month and 2017 year 2 month , That is, during the Spring Festival and winter vacation , All kinds of takeout APP And the number of red envelopes has declined significantly
, obvious , Most of the group members go home for the new year , The demand for takeout is greatly reduced . What's interesting is that , There is no significant change in the number of didi red packets , One is the influence of Spring Festival , On the other hand, Didi did a good job in sinking the third and fourth tier cities
.





last , We see a steady increase in the number of didi red packets until 2016 year 7 Monthly peak , from 2016 year 8 Falling all the way from the beginning of the month . The author thinks that the reason for the decline has little to do with the group members , The main reason is 2016 year 8 month 1 Didi announced the acquisition of Uber China , Didi is the only one in the field of sharing travel in China , I clearly remember
From then on, Didi's bonus package was greatly reduced , There's a starting price for the express
. On the one hand, the reduction of preferential policies , On the other hand, some swing passengers may choose other travel modes , The author thinks that this is the reason why Didi's share of red envelopes has declined .









chart 6






chart 6 Will didi and ofo The change trend of red packet quantity is specially listed , In this way, we can see the changes more intuitively . Why there is no Moby , Because Moby shares less , Not obvious on the chart . if 2015 Summer is the time when car sharing began to grow rapidly , So you can see from the picture
2016 The summer of 2008 is the time for the rapid development of bike sharing
. In fact, Didi 2012 Started taxi hailing in ,ofo As early as 2014 Started to promote shared bicycles on campus in . along with 4G The popularity of Internet and smart phones , More and more wechat users , Under the comprehensive action of many factors , These trips O2O Service in 2015 Years later, it began to develop rapidly .




Let's narrow the time dimension to within one day , Take a look at the relationship between the number and time of sharing red envelopes for travel and take out in a day ( chart 7).








chart 7





generally speaking , One online red packet sharing can represent the sharers and generate corresponding behaviors offline , Through the change of red packet sharing quantity, we can see the change trend , stay “ food ” and “ That's ok ” aspect , The statistics well reflect our general cognition .
Take out is concentrated before 12 noon and 7 pm , Travel is relatively evenly distributed during the day .






Alipay cross year red envelopes




I believe that most readers have experienced the baptism of Alipay's cross year red envelope. , As a sensitive group leader of Hongbao group , I found that the squeak of Alipay began to appear in the group in mid December. , Peak by the end of the year
, chart 8 It's group talk 2017 year 12 month 10 Day to day 2018 year 1 month 10 The change of Alipay's share of the year's red envelopes .









chart 8





Because I am abroad , Didn't participate in the action of dividing up the red contract , But what I'm curious about is : stay 2017 year 12 month 12 Day to day 2017 year 12 month 22
day , Alipay shares red envelopes ,“ Alipay ” There are a lot of variants in three characters , It was once thought to be fraudulent news , I analyze all the Alipay variants in these ten days. , Make it into a picture 9 The word cloud of .






chart 9






Alipay produced ten variants. , At first, the author speculated that Alipay was to prevent WeChat from tracking and shielding. , But I don't think this variant will prevent wechat from detecting messages , Besides, before and after that, red envelopes are normal , So I am looking forward to my friends who understand this question .






summary





to make a long story short , The data set of 20000 records is too small , So it's hard to get a macroscopic conclusion , Most of the conclusions are obvious . Use this data set for further behavior prediction, for example , User portrait , It's also unrealistic . in addition , The particularity of this dataset lies in its uniqueness , Different from the publicly available data such as Weibo , Such data can only be collected by human organizations ,
So even if the dataset is large enough , The model based on it is difficult to have practical value .





therefore , Suppose I have enough group members , I can collect their gender , Occupation and income , Time to share red envelopes online with them , type , frequency , Some interesting economic conclusions may be drawn . further , If we can get the click status of each red packet group member , This adds another data dimension , You can optimize red packet sending by combining time, title ads and click through rate , You can also combine other data dimensions within the group to draw users , Behavior prediction, etc . of course ,
All of these are based on the premise that there are enough group members and red packets . Under this premise , We can O2O Make a macro analysis of the development of the industry , Observe the development of the industry from a new perspective .





But the above restrictions , It's not a problem for wechat officials , Wechat uses its platform advantages to connect countless APP, Using different data sources , Wechat can be filtered through collaboration (Collaborative
Filtering) And multi perspective learning (Multi-viewLearning) User portrait for more accurate recommendation . Think from another angle , More and more of our actions are BAT
Three collected , People are becoming more and more transparent on the Internet , So the protection of privacy is becoming more and more important , It's not just about self-discipline , But also rely on the state to strengthen legislation .




Through this analysis , My main discovery is the one sidedness of small data sets , It's not that if the number reaches millions, it's not small data , It's about the need
Recognize the limitations of existing datasets , Can't generalize , And try to get comprehensive macro data as much as possible . This has some implications for data mining practitioners .




Wechat group chat records can be accessed through “ Sync assistant ” Export to computer , Can be exported as a text document , Form or web page format , combination Excel And related
Python tool kit , It can easily mine wechat group chat data , Readers can dig their own wechat chat records . I also use the data set used in this article to be anonymously processed and published on the Internet for you to learn and use .




Dataset download address :

http://ytongdou.com/wp-content/uploads/2018/01/W




【 Today's machine learning concepts 】

Have a Great Definition


<http://mp.weixin.qq.com/s?__biz=MjM5MTQzNzU2NA==&mid=2651656302&idx=1&sn=7e3f162603083a6865f778574b7be0d7&chksm=bd4c37fd8a3bbeebbe2ae84e56d7e60cfa7bedba204ed304f593a99b010d70d85f38c886ada1&scene=21#wechat_redirect>


<http://mp.weixin.qq.com/s?__biz=MjM5MTQzNzU2NA==&mid=2651656302&idx=1&sn=7e3f162603083a6865f778574b7be0d7&chksm=bd4c37fd8a3bbeebbe2ae84e56d7e60cfa7bedba204ed304f593a99b010d70d85f38c886ada1&scene=21#wechat_redirect>