Empowerment is reproduced from public address.THU Data pie(ID:datapi)

author: Du Ying Tong




Introduction





Author in2015 year7 In April, we created a wechat group chat based on sharing Didi's taxi red bag, The purpose of creation is to make it convenient for everyone to share red envelopes without disturbing others, You can easily get red envelopes when you need them. With the increase of the number of people in the group and the types of red packets shared, The group has become a variety of
O2O serviceAPP A distribution center for coupons. from2015 year8 Month to2017 year8 month, The group has generated about 20000 red packet sharing records
, The author recently exported these records, By quantity, time, Analyze these data in semantic and other dimensions, Let's share my own interpretation for you to learn and exchange.

 



Quantitative dimension





The main members of this group are university students in Beijing. In two years, the group has co generated21477 Chat records, The effective red packet sharing record is about20000 strip, Group members in10 From dozens of people to500 Man full group.




Chat records can be exported as Excel Form format, The format of a single chat record is shown in the figure below1 Shown.






chart1





The data of each column is wechat group number( You're right, Wechat group also has group number), Message sent time, Sender's wechat nickname, Sender micro signal, Sending form( Receive or send), Message type( text, Webpage, Animation expression, Photo wallpaper) And message content. Because most of the red envelopes are shared in the form of web pages, And each kind of
APP Use only their own fixed domain names, Such as didi taxi xiaojukeji.com , Hungry? Use
ele.me. Through the statistics of different domain names, The author analyzed the most number of red envelopes12 individualAPP And their quantity proportion( chart2), this12 class APP Of the total number of red packets95%.






chart2




It can be seen from the figure that take out red envelopes are the most numerous among all kinds of red envelopes, Because of food, clothing, housing and transportation,“ food” The highest frequency. Hungry? Red packets account for nearly half of all red packets
, This is in line with the2016 and2017 Annual takeaway APP
Conclusions of market share report( Hungry and meituan take out market share) Atypism. This is because of the limitations of wechat group chat participants' identities and regions, so the statistical results can only reflect the sales in a small rangeAPP Market share of.




Except for the APP Outside, There are many red envelopes APP Also: Where are you going? Unibike, Tick pooling, Love bee, One metre fresh, Ctrip,
Daily excellence, Happiness, Youku, Happy Xiaole,Airbnb, China Mobile, Zupool, In stock. What needs to be added is, chart2 The red envelopes of Beijing East include the Beijing East Shopping Mall, Jingdong home and Jingdong Finance, Netease's red packet includes Netease's strict selection, Koala Haigou and Yinyang teacher.




Above APP Basically covering most of the mainstream services in China O2O
ServiceAPP, At the same time, it also reflects the consumption characteristics of college students. Carpooling, Take-out food, Fresh delivery, online shopping, Entertainment is the main consumption form of College Students.




From diagram1 You can see that each red packet has a corresponding slogan when sharing, The author analyzes the high frequency vocabulary of these advertisements, And make it into a cloud of words, Pictured3 Shown.






chart3




Careful readers may find that there are several types of red envelopes, One is propaganda APP ( And services provided) itself, One is advertising of other brands, There are film and TV series and brand promotion activities, Another type is
APP Own star endorsement, For example, hungry Wang Zulan and Kobe Bryant. I analyzed.2015 year8 month,2016 year8 Yue He2017 year8 The proportion of these three kinds of red envelopes in the three months, Pictured4 Shown.









chart4






2015 Summer is summer.O2O At the beginning of the rapid development of services, At that time, their market share was not high, Therefore, the main purpose of red envelopes is to promote their own services, Here we are2016 Summer in summer, O2O Service competition has reached a hot stage( Takeout and travel), At that time, the discount of red envelopes was relatively large, More people to share, So we see a significant increase in the proportion of advertisements of other brands,
Red packet advertisement can be used asO2O One of the revenue sources of service providers
. I have no industry experience, But it is speculated that the exposure rate and click through rate of red packet advertising are higher than that of some other forms of advertising.2017 Summer in summer, At this time, the take out and travel market pattern has been determined, Reduce the discount of red envelopes, Decrease in number of Sharers, So most of the ads are for themselvesAPP Propaganda, The common advertising language is“ The firstX Maximum amount of red packets received”, To stimulate people to click on links and generate consumption.




Quantitative dimension





chart5 It's the top seven red envelopesAPP The trend of the number of red envelopes in two years.




( It is recommended to watch it horizontally)



chart5





The following conclusions can be drawn from the trend of the number of red packets: First, Are you hungry? And the red envelopes of meituan take out are the mainstream of all take out red envelopes
. stay2016 year8 Before month, The number of red packets for meituan takeout is higher than that for hungry, Are you hungry? The number of red packets is over, Far higher than the number of meituan red packets. The reason for the change is not hunger. Increase the promotion efforts, It's because most of the group members( A college student in Beijing) Moved from one campus to another, And the scale of meituan takeout in the original campus is larger than that in the new campus. same, stay2017 year6 After month, The overall decline in the number of red envelopes is due to the fact that most of the group members graduated from University, Demand for takeout is down. This reflects the instability of small-scale data from another perspective.




Second, It's also take out red envelopes, We can see in2016 year2 Yue He2017 year2 month, It's the time of Spring Festival and winter vacation, Various types of takeaway APP And the number of red envelopes has declined significantly
, Obvious, Most of the group members go home for the new year, The demand for takeout is greatly reduced. Interestingly, There is no significant change in the number of didi red packets, One is the influence of Spring Festival, On the other hand, Didi did a good job in sinking the third and fourth tier cities
.





Last, We see a steady increase in the number of didi red packets until2016 year7 Monthly peak, from2016 year8 Falling all the way from the beginning of the month. The author thinks that the reason for the decline has little to do with the group members, The main reason is2016 year8 month1 Didi announced the acquisition of Uber China, Didi is the only one in the field of sharing travel in China, I clearly remember
From then on, Didi's bonus package was greatly reduced, There's a starting price for the express
. On the one hand, the reduction of preferential policies, On the other hand, some swing passengers may choose other travel modes, The author thinks that this is the reason why Didi's share of red envelopes has declined.









chart6






chart6 Drop drops andofo The change trend of red packet quantity is specially listed, In this way, we can see the changes more intuitively. Why there is no Moby, Because Moby shares less, Not obvious on the chart. If2015 Summer is the time when car sharing began to grow rapidly, So you can see from the picture
2016 The summer of 2008 is the time for the rapid development of bike sharing
. In fact, Didi2012 Started taxi hailing in,ofo As early as2014 Started to promote shared bicycles on campus in. along with4G The popularity of Internet and smart phones, More and more wechat users, Under the comprehensive action of many factors, These tripsO2O Service in2015 Years later, it began to develop rapidly.




Let's narrow the time dimension to within one day, Take a look at the relationship between the number and time of sharing red envelopes for travel and take out in a day( chart7).








chart7





generally speaking, One online red packet sharing can represent the sharers and generate corresponding behaviors offline, Through the change of red packet sharing quantity, we can see the change trend, stay“ food” and“ That's ok” Aspect, The statistics well reflect our general cognition.
Take out is concentrated before 12 noon and 7 pm, Travel is relatively evenly distributed during the day.






Alipay cross year red envelopes




I believe that most readers have experienced the baptism of Alipay's cross year red envelope. As a sensitive group leader of Hongbao group, I found that the squeak of Alipay began to appear in the group in mid December. Peak by the end of the year
, chart8 It's group talk2017 year12 month10 Day to2018 year1 month10 The change of Alipay's share of the year's red envelopes.









chart8





Because I am abroad, Didn't participate in the action of dividing up the red contract, But what I'm curious about is: stay2017 year12 month12 Day to2017 year12 month22
day, Alipay shares red envelopes,“ Alipay” There are a lot of variants in three characters, It was once thought to be fraudulent news, I analyze all the Alipay variants in these ten days. Make it into a picture9 Word cloud.






chart9






Alipay produced ten variants. At first, the author speculated that Alipay was to prevent WeChat from tracking and shielding. But I don't think this variation will prevent wechat from detecting messages, Besides, before and after that, red envelopes are normal, So I am looking forward to my friends who can understand this question.






summary





To make a long story short, The data set of 20000 records is too small, So it's hard to get a macroscopic conclusion, Most of the conclusions are obvious. Use this data set for further behavior prediction, for example, User portrait, It's also unrealistic. in addition, The particularity of this dataset lies in its uniqueness, Different from the publicly available data such as Weibo, Such data can only be collected by human organizations,
So even if the dataset is large enough, The model based on it is difficult to have practical value.





therefore, Suppose I have enough group members, I can collect their gender, Occupation and income, Time to share red envelopes online with them, type, frequency, Some interesting economic conclusions may be drawn. further, If we can get the click status of each red packet group member, This adds another data dimension, You can optimize red packet sending by combining time, title ads and click through rate, You can also combine other data dimensions within the group to draw users, Behavior prediction, etc. Of course,
All of these are based on the premise that there are enough group members and red packets. Under this premise, We canO2O Make a macro analysis of the development of the industry, Observe the development of the industry from a new perspective.





But the above restrictions, It's not a problem for wechat officials, Wechat uses its platform advantages to connect countlessAPP, Using different data sources, Wechat can be filtered through collaboration(Collaborative
Filtering) And multi perspective learning(Multi-viewLearning) User portrait for more accurate recommendation. Think from another angle, More and more of our actions are BAT
Three collected, People are becoming more and more transparent on the Internet, So the protection of privacy is becoming more and more important, It's not just about self-discipline, But also rely on the state to strengthen legislation.




Through this analysis, My main discovery is the one sidedness of small data sets, It's not that if the number reaches millions, it's not small data, It's about the need
Recognize the limitations of existing datasets, Can't generalize, And try to get comprehensive macro data as much as possible. This has some implications for data mining practitioners.




Wechat group chat records can be accessed through“ Synchronization assistant” Export to computer, Can be exported as a text document, Form or web page format, Combination Excel And related
Python Tool kit, It can easily mine wechat group chat data, Readers can dig their own wechat chat records. I also use the data set used in this article to be anonymously processed and published on the Internet for you to learn and use.




Dataset download address:

http://ytongdou.com/wp-content/uploads/2018/01/W




【 Today's machine learning concepts】

Have a Great Definition


<http://mp.weixin.qq.com/s?__biz=MjM5MTQzNzU2NA==&mid=2651656302&idx=1&sn=7e3f162603083a6865f778574b7be0d7&chksm=bd4c37fd8a3bbeebbe2ae84e56d7e60cfa7bedba204ed304f593a99b010d70d85f38c886ada1&scene=21#wechat_redirect>


<http://mp.weixin.qq.com/s?__biz=MjM5MTQzNzU2NA==&mid=2651656302&idx=1&sn=7e3f162603083a6865f778574b7be0d7&chksm=bd4c37fd8a3bbeebbe2ae84e56d7e60cfa7bedba204ed304f593a99b010d70d85f38c886ada1&scene=21#wechat_redirect>