|
|
Measuring Corporate Innovation Based on LDA Topic Model |
Ye Qin1,Cai Jianfeng1,Zhang Qiuyun2 |
(1.School of Management, Northwestern Polytechnical University;2.School of Computer Science, Northwestern Polytechnical University, Xi′an 710129, China) |
|
|
Abstract The Chinese government firmly adheres to the path of independent innovation with Chinese characteristics and implement an innovation-driven development strategy. As the main body of innovation, enterprises play a pivotal role in promoting national innovation and transformation, therefore the research on corporate innovation has received extensive attention from the academic community. Scholars have carried out a variety of theoretical and empirical studies around corporate innovation and have obtained some remarkable achievements. However, the important issue of how to accurately measure corporate innovation waits to be addressed. This problem is challenging for both the academic circles and the industrial field especially under the background of the national innovation-driven development strategy in China. The current mainstream proxy indicators of corporate innovation, such as numbers of patents and research and development (R&D) expenditures, have recently been criticized since they can only reflect some aspects of corporate innovation, while ignoring other vital parts of corporate innovation activities. Wherefore this paper tries to develop a new method to comprehensively and accurately measure corporate innovation based on text analysis using the natural language processing technique and machine learning algorithms. #br#This research introduces the unsupervised learning method in the field of machine learning and develops a new method of measuring corporate innovation by constructing the Latent Dirichlet Allocation (LDA) topic model based on the text of analyst reports of listed companies. The textual content of analyst reports covers both the objective description and professional evaluation on various aspects of corporate innovation, such as product innovation, process innovation, market innovation, supply source innovation and so on. Besides, it has similar characteristics in terms of text structure and wording, which lays a good foundation for the use of LDA topic modeling method. To start with, Python3.8 is applied to write a program to automatically download all the analyst reports issued for China′s A-share listed companies from 2010 to 2019 from Hexun Finance Website, Sina Finance Website and Wind Financial Terminal. A total of 201 569 analyst reports are obtained. After a series of data cleaning, The study gets 47 563 samples which are used as a corpus to train the LDA topic model, identify the corporate innovation topic, calculate the load intensity of each analyst report on the corporate innovation topic, and extract the corporate innovation topic load intensity as text-based corporate innovation, since the load intensity reflects the extent to which the analyst report describes the corporate with innovation topic reflecting the corporate′s innovation practice. The text-based corporate innovation by the new method is compared with commonly used proxy indicators of corporate innovation. #br#This study finds that the text-based corporate innovation measurement method is applicable to companies with and without patents as well as R&D expenditures. For firms with patents, text-based corporate innovation is significantly related to patent applications. While for firms without patents, the new measurement method can effectively identify the innovative practices including but not limited to using new technologies and entering new markets. The same goes for firms with and without R&D expenditures. For firms with R&D expenditures, text-based corporate innovation is significantly related to R&D expenditures, while for firms without R&D expenditures, text-based corporate innovation can efficiently capture corporate innovation activities. The time series analysis shows that the text-based corporate innovation effectively reflects the macro trend of corporate innovation during the sample period. #br#This research is of theoretical and practical significance for it not only systematically clarifies the traditional incomplete and inaccurate proxy indicators of corporate innovation, but also figures out a new method of measuring corporate innovation based on the text analysis of analyst reports. It further broadens the application of text big data in the field of management and organization studies, and contributes to the application of textual data in the field of management and organization research.#br#
|
Received: 17 April 2022
|
|
|
|
|
[1] 朱雪忠,胡成.专利是测度企业技术创新绩效的有效工具吗[J].科学学研究,2021,39(8):1498-1503. [2] HALL B,HELMERS C,ROGERS M,et al.The choice between formal and informal intellectual property:a review[J].Journal of Economic Literature,2014,52(2):375-423. [3] 沈艳,陈赟,黄卓.文本大数据分析在经济学和金融学中的应用:一个文献综述[J].经济学(季刊),2019,18(4):1153-1186. [4] 曹丽娜,唐锡晋.基于主题模型的BBS话题演化趋势分析[J].管理科学学报,2014,17(11):109-121. [5] BELLSTAM G,BHAGAT S,COOKSON J A.A text-based analysis of corporate innovation[J].Management Science,2021,67(7):4004-4031. [6] AHUJA G,LAMPERT C M,TANDON V.Moving beyond schumpeter:management research on the determinants of technological innovation[J].Academy of Management annals,2008,2(1):1-98. [7] CHEN J-S,TSOU H-T,CHING R K.Co-production and its effects on service innovation[J].Industrial Marketing Management,2011,40(8):1331-1346. [8] 解学梅,左蕾蕾.企业协同创新网络特征与创新绩效:基于知识吸收能力的中介效应研究[J].南开管理评论,2013,16(3):47-56. [9] 朱磊,陈曦,王春燕.国有企业混合所有制改革对企业创新的影响[J].经济管理,2019,41(11):72-91. [10] KHALILI H,NEJADHUSSEIN S,FAZEL A.The influence of entrepreneurial orientation on innovative performance:study of a petrochemical company in Iran[J].Journal of Knowledge-based Innovation in China,2013,5(3):262-278. [11] ZHOU K Z,GAO G Y,ZHAO H.State ownership and firm innovation in China:an integrated view of institutional and efficiency logics[J].Administrative Science Quarterly,2017,62(2):375-404. [12] 郝项超,梁琪.非高管股权激励与企业创新:公平理论视角[J].金融研究,2022,65(3):171-188. [13] BROCKMAN B K,MORGAN R M.The role of existing knowledge in new product innovativeness and performance[J].Decision Sciences,2003,34(2):385-419. [14] 余明桂,钟慧洁,范蕊.业绩考核制度可以促进央企创新吗[J].经济研究,2016,51(12):104-117. [15] HIRSHLEIFER D,LOW A,TEOH S H.Are overconfident CEOs better innovators[J].The Journal of Finance,2012,67(4):1457-1498. [16] 赵晶,陈宣雨,迟旭.基于文本分析的企业国际化测量方法及应用研究[J].中国软科学,2021,36(1):136-146. [17] BAKER S R,BLOOM N,DAVIS S J.Measuring economic policy uncertainty[J].The quarterly Journal of Economics,2016,131(4):1593-1636. [18] GULEN H,ION M.Policy uncertainty and corporate investment[J].The Review of Financial Studies,2016,29(3):523-564. [19] JIANG F,LEE J,MARTIN X,et al.Manager sentiment and stock returns[J].Journal of Financial Economics,2019,132(1):126-149. [20] 林煜恩,李欣哲,卢扬,等.管理层语调的信号和迎合:基于中国上市企业创新的研究[J].管理科学,2020,33(4):53-66. [21] LI F.Annual report readability,current earnings,and earnings persistence[J].Journal of Accounting and economics,2008,45(2-3):221-247. [22] 孟庆斌,杨俊华,鲁冰.管理层讨论与分析披露的信息含量与股价崩盘风险——基于文本向量化方法的研究[J].中国工业经济,2017,35(12):132-150. [23] THORSRUD L A.Words are the new numbers:a newsy coincident index of the business cycle[J].Journal of Business & Economic Statistics,2020,38(2):393-409. [24] 徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436. [25] DEERWESTER S,DUMAIS S T,FURNAS G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407. [26] HOFMANN T.Probabilistic latent semantic indexing[C].Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,1999:50-57. [27] BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].The Journal of Machine Learning Research,2003,3(1):993-1022. [28] HANSEN S,MCMAHON M.Shocking language:understanding the macroeconomic effects of central bank communication[J].Journal of International Economics,2016,99:S114-S133. [29] HANSEN S,MCMAHON M,PRAT A.Transparency and deliberation within the FOMC:a computational linguistics approach[J].The Quarterly Journal of Economics,2018,133(2):801-870. [30] 马黎珺,伊志宏,张澈.廉价交谈还是言之有据——分析师报告文本的信息含量研究[J].管理世界,2019,35(7):182-200. [31] HUANG A H,ZANG A Y,ZHENG R.Evidence on the information content of text in analyst reports[J].The Accounting Review,2014,89(6):2151-2180. [32] TEH Y W,JORDAN M I,BEAL M J,et al.Hierarchical dirichlet processes[J].Journal of the American Statistical Association,2006,101(476):1566-1581. [33] 陈劲,郑刚.创新管理:赢得持续竞争优势(第三版)[M].北京:北京大学出版社,2016. [34] LOUGHRAN T,MCDONALD B.When is a liability not a liability? textual analysis,dictionaries,and 10-Ks[J].The Journal of Finance,2011,66(1):35-65.
|
|
|
|