Mapping Global Research Output in Big Data during 2007-16

The paper examines global research in big data, as covered in Scopus database 2007-16, on a series of bibliometric indicators. The study finds that big data registered exceedingly fast growth (135.2%), but averaged low citation impact per paper (3.75) and accounted for very low share of highly cited papers (0.86%) in 10 years. The study reports publication trends in big data research by top countries, top institutions, top authors, top journals, major subject areas, publication modes, and country-level share of international collaborative publications. The study concludes that big data is a subject of recent origin. Given its major potential to impact business, governance, society, healthcare, industry and many other sectors, big data is fast emerging as a major discipline of interest and importance to nations, corporates, and institutions across developed and fast emerging economies.

analytics available to all the parts of the organization that need them to discover valuable insights, make better decisions and solve actual business problems.

1-1-Literature Review
Of late, a number of bibliometric studies have been conducted that are national and global in scope.Amongst these studies, Halevi and Moed [4] analyzed publications data on big data from several perspectives: time line, types of published papers, geographic output, disciplinary output and thematic and conceptual development.They download data from Scopus database and in conclusion described the emergence of global big data as a research topic.Singh, Banshal, Singhal and Uddin [5] studied big data research output published during 2010-14, and as covered in both databases, Web of Knowledge and Scopus, for research growth, authorship patterns, country-level research collaboration patterns, major contributors (countries, institutions and individuals), top publication sources, thematic and emerging themes in the field.Singh and Singh [6] mapped Indian research output in the area of big data published during 2001-15, using Scopus database, for understanding current status, growth, and collaboration trends in big data research and diffusion of big data research in Indian scientific literature.Liu [7] analyzed big data research output (282 records using SSCI database during 2005-15) for understanding distribution of research by publication year, growth pattern, top journals, top subject areas, top countries/territories, academic institutions, top authors and applicability of Lotka's law.Porter, Huang, Schuehle and Youtie [8] presented a meta-analysis of big data research activity, covering 7006 research publications since 2009 from Web of Science database.Using "tech mining" (bibliometric and text analyses of research publication abstract record sets), the authors provided a research landscape of who is doing what, where, and when.Mathisen, Wienhofen and Roman [9] presented the current status of empirical research in big data by mapping the collected research (covering 1778 contributions) according to the labels: variety, volume and velocity.Besides, they identified application areas of big data.The authors concluded that the share of publications conforming to empirical results is well below the average compared to computer science research as a whole.Kalantri et al [10] analyzed 6572 papers in big data field as indexed in in Web of Science TM Core Collection database from 1980 to 19 March 2015 and reported publication trends by document type and language, year of publication, top countries, top journals, top research areas, and top authors.There were a few other bibliometric studies but covering only application dimensions of big data field in medical science.Liao, Liao, Lee, Li, Chiclana and and Zeng [11] used visualization tools (GraphPad Prism 5, VOSviewer and CiteSpace softwares ) to identify annual trends, top authors, top journals, top institutes, country-level citations and Hindex, keywords distribution, highly cited papers, and co-authorship status.Gua, Lia, Lia and Lianga [12] provided an overview of healthcare big data research, research hotspots and future research directions.Youtie, Porter and Hunag [13] examined a dataset of 488 social science and humanities papers written about big data and concluded that eight sub-fields are important in framing social science research about big data.The big data field covering social science is evolving from general sociological considerations towards social science applications, issues, and privacy concerns.

1-2-Objectives
The study analyses the performance of global big data research during 2007-16, based on publications, citations and international collaborative publications covered in Scopus database.The focus of study in particular was to growth characteristics and pattern of world research output and its citation received; global publications output, share and citations of top 12 most productive countries; international collaboration share of top 12 most productive countries; the subject-wise distribution of global research output and its growth and decline; identification of significant keywords; publication productivity and citation impact 100 most productive organizations and authors; leading medium of communication and characteristics of top 96 high cited papers.

2-Methodology
The global research output on big data studies was identified, retrieved and downloaded from the Scopus database (http://www.scopus.com)2007-16, using a well formulated search strategy.The search strategy included using the term "big data" in "keyword tag", "article title tag", and "source title tag" " and restricted search output to period 2007-16 in "date range tag".This main search string was further restricted to individual country by name in "country tag" to ascertain publication output of top 12 most productive countries in big data research.The main search string was also refined by "subject area tag", "country tag", "source title tag", "journal title name" and "affiliation tag" one by one and accordingly determined distribution of big data publications output by subject, collaborating countries, author-wise, organization-wise and journal-wise, etc.For citation data, citations to publications were collected from date of publication till 27 January 2018.

3-1-Publications Distribution
Big data research registered 136.84%CAGR growth, cumulated 26566 publications globally in 10 years during 2007-16, and witnessed a big jump in its annual output from just 2 in 2007 to 11104 publications in 2016.The second-half of the study period (2012-16) cumulated a five-year high of 26477 publications, compared to just 89 during the first-half , an absolute five-year growth 29649%.The big data research averaged 3.75 citations per paper since publication in 10 years during 2007-16, and citation impact of its five-year output dropped from 39.58 CPS in 2007-11 to 3.63 CPS in 2012-16 (Figure 1, Table 1).

3-2-Top 12 Most Productive Countries
In all, 161 countries participated in big data research during 2007-16, but only 12 had lead the field with their cumulative global output of 92.14%; their individual share varied between 2.40% and 27.98% of global output.The USA leads the world with 27.98% global share, followed closely by China (24.58%).Both USA and China account for more than 50% of global share, followed by India (6.62%), U.K. (5.75%), Germany (5.11%), and others (Figure 2, Table 2).
Of the 12 top countries, six registered relative citation Index above the group average of 1.30: USA (1.88), UK (1.77), Canada (1.72), Australia (1.62), Italy (1.48) and Spain Korea (1.31) during the period (Table 2).China, India, Germany, and South Korea --rated as the most productive countries after USA --failed to register above average relative citation score, highlighting thereby gap in their performance in terms of quality and quantity of research.

3-4-Subject-Wise Distribution of Research Output
The global big data research cuts across several disciplines as reflected in Scopus database classification.Computer science is the most studied subject in big data research accounting for 67.99% subject share, the highest compared to other subjects, followed by engineering (42.65%), social sciences (13.80%), and the rest in other subjects as covered in Table 3.

4-6-Significant Keywords
Around 69 significant keywords were identified from the literature that seeks to highlight broad trends in big data research.These keywords are listed in Table 4 in the decreasing order of their occurrence during 2007-16.

4-7-Top 100 Organizations in Big Data Research
Top 100 most productive organizations originated from 15 countries, and they contributed 60 to 347 publications each, accounted for 36.01%(9566) global publications share and 61.33% (61074) global citations share during 2007-16.Of 100 organizations, 70 had originated from just two countries, 36 from USA (with 3232 papers) and 34 from China (3801 papers).The rest originated from 13 countries, 7 of which were from Australia (613 papers), 4 from Hong Kong (275 papers), and others.Singapore registered the highest impact (14.23), the highest h-index (15.50), and Hong Kong the highest international collaborative publications (69.82% of national output).(Figure 3, Table 5)   Top 100 organizations in big data research posted constant decline in publications productivity, but top 40 posted declines faster than the rest 60s.Top 100 organizations revealed inconsistent trend in citations impact.Bottom 60 organizations (associated with lower productivity) posted higher citations per paper compared to top 40s (associated with higher productivity).This implies that top 40 organizations differ in terms of quality and quantity of research far more than the rest 60s.Furthermore, the productivity data analysis reveals that top 10 amongst 100 organizations lead in publications productivity (Figure 4).

3-8-Top 100 Most Productive Authors in Big Data Research
A total of 9269 authors participated in big data research during 2007-16, of which 8452 contributed 1-5 papers each, 621 authors 6-10 papers each, 174 authors 11-20 papers each and 22 authors 21-47 papers each.
Top 100 most productive authors in big data research varied in their productivity from 13 to 47 publications in 10 years.Of these 100 organizations, 24 were from USA (with 422 papers, 19 from China (3410 papers), 12 from Australia (257 papers), 6 each from Italy and U.K. (121 and 99 papers), and others.In terms of citation impact per paper, Sweden registered the highest impact (18.60), followed by Italy (12.59),Australia (12.41) and others (Table 8).Top 100 authors in big data research post consistent fall in their productivity and their citation impact as their ranking order drops 1 to 100.This data reveals that top 10 organizations lead in publications productivity and citation impact (Figure 5).11).

4-Conclusion
The study provides a comprehensive description and analysis of big data research on a series on bibliometric indicators, covering research publications published across the world in 10-year during 2007-16.Big data research cumulated 26566 publications, and averaged 3.75 citations per paper since publication during the period.In addition, the study reports publication trends in big data research by top countries, top institutions, top authors, top journals, and popular subject areas.The study also characterized analytical outcomes on indicators like average citations per paper, relative citation index, average productivity, and country-level international collaboration share.
The study concludes that big data is a subject of recent origin.Given its major potential to impact business, governance, society, healthcare, industry and many other sectors, big data has emerged as a major discipline.Within a decade big data has witnessed big surge in its research growth to 135.6 %.Top countries like USA, China, India, UK, and Germany have played a prominent role in the growth of big data research even as 61 countries in all had participated and contributed to research in the field during the period.Top countries, top organizations, and top authors, however, differ in terms of qualitative dimensions in big data research measured on relative citation index, citations per paper, and high citations per paper count.Highly cited papers output is limited to less than 1 percent (96, 0.86%) of total big data research output in the world.Besides, highly cited papers output is localized to select few countries like USA and China.USA and China are the global leaders in big data research, whereas other high productivity countries in this field are still distant cousins.

Figure 3 .
Figure 3. Top 100 Organizations in Big Data Research by Country of Origin: 2007-16.

Figure 4 .
Figure 4. Top 100 Organizations Big Data Research: Quantity vs Quality Performance.

Figure 5 .
Figure 5. Top 100 Authors in Big Data Research: Quantity vs Quality Performance.

A
total of 1603 journals reported 7538 papers in big data research in 10 years during 2007-16, of which 1025 journals contributed 1-5 papers each, 426 journals 6-10 papers each, 109 journals 11-20 papers each, 18 journals 21-30 papers each, 24 journals 31-100 papers each and 1 journal 135 papers.Of the 1603 reporting journals, 25 accounted for 16.61% share of 7538 journal papers, each reporting 31 to 135 papers during the period.The top most productive journal (with 135 papers) was Big Data, followed by International Journal of Applied Engineering Research (83 papers)Research & Development (72 papers), Future Generation Computer Systems (71 papers), IEEE Access (64 papers), etc. (Table These 96 highly cited papers involved the participation of 397 authors from 269 organizations.Top organizations which contributed 96 highly cited papers include: University of California, Berkeley, USA, MIT, USA and Nayang Technological University, Singapore (5 papers each), Harvard University, USA (4 papers), University of Cincinnati, OH, USA, Institute of Computing Technology, CAS, China and Tsinghua University, China, University of Toronto, Canada, University of Macau and Imperial College London, U.K. (3 papers each), University of Arizona, USA, Florida Atlantic University, USA.Cornell University, Ithaca, USA, Auburn University, USA, Duke University, USA, University of Southern California, USA, University of Pittsburg, USA, University of Michigan, USA, University of California, San Diego, USA, John Hopkins University, USA, University of Science & Technology of China, China, Microsoft Research Asia, China, University of Melbourne, Australia, University of Wollongong, Australia and University College London, U.K. (2 papers each), etc.These 96 highly cited papers were published in 60 journals, with 3 papers each in Communications of the ACM, IEEE Access and Nature, 2 papers each in Big Data, Dialogue in Human Geography, Harvard Business Review, IEEE Communication Survey & Tutorials, IEEE Intelligent Systems, IEEE Signal Processing Magazine, IEEE Transactions on Emerging Topics in Computing, International Journal of Production Economics, Journal of Parallel & Distributed Computing and MIT Sloan Management Review, and 1 paper each in 31 other journals.

75 Share of 12 countries in World Total 92.14 Figure
2. Quantity & Quality Comparative Study of Top Most Productive Countries: 2007-16.

Table 3 . Subject-Wise Break-up of Global Publications on Mobile Research during 2007-16.
*There was overlapping of research output across various subjects

Table 5 . Publication and Citation Profile of Top 100 Organizations by Country of Origin: 2007-16.
Big data research involved participation of a total of 6802 global organizations, of which 5137 contributed 1-5 papers each, 766 organizations 6-10 papers each, 310 organizations 11-20 papers each, 230 organizations 21-40 papers each, 124 organizations 41-100 papers each and 235 organizations 101-348 papers each.On further analyzing these 100 organizations, it was observed that: Ofthe top 100 organizations, 38 registered productivity above the group average of 95.66 publications per organization: Tsinghua University ,China (347 papers), Beijing University of Posts and Telecommunications, China (220 papers), Ministry of Education, China (209 papers), Shanghai Jiao Tong University, China (182 papers), Wuhan University, China (163 papers), National University of Defense Technology, China (161 papers), Carnegie Mellon University, USA (142 papers), Massachusetts Institute of Technology, USA (139 papers), CNRS Centre National de la Recherche Scientifique, France (139 papers), Beihang University, China (136 papers), Peking University, China (136 papers), IBM Thomas J. Watson Research Center, USA (132 papers), etc.Among the 38 organizations, 15 each were from USA and China, 3 from Australia, 1 each from Canada, France, India, Singapore and U.K. The scientific profile of top 20 most productive organizations are shown in