# Regression for citation data: An evaluation of different methods

@article{Thelwall2014RegressionFC, title={Regression for citation data: An evaluation of different methods}, author={Mike A Thelwall and Paul Wilson}, journal={ArXiv}, year={2014}, volume={abs/1510.08877} }

Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have… Expand

#### 95 Citations

Stopped Sum Models for Citation Data

- Mathematics, Computer Science
- ISSI
- 2015

This article assesses stopped sum models for citation data and compares them with two previously used models, the discretised lognormal and negative binomial distributions using the Akaike Information Criterion. Expand

Citation count distributions for large monodisciplinary journals

- Computer Science, Mathematics
- J. Informetrics
- 2016

Fitting statistical distributions to 50 large subject-specific journals in the belief that individual journals can be purer than subject categories and may therefore give clearer findings suggests that the discretised lognormal is the more appropriate distribution for modelling pure citation data. Expand

Stopped sum models and proposed variants for citation data

- Mathematics, Computer Science
- Scientometrics
- 2016

Based upon data from 20 Scopus categories, some of the stopped sum variant models had lower AIC values than the discretised lognormal models, which were otherwise the best (with respect to AIC). Expand

The h index for research assessment: Simple and popular, but shown by mathematical analysis to be inconsistent and misleading

- Computer Science, Mathematics
- ArXiv
- 2020

In synthetic series, the number of citations and the mean number of citation are much better indicators of research performance than h and h/N, and it is discussed that this conclusion can be extended to real citation series. Expand

Are the discretised lognormal and hooked power law distributions plausible for citation data?

- Mathematics, Computer Science
- J. Informetrics
- 2016

This article investigates the plausibility of the discretised lognormal and hooked power law distributions for modelling the full range of citation counts, with an offset of 1.0, and finds that both distributions fail a Kolmogorov–Smirnov goodness of fit test. Expand

More precise methods for national research citation impact comparisons

- Mathematics, Computer Science
- J. Informetrics
- 2015

Two new methods to identify national differences in average citation impact are introduced, one based on linear modelling for normalised data and the other using the geometric mean, which has the advantage of distinguishing between national contributions to internationally collaborative articles. Expand

The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression

- Mathematics, Computer Science
- J. Informetrics
- 2016

Comparisons of the discretised lognormal and the hooked power law with citation data are reported, adding 1 to citation counts in order to include zeros. Expand

Double rank analysis for research assessment

- Mathematics, Computer Science
- J. Informetrics
- 2018

The double rank analysis is developed, in which publications that have a low number of citations are also included, in order to achieve the same purpose without restrictions. Expand

Does quality and content matter for citedness? A comparison with para-textual factors and over time

- Computer Science
- J. Informetrics
- 2015

It is found that the JIF has a larger influence on the citation impact of a publication than the quality (measured by judgments of peers). Expand

The application of citation count regression to identify important papers in the literature on non-audit fees

- Economics
- Managerial Auditing Journal
- 2019

PurposeThis paper aims to show that when conducting a literature review, important papers can be identified by regressing citation counts on prior publications’… Expand

#### References

SHOWING 1-10 OF 81 REFERENCES

Universality of performance indicators based on citation and reference counts

- Computer Science, Physics
- Scientometrics
- 2012

This work demonstrates that comparisons can be made between publications from different disciplines and publication dates, regardless of their citation count and without expensive access to the whole world-wide citation graph. Expand

Multiple regression analysis of a patent’s citation frequency and quantitative characteristics: the case of Japanese patents

- Computer Science
- Scientometrics
- 2013

The multiple regression analyses demonstrate that the number of classification of cited patents contributes more to the regression than do other factors, which implies that, if confounding between factors is taken into account, it is the diversity of classifications assigned to backward citations that more largely influences thenumber of forward citations. Expand

Distributions for cited articles from individual subjects and years

- Computer Science, Mathematics
- J. Informetrics
- 2014

The results show that the power law is not a suitable model for collections of articles from a single subject and year, even for the purpose of estimating the slope of the tail of the citation data, and only the hooked power law and discrete lognormal distributions should be considered for subject-and-year-based citation analysis in future. Expand

Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal

- Computer Science, Medicine
- J. Assoc. Inf. Sci. Technol.
- 2010

A large-scale empirical analysis of journals from every field in Thomson Reuters' Web of Science database suggests that the discrete lognormal distribution is a globally accurate model for the distribution of “eventual impact” of scientific papers published in single-discipline journal in a single year that is removed sufficiently from the present date. Expand

International coauthorship and citation impact: A bibliometric study of six LIS journals, 1980-2008

- Geography, Computer Science
- J. Assoc. Inf. Sci. Technol.
- 2011

This researcher proposes geographies of invisible colleagues and a geographic scope effect to further investigate the relationships between author geographic affiliation and citation impact. Expand

How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects

- Mathematics, Computer Science
- J. Informetrics
- 2013

This study will explain what adjusted predictions and marginal effects are and how useful they are for institutional evaluative bibliometrics, and focus particularly on Average Adjusted Predictions (AAPs), Average Marginal Effects (AMEs), adjusted Predictions at Representative Values (APRVs) and Marginal effects at Representative values (MERVs). Expand

On determinants of citation scores: a case study in chemical engineering

- Computer Science
- 1994

Using multiple regression analysis, it is found that the factor ‘top‐author,’ i.e., the ‘personal variation’ contributes the largest number of citations. Expand

Understanding journal usage: A statistical analysis of citation and use

- Computer Science
- J. Assoc. Inf. Sci. Technol.
- 2007

The regression results indicated that print journal use was a significant predictor of local journal citations prior to the adoption of online journals and publisher-provided and locally recorded online journal use measures were also significant predictors of local citations. Expand

Modeling nonuniversal citation distributions: the role of scientific journals

- Economics, Physics
- ArXiv
- 2013

A model for citation networks via an intrinsic nodal weight function and an intuitive ageing mechanism is developed that addresses the intrinsic heterogeneity of a paper determined by the impact factor of the journal publishing it. Expand

How well developed are altmetrics? A cross-disciplinary analysis of the presence of ‘alternative metrics’ in scientific publications

- Computer Science
- Scientometrics
- 2014

The main result of the study is that the altmetrics source that provides the most metrics is Mendeley, with metrics on readerships for 62.6 % of all the publications studied, other sources only provide marginal information. Expand