Scholar indices are intended to measure the contributions of authors to their fields of research. Jorge E. Hirsch suggested the h-index in 2005 as an author-level metric intended to measure both the productivity and citation impact of the publications of an author. An author has index h if h of his or her N papers have at least h citations each, and the other (N-h) papers have no more than h citations each.
In response to a comment, we will use our trusty RISmed package and the PubMed database to develop a script for calculating an h-index, as well as two similar metrics, the m-quotient, and g-index. Here is the code to conduct the search, the citations information is stored in the EUtilitiesSummary()
as Cited()
.
x <- "Yi-Kuo Yu" res <- EUtilsSummary(x, type="esearch", db="pubmed", datetype='pdat', mindate=1900, maxdate=2015, retmax=500) citations <- Cited(res) citations <- as.data.frame(citations)
h-index
Calculating the h-index is just a matter of cleverly arranging the data. Above, we created a data frame with one column containing all the values of Cited()
in our search. We will sort them in descending order, then make a new column with the index values. The highest index value that is greater than the number of citations is that author’s h-index. The following code will return that index number.
citations <- citations[order(citations$citations,decreasing=TRUE),] citations <- as.data.frame(citations) citations <- cbind(id=rownames(citations),citations) citations $id<- as.character(citations$id) citations $id<- as.numeric(citations$id) hindex <- max(which(citations$id<=citations$citations)) hindex 12
Here is the data frame we created above that shows that Dr. Yi-Kuo Yu has an h-index of 12, since he has 12 publications with 12 or more citations.
citations id citations 1 181 2 62 3 34 4 31 5 23 6 19 7 19 8 18 9 14 10 14 11 13 12 13 13 10 14 8
m-quotient
Although the h-index is a useful metric to measure an author’s impact, it has some disadvantages. For instance, a long, less impactful career will typically outscore a superstar junior scientist. For these cases, the m-quotient divides the h-index by the number of years since the author’s first publication. In this sense it is a way to normalize by career span.
y <- YearPubmed(EUtilsGet(res)) low <- min(y) high <- max(y) den <- high-low mquotient <- hindex/den mquotient 0.92
g-index
Another weakness of the h-index is that doesn’t take into account highly cited publications. It doesn’t matter if an author has a few highly cited publications, he gets the same h-index as a relatively obscure author. The g-index was developed to address this situation. The g-index is the largest rank (where papers are arranged in decreasing order of the number of citations they received) such that the first g papers have (together) at least g^2 citations”. Here is code to calculate the g-index.
citations$square <- citations$id^2 citations$sums <- cumsum(citations$citations) gindex <- max(which(citations$square<citations$sums)) gindex 22
We made two new columns, one for the squares of the index column and one for the cumulative sum of the citations in descending order. Similar to the h-index, we need the index of the highest squared index value that is less than the cumulative sum. Our output with the two new columns below shows that Dr. Yu has a g-score of 22, based on the fact that especially his top two publications have many citations.
citations id citations square sums 1 181 1 181 2 62 4 243 3 34 9 277 4 31 16 308 5 23 25 331 6 19 36 350 7 19 49 369 8 18 64 387 9 14 81 401 10 14 100 415 11 13 121 428 12 13 144 441 13 10 169 451 14 8 196 459 15 7 225 466 16 7 256 473 17 7 289 480 18 7 324 487 19 7 361 494 20 7 400 501 21 6 441 507 22 5 484 512 23 4 529 516 24 4 576 520
Check out the updated Shiny App to let the App do the work for you.