| Title: | How Fair Are You When You Publish/Cite Scientific Works? |
|---|---|
| Description: | Provides a user-friendly way to compute the non-profit and academic friendly ratio of the bibliographic reference list before submitting a manuscript for peer review. |
| Authors: | Nicolas Casajus [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-5537-5294>) |
| Maintainer: | Nicolas Casajus <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.0.0 |
| Built: | 2026-05-31 09:55:39 UTC |
| Source: | https://github.com/FRBCesab/fairpub |
This helper cleans DOIs (Digital Object Identifier) by removing prefix
(doi:, https://doi.org/ and http://dx.doi.org/) and using lower case.
fp_clean_doi(doi = NULL)fp_clean_doi(doi = NULL)
doi |
a |
A character of DOI without prefix and in lower case.
dois <- c( "10.1098/rsos.160384", "10.1098/RSOS.160384", "doi: 10.1098/rsos.160384", "http://dx.doi.org/10.1098/rsos.160384", "https://doi.org/10.1098/rsos.160384", "HTTPS://DOI.ORG/10.1098/RSOS.160384", NA ) fp_clean_doi(dois)dois <- c( "10.1098/rsos.160384", "10.1098/RSOS.160384", "doi: 10.1098/rsos.160384", "http://dx.doi.org/10.1098/rsos.160384", "https://doi.org/10.1098/rsos.160384", "HTTPS://DOI.ORG/10.1098/RSOS.160384", NA ) fp_clean_doi(dois)
Scientific journals operate over a broad spectrum of publishing strategies, from strictly for-profit, to non-profit, and in-between business models (e.g. for-profit but academic friendly journals).
From a list of references, this function computes three citation ratios: the proportion of non-profit citations, the proportion of for-profit and academic friendly citations, and the proportion of for-profit and non-academic friendly citations (Beck et al. 2026).
It uses the OpenAlex bibliographic database (https://openalex.org) to retrieve journal names from article DOI and the DAFNEE database (https://dafnee.isem-evolution.fr/) to get the business model and the academic friendly status of journals.
fp_compute_citation_ratio(doi = NULL)fp_compute_citation_ratio(doi = NULL)
doi |
a |
A list of two elements:
summary, a data.frame with two columns (metric and value)
reporting the following statistics:
number of total references (length of doi argument)
number of references with DOI
number of deduplicated references
number of references found in the OpenAlex database
number of references whose journal is indexed in the DAFNEE database
number of non-profit and academic friendly references
number of for-profit and academic friendly references
number of for-profit and non academic friendly references
ratios, a vector of three ratios:
non-profit and academic friendly ratio
for-profit and academic friendly ratio
for-profit and non academic friendly ratio
Beck M et al. (2026) Citation self-awareness for a fairer academic publishing landscape. BioScience. DOI: doi:10.1093/biosci/biag028
# Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') # Path to the BibTeX provided by <fairpub> ---- filename <- system.file( file.path("extdata", "references.bib"), package = "fairpub" ) # Extract DOI from BibTeX ---- doi_list <- fp_extract_doi(file = filename) # Print DOI ---- head(doi_list) ## Not run: # Compute citation ratio ---- fp_compute_citation_ratio(doi_list) #> $summary #> metric value #> 1 Total references 38 #> 2 References with DOI 33 #> 3 Deduplicated references 33 #> 4 References found in OpenAlex 33 #> 5 References found in DAFNEE 11 #> 6 Non-profit and academic friendly references 9 #> 7 For-profit and academic friendly references 2 #> 8 For-profit and non-academic friendly references 0 #> #> $ratios #> Non-profit and academic friendly For-profit and academic friendly #> 0.82 0.18 #> For-profit and non-academic friendly #> 0.00 ## End(Not run)# Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') # Path to the BibTeX provided by <fairpub> ---- filename <- system.file( file.path("extdata", "references.bib"), package = "fairpub" ) # Extract DOI from BibTeX ---- doi_list <- fp_extract_doi(file = filename) # Print DOI ---- head(doi_list) ## Not run: # Compute citation ratio ---- fp_compute_citation_ratio(doi_list) #> $summary #> metric value #> 1 Total references 38 #> 2 References with DOI 33 #> 3 Deduplicated references 33 #> 4 References found in OpenAlex 33 #> 5 References found in DAFNEE 11 #> 6 Non-profit and academic friendly references 9 #> 7 For-profit and academic friendly references 2 #> 8 For-profit and non-academic friendly references 0 #> #> $ratios #> Non-profit and academic friendly For-profit and academic friendly #> 0.82 0.18 #> For-profit and non-academic friendly #> 0.00 ## End(Not run)
This function detects and extracts DOI from bibliographic records. User can
provides either a character vector (argument x) or the path to a BibTex
file (argument file).
fp_extract_doi(x = NULL, file = NULL)fp_extract_doi(x = NULL, file = NULL)
x |
a |
file |
a |
A character vector with extracted DOI. Some values can be NA in case of
books, chapters, etc. or if references are malformed in the BibTeX.
# Argument 'x' (one DOI per element) ---- string <- c( "Beck M (2026) Citation self-awareness... 10.1093/biosci/biag028.", "Galtier N (2026) Time to publish... DOI: 10.32942/X24933", "Doe J (9999) Title... http://dx.doi.org/10.1162/qss(c)_00305", "Receveur A (2024) David vs Goliath... https://doi.org/10.1111/ele.14395", "Smith J (9999) This is a fake article." ) ## Extract DOI from a vector ---- fp_extract_doi(x = string) # Argument 'x' (many DOI per element) ---- string <- paste(string, collapse = "\n") cat(string) ## Extract DOI from a vector ---- fp_extract_doi(x = string) # Argument 'file' ---- ## Path to the BibTeX provided by <fairpub> ---- filename <- system.file( file.path("extdata", "references.bib"), package = "fairpub" ) ## Extract DOI from BibTeX ---- fp_extract_doi(file = filename)# Argument 'x' (one DOI per element) ---- string <- c( "Beck M (2026) Citation self-awareness... 10.1093/biosci/biag028.", "Galtier N (2026) Time to publish... DOI: 10.32942/X24933", "Doe J (9999) Title... http://dx.doi.org/10.1162/qss(c)_00305", "Receveur A (2024) David vs Goliath... https://doi.org/10.1111/ele.14395", "Smith J (9999) This is a fake article." ) ## Extract DOI from a vector ---- fp_extract_doi(x = string) # Argument 'x' (many DOI per element) ---- string <- paste(string, collapse = "\n") cat(string) ## Extract DOI from a vector ---- fp_extract_doi(x = string) # Argument 'file' ---- ## Path to the BibTeX provided by <fairpub> ---- filename <- system.file( file.path("extdata", "references.bib"), package = "fairpub" ) ## Extract DOI from BibTeX ---- fp_extract_doi(file = filename)
By querying the OpenAlex bibliographic database (https://openalex.org) and the DAFNEE database (https://dafnee.isem-evolution.fr/), this function returns the business model and the academic friendly status of an article (more precisely the fairness status of the journal).
fp_get_article_fairness(doi = NULL)fp_get_article_fairness(doi = NULL)
doi |
a |
A data.frame with two columns: journal, the journal name, and
fairness, the fairness status with the following possible values:
Non-profit and academic friendly
For-profit and academic friendly
For-profit and non academic friendly
Record not found in OpenAlex
Record not found in DAFNEE database
# Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') ## Not run: # Fairness status ---- fp_get_article_fairness(doi = "10.1126/science.162.3859.1243") #> journal fairness #> 1 Science Non-profit and academic friendly fp_get_article_fairness(doi = "10.1111/j.1461-0248.2005.00792.x") #> journal fairness #> 1 Ecology Letters For-profit and academic friendly fp_get_article_fairness(doi = "10.1038/35002501") #> journal fairness #> 1 Nature For-profit and non-academic friendly # Article not found in OA ---- fp_get_article_fairness(doi = "10.xxxx/xxxx") #> journal fairness #> 1 NA Record not found in OpenAlex # Journal not found in the DAFNEE database ---- fp_get_article_fairness(doi = "10.21105/joss.05753") #> journal fairness #> 1 The Journal of Open Source Software Record not found in DAFNEE database ## End(Not run)# Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') ## Not run: # Fairness status ---- fp_get_article_fairness(doi = "10.1126/science.162.3859.1243") #> journal fairness #> 1 Science Non-profit and academic friendly fp_get_article_fairness(doi = "10.1111/j.1461-0248.2005.00792.x") #> journal fairness #> 1 Ecology Letters For-profit and academic friendly fp_get_article_fairness(doi = "10.1038/35002501") #> journal fairness #> 1 Nature For-profit and non-academic friendly # Article not found in OA ---- fp_get_article_fairness(doi = "10.xxxx/xxxx") #> journal fairness #> 1 NA Record not found in OpenAlex # Journal not found in the DAFNEE database ---- fp_get_article_fairness(doi = "10.21105/joss.05753") #> journal fairness #> 1 The Journal of Open Source Software Record not found in DAFNEE database ## End(Not run)
By querying the OpenAlex bibliographic database (https://openalex.org) and the DAFNEE database (https://dafnee.isem-evolution.fr/), this function returns the business model and the academic friendly status of a journal.
fp_get_journal_fairness(journal = NULL)fp_get_journal_fairness(journal = NULL)
journal |
a |
A data.frame with two columns: journal, the journal name, and
fairness, the fairness status with the following possible values:
Non-profit and academic friendly
For-profit and academic friendly
For-profit and non academic friendly
Record not found in OpenAlex
Record not found in DAFNEE database
# Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') ## Not run: # Fairness status ---- fp_get_journal_fairness("Science") #> journal fairness #> 1 Science Non-profit and academic friendly # Fuzzy search ---- fp_get_journal_fairness("Science of Nature") #> No exact match found! #> The fuzzy search returns these three best candidates: #> 'The Science of Nature' #> 'Science Advances' #> 'People and Nature' fp_get_journal_fairness("The Science of Nature") #> journal fairness #> 1 The Science of Nature For-profit and non-academic friendly ## End(Not run)# Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') ## Not run: # Fairness status ---- fp_get_journal_fairness("Science") #> journal fairness #> 1 Science Non-profit and academic friendly # Fuzzy search ---- fp_get_journal_fairness("Science of Nature") #> No exact match found! #> The fuzzy search returns these three best candidates: #> 'The Science of Nature' #> 'Science Advances' #> 'People and Nature' fp_get_journal_fairness("The Science of Nature") #> journal fairness #> 1 The Science of Nature For-profit and non-academic friendly ## End(Not run)
Queries the OpenAlex bibliographic database (https://openalex.org) to retrieve an author's identifier.
fp_get_openalex_author_id(author = NULL, n = 10)fp_get_openalex_author_id(author = NULL, n = 10)
author |
a |
n |
an |
A data frame with the following columns:
OpenAlex author ID
Author name in OpenAlex
ORCID identifier
Number of publications
## Not run: # Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') fp_get_openalex_author_id("Nicolas Casajus") #> id display_name orcid works_count #> 1 A5004806463 Nicolas Casajus 0000-0002-5537-5294 102 fp_get_openalex_author_id("Nicolas Mouquet") #> id display_name orcid works_count #> 1 A5001034207 Nicolas Mouquet 0000-0003-1840-6984 210 ## End(Not run)## Not run: # Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') fp_get_openalex_author_id("Nicolas Casajus") #> id display_name orcid works_count #> 1 A5004806463 Nicolas Casajus 0000-0002-5537-5294 102 fp_get_openalex_author_id("Nicolas Mouquet") #> id display_name orcid works_count #> 1 A5001034207 Nicolas Mouquet 0000-0003-1840-6984 210 ## End(Not run)
Queries the OpenAlex bibliographic database (https://openalex.org) to retrieve works associated with an OpenAlex author identifier. Optionally filters publication types and incomplete records.
fp_get_openalex_author_works( author_id = NULL, select = c("article", "review", "letter"), drop_na = TRUE )fp_get_openalex_author_works( author_id = NULL, select = c("article", "review", "letter"), drop_na = TRUE )
author_id |
a |
select |
a |
drop_na |
a |
This function is a wrapper around the OpenAlex API using the
openalexR package. Results are automatically standardized and
cleaned for downstream bibliometric analyses.
Some repositories and preprint servers (e.g. Zenodo, HAL, bioRxiv, figshare) may be excluded depending on the selected work types.
A data frame containing one row per work with the following columns:
OpenAlex work identifier
Work (first) author
Work title
Year of publication
Journal or source name
OpenAlex source identifier
Digital Object Identifier
Citation count in OpenAlex
OpenAlex work type
## Not run: # Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') fp_get_openalex_author_works("A5004806463") #> id authors title #> 1 W7143431770 Brunno F. Oliveira et al. Species range shifts often... #> 2 W7153879999 Miriam Beck et al. Citation self-awareness for... #> 3 W4406766122 Érica Rievrs Borges et al. Road‐river intersectio... #> 4 W4415048605 Jonathan Bonfanti et al. Geographic, taxonomic and metr... #> 5 W4411408576 Matthew McLean et al. Conserving the beauty of... #> 6 W4415113473 Nicolas Casajus et al. forcis: An R package... #> publication_year source_display_name #> 1 2026 Proceedings of the National Academy of Sciences #> 2 2026 BioScience #> 3 2025 Applied Vegetation Science #> 4 2025 Ecology Letters #> 5 2025 Proceedings of the National Academy of Sciences #> 6 2025 The Journal of Open Source Software #> source_id doi cited_by_count type #> 1 S125754415 10.1073/pnas.2515903123 1 article #> 2 S121830084 10.1093/biosci/biag028 0 article #> 3 S179963793 10.1111/avsc.70011 0 article #> 4 S80967739 10.1111/ele.70220 1 review #> 5 S125754415 10.1073/pnas.2415931122 1 article #> 6 S4210214273 10.21105/joss.09217 0 article ## End(Not run)## Not run: # Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') fp_get_openalex_author_works("A5004806463") #> id authors title #> 1 W7143431770 Brunno F. Oliveira et al. Species range shifts often... #> 2 W7153879999 Miriam Beck et al. Citation self-awareness for... #> 3 W4406766122 Érica Rievrs Borges et al. Road‐river intersectio... #> 4 W4415048605 Jonathan Bonfanti et al. Geographic, taxonomic and metr... #> 5 W4411408576 Matthew McLean et al. Conserving the beauty of... #> 6 W4415113473 Nicolas Casajus et al. forcis: An R package... #> publication_year source_display_name #> 1 2026 Proceedings of the National Academy of Sciences #> 2 2026 BioScience #> 3 2025 Applied Vegetation Science #> 4 2025 Ecology Letters #> 5 2025 Proceedings of the National Academy of Sciences #> 6 2025 The Journal of Open Source Software #> source_id doi cited_by_count type #> 1 S125754415 10.1073/pnas.2515903123 1 article #> 2 S121830084 10.1093/biosci/biag028 0 article #> 3 S179963793 10.1111/avsc.70011 0 article #> 4 S80967739 10.1111/ele.70220 1 review #> 5 S125754415 10.1073/pnas.2415931122 1 article #> 6 S4210214273 10.21105/joss.09217 0 article ## End(Not run)
Queries the OpenAlex bibliographic database (https://openalex.org) to retrieve metadata about publications matching a given title, including their DOI.
fp_get_openalex_doi(title = NULL, n = 10)fp_get_openalex_doi(title = NULL, n = 10)
title |
a |
n |
an |
A data frame with the following columns:
Title of the publication
Year of publication
Journal or source name
Digital Object Identifier (DOI)
## Not run: # Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') # Search for an full title ---- fp_get_openalex_doi( "Citation self-awareness for a fairer academic publishing landscape" ) #> display_name publication_year #> 1 Citation self-awareness for a fairer academic... 2026 #> source_display_name doi #> 1 BioScience 10.1093/biosci/biag028 # Search for a partial title ---- fp_get_openalex_doi("Citation fairer academic landscape") #> display_name publication_year #> 1 Strategic citations for a fairer academic... 2025 #> 2 Citation self-awareness for a fairer academic... 2026 #> source_display_name doi #> 1 bioRxiv (Cold Spring Harbor Laboratory) 10.1101/2025.08.06.668908 #> 2 BioScience 10.1093/biosci/biag028 ## End(Not run)## Not run: # Be polite and send your email to OpenAlex API ---- options(openalexR.mailto = '[email protected]') # Search for an full title ---- fp_get_openalex_doi( "Citation self-awareness for a fairer academic publishing landscape" ) #> display_name publication_year #> 1 Citation self-awareness for a fairer academic... 2026 #> source_display_name doi #> 1 BioScience 10.1093/biosci/biag028 # Search for a partial title ---- fp_get_openalex_doi("Citation fairer academic landscape") #> display_name publication_year #> 1 Strategic citations for a fairer academic... 2025 #> 2 Citation self-awareness for a fairer academic... 2026 #> source_display_name doi #> 1 bioRxiv (Cold Spring Harbor Laboratory) 10.1101/2025.08.06.668908 #> 2 BioScience 10.1093/biosci/biag028 ## End(Not run)
Groups potentially duplicate bibliographic records by computing pairwise string distances between work titles and clustering similar items.
fp_identify_duplicate_works( data = NULL, string_dist = "lv", hclust_method = "single", threshold = 0.2 )fp_identify_duplicate_works( data = NULL, string_dist = "lv", hclust_method = "single", threshold = 0.2 )
data |
a |
string_dist |
a |
hclust_method |
a |
threshold |
a |
Title similarity is computed after basic text normalization
(lowercasing, punctuation removal, whitespace trimming).
Distances are calculated using stringdist::stringdistmatrix() and
normalized by title length before hierarchical clustering.
This function does not remove duplicates but assigns a cluster identifier that can be used for downstream deduplication or grouping.
The input data.frame with an additional column:
Integer cluster identifier grouping similar titles.
## Not run: df <- data.frame( title = c( "Deep Learning for NLP", "Deep learning for natural language processing", "Quantum Computing Basics" ) ) fp_identify_duplicate_works(df) ## End(Not run)## Not run: df <- data.frame( title = c( "Deep Learning for NLP", "Deep learning for natural language processing", "Quantum Computing Basics" ) ) fp_identify_duplicate_works(df) ## End(Not run)
The DAFNEE database (Database of Academia‑Friendly Journals in Ecology and Evolution, https://dafnee.isem-evolution.fr/) provides the business model and the academic friendly status of several journals in the field of Ecology and Evolution.
The fairpub package provides a selection of 287 DAFNEE journals and
this function returns information about these journals.
fp_list_dafnee_journals()fp_list_dafnee_journals()
A data.frame with three columns:
journal, the name of the journal
business_model, the business model of the journal (non-profit or
for-profit)
academic_friendly, the academic friendly status of the journal (yes or
no)
# List DAFNEE journals in fairpub ---- journals <- fp_list_dafnee_journals() # Number of journals ---- nrow(journals) # Preview of the outputs ---- head(journals)# List DAFNEE journals in fairpub ---- journals <- fp_list_dafnee_journals() # Number of journals ---- nrow(journals) # Preview of the outputs ---- head(journals)
Returns the set of work types recognized by the OpenAlex database and used for filtering bibliographic records.
fp_list_openalex_work_types()fp_list_openalex_work_types()
These work types correspond to the classification system used by
OpenAlex to describe scholarly outputs. They can be used to filter
results in functions such as fp_get_openalex_author_works().
a character vector of valid OpenAlex work types.
fp_list_openalex_work_types()fp_list_openalex_work_types()