Title: | Link Preprints And Publications By DOI |
---|---|
Description: | Links preprints to publications using the method described in Cabanac G, Oikonomidi T, Boutron I. "Day-to-day discovery of preprint-publication links". Scientometrics. 2021;1–20. DOI: 10.1007/s11192-021-03900-7. |
Authors: | Luke Zappia [aut, cre] (<https://orcid.org/0000-0001-7744-8565>, lazappi) |
Maintainer: | Luke Zappia <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1 |
Built: | 2025-01-21 03:55:31 UTC |
Source: | https://github.com/lazappi/doilinker |
Calculate the Jaccard similarity between two strings. Strings are first tokenised, stop words are removed and tokens are stemmed.
calc_jaccard_similarity(string1, string2)
calc_jaccard_similarity(string1, string2)
string1 |
First string to score similarity |
string2 |
Second string to score similarity |
Jaccard similarity score
Convert the authors table from Crossref into a single string. It has the form "LastName1, FirstName1; LastName2, FirstName2;...".
get_cr_authors_str(cr_authors)
get_cr_authors_str(cr_authors)
cr_authors |
Authors data.frame from a Crossref query |
Character vector with authors names
Return the ORCiD for the first author of a reference.
get_first_orcid(cr_authors)
get_first_orcid(cr_authors)
cr_authors |
Authors data.frame from a Crossref query |
Character vector with ORCiD or NA
if not found
Check whether a DOI is from a preprint server.
is_doi_preprint(doi, container_title)
is_doi_preprint(doi, container_title)
doi |
DOI to check |
container_title |
Whether or not there is a container title (journal name) associated with the DOI |
Logical whether the DOI is from a preprint
Decide if a results from a DOI query is a match or not.
is_result_match( query_title, result_title, query_orcid, result_orcid, query_authors, result_authors )
is_result_match( query_title, result_title, query_orcid, result_orcid, query_authors, result_authors )
query_title |
Title of the query reference |
result_title |
Title of the result reference |
query_orcid |
ORCiD of the query first author |
result_orcid |
ORCiD of the result first author |
query_authors |
Authors string for the query |
result_authors |
Authors string for the result |
Result is a match if the similarity between titles is sufficiently high or if the similarity is lower by the first author ORCiDs match or the first author names match.
Logical whether the result is a match or not
Check if two author strings have the same first author.
same_first_author(authors_str1, authors_str2)
same_first_author(authors_str1, authors_str2)
authors_str1 |
Character vector containing first authors list |
authors_str2 |
Character vector containing second authors list |
Last names of the first authors are compared. If these match then first names
are checked and TRUE
is returned if either the whole first name or first
initial matches. If first and last names cannot be separated the whole author
names are compared. Some simplification of characters in names is performed
to improve matches.
Logical whether first authors are the same
Search Crossref for potential linked references for a DOI
search_doi_links( doi, preprint = NULL, limit = 20, filter_matches = FALSE, verbose = TRUE )
search_doi_links( doi, preprint = NULL, limit = 20, filter_matches = FALSE, verbose = TRUE )
doi |
DOI to query |
preprint |
Logical. Whether |
limit |
Number of Crossref results to return. Default is 20, max is 1000. |
filter_matches |
Logical. If |
verbose |
Logical. Whether or not to print progress messages. |
Based on the method described in "Day-to-day discovery of preprint–publication links" https://doi.org/10.1007/s11192-021-03900-7 and code from https://github.com/gcabanac/preprint-publication-linker. Query filters have been modified to allow reverse linking publications to preprints as well as preprints to publications.
tibble of potential links
search_doi_links("10.1101/133173")
search_doi_links("10.1101/133173")