papers.name module

papers.name.contains_initials(s)[source]
papers.name.deduplicate_words(words, separators)[source]

Remove name duplicates in a list of words

papers.name.has_only_initials(string)[source]
papers.name.is_fully_capitalized(s)[source]
papers.name.match_first_names(pair)[source]

Returns true when the given pair of first names is compatible.

>>> match_first_names(('A','Amanda'))
True
>>> match_first_names(('Amanda','Amanda'))
True
>>> match_first_names(('Alfred','Amanda'))
False
>>> match_first_names(('patrick','P'))
True
>>> match_first_names((None,'Iryna'))
True
>>> match_first_names(('Clément','Clement'))
True
papers.name.match_names(a, b)[source]

Returns a boolean: are these two names compatible? Examples: > (‘Robin’, ‘Ryder’),(‘R.’, ‘Ryder’): True > (‘Robin J.’, ‘Ryder’),(‘R.’, ‘Ryder’): True > (‘R. J.’, ‘Ryder’),(‘J.’, ‘Ryder’): True > (‘R. K.’, ‘Ryder’),(‘K.’, ‘Ryder’): False > (‘Claire’, ‘Mathieu’),(‘Claire’, ‘Kenyon-Mathieu’): False

papers.name.most_similar_author(ref_name, authors)[source]

Given a name, compute the index of the most similar name in the authors list, if there is any compatible name. (None otherwise)

papers.name.name_normalization(ident)[source]
papers.name.name_signature(first, last)[source]
papers.name.name_similarity(a, b)[source]

Returns a float: how similar are these two names? Examples:

>>> int(10*name_similarity(('Robin', 'Ryder'),('Robin', 'Ryder')))
8
>>> int(10*name_similarity(('Robin', 'Ryder'),('R.', 'Ryder')))
4
>>> int(10*name_similarity(('R.', 'Ryder'),('R.', 'Ryder')))
4
>>> int(10*name_similarity(('Robin J.', 'Ryder'),('R.', 'Ryder')))
3
>>> int(10*name_similarity(('Robin J.', 'Ryder'),('R. J.', 'Ryder')))
8
>>> int(10*name_similarity(('R. J.', 'Ryder'),('J.', 'Ryder')))
3
>>> int(10*name_similarity(('Robin', 'Ryder'),('Robin J.', 'Ryder')))
7
>>> int(10*name_similarity(('W. Timothy','Gowers'), ('Timothy','Gowers')))
7
>>> int(10*name_similarity(('Robin K.','Ryder'), ('Robin J.', 'Ryder')))
0
>>> int(10*name_similarity(('Claire', 'Mathieu'),('Claire', 'Kenyon-Mathieu')))
0
>>> int(10*name_similarity(('Amanda P.','Brown'),('Patrick','Brown')))
0
papers.name.name_unification(a, b)[source]

Returns the unified name of two matching names

Parameters:
  • a – the first name pair (pair of unicode strings)
  • b – the second name pair (idem)
Returns:

a unified name pair.

papers.name.normalize_last_name(last)[source]

Removes diacritics and hyphens from last names for comparison

papers.name.normalize_name_words(w)[source]

If it is an initial, ensure it is of the form “T.”, and recapitalize fully capitalized words. Also convert things like “Jp.” to “J.-P.” This function is to be called on first or last names only.

papers.name.num_caps(a)[source]

Number of capitalized letters

papers.name.parse_comma_name(name)[source]

Parse a name of the form “Last name, First name” to (first name, last name) Try to do something reasonable if there is no comma.

papers.name.predsplit_backwards(predicate, words)[source]
papers.name.predsplit_forward(predicate, words)[source]
papers.name.rebuild_name(name_words, separators)[source]

Reconstructs a name string out of words and separators, as returned by split_name_words. len(name_words) = len(separators) + 1 is assumed.

Parameters:
  • name_words – The list of name words (without periods)
  • separators – The list of separators (‘’ or ‘-‘).
papers.name.recapitalize_word(w, force=False)[source]

Turns every fully capitalized word into an uncapitalized word (except for the first character). By default, only do it if the word is fully capitalized.

papers.name.remove_final_comma(w)[source]

Remove all commas following words

papers.name.shallower_name_similarity(a, b)[source]

Same as name_similarity, but accepts differences in the last names. This heuristics is more costly but is only used to attribute an ORCID affiliation to the right author in papers fetched from ORCID. (in the next function)

papers.name.shorten_first_name(string)[source]
papers.name.split_name_words(string)[source]
Returns:A pair of lists. The first one is the list of words, the second is the list of separators (either ‘’ or ‘-‘)
papers.name.to_plain_name(name)[source]

Converts a Name instance to a pair of (firstname,lastname)

papers.name.unify_name_lists(a, b)[source]

Unify two name lists, by matching compatible names and unifying them, and inserting the other names as they are. The names are sorted by average rank in the two lists.

Returns:the unified list of pairs: the first component is the unified name (a pair itself), the second is the pair of indices from the original lists this name was created from (None when there is no corresponding name in one of the lists).
papers.name.weight_first_name(word)[source]
papers.name.weight_first_names(name_pair)[source]
papers.name.zipNone(lstA, lstB)[source]

Just as zip(), but pads with None the shortest list so that the list lengths match