Data Types
TypedDict schemas definitions used internally to structure Author, Publication, and Journal results.
- class scholarly2.data_types.Author[source]
Authorobject used to represent an author entry on Google Scholar.(When source is not specified, the field is present in all sources)
- Parameters:
scholar_id – The id of the author on Google Scholar
name – The name of the author
affiliation – The affiliation of the author
organization – A unique ID of the organization (source: AUTHOR_PROFILE_PAGE)
email_domain – The email domain of the author (source: SEARCH_AUTHOR_SNIPPETS, AUTHOR_PROFILE_PAGE)
url_picture – The URL for the picture of the author
homepage – URL of the homepage of the author
citedby – The number of citations to all publications. (source: SEARCH_AUTHOR_SNIPPETS)
filled – The list of sections filled out of the total set of sections that can be filled
interests – Fields of interest of this Author (sources: SEARCH_AUTHOR_SNIPPETS, AUTHOR_PROFILE_PAGE)
citedby5y – The number of new citations in the last 5 years to all publications. (source: SEARCH_AUTHOR_SNIPPETS)
hindex – The h-index is the largest number h such that h publications have at least h citations. (source: SEARCH_AUTHOR_SNIPPETS)
hindex5y – The largest number h such that h publications have at least h new citations in the last 5 years. (source: SEARCH_AUTHOR_SNIPPETS)
i10index – This is the number of publications with at least 10 citations. (source: SEARCH_AUTHOR_SNIPPETS)
i10index5y – The number of publications that have received at least 10 new citations in the last 5 years. (source: SEARCH_AUTHOR_SNIPPETS)
cites_per_year – Breakdown of the number of citations to all publications over the years (source: SEARCH_AUTHOR_SNIPPETS)
public_access – Number of articles that are available and not available in accordance with public access mandates. (source: SEARCH_AUTHOR_SNIPPETS, AUTHOR_PROFILE_PAGE)
publications – A list of publications objects. (source: SEARCH_AUTHOR_SNIPPETS)
coauthors – A list of coauthors (list of Author objects) (source: SEARCH_AUTHOR_SNIPPETS)
container_type – Used from the source code to identify if this container object is an Author or a Publication object.
source – The place where the author information are derived
- affiliation: str
- citedby: int
- citedby5y: int
- cites_per_year: Dict[int, int]
- coauthors: List
- container_type: str
- email_domain: str
- filled: List[str]
- hindex: int
- hindex5y: int
- homepage: str
- i10index: int
- i10index5y: int
- interests: List[str]
- name: str
- organization: int
- public_access: PublicAccess
- publications: List[Publication]
- scholar_id: str
- source: AuthorSource
- url_picture: str
- class scholarly2.data_types.AuthorSource(value)[source]
Defines the source of the HTML that will be parsed.
Author page: https://scholar.google.com/citations?hl=en&user=yxUduqMAAAAJ
Search authors: https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=jordan&btnG=
Coauthors: From the list of co-authors from an Author page
- AUTHOR_PROFILE_PAGE = 'AUTHOR_PROFILE_PAGE'
- CO_AUTHORS_LIST = 'CO_AUTHORS_LIST'
- SEARCH_AUTHOR_SNIPPETS = 'SEARCH_AUTHOR_SNIPPETS'
- class scholarly2.data_types.BibEntry[source]
BibEntryThe bibliographic entry for a publication(When source is not specified, the field is present in all sources)
- Parameters:
pub_type – the type of entry for this bib (for example ‘article’) (source: PUBLICATION_SEARCH_SNIPPET)
bib_id – bib entry id (source: PUBLICATION_SEARCH_SNIPPET)
abstract – description of the publication
title – title of the publication
author – list of author the author names that contributed to this publication
pub_year – the year the publication was first published
venue – the venue of the publication (source: PUBLICATION_SEARCH_SNIPPET)
journal – Journal Name
volume – number of years a publication has been circulated
number – NA number of a publication
pages – range of pages
publisher – The publisher’s name
citation – Formatted citation string, usually containing journal name, volume and page numbers (source: AUTHOR_PUBLICATION_ENTRY)
pub_url – url of the website providing the publication
- abstract: str
- author: str
- bib_id: str
- citation: str
- journal: str
- number: str
- pages: str
- pub_type: str
- pub_year: str
- publisher: str
- title: str
- venue: str
- volume: str
- scholarly2.data_types.CitesPerYear
Lightweight Data Structure to hold the numbers articles available or not available publicly according to funding mandates
alias of
Dict[int,int]
- class scholarly2.data_types.Journal[source]
Journalobject used to represent a journal entry on Google Scholar.(When source is not specified, the field is present in all sources)
- Parameters:
name – The name of the journal
h5-index – h5-index is the h-index for articles published in the journal during the last 5 complete years.
h5-median – h5-median for a publication is the median number of citations for the articles that make up its h5-index.
url_citations – The URL for the cached citations page of the journal
comment – String representing the ranking for the journal in various categories
- comment: str
- h5_index: int
- h5_median: int
- name: str
- url_citations: str
- class scholarly2.data_types.Mandate[source]
MandateA funding mandate for a given year- Parameters:
agency – name of the funding agency
url_policy – url of the policy for this mandate
url_policy_cached – url of the policy cached by Google Scholar
effective_date – date from which the policy is effective
embargo – period within which the article must be publicly available
acknowledgement – text in the paper acknowledging the funding
grant – grant ID that supported this work
- acknowledgement: str
- agency: str
- effective_date: str
- embargo: str
- grant: str
- url_policy: str
- url_policy_cached: str
- class scholarly2.data_types.ProxyMode(value)[source]
Defines the supported proxy modes.
SOCKS5_PROXIESis the recommended mode. The remaining proxy modes are deprecated compatibility paths.- FREE_PROXIES = 'FREE_PROXIES'
- LUMINATI = 'LUMINATI'
- SCRAPERAPI = 'SCRAPERAPI'
- SINGLEPROXY = 'SINGLEPROXY'
- SOCKS5_PROXIES = 'SOCKS5_PROXIES'
- TOR_EXTERNAL = 'TOR_EXTERNAL'
- TOR_INTERNAL = 'TOR_INTERNAL'
- class scholarly2.data_types.Publication[source]
Publicationobject used to represent a publication entry on Google Scholar.(When source is not specified, the field is present in all sources)
- Parameters:
BibEntryCitation – contains additional information about the publication
gsrank – position of the publication in the query (source: PUBLICATION_SEARCH_SNIPPET)
author_id – list of the corresponding author ids of the authors that contributed to the Publication (source: PUBLICATION_SEARCH_SNIPPET)
num_citations – number of citations of this Publication
cites_id – This corresponds to a “single” publication on Google Scholar. Used in the web search request to return all the papers that cite the publication. If cites_id = 16766804411681372720 then: https://scholar.google.com/scholar?cites=<cites_id>&hl=en If the publication comes from a “merged” list of papers from an authors page, the “citedby_id” will be a comma-separated list of values. It is also used to return the “cluster” of all the different versions of the paper. https://scholar.google.com/scholar?cluster=16766804411681372720&hl=en (source: AUTHOR_PUBLICATION_ENTRY)
citedby_url – This corresponds to a “single” publication on Google Scholar. Used in the web search request to return all the papers that cite the publication. https://scholar.google.com/scholar?cites=16766804411681372720hl=en If the publication comes from a “merged” list of papers from an authors page, the “citedby_url” will be a comma-separated list of values. It is also used to return the “cluster” of all the different versions of the paper. https://scholar.google.com/scholar?cluster=16766804411681372720&hl=en
cites_per_year – a dictionay containing the number of citations per year for this Publication (source: AUTHOR_PUBLICATION_ENTRY)
eprint_url – digital version of the Publication. Usually it is a pdf.
pub_url – url of the website providing the publication
author_pub_id – The id of the paper on Google Scholar from an author page. Comes from the parameter “citation_for_view=PA9La6oAAAAJ:YsMSGLbcyi4C”. It combines the author id, together with a publication id. It may corresponds to a merging of multiple publications, and therefore may have multiple “citedby_id” values. (source: AUTHOR_PUBLICATION_ENTRY)
public_access – Boolean corresponding to whether the article is available or not in accordance with public access mandates.
mandates – List of mandates with funding information and public access requirements.
url_related_articles – the url containing link for related articles of a publication (needs fill() for AUTHOR_PUBLICATION_ENTRIES)
url_add_sclib – (source: PUBLICATION_SEARCH_SNIPPET)
url_scholarbib – the url containing links for the BibTeX entry, EndNote, RefMan and RefWorks (source: PUBLICATION_SEARCH_SNIPPET)
filled – whether the publication is fully filled or not
source – The source of the publication entry
container_type – Used from the source code to identify if this container object is an Author or a Publication object.
- author_id: List[str]
- author_pub_id: str
- citedby_url: str
- cites_id: List[str]
- cites_per_year: Dict[int, int]
- container_type: str
- eprint_url: str
- filled: bool
- gsrank: int
- num_citations: int
- pub_url: str
- public_access: bool
- source: PublicationSource
- url_add_sclib: str
- url_scholarbib: str
- class scholarly2.data_types.PublicationSource(value)[source]
Defines the source of the publication. In general, a publication on Google Scholar has two forms: * Appearing as a PUBLICATION SNIPPET and * Appearing as a paper in an AUTHOR PAGE
“PUBLICATION SEARCH SNIPPET”. This form captures the publication when it appears as a “snippet” in the context of the resuls of a publication search. For example:
Publication search: https://scholar.google.com/scholar?hl=en&q=adaptive+fraud+detection&btnG=&as_sdt=0%2C33
The entries appear under the <div class = “gs_r gs_or gs_scl”> tags Each entry has a data-cid attribute (e.g., data-cid=”pthm1bWT96oJ”)
The same type of results will also appear when someome searches using the “cited by”, “related articles”, and “all XX versions” links that appear under the publication snippet.
“Cited By” link: https://scholar.google.com/scholar?cites=12319477714873931942&as_sdt=5,33&sciodt=0,33&hl=en
“Related Articles” link: https://scholar.google.com/scholar?q=related:pthm1bWT96oJ:scholar.google.com/&scioq=adaptive+fraud+detection&hl=en&as_sdt=0,33
“All versions” link: https://scholar.google.com/scholar?cluster=12319477714873931942&hl=en&as_sdt=0,33
The snippet version of these publications contain the information that appears in the results. Often, the snippet version will miss authors, will have an abbreviated name for the venue, and so on.
We can fill these snippets by clicking on the “Cite” button” and get back the MLA/APA/Chicago/… citations forms, PLUS links for BibTeX, EndNote, RefMan, and RefWorks.
“AUTHOR PUBLICATION ENTRY”
We also have publications that appear in the “author pages” of Google Scholar. These publications are often a set of publications “merged” together.
The snippet version of these publications conains the title of the publication, a subset of the authors, the (sometimes truncated) venue, and the year of the publication and the number of papers that cite the publication.
The snippet entries appear under the <tr class=”gsc_a_tr”> entries in the main page of the author.
To fill in the publication, we open the “detailed view” of the paper
Detailed view page: https://scholar.google.com/citations?view_op=view_citation&hl=en&citation_for_view=-Km63D4AAAAJ:d1gkVwhDpl0C
- AUTHOR_PUBLICATION_ENTRY = 'AUTHOR_PUBLICATION_ENTRY'
- JOURNAL_CITATION_LIST = 'JOURNAL_CITATION_LIST'
- PUBLICATION_SEARCH_SNIPPET = 'PUBLICATION_SEARCH_SNIPPET'