Title: | Detect Country Names in Documents |
---|---|
Description: | Detects country names in PDF documents imported with the package 'pdftools'. |
Authors: | Nicolas Casajus [aut, cre, cph] |
Maintainer: | Nicolas Casajus <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1 |
Built: | 2024-11-16 04:35:19 UTC |
Source: | https://github.com/FRBCesab/geoparser |
Detect countries names
geoparser(x)
geoparser(x)
x |
a |
A data.frame
with the following four columns:
geographic_entity
: the name of the country
n_pages
: the total number of pages in the document
page
: the page number
count
: the occurrence of the country name for a given page
## Example document ---- texte <- c( " Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt US labore et dolore magna aliqua. USA enim ad minim veniam, quis nostrud exercitation ullamco laboris United States ", " Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt US labore et dolore magna aliqua. USA enim ad minim veniam, quis nostrud exercitation ullamco laboris Canada. " ) ## Detect countries ---- geoparser(texte)
## Example document ---- texte <- c( " Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt US labore et dolore magna aliqua. USA enim ad minim veniam, quis nostrud exercitation ullamco laboris United States ", " Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt US labore et dolore magna aliqua. USA enim ad minim veniam, quis nostrud exercitation ullamco laboris Canada. " ) ## Detect countries ---- geoparser(texte)
This dataset contains a list of 256 World countries (geographic entities) according to the GADM database https://gadm.org.
world_countries
world_countries
A data.frame
with 256 rows (geographic entities, official
(or non official) countries) and the 10 following variables:
the ISO 3166 alpha 2 code of the country
the ISO 3166 alpha 3 code of the country
the ISO 3166 numeric code of the country
the name of the country (geographic entity)
the name of recognized country (by the UN)
the formal name of the geographic entity
the regular expression used to detect the entity in a full text
the name of the continent where the entity is located
the name of the UN region where the entity is located
the name of the UN subregion where the entity is located
data("world_countries") head(world_countries)
data("world_countries") head(world_countries)