Package 'geoparser'

Title: Detect Country Names in Documents
Description: Detects country names in PDF documents imported with the package 'pdftools'.
Authors: Nicolas Casajus [aut, cre, cph]
Maintainer: Nicolas Casajus <[email protected]>
License: GPL (>= 2)
Version: 0.1
Built: 2024-11-16 04:35:19 UTC
Source: https://github.com/FRBCesab/geoparser

Help Index


Detect countries names

Description

Detect countries names

Usage

geoparser(x)

Arguments

x

a character in which countries will be detected.

Value

A data.frame with the following four columns:

  • geographic_entity: the name of the country

  • n_pages: the total number of pages in the document

  • page: the page number

  • count: the occurrence of the country name for a given page

Examples

## Example document ----
texte <- c(
  "
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod 
  tempor incididunt US labore et dolore magna aliqua. USA enim ad minim 
  veniam, quis nostrud exercitation ullamco laboris United States
  ",
  "
  Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod 
  tempor incididunt US labore et dolore magna aliqua. USA enim ad minim 
  veniam, quis nostrud exercitation ullamco laboris Canada.
  "
)

## Detect countries ----
geoparser(texte)

Dataset: World countries list

Description

This dataset contains a list of 256 World countries (geographic entities) according to the GADM database https://gadm.org.

Usage

world_countries

Format

A data.frame with 256 rows (geographic entities, official (or non official) countries) and the 10 following variables:

iso_alpha2

the ISO 3166 alpha 2 code of the country

iso_alpha3

the ISO 3166 alpha 3 code of the country

iso_num

the ISO 3166 numeric code of the country

geographic_entity

the name of the country (geographic entity)

sovereignty

the name of recognized country (by the UN)

formal_name

the formal name of the geographic entity

parser

the regular expression used to detect the entity in a full text

continent

the name of the continent where the entity is located

un_region

the name of the UN region where the entity is located

un_subregion

the name of the UN subregion where the entity is located

Examples

data("world_countries")
head(world_countries)