Firstnames

Retrieving information about firstnames and occupations in Wikipedia via Wikidata.

Presentation

The idea is to study the data about given names from wikipedia.org, and in particular, what are the occupations of people with that given name. It turns out that scrapping wikipedia, getting the first names and the occupation would be difficult, as the data is unstructured. Hopefully, the wikidata project exists. It gives access to structured data.

The tools are given in R, and you can see it in action on shinyapps.io. This provide visualizations, and enables you to get the dataframe in csv directly. You can download the files and run it locally within R, either trough the app (server.R and ui.R, with queries.R and mySPARL.R in the same folder), or using the functions in queries.R (with mySPARQL.R) directly.

You can knit the Firstnames.Rmd file to get a description of the functions and some examples.

Files and functions

queries.R provides the main functions
- In particular, queryStream takes the string of a firstname as an argument and gives the dataset.
- Note that queryStreamWithProgress is the same, with an increase for the progress bar on the app, and is only useful within the shiny app.
mySPARQL.R is a rewrite of certain functions within the SPARQL package to include a support for UTF-8. Normally, it will soon be added to the package, so this won't be needed.
server.R and ui.R are the files for the shiny app that can be seen there.

Packages needed

All the packages are provided on CRAN.

Minimal packages to get the informations:

WikidataR to get item and properties informations,
SPARQL for the main query,
dplyr, tidyr and magrittr for data manipulation and cleaning.

Packages for data visualization:

stringr for string manipulation,
wordcloud for a wordcloud,
ggplot2 and ggthemes for graphs,

Packages to run the app

shiny and shinyjs

Contribute

Do not hesitate to fork me and/or contact me for more informations.

Enjoy!

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.gitignore		.gitignore
Firstnames.Rmd		Firstnames.Rmd
LICENSE		LICENSE
README.md		README.md
mySPARQL.R		mySPARQL.R
queries.R		queries.R
server.R		server.R
ui.R		ui.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Firstnames

Presentation

Files and functions

Packages needed

Minimal packages to get the informations:

Packages for data visualization:

Packages to run the app

Contribute

About

Releases

Packages

Languages

License

FlorianGD/Firstnames

Folders and files

Latest commit

History

Repository files navigation

Firstnames

Presentation

Files and functions

Packages needed

Minimal packages to get the informations:

Packages for data visualization:

Packages to run the app

Contribute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages