Getting boardgamegeek Data
October 30, 2017
R web scraping boardgamesGetting BGG Data
I’m going to walk through the steps I take to pull data from the boardgamegeek.com database and combine it into an analytics ready data.frame
. First I’ll explain how to use the API to pull a single game at a time, and then multiple games all at once. Then I’ll show how I loop through multiple multi-game pulls, clean up the results, and then merge them into a single data.frame
.
Libraries used here
The workhorses here are httr
/XML
, which get the data and parses it, and foreach
, which helps me set up the loop to pull large amounts of data. The complete list of packages is:
foreach
iterators
parallel
doParallel
XML
httr
tibble
pander
I’m actually loading foreach
so I don’t have to use funky syntax for %dopar%
library(foreach)
Basic API Usage
Single Game
Calling the API
url <- 'http://www.boardgamegeek.com/xmlapi/boardgame/1'
Parsing the XML
xmlChar <- rawToChar(httr::GET(url)$content)
xmlParsed <- XML::xmlParse(xmlChar)
xmlRoot <- XML::xmlRoot(xmlParsed)
data <- XML::xmlSApply(xmlRoot,function(x) XML::xmlSApply(x, XML::xmlValue))
Viewing the Data
keepColumns <- c("yearpublished", "minplayers", "maxplayers", "playingtime",
"minplaytime", "maxplaytime", "age", "name", "description",
"boardgamemechanic", "boardgamecategory", "boardgamepublisher")
pander::pander(t(data)[,keepColumns])
yearpublished | minplayers | maxplayers | playingtime | minplaytime |
---|---|---|---|---|
1986 | 3 | 5 | 240 | 240 |
maxplaytime | age | name | description |
---|---|---|---|
240 | 14 | Die Macher | Die Macher is a game about seven sequential political races in different regions of Germany. Players are in charge of national political parties, and must manage limited resources to help their party to victory. The winning party will have the most victory points after all the regional elections. There are four different ways of scoring victory points. First, each regional election can supply one to eighty victory points, depending on the size of the region and how well your party does in it. Second, if a party wins a regional election and has some media influence in the region, then the party will receive some media-control victory points. Third, each party has a national party membership which will grow as the game progresses and this will supply a fair number of victory points. Lastly, parties score some victory points if their party platform matches the national opinions at the end of the game. The 1986 edition featured 4 parties from the old West Germany and supported 3-4 players. The 1997 edition supports up to 5 players in the re-united Germany and updated several features of the rules as well. The 2006 edition also supports up to 5 players and adds a shorter 5 round variant and additional rules updates by the original designer. Die Macher is #1 in the Valley Games Classic Line |
boardgamemechanic | boardgamecategory | boardgamepublisher |
---|---|---|
Area Control / Area Influence | Economic | Hans im Glück Verlags-GmbH |
Multiple Games
url <- 'http://www.boardgamegeek.com/xmlapi/boardgame/1,2,3'
xmlChar <- rawToChar(httr::GET(url)$content)
xmlParsed <- XML::xmlParse(xmlChar)
xmlRoot <- XML::xmlRoot(xmlParsed)
data <- XML::xmlSApply(xmlRoot,function(x) XML::xmlSApply(x, XML::xmlValue))
dt <- tibble::as_tibble(t(sapply(data, function(x) x[keepColumns])))
pander::pander(dt)
yearpublished | minplayers | maxplayers | playingtime | minplaytime |
---|---|---|---|---|
1986 | 3 | 5 | 240 | 240 |
1981 | 3 | 4 | 30 | 30 |
1998 | 2 | 4 | 60 | 30 |
maxplaytime | age | name | description |
---|---|---|---|
240 | 14 | Die Macher | Die Macher is a game about seven sequential political races in different regions of Germany. Players are in charge of national political parties, and must manage limited resources to help their party to victory. The winning party will have the most victory points after all the regional elections. There are four different ways of scoring victory points. First, each regional election can supply one to eighty victory points, depending on the size of the region and how well your party does in it. Second, if a party wins a regional election and has some media influence in the region, then the party will receive some media-control victory points. Third, each party has a national party membership which will grow as the game progresses and this will supply a fair number of victory points. Lastly, parties score some victory points if their party platform matches the national opinions at the end of the game. The 1986 edition featured 4 parties from the old West Germany and supported 3-4 players. The 1997 edition supports up to 5 players in the re-united Germany and updated several features of the rules as well. The 2006 edition also supports up to 5 players and adds a shorter 5 round variant and additional rules updates by the original designer. Die Macher is #1 in the Valley Games Classic Line |
30 | 12 | Dragonmaster | Dragonmaster is a trick-taking card game based on an older game called Coup d’etat. Each player is given a supply of plastic gems, which represent points. Each player will get to be the dealer for five different hands, with slightly different goals for each hand. After all cards have been dealt out, the dealer decides which hand best suits his or her current cards, and the other players are penalized points (in the form of crystals) for taking certain tricks or cards. For instance, if “first” or “last” is called, then a player is penalized for taking the first or last tricks. All players will get a chance to be dealer for five hands, but other players can steal this opportunity by taking all of the tricks during certain hands. At the end, the biggest pile of gems wins the game. Jewel contents: 10 clear (2 extra) 14 green (2 extra) 22 red (2 extra) 22 blue (2 extra) |
60 | 10 | Samouraï | Part of the Knizia tile-laying trilogy, Samurai is set in medieval Japan. Players compete to gain the favor of three factions: samurai, peasants, and priests, which are represented by helmet, rice paddy, and Buddha figures scattered about the board, which features the islands of Japan. The competition is waged through the use of hexagonal tiles, each of which help curry favor of one of the three factions — or all three at once! Players can make lightning-quick strikes with horseback ronin and ships or approach their conquests more methodically. As each figure (helmets, rice paddies, and Buddhas) is surrounded, it is awarded to the player who has gained the most favor with the corresponding group. Gameplay continues until all the symbols of one type have been removed from the board or four figures have been removed from play due to a tie for influence. At the end of the game, players compare captured symbols of each type, competing for majorities in each of the three types. Ties are not uncommon and are broken based on the number of other, “non-majority” symbols each player has collected. |
boardgamemechanic | boardgamecategory | boardgamepublisher |
---|---|---|
Area Control / Area Influence | Economic | Hans im Glück Verlags-GmbH |
Trick-taking | Card Game | E.S. Lowe |
Area Control / Area Influence | Abstract Strategy | 999 Games |
Big Data Pull
GetGames <- function(from, to) {
url <- 'http://www.boardgamegeek.com/xmlapi/boardgame/'
xmlChar <- rawToChar(httr::GET(paste0(url, paste(from:to, collapse = ',')))$content)
xmlParsed <- XML::xmlParse(xmlChar)
xmlRoot <- XML::xmlRoot(xmlParsed)
data <- XML::xmlSApply(xmlRoot,function(x) XML::xmlSApply(x, XML::xmlValue))
keepColumns <- c("yearpublished", "minplayers", "maxplayers", "playingtime",
"minplaytime", "maxplaytime", "age", "name", "description",
"boardgamemechanic", "boardgamecategory", "boardgamepublisher")
dt <- tibble::as_tibble(t(sapply(data, function(x) x[keepColumns])))
Sys.sleep(5)
dt
}
steps <- seq(1, 238201, by = 200)
nsteps <- length(steps)
nsteps_m1 <- nsteps - 1
controlMatrix <- matrix(c(steps[1:nsteps_m1], steps[2:nsteps]), ncol = 2)
controlMatrix[,2] <-controlMatrix[,2] - 1
I’m just doing a few iterations here, rather than the full sweep. I’m not using .combine
because there are some additional cleaning steps before the data can be merged.
# Running in parallel makes everything faster, but isn't necessary
cl <- parallel::makeCluster(2)
doParallel::registerDoParallel(cl)
allGames <- foreach(control = iterators::iter(controlMatrix[1:10,], by = 'row')) %dopar% {
GetGames(control[1], control[2])
}
Not all ID’s are valid
Because many of the board game id’s are invalid, we don’t get 200 games per pull.
hist(sapply(allGames, nrow), main = "Histogram of row counts")
Clean up some errors
Some of the id ranges have no games present, so they return some weird garbage. Because there’s no data, the columns don’t end up with names, so I’m checking that the first name is correct to weed out those errors.
is.error <- function(li) {
if(names(li)[1] != 'yearpublished') TRUE
else FALSE
}
allGamesClean <- allGames[!sapply(allGames, is.error)]
Make sure Columns are named correctly (because sometimes they aren’t)
This is to just make sure rbind
runs peacefully.
properNames <- c("yearpublished", "minplayers", "maxplayers",
"playingtime", "minplaytime", "maxplaytime", "age",
"name", "description", "boardgamemechanic",
"boardgamecategory", "boardgamepublisher")
allGamesCleaner <- lapply(allGamesClean, function(x) {
names(x) <- properNames
x
})
Bind to a single data frame
do.call
and rbind
to turn the list of data.frames
into a single data.frame
.
gameData <- do.call(rbind, allGamesCleaner)
pander::pander(head(gameData))
yearpublished | minplayers | maxplayers | playingtime | minplaytime |
---|---|---|---|---|
1986 | 3 | 5 | 240 | 240 |
1981 | 3 | 4 | 30 | 30 |
1998 | 2 | 4 | 60 | 30 |
1992 | 2 | 4 | 60 | 60 |
1964 | 3 | 6 | 90 | 90 |
1989 | 2 | 6 | 240 | 240 |
maxplaytime | age | name |
---|---|---|
240 | 14 | Die Macher |
30 | 12 | Dragonmaster |
60 | 10 | Samouraï |
60 | 12 | Tal der Könige |
90 | 12 | Acquire |
240 | 12 | Mare Mediterraneum |
description |
---|
Die Macher is a game about seven sequential political races in different regions of Germany. Players are in charge of national political parties, and must manage limited resources to help their party to victory. The winning party will have the most victory points after all the regional elections. There are four different ways of scoring victory points. First, each regional election can supply one to eighty victory points, depending on the size of the region and how well your party does in it. Second, if a party wins a regional election and has some media influence in the region, then the party will receive some media-control victory points. Third, each party has a national party membership which will grow as the game progresses and this will supply a fair number of victory points. Lastly, parties score some victory points if their party platform matches the national opinions at the end of the game. The 1986 edition featured 4 parties from the old West Germany and supported 3-4 players. The 1997 edition supports up to 5 players in the re-united Germany and updated several features of the rules as well. The 2006 edition also supports up to 5 players and adds a shorter 5 round variant and additional rules updates by the original designer. Die Macher is #1 in the Valley Games Classic Line |
Dragonmaster is a trick-taking card game based on an older game called Coup d’etat. Each player is given a supply of plastic gems, which represent points. Each player will get to be the dealer for five different hands, with slightly different goals for each hand. After all cards have been dealt out, the dealer decides which hand best suits his or her current cards, and the other players are penalized points (in the form of crystals) for taking certain tricks or cards. For instance, if “first” or “last” is called, then a player is penalized for taking the first or last tricks. All players will get a chance to be dealer for five hands, but other players can steal this opportunity by taking all of the tricks during certain hands. At the end, the biggest pile of gems wins the game. Jewel contents: 10 clear (2 extra) 14 green (2 extra) 22 red (2 extra) 22 blue (2 extra) |
Part of the Knizia tile-laying trilogy, Samurai is set in medieval Japan. Players compete to gain the favor of three factions: samurai, peasants, and priests, which are represented by helmet, rice paddy, and Buddha figures scattered about the board, which features the islands of Japan. The competition is waged through the use of hexagonal tiles, each of which help curry favor of one of the three factions — or all three at once! Players can make lightning-quick strikes with horseback ronin and ships or approach their conquests more methodically. As each figure (helmets, rice paddies, and Buddhas) is surrounded, it is awarded to the player who has gained the most favor with the corresponding group. Gameplay continues until all the symbols of one type have been removed from the board or four figures have been removed from play due to a tie for influence. At the end of the game, players compare captured symbols of each type, competing for majorities in each of the three types. Ties are not uncommon and are broken based on the number of other, “non-majority” symbols each player has collected. |
When you see the triangular box and the luxurious, large blocks, you can tell this game was designed to be beautiful as well as functional. The object of the game is to build pyramids out of the different colored blocks. A pyramid scores more points when it’s made from a few colors, but it’s much harder to consistently outbid the other players for the necessary blocks. The game is over when the Pharoah’s Pyramid in the center is completed, which is built using all the blocks that the players don’t use during the course of the game. Final round 1990 Hippodice Spieleautorenwettbewerb. |
In Acquire, each player strategically invests in businesses, trying to retain a majority of stock. As the businesses grow with tile placements, they also start merging, giving the majority stockholders of the acquired business sizable bonuses, which can then be used to reinvest into other chains. All of the investors in the acquired company can then cash in their stocks for current value or trade them 2-for-1 for shares of the newer, larger business. The game is a race to acquire the greatest wealth. This Sid Sackson classic has taken many different forms over the years depending on the publisher. Some versions of the 3M bookshelf edition included rules for a 2-player variant. The original version is part of the 3M Bookshelf Series. Note: many books and websites list this as a 1962 publication. This is incorrect; information from Sid Sackson’s diaries, correspondence, and royalty statements prove that it was published in 1964. However, for some reason admins continue to accept “corrections” of the publication date to 1962. A detailed timeline of the development and publication of the game can be found at https://opinionatedgamers.com/2014/05/29/how-acquire-became-…, for those interested. |
In the ancient lands along the Mediterranean, players attempt to satisfy their unique victory conditions via trade, war and construction. This lavishly produced game contains tons of wooden game components and a beautiful roll-out vinyl map. Players produce a score of different commodities to trade with other cities in the hope of creating enough income to fill their capitals with buildings, produce artwork, and fill warehouses with goods. |
boardgamemechanic | boardgamecategory | boardgamepublisher |
---|---|---|
Area Control / Area Influence | Economic | Hans im Glück Verlags-GmbH |
Trick-taking | Card Game | E.S. Lowe |
Area Control / Area Influence | Abstract Strategy | 999 Games |
Action Point Allowance System | Ancient | KOSMOS |
Hand Management | Economic | 3M |
Dice Rolling | Civilization | Historien Spiele Galerie (Historien Spielegalerie) |