Getting boardgamegeek Data

October 30, 2017
R web scraping boardgames

Getting BGG Data

I’m going to walk through the steps I take to pull data from the boardgamegeek.com database and combine it into an analytics ready data.frame. First I’ll explain how to use the API to pull a single game at a time, and then multiple games all at once. Then I’ll show how I loop through multiple multi-game pulls, clean up the results, and then merge them into a single data.frame.

Libraries used here

The workhorses here are httr/XML, which get the data and parses it, and foreach, which helps me set up the loop to pull large amounts of data. The complete list of packages is:

  • foreach
  • iterators
  • parallel
  • doParallel
  • XML
  • httr
  • tibble
  • pander

I’m actually loading foreach so I don’t have to use funky syntax for %dopar%

library(foreach)

Basic API Usage

Single Game

Calling the API

url <- 'http://www.boardgamegeek.com/xmlapi/boardgame/1'

Parsing the XML

xmlChar <- rawToChar(httr::GET(url)$content)
xmlParsed <- XML::xmlParse(xmlChar)
xmlRoot <- XML::xmlRoot(xmlParsed)
data <- XML::xmlSApply(xmlRoot,function(x) XML::xmlSApply(x, XML::xmlValue))

Viewing the Data

keepColumns <- c("yearpublished", "minplayers", "maxplayers", "playingtime", 
                 "minplaytime", "maxplaytime", "age", "name", "description",
                 "boardgamemechanic", "boardgamecategory", "boardgamepublisher")

pander::pander(t(data)[,keepColumns])
Table continues below
yearpublished minplayers maxplayers playingtime minplaytime
1986 3 5 240 240
Table continues below
maxplaytime age name description
240 14 Die Macher Die Macher is a game about seven sequential political races in different regions of Germany. Players are in charge of national political parties, and must manage limited resources to help their party to victory. The winning party will have the most victory points after all the regional elections. There are four different ways of scoring victory points. First, each regional election can supply one to eighty victory points, depending on the size of the region and how well your party does in it. Second, if a party wins a regional election and has some media influence in the region, then the party will receive some media-control victory points. Third, each party has a national party membership which will grow as the game progresses and this will supply a fair number of victory points. Lastly, parties score some victory points if their party platform matches the national opinions at the end of the game.

The 1986 edition featured 4 parties from the old West Germany and supported 3-4 players. The 1997 edition supports up to 5 players in the re-united Germany and updated several features of the rules as well. The 2006 edition also supports up to 5 players and adds a shorter 5 round variant and additional rules updates by the original designer.

Die Macher is #1 in the Valley Games Classic Line

boardgamemechanic boardgamecategory boardgamepublisher
Area Control / Area Influence Economic Hans im Glück Verlags-GmbH

Multiple Games

url <- 'http://www.boardgamegeek.com/xmlapi/boardgame/1,2,3'
xmlChar <- rawToChar(httr::GET(url)$content)
xmlParsed <- XML::xmlParse(xmlChar)
xmlRoot <- XML::xmlRoot(xmlParsed)
data <- XML::xmlSApply(xmlRoot,function(x) XML::xmlSApply(x, XML::xmlValue))
dt <- tibble::as_tibble(t(sapply(data, function(x) x[keepColumns])))
pander::pander(dt)
Table continues below
yearpublished minplayers maxplayers playingtime minplaytime
1986 3 5 240 240
1981 3 4 30 30
1998 2 4 60 30
Table continues below
maxplaytime age name description
240 14 Die Macher Die Macher is a game about seven sequential political races in different regions of Germany. Players are in charge of national political parties, and must manage limited resources to help their party to victory. The winning party will have the most victory points after all the regional elections. There are four different ways of scoring victory points. First, each regional election can supply one to eighty victory points, depending on the size of the region and how well your party does in it. Second, if a party wins a regional election and has some media influence in the region, then the party will receive some media-control victory points. Third, each party has a national party membership which will grow as the game progresses and this will supply a fair number of victory points. Lastly, parties score some victory points if their party platform matches the national opinions at the end of the game.

The 1986 edition featured 4 parties from the old West Germany and supported 3-4 players. The 1997 edition supports up to 5 players in the re-united Germany and updated several features of the rules as well. The 2006 edition also supports up to 5 players and adds a shorter 5 round variant and additional rules updates by the original designer.

Die Macher is #1 in the Valley Games Classic Line

30 12 Dragonmaster Dragonmaster is a trick-taking card game based on an older game called Coup d’etat. Each player is given a supply of plastic gems, which represent points. Each player will get to be the dealer for five different hands, with slightly different goals for each hand. After all cards have been dealt out, the dealer decides which hand best suits his or her current cards, and the other players are penalized points (in the form of crystals) for taking certain tricks or cards. For instance, if “first” or “last” is called, then a player is penalized for taking the first or last tricks. All players will get a chance to be dealer for five hands, but other players can steal this opportunity by taking all of the tricks during certain hands. At the end, the biggest pile of gems wins the game.

Jewel contents:

10 clear (2 extra)
14 green (2 extra)
22 red (2 extra)
22 blue (2 extra)

60 10 Samouraï Part of the Knizia tile-laying trilogy, Samurai is set in medieval Japan. Players compete to gain the favor of three factions: samurai, peasants, and priests, which are represented by helmet, rice paddy, and Buddha figures scattered about the board, which features the islands of Japan. The competition is waged through the use of hexagonal tiles, each of which help curry favor of one of the three factions — or all three at once! Players can make lightning-quick strikes with horseback ronin and ships or approach their conquests more methodically. As each figure (helmets, rice paddies, and Buddhas) is surrounded, it is awarded to the player who has gained the most favor with the corresponding group.

Gameplay continues until all the symbols of one type have been removed from the board or four figures have been removed from play due to a tie for influence.

At the end of the game, players compare captured symbols of each type, competing for majorities in each of the three types. Ties are not uncommon and are broken based on the number of other, “non-majority” symbols each player has collected.

boardgamemechanic boardgamecategory boardgamepublisher
Area Control / Area Influence Economic Hans im Glück Verlags-GmbH
Trick-taking Card Game E.S. Lowe
Area Control / Area Influence Abstract Strategy 999 Games

Big Data Pull

GetGames <- function(from, to) {
  url <- 'http://www.boardgamegeek.com/xmlapi/boardgame/'
  xmlChar <- rawToChar(httr::GET(paste0(url, paste(from:to, collapse = ',')))$content)
  xmlParsed <- XML::xmlParse(xmlChar)
  xmlRoot <- XML::xmlRoot(xmlParsed)
  data <- XML::xmlSApply(xmlRoot,function(x) XML::xmlSApply(x, XML::xmlValue))
  
  keepColumns <- c("yearpublished", "minplayers", "maxplayers", "playingtime", 
                   "minplaytime", "maxplaytime", "age", "name", "description",
                   "boardgamemechanic", "boardgamecategory", "boardgamepublisher")
  
  dt <- tibble::as_tibble(t(sapply(data, function(x) x[keepColumns])))
  Sys.sleep(5)
  dt
}
steps <- seq(1, 238201, by = 200)
nsteps <- length(steps)
nsteps_m1 <- nsteps - 1
controlMatrix <- matrix(c(steps[1:nsteps_m1], steps[2:nsteps]), ncol = 2)
controlMatrix[,2] <-controlMatrix[,2] - 1

I’m just doing a few iterations here, rather than the full sweep. I’m not using .combine because there are some additional cleaning steps before the data can be merged.

# Running in parallel makes everything faster, but isn't necessary
cl <- parallel::makeCluster(2)
doParallel::registerDoParallel(cl)

allGames <- foreach(control = iterators::iter(controlMatrix[1:10,], by = 'row')) %dopar% {
  GetGames(control[1], control[2])
}

Not all ID’s are valid

Because many of the board game id’s are invalid, we don’t get 200 games per pull.

hist(sapply(allGames, nrow), main = "Histogram of row counts")

Clean up some errors

Some of the id ranges have no games present, so they return some weird garbage. Because there’s no data, the columns don’t end up with names, so I’m checking that the first name is correct to weed out those errors.

is.error <- function(li) {
  if(names(li)[1] != 'yearpublished') TRUE
  else FALSE
}
allGamesClean <- allGames[!sapply(allGames, is.error)]

Make sure Columns are named correctly (because sometimes they aren’t)

This is to just make sure rbind runs peacefully.

properNames <- c("yearpublished", "minplayers", "maxplayers", 
                 "playingtime", "minplaytime", "maxplaytime", "age", 
                 "name", "description", "boardgamemechanic", 
                 "boardgamecategory", "boardgamepublisher")
allGamesCleaner <- lapply(allGamesClean, function(x) {
  names(x) <- properNames
  x
})

Bind to a single data frame

do.call and rbind to turn the list of data.frames into a single data.frame.

gameData <- do.call(rbind, allGamesCleaner)
pander::pander(head(gameData))
Table continues below
yearpublished minplayers maxplayers playingtime minplaytime
1986 3 5 240 240
1981 3 4 30 30
1998 2 4 60 30
1992 2 4 60 60
1964 3 6 90 90
1989 2 6 240 240
Table continues below
maxplaytime age name
240 14 Die Macher
30 12 Dragonmaster
60 10 Samouraï
60 12 Tal der Könige
90 12 Acquire
240 12 Mare Mediterraneum
Table continues below
description
Die Macher is a game about seven sequential political races in different regions of Germany. Players are in charge of national political parties, and must manage limited resources to help their party to victory. The winning party will have the most victory points after all the regional elections. There are four different ways of scoring victory points. First, each regional election can supply one to eighty victory points, depending on the size of the region and how well your party does in it. Second, if a party wins a regional election and has some media influence in the region, then the party will receive some media-control victory points. Third, each party has a national party membership which will grow as the game progresses and this will supply a fair number of victory points. Lastly, parties score some victory points if their party platform matches the national opinions at the end of the game.

The 1986 edition featured 4 parties from the old West Germany and supported 3-4 players. The 1997 edition supports up to 5 players in the re-united Germany and updated several features of the rules as well. The 2006 edition also supports up to 5 players and adds a shorter 5 round variant and additional rules updates by the original designer.

Die Macher is #1 in the Valley Games Classic Line

Dragonmaster is a trick-taking card game based on an older game called Coup d’etat. Each player is given a supply of plastic gems, which represent points. Each player will get to be the dealer for five different hands, with slightly different goals for each hand. After all cards have been dealt out, the dealer decides which hand best suits his or her current cards, and the other players are penalized points (in the form of crystals) for taking certain tricks or cards. For instance, if “first” or “last” is called, then a player is penalized for taking the first or last tricks. All players will get a chance to be dealer for five hands, but other players can steal this opportunity by taking all of the tricks during certain hands. At the end, the biggest pile of gems wins the game.

Jewel contents:

10 clear (2 extra)
14 green (2 extra)
22 red (2 extra)
22 blue (2 extra)

Part of the Knizia tile-laying trilogy, Samurai is set in medieval Japan. Players compete to gain the favor of three factions: samurai, peasants, and priests, which are represented by helmet, rice paddy, and Buddha figures scattered about the board, which features the islands of Japan. The competition is waged through the use of hexagonal tiles, each of which help curry favor of one of the three factions — or all three at once! Players can make lightning-quick strikes with horseback ronin and ships or approach their conquests more methodically. As each figure (helmets, rice paddies, and Buddhas) is surrounded, it is awarded to the player who has gained the most favor with the corresponding group.

Gameplay continues until all the symbols of one type have been removed from the board or four figures have been removed from play due to a tie for influence.

At the end of the game, players compare captured symbols of each type, competing for majorities in each of the three types. Ties are not uncommon and are broken based on the number of other, “non-majority” symbols each player has collected.

When you see the triangular box and the luxurious, large blocks, you can tell this game was designed to be beautiful as well as functional. The object of the game is to build pyramids out of the different colored blocks. A pyramid scores more points when it’s made from a few colors, but it’s much harder to consistently outbid the other players for the necessary blocks. The game is over when the Pharoah’s Pyramid in the center is completed, which is built using all the blocks that the players don’t use during the course of the game.

Final round 1990 Hippodice Spieleautorenwettbewerb.

In Acquire, each player strategically invests in businesses, trying to retain a majority of stock. As the businesses grow with tile placements, they also start merging, giving the majority stockholders of the acquired business sizable bonuses, which can then be used to reinvest into other chains. All of the investors in the acquired company can then cash in their stocks for current value or trade them 2-for-1 for shares of the newer, larger business. The game is a race to acquire the greatest wealth.

This Sid Sackson classic has taken many different forms over the years depending on the publisher. Some versions of the 3M bookshelf edition included rules for a 2-player variant. The original version is part of the 3M Bookshelf Series.

Note: many books and websites list this as a 1962 publication. This is incorrect; information from Sid Sackson’s diaries, correspondence, and royalty statements prove that it was published in 1964. However, for some reason admins continue to accept “corrections” of the publication date to 1962. A detailed timeline of the development and publication of the game can be found at https://opinionatedgamers.com/2014/05/29/how-acquire-became-…, for those interested.

In the ancient lands along the Mediterranean, players attempt to satisfy their unique victory conditions via trade, war and construction. This lavishly produced game contains tons of wooden game components and a beautiful roll-out vinyl map. Players produce a score of different commodities to trade with other cities in the hope of creating enough income to fill their capitals with buildings, produce artwork, and fill warehouses with goods.

boardgamemechanic boardgamecategory boardgamepublisher
Area Control / Area Influence Economic Hans im Glück Verlags-GmbH
Trick-taking Card Game E.S. Lowe
Area Control / Area Influence Abstract Strategy 999 Games
Action Point Allowance System Ancient KOSMOS
Hand Management Economic 3M
Dice Rolling Civilization Historien Spiele Galerie (Historien Spielegalerie)

Publishing Trends

November 17, 2017
board games R trends

Mechanics Over Time

November 9, 2017
board games R statistics