Sacred texts in R.
sacred includes 6 tidy datasets about the ‘Apocrypha’, the ‘King James Bible’, the ‘Greek New Testament’, the ‘Septuagint’, the ‘Tanach’ and the ‘Vulgate’. The package also includes utility datasets such as Middle English keywords and Sentiment terms (positive and negative).
We can have a look at the most mentioned terms in the Bible. Here I use the
king_james_version dataset which is a corpus of documents from the tm package.
sacred also comes with a set of 295 Middle English stopwords.
library(dplyr) library(echarts4r) # devtools::install_github("JohnCoene/sacred") library(tidytext) data("king_james_version") data("middle_english_stopwords") # get middle english stopwords # set echarts4r theme THEME <- "vintage" # add common englishh stopwords sw <- plyr::rbind.fill(stop_words, middle_english_stopwords) king_james_version %>% unnest_tokens(word, text) %>% anti_join(sw, by = "word") %>% count(word, sort = TRUE) %>% slice(1:120) %>% e_charts() %>% e_cloud(word, n) %>% e_theme("westeros") %>% e_title("Most mentioned words", "King James Bible") %>% e_tooltip(trigger = "item") %>% e_theme(THEME)
Most sentiment lexicons are based on online reviews or social media conversations and therefore are not well suited to 17 century text. To remedy to this
bibler also comes with lexicons of positive and negative Middle English terms.
We can now assess the sentiment of each psalm and plot the sentiment score by book in order of appearance in the Bible.
data("middle_english_sentiments") # add common sentiment common_sentiments <- get_sentiments("afinn") common_sentiments <- common_sentiments %>% anti_join(middle_english_sentiments, by = "word") sent <- plyr::rbind.fill( middle_english_sentiments, common_sentiments ) %>% select(-sentiment) %>% unique() king_james_version %>% unnest_tokens(word, text) %>% anti_join(sw) %>% inner_join(sent) %>% group_by(book.num) %>% summarise(score = sum(score)) %>% e_charts(book.num) %>% e_bar(score) %>% e_visual_map( score, show = FALSE, type = "piecewise", pieces = list( list(gt = -1500, lte = 0, color = "#dc322f"), list(gt = 0, lte = 600, color = "#859900") ) ) %>% e_legend(FALSE) %>% e_title("Average sentiment by book", "King James Bible") %>% e_tooltip(trigger = "axis") %>% e_theme(THEME) -> p #> Joining, by = "word" #> Joining, by = "word" p
There is indeed more negativity at the “beginning” of the Bible: the Old Testament is far more gruesome than the New Testament.
The Psalms (book #19) is a lapse of light amidst the dark and spine-chilling godly injunctions of the Old Testament, “psalm” means “praises,” it seems appropriate to be the most positive book.
John 3 in the New Testament is also positive, by the Bible standard anyway: John accepts Jesus as the Messiah, gets baptised, finds the path to God, etc. Good stuff.
The most negative book according to the analysis is Book of Isaiah (book #24) which is not inaccurate, the message of the book is summed by Wikipedia:
“The book opens by setting out the themes of judgment and subsequent restoration for the righteous. God has a plan which will be realised on the”Day of Yahweh“, when Jerusalem will become the centre of his worldwide rule.”
style <- list( normal = list( color = "transparent", borderWidth = 1, borderType = "dashed" ) ) p %>% e_mark_area( itemStyle = style, data = list( list( name = "Old Testament", xAxis = 0, yAxis = -1500 ), list( xAxis = 41, yAxis = 500 ) ) )
Christopher Hitchens once said that Mary only appeared three times in the bibe. I cannot tell exactly what he meant by “appeared” (only God knows now) but we can tell that she is mentioned far more often - 46 times.
length(king_james_version$text[grep("Mary", king_james_version$text)]) # nbr mentions #>  46