1. After importing the CSV as a data frame I converted the character vector into a factor since it is a categorical variable. I then created a corpus, a collection of text documents–in this case SMS messages. You can view specific messages using list functions.

I then reverted all words to lower case letters, removed numbers, stop words, and punctuation. I performed stemming to strip words of a suffix down to the root word. All of the previous steps leave white space so the final step in our text cleanup process is to remove additional whitespace.

sms_corpus <- VCorpus(VectorSource(sms_raw$text))
print(sms_corpus)
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 5574
inspect(sms_corpus[1:5])
## <<VCorpus>>
## Metadata:  corpus specific: 0, document level (indexed): 0
## Content:  documents: 5
## 
## [[1]]
## <<PlainTextDocument>>
## Metadata:  7
## Content:  chars: 111
## 
## [[2]]
## <<PlainTextDocument>>
## Metadata:  7
## Content:  chars: 29
## 
## [[3]]
## <<PlainTextDocument>>
## Metadata:  7
## Content:  chars: 155
## 
## [[4]]
## <<PlainTextDocument>>
## Metadata:  7
## Content:  chars: 49
## 
## [[5]]
## <<PlainTextDocument>>
## Metadata:  7
## Content:  chars: 61
as.character(sms_corpus[[3]])
## [1] "Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's"
lapply(sms_corpus[4:7], as.character)
## $`4`
## [1] "U dun say so early hor... U c already then say..."
## 
## $`5`
## [1] "Nah I don't think he goes to usf, he lives around here though"
## 
## $`6`
## [1] "FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv"
## 
## $`7`
## [1] "Even my brother is not like to speak with me. They treat me like aids patent."
sms_corpus_clean <- tm_map(sms_corpus,
content_transformer(tolower))
as.character(sms_corpus[[4]])
## [1] "U dun say so early hor... U c already then say..."
as.character(sms_corpus_clean[[4]])
## [1] "u dun say so early hor... u c already then say..."
sms_corpus_clean <- tm_map(sms_corpus_clean, removeNumbers)
sms_corpus_clean <- tm_map(sms_corpus_clean,
removeWords, stopwords())
sms_corpus_clean <- tm_map(sms_corpus_clean, removePunctuation)
sms_corpus_clean <- tm_map(sms_corpus_clean, stemDocument)
sms_corpus_clean <- tm_map(sms_corpus_clean, stripWhitespace)
  1. We first want to perform tokenization to split the messages into individual components. We then take the corpus and create a data structure called a Document Term Matrix where rows indicate documents(SMS messages),and columns indicate terms (words).Each cell in the table is zero so none of the words listed on top of the columns appear in any of the first five messages in the corpus.
sms_dtm <- DocumentTermMatrix(sms_corpus_clean)
sms_dtm2 <- DocumentTermMatrix(sms_corpus, control = list(
tolower = TRUE,
removeNumbers = TRUE,
stopwords = TRUE,
removePunctuation = TRUE,
stemming = TRUE
))
sms_dtm
## <<DocumentTermMatrix (documents: 5574, terms: 6630)>>
## Non-/sparse entries: 42680/36912940
## Sparsity           : 100%
## Maximal term length: 40
## Weighting          : term frequency (tf)
sms_dtm2
## <<DocumentTermMatrix (documents: 5574, terms: 7025)>>
## Non-/sparse entries: 43769/39113581
## Sparsity           : 100%
## Maximal term length: 40
## Weighting          : term frequency (tf)
sms_dtm_train <- sms_dtm[1:4169, ]
sms_dtm_test <- sms_dtm[4170:5559, ]
sms_train_labels <- sms_raw[1:4169, ]$type
sms_test_labels <- sms_raw[4170:5559, ]$type
prop.table(table(sms_train_labels))
## sms_train_labels
##       ham      spam 
## 0.8647158 0.1352842
prop.table(table(sms_test_labels))
## sms_test_labels
##       ham      spam 
## 0.8697842 0.1302158
  1. The cleaned corpus word cloud was created first. Since our corpus is based on a vector source we get a warning when looking at the ham an spam clouds, but no documents were dropped.

In training the naive bayes classifier we eliminate any words that do not appear in at least five text message. Since the cells in the sparse matrix are numeric and measure the number of times a word appears in a message we have to change it to a categorical variable using “yes” or “no” to indicate if the word appears at all. After training the model we evaluate the performance. We want to allow words that appear in zero spam or zero ham messages to have input in the classification process. So we include a laplace so that if a word only appears in spam and not ham it is not indefinitely considered only spam.

wordcloud(sms_corpus_clean, min.freq = 50, random.order = FALSE)

spam <- subset(sms_raw, type == "spam")
ham <- subset(sms_raw, type == "ham")
wordcloud(spam$text, max.words = 30, scale = c(3, 0.5))
## Warning in tm_map.SimpleCorpus(corpus, tm::removePunctuation):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus, function(x) tm::removeWords(x,
## tm::stopwords())): transformation drops documents

wordcloud(ham$text, max.words = 30, scale = c(3, 0.5))
## Warning in tm_map.SimpleCorpus(corpus, tm::removePunctuation):
## transformation drops documents

## Warning in tm_map.SimpleCorpus(corpus, tm::removePunctuation):
## transformation drops documents

findFreqTerms(sms_dtm_train, 5)
##    [1] "â£wk"          "â\200¦"           "â\200“"           "abiola"       
##    [5] "abl"           "abt"           "accept"        "access"       
##    [9] "account"       "across"        "activ"         "actual"       
##   [13] "add"           "address"       "admir"         "adult"        
##   [17] "advanc"        "aft"           "afternoon"     "aftr"         
##   [21] "age"           "ago"           "ahead"         "aight"        
##   [25] "aint"          "air"           "aiyah"         "alex"         
##   [29] "almost"        "alon"          "alreadi"       "alright"      
##   [33] "alrit"         "also"          "alway"         "amp"          
##   [37] "angri"         "announc"       "anoth"         "answer"       
##   [41] "anybodi"       "anymor"        "anyon"         "anyth"        
##   [45] "anytim"        "anyway"        "apart"         "app"          
##   [49] "appli"         "appoint"       "appreci"       "april"        
##   [53] "ard"           "area"          "argument"      "arm"          
##   [57] "around"        "arrang"        "arrest"        "arriv"        
##   [61] "asap"          "ask"           "askd"          "asleep"       
##   [65] "ass"           "attempt"       "auction"       "avail"        
##   [69] "ave"           "avoid"         "await"         "award"        
##   [73] "away"          "awesom"        "babe"          "babi"         
##   [77] "back"          "bad"           "bag"           "bak"          
##   [81] "balanc"        "bank"          "bare"          "bath"         
##   [85] "batteri"       "bcoz"          "bcum"          "bday"         
##   [89] "beauti"        "becom"         "bed"           "bedroom"      
##   [93] "begin"         "believ"        "belli"         "best"         
##   [97] "better"        "bid"           "big"           "bill"         
##  [101] "bird"          "birthday"      "bit"           "black"        
##  [105] "blank"         "bless"         "blue"          "bluetooth"    
##  [109] "bodi"          "bold"          "bonus"         "boo"          
##  [113] "book"          "bore"          "boss"          "bother"       
##  [117] "bout"          "bowl"          "box"           "boy"          
##  [121] "boytoy"        "brand"         "break"         "breath"       
##  [125] "brilliant"     "bring"         "brother"       "bslvyl"       
##  [129] "btnationalr"   "budget"        "bugi"          "bus"          
##  [133] "busi"          "buy"           "buzz"          "cabin"        
##  [137] "cafe"          "cal"           "call"          "caller"       
##  [141] "callertun"     "camcord"       "came"          "camera"       
##  [145] "can"           "cancel"        "cant"          "car"          
##  [149] "card"          "care"          "carlo"         "case"         
##  [153] "cash"          "cashbal"       "catch"         "caus"         
##  [157] "chanc"         "chang"         "charact"       "charg"        
##  [161] "chariti"       "chat"          "cheap"         "check"        
##  [165] "cheer"         "chennai"       "chikku"        "childish"     
##  [169] "children"      "chines"        "choic"         "choos"        
##  [173] "christma"      "cine"          "cinema"        "claim"        
##  [177] "class"         "clean"         "clear"         "click"        
##  [181] "clock"         "close"         "club"          "code"         
##  [185] "coffe"         "coin"          "cold"          "colleagu"     
##  [189] "collect"       "colleg"        "colour"        "come"         
##  [193] "comin"         "comp"          "compani"       "competit"     
##  [197] "complet"       "complimentari" "comput"        "concentr"     
##  [201] "condit"        "confid"        "confirm"       "congrat"      
##  [205] "congratul"     "connect"       "contact"       "content"      
##  [209] "convey"        "cook"          "cool"          "copi"         
##  [213] "correct"       "cos"           "cost"          "countri"      
##  [217] "coupl"         "cours"         "cover"         "coz"          
##  [221] "crave"         "crazi"         "credit"        "cri"          
##  [225] "croydon"       "cuddl"         "cum"           "cup"          
##  [229] "current"       "custcar"       "custom"        "cut"          
##  [233] "cute"          "cuz"           "dad"           "daddi"        
##  [237] "damn"          "darl"          "darlin"        "darren"       
##  [241] "dat"           "date"          "day"           "dead"         
##  [245] "deal"          "dear"          "decid"         "deep"         
##  [249] "definit"       "del"           "delet"         "deliv"        
##  [253] "deliveri"      "den"           "depend"        "detail"       
##  [257] "dey"           "didnt"         "die"           "differ"       
##  [261] "difficult"     "digit"         "din"           "dinner"       
##  [265] "direct"        "dis"           "discount"      "discuss"      
##  [269] "disturb"       "dnt"           "doctor"        "doesnt"       
##  [273] "dog"           "doin"          "dollar"        "don"          
##  [277] "donâ’t"        "donâ\200\230t"       "done"          "dont"         
##  [281] "door"          "doubl"         "download"      "draw"         
##  [285] "dream"         "drink"         "drive"         "drop"         
##  [289] "drug"          "dude"          "dun"           "dunno"        
##  [293] "dvd"           "earli"         "earlier"       "easi"         
##  [297] "eat"           "eatin"         "either"        "els"          
##  [301] "email"         "embarass"      "empti"         "end"          
##  [305] "enemi"         "energi"        "england"       "enjoy"        
##  [309] "enough"        "enter"         "entri"         "envelop"      
##  [313] "especi"        "etc"           "euro"          "eve"          
##  [317] "even"          "ever"          "everi"         "everyon"      
##  [321] "everyth"       "exact"         "exam"          "excel"        
##  [325] "excit"         "excus"         "expect"        "experi"       
##  [329] "expir"         "extra"         "eye"           "face"         
##  [333] "facebook"      "fact"          "fall"          "famili"       
##  [337] "fanci"         "fantasi"       "fantast"       "far"          
##  [341] "fast"          "fat"           "father"        "fault"        
##  [345] "feel"          "felt"          "fetch"         "fight"        
##  [349] "figur"         "file"          "fill"          "film"         
##  [353] "final"         "find"          "fine"          "finger"       
##  [357] "finish"        "first"         "five"          "fix"          
##  [361] "flight"        "flirt"         "flower"        "follow"       
##  [365] "fone"          "food"          "forev"         "forget"       
##  [369] "forgot"        "forward"       "found"         "free"         
##  [373] "freemsg"       "freephon"      "fren"          "fri"          
##  [377] "friday"        "friend"        "friendship"    "frm"          
##  [381] "frnd"          "frnds"         "fuck"          "full"         
##  [385] "fullonsmscom"  "fun"           "funni"         "futur"        
##  [389] "gal"           "game"          "gap"           "gas"          
##  [393] "gave"          "gay"           "gentl"         "get"          
##  [397] "gettin"        "gift"          "girl"          "give"         
##  [401] "glad"          "god"           "goe"           "goin"         
##  [405] "gone"          "gonna"         "good"          "goodmorn"     
##  [409] "goodnight"     "got"           "goto"          "gotta"        
##  [413] "great"         "green"         "greet"         "grin"         
##  [417] "group"         "guarante"      "gud"           "guess"        
##  [421] "guy"           "gym"           "haf"           "haha"         
##  [425] "hai"           "hair"          "half"          "hand"         
##  [429] "hang"          "happen"        "happi"         "hard"         
##  [433] "hav"           "havent"        "head"          "hear"         
##  [437] "heard"         "heart"         "heavi"         "hee"          
##  [441] "hell"          "hello"         "help"          "hey"          
##  [445] "hgsuiteland"   "high"          "hit"           "hiya"         
##  [449] "hmm"           "hmmm"          "hmv"           "hol"          
##  [453] "hold"          "holder"        "holiday"       "home"         
##  [457] "honey"         "hook"          "hop"           "hope"         
##  [461] "horni"         "hospit"        "hot"           "hotel"        
##  [465] "hour"          "hous"          "housemaid"     "how"          
##  [469] "howev"         "howz"          "hrs"           "hug"          
##  [473] "huh"           "hungri"        "hurri"         "hurt"         
##  [477] "iam"           "ice"           "idea"          "identifi"     
##  [481] "ignor"         "ill"           "imagin"        "imma"         
##  [485] "immedi"        "import"        "inc"           "inch"         
##  [489] "includ"        "india"         "indian"        "info"         
##  [493] "inform"        "instead"       "interest"      "interview"    
##  [497] "invit"         "ipod"          "irrit"         "ish"          
##  [501] "issu"          "ive"           "izzit"         "januari"      
##  [505] "jay"           "job"           "john"          "join"         
##  [509] "joke"          "joy"           "jus"           "just"         
##  [513] "juz"           "kalli"         "kate"          "keep"         
##  [517] "kept"          "key"           "kick"          "kid"          
##  [521] "kill"          "kind"          "kinda"         "king"         
##  [525] "kiss"          "knew"          "know"          "knw"          
##  [529] "ladi"          "land"          "landlin"       "laptop"       
##  [533] "lar"           "last"          "late"          "later"        
##  [537] "latest"        "laugh"         "lazi"          "ldn"          
##  [541] "lead"          "learn"         "least"         "leav"         
##  [545] "lect"          "left"          "leh"           "lei"          
##  [549] "lemm"          "less"          "lesson"        "let"          
##  [553] "letter"        "liao"          "librari"       "lick"         
##  [557] "lie"           "life"          "lift"          "light"        
##  [561] "like"          "line"          "link"          "list"         
##  [565] "listen"        "littl"         "live"          "load"         
##  [569] "loan"          "local"         "locat"         "log"          
##  [573] "login"         "lol"           "long"          "longer"       
##  [577] "look"          "lor"           "lose"          "lost"         
##  [581] "lot"           "lovabl"        "love"          "lover"        
##  [585] "loverboy"      "loyalti"       "ltd"           "ltdecimalgt"  
##  [589] "ltgt"          "lttimegt"      "luck"          "lucki"        
##  [593] "lunch"         "luv"           "made"          "mah"          
##  [597] "mail"          "make"          "man"           "mani"         
##  [601] "march"         "mark"          "marri"         "marriag"      
##  [605] "match"         "mate"          "matter"        "maxim"        
##  [609] "may"           "mayb"          "mean"          "meant"        
##  [613] "med"           "medic"         "meet"          "meh"          
##  [617] "mell"          "member"        "men"           "menu"         
##  [621] "merri"         "messag"        "met"           "mid"          
##  [625] "midnight"      "might"         "min"           "mind"         
##  [629] "mine"          "minut"         "miracl"        "miss"         
##  [633] "mistak"        "moan"          "mob"           "mobil"        
##  [637] "mobileupd"     "mode"          "mom"           "moment"       
##  [641] "mon"           "monday"        "money"         "month"        
##  [645] "mood"          "moon"          "morn"          "motorola"     
##  [649] "move"          "movi"          "mrng"          "mrt"          
##  [653] "msg"           "msgs"          "mths"          "much"         
##  [657] "mum"           "murder"        "music"         "must"         
##  [661] "muz"           "nah"           "nake"          "name"         
##  [665] "nation"        "natur"         "naughti"       "near"         
##  [669] "need"          "net"           "network"       "neva"         
##  [673] "never"         "new"           "news"          "next"         
##  [677] "nice"          "nigeria"       "night"         "nite"         
##  [681] "nobodi"        "noe"           "nokia"         "none"         
##  [685] "noon"          "nope"          "normal"        "noth"         
##  [689] "notic"         "now"           "ntt"           "num"          
##  [693] "number"        "nxt"           "nyt"           "offer"        
##  [697] "offic"         "offici"        "okay"          "oki"          
##  [701] "old"           "omw"           "one"           "onlin"        
##  [705] "oop"           "open"          "oper"          "opinion"      
##  [709] "opt"           "optout"        "orang"         "orchard"      
##  [713] "order"         "oredi"         "oso"           "other"        
##  [717] "otherwis"      "outsid"        "pack"          "page"         
##  [721] "paid"          "pain"          "paper"         "parent"       
##  [725] "park"          "part"          "parti"         "partner"      
##  [729] "pass"          "passion"       "password"      "past"         
##  [733] "pay"           "peac"          "peopl"         "per"          
##  [737] "person"        "pete"          "phone"         "photo"        
##  [741] "pic"           "pick"          "pictur"        "piec"         
##  [745] "pix"           "pizza"         "place"         "plan"         
##  [749] "plane"         "play"          "player"        "pleas"        
##  [753] "pleasur"       "pls"           "plus"          "plz"          
##  [757] "pmin"          "pmsg"          "pobox"         "poboxwwq"     
##  [761] "point"         "poli"          "polic"         "poor"         
##  [765] "pop"           "possibl"       "post"          "pound"        
##  [769] "power"         "pple"          "ppm"           "practic"      
##  [773] "pray"          "prefer"        "prepar"        "press"        
##  [777] "pretti"        "price"         "princess"      "privat"       
##  [781] "prize"         "prob"          "probabl"       "problem"      
##  [785] "process"       "project"       "promis"        "pub"          
##  [789] "put"           "qualiti"       "question"      "quick"        
##  [793] "quit"          "quiz"          "quot"          "rain"         
##  [797] "rate"          "rather"        "rcvd"          "reach"        
##  [801] "read"          "readi"         "real"          "realiz"       
##  [805] "realli"        "reason"        "receipt"       "receiv"       
##  [809] "recent"        "record"        "refer"         "regard"       
##  [813] "regist"        "remain"        "rememb"        "remind"       
##  [817] "remov"         "rent"          "rental"        "repli"        
##  [821] "repres"        "request"       "respond"       "respons"      
##  [825] "rest"          "result"        "return"        "reveal"       
##  [829] "review"        "right"         "ring"          "rington"      
##  [833] "rite"          "road"          "rock"          "room"         
##  [837] "roommat"       "rose"          "round"         "rowwjhl"      
##  [841] "rpli"          "rreveal"       "run"           "sad"          
##  [845] "sae"           "safe"          "said"          "sale"         
##  [849] "sam"           "sat"           "saturday"      "savamob"      
##  [853] "save"          "saw"           "say"           "sch"          
##  [857] "school"        "score"         "scream"        "sea"          
##  [861] "search"        "season"        "sec"           "second"       
##  [865] "secret"        "see"           "seem"          "seen"         
##  [869] "select"        "self"          "sell"          "semest"       
##  [873] "send"          "sens"          "sent"          "serious"      
##  [877] "servic"        "set"           "settl"         "sex"          
##  [881] "sexi"          "shall"         "share"         "shd"          
##  [885] "ship"          "shirt"         "shit"          "shop"         
##  [889] "short"         "show"          "shower"        "shuhui"       
##  [893] "sick"          "side"          "sigh"          "sight"        
##  [897] "sign"          "silent"        "simpl"         "sinc"         
##  [901] "sing"          "singl"         "sir"           "sis"          
##  [905] "sister"        "sit"           "situat"        "sky"          
##  [909] "slave"         "sleep"         "slept"         "slow"         
##  [913] "slowli"        "small"         "smile"         "smoke"        
##  [917] "sms"           "smth"          "snow"          "sofa"         
##  [921] "solv"          "somebodi"      "someon"        "someth"       
##  [925] "sometim"       "somewher"      "song"          "soni"         
##  [929] "sonyericsson"  "soon"          "sorri"         "sort"         
##  [933] "sound"         "space"         "speak"         "special"      
##  [937] "specialcal"    "spend"         "spent"         "spoke"        
##  [941] "sport"         "spree"         "stand"         "star"         
##  [945] "start"         "statement"     "station"       "stay"         
##  [949] "std"           "still"         "stock"         "stop"         
##  [953] "store"         "stori"         "str"           "straight"     
##  [957] "street"        "strong"        "student"       "studi"        
##  [961] "stuff"         "stupid"        "style"         "sub"          
##  [965] "subscrib"      "success"       "summer"        "sun"          
##  [969] "sunday"        "sunshin"       "support"       "suppos"       
##  [973] "sure"          "surpris"       "sweet"         "swing"        
##  [977] "system"        "take"          "talk"          "tampa"        
##  [981] "tcs"           "teach"         "team"          "tear"         
##  [985] "teas"          "tel"           "tell"          "ten"          
##  [989] "tenerif"       "term"          "test"          "text"         
##  [993] "thank"         "thanx"         "that"          "thing"        
##  [997] "think"         "thinkin"       "thk"           "thnk"         
## [1001] "tho"           "though"        "thought"       "throw"        
## [1005] "thru"          "tht"           "thur"          "ticket"       
## [1009] "til"           "till"          "time"          "tire"         
## [1013] "titl"          "tmr"           "tncs"          "today"        
## [1017] "togeth"        "told"          "tomo"          "tomorrow"     
## [1021] "tone"          "tonight"       "tonit"         "took"         
## [1025] "top"           "tot"           "total"         "touch"        
## [1029] "tough"         "tour"          "toward"        "town"         
## [1033] "track"         "train"         "transact"      "treat"        
## [1037] "tri"           "trip"          "troubl"        "true"         
## [1041] "trust"         "truth"         "tscs"          "ttyl"         
## [1045] "tuesday"       "turn"          "twice"         "two"          
## [1049] "txt"           "txting"        "txts"          "type"         
## [1053] "ufind"         "ugh"           "umma"          "uncl"         
## [1057] "understand"    "unless"        "unlimit"       "unredeem"     
## [1061] "unsub"         "unsubscrib"    "updat"         "ure"          
## [1065] "urgent"        "urself"        "use"           "usf"          
## [1069] "usual"         "uve"           "valentin"      "valid"        
## [1073] "valu"          "vari"          "verifi"        "via"          
## [1077] "video"         "visit"         "voic"          "voucher"      
## [1081] "wait"          "wake"          "walk"          "wan"          
## [1085] "wana"          "wanna"         "want"          "wap"          
## [1089] "warm"          "wast"          "wat"           "watch"        
## [1093] "water"         "way"           "weak"          "wear"         
## [1097] "weather"       "wed"           "wednesday"     "weed"         
## [1101] "week"          "weekend"       "weight"        "welcom"       
## [1105] "well"          "wen"           "went"          "wer"          
## [1109] "wet"           "what"          "whatev"        "whenev"       
## [1113] "whole"         "wid"           "wif"           "wife"         
## [1117] "wil"           "will"          "win"           "wine"         
## [1121] "winner"        "wish"          "wit"           "within"       
## [1125] "without"       "wiv"           "wkli"          "wnt"          
## [1129] "woke"          "won"           "wonder"        "wont"         
## [1133] "word"          "work"          "workin"        "world"        
## [1137] "worri"         "worth"         "wot"           "wow"          
## [1141] "write"         "wrong"         "wun"           "wwwgetzedcouk"
## [1145] "xmas"          "xxx"           "yahoo"         "yar"          
## [1149] "yeah"          "year"          "yep"           "yes"          
## [1153] "yest"          "yesterday"     "yet"           "yoga"         
## [1157] "yogasana"      "yrs"           "yun"           "yup"
sms_freq_words <- findFreqTerms(sms_dtm_train, 5)
str(sms_freq_words)
##  chr [1:1160] "â£wk" "â\200¦" "â\200“" "abiola" "abl" "abt" "accept" ...
sms_dtm_freq_train<- sms_dtm_train[ , sms_freq_words]
sms_dtm_freq_test <- sms_dtm_test[ , sms_freq_words]
convert_counts <- function(x) {
x <- ifelse(x > 0, "Yes", "No")
}
sms_train <- apply(sms_dtm_freq_train, MARGIN = 2,
convert_counts)
sms_test <- apply(sms_dtm_freq_test, MARGIN = 2,
convert_counts)
sms_classifier <- naiveBayes(sms_train, sms_train_labels)
sms_test_pred <- predict(sms_classifier, sms_test)
CrossTable(sms_test_pred, sms_test_labels,
prop.chisq = FALSE, prop.t = FALSE,
dnn = c('predicted', 'actual'))
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1390 
## 
##  
##              | actual 
##    predicted |       ham |      spam | Row Total | 
## -------------|-----------|-----------|-----------|
##          ham |      1200 |        20 |      1220 | 
##              |     0.984 |     0.016 |     0.878 | 
##              |     0.993 |     0.110 |           | 
## -------------|-----------|-----------|-----------|
##         spam |         9 |       161 |       170 | 
##              |     0.053 |     0.947 |     0.122 | 
##              |     0.007 |     0.890 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |      1209 |       181 |      1390 | 
##              |     0.870 |     0.130 |           | 
## -------------|-----------|-----------|-----------|
## 
## 
sms_classifier2 <- naiveBayes(sms_train, sms_train_labels,
laplace = 1)
sms_test_pred2 <- predict(sms_classifier2, sms_test)
CrossTable(sms_test_pred2, sms_test_labels,
prop.chisq = FALSE, prop.t = FALSE, prop.r = FALSE,
dnn = c('predicted', 'actual'))
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1390 
## 
##  
##              | actual 
##    predicted |       ham |      spam | Row Total | 
## -------------|-----------|-----------|-----------|
##          ham |      1202 |        28 |      1230 | 
##              |     0.994 |     0.155 |           | 
## -------------|-----------|-----------|-----------|
##         spam |         7 |       153 |       160 | 
##              |     0.006 |     0.845 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |      1209 |       181 |      1390 | 
##              |     0.870 |     0.130 |           | 
## -------------|-----------|-----------|-----------|
## 
##