Decision Forests - Revision history

Wikiuser: /* A parallel Decision Forest */

2016-03-19T10:17:15Z

‎A parallel Decision Forest

Wikiuser: /* Code on Github */

2016-03-19T10:00:49Z

‎Code on Github

Wikiuser: Created page with " == Code on Github == Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github..."

2016-03-19T09:50:38Z

Created page with " == Code on Github == Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github..."

New page

== Code on Github ==
Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github repo]

@@ Line 8: / Line 8: @@
    <li>extractwords.py - contains the ExtractWords class, which is used for automatic feature generation.</li>
    <ul><li>Reads a large number of emails and finds the frequency each word appears in spam or non-spam emails.  Then those frequencies are subtracted from eachother.  The words with the highest magnitudes then are used as features.  (The numbers with large magnitudes should be those that are particularly spammy, or not)  A new file  -message.fts- is written, containing the features.</li></ul>
 </ul>

← Older revision		Revision as of 10:00, 19 March 2016
Line 1:		Line 1:

−	== ~~Code on Github~~ ==	+	== A parallel Decision Forest ==
	Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github repo]		Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github repo]
		+
		+	Explanation of some files
		+	<ul>
		+	<li>emaildata.py - contains the EmailData class, whose methods are used to extract data from each individual email.</li>
		+	<li>extractwords.py - contains the ExtractWords class, which is used for automatic feature generation.</li>
		+	<ul><li>Reads a large number of emails and finds the frequency each word appears in spam or non-spam emails. Then those frequencies are subtracted from eachother. The words with the highest magnitudes then are used as features. (The numbers with large magnitudes should be those that are particularly spammy, or not) A new file -message.fts- is written, containing the features.</li></ul>
		+	</ul>