Difference between revisions of "Decision Forests"
(Created page with " == Code on Github == Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github...") |
(→Code on Github) |
||
Line 1: | Line 1: | ||
− | == | + | == A parallel Decision Forest == |
Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github repo] | Here is a link to all of the code that I used for my email spam/ham classification algorithm - [https://github.com/brianjp93/email-classification/ github repo] | ||
+ | |||
+ | Explanation of some files | ||
+ | <ul> | ||
+ | <li>emaildata.py - contains the EmailData class, whose methods are used to extract data from each individual email.</li> | ||
+ | <li>extractwords.py - contains the ExtractWords class, which is used for automatic feature generation.</li> | ||
+ | <ul><li>Reads a large number of emails and finds the frequency each word appears in spam or non-spam emails. Then those frequencies are subtracted from eachother. The words with the highest magnitudes then are used as features. (The numbers with large magnitudes should be those that are particularly spammy, or not) A new file -message.fts- is written, containing the features.</li></ul> | ||
+ | </ul> |
Revision as of 02:00, 19 March 2016
A parallel Decision Forest
Here is a link to all of the code that I used for my email spam/ham classification algorithm - github repo
Explanation of some files
- emaildata.py - contains the EmailData class, whose methods are used to extract data from each individual email.
- extractwords.py - contains the ExtractWords class, which is used for automatic feature generation.
- Reads a large number of emails and finds the frequency each word appears in spam or non-spam emails. Then those frequencies are subtracted from eachother. The words with the highest magnitudes then are used as features. (The numbers with large magnitudes should be those that are particularly spammy, or not) A new file -message.fts- is written, containing the features.