Determine if the TSN Bio contains one or more spam words.
Certain twam accounts pushing pharmaceutical or adult natured products via Twitter have been observed to contain specific 'spammy' words in their account Bio details.
| GRADE | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| DESC | 0 Spam Words | 1 Spam Word | 2 Spam Words | 3 Spam Words | 4 or more Spam Words |
The following example is from a TSN that has a total of 10 words in their Bio, 2 of which have been identified as spam-like, resulting in a module grade of 3:
<bio_spam_word> <date>1265543618</date> <exec_time>8</exec_time> <raw_data> <total_bio_words>10</total_bio_words> <total_spam_words>2</total_spam_words> <spam_words>amphetamine,xxx</spam_words> </raw_data> <result>3</result> </bio_spam_word>
TWASE holds approximately 400 commonly used words in spam communications (such as junk email) that are used to compare to the TSN Bio content.
The Twitter Bio contains a relatively small amount of space to add information about yourself. This increases the chances of people using abbreviations, smaller, highly specific, and spam-like words to describe themselves.
Thus, there is good chance of false positives generated by this module through matching Bio words with commonly used spam-like words and so it's relavance is low to the overall Scan Result.