bio_spam_word

description

Determine if the TSN Bio contains one or more spam words.

purpose

Certain twam accounts pushing pharmaceutical or adult natured products via Twitter have been observed to contain specific 'spammy' words in their account Bio details.

grading

GRADE 1 2 3 4 5
DESC 0 Spam Words 1 Spam Word 2 Spam Words 3 Spam Words 4 or more Spam Words

example

The following example is from a TSN that has a total of 10 words in their Bio, 2 of which have been identified as spam-like, resulting in a module grade of 3:

<bio_spam_word> 
	<date>1265543618</date> 
	<exec_time>8</exec_time> 
	<raw_data> 
		<total_bio_words>10</total_bio_words> 
		<total_spam_words>2</total_spam_words> 
		<spam_words>amphetamine,xxx</spam_words> 
	</raw_data> 
	<result>3</result> 
</bio_spam_word> 

data

TWASE holds approximately 400 commonly used words in spam communications (such as junk email) that are used to compare to the TSN Bio content.

notes

The Twitter Bio contains a relatively small amount of space to add information about yourself. This increases the chances of people using abbreviations, smaller, highly specific, and spam-like words to describe themselves.

Thus, there is good chance of false positives generated by this module through matching Bio words with commonly used spam-like words and so it's relavance is low to the overall Scan Result.

 
module/bio_spam_word.txt · Last modified: 2010/02/28 10:46 (external edit)