I feel so dirty for writing this, but I was inspired by this twitterer.
After reading how to create a twitter bot in 5 minutes, I realized that I could combine this with simplepie and create a twitter spammer in less than ten minutes. Well, ok it took me fifteen.
So the steps were:
Get something to say,
Get someone to say it to,
Say it.
First, I needed an account with something to say. I used my fsbo homes already-spamming twitter feed. ( . http://twitter.com/fsbohomes ). It just takes the new stuff in the fsbo auction site’s rss feed and twitters it via twitterfeed.
As it turns out, the public feed probably isn’t the best tool for this; a better tool would be twitter’s search at http://search.twitter.com/ . Note the rss feed icon on the results page.
Here’s what I came up with:
// simplepie makes the rss parsing very easy.
include_once( "libs/simplepie.inc" );
$keywords=
array(
‘fsbo’=>
"Hey come check out great prices on real estate auctions: http://www.fsboauction.info"
);
$username = ‘fsbohomes’;
$password = ‘somepassword’;
// read the public timeline. Note that this is cached for 60 seconds, so asking for it more often
// than that is silly. Also notice that it won’t catch _everything_ but you should get at least one a day.
$read=‘http://twitter.com/statuses/public_timeline.rss’;
$feed=new SimplePie( );
$feed->set_feed_url( $read );
$feed->set_cache_duration( 60 );
$feed->init();
$feed->handle_content_type();
$count=0;
foreach( $feed->get_items() as $item ){
// read the tweet.
$tweet=$item->get_title();
// split it into username and the contents.
// we’ll be comparing our keywords against the contents and
// sending the username an @ message.
list( $a, $t)=split(‘:’, $tweet, 2);
// I had a lot of multibyte-encoded lines but didn’t feel like parsing them.
// since I don’t speak the languages anyway.
$e= mb_detect_encoding( $t );
if ( $e == ‘ASCII’ ) {
// check each keyword
foreach( $keywords as $seek=>$message ){
$count++;
if ( strpos( $t, $seek ) !== false ) {
// if the keyword is found, send a message.
$msg = ‘@’.$a.‘ ‘.$message;
$msg = substr( $msg, 0, 140 );
sendTweet( $msg );
}
}
}
}
// http://twitter.com/statuses/public_timeline.format
// formats: xml, json, rss, atom
// this is from the twitter bot post referenced above
// it’s just a function for posting a twitter.
function sendTweet($msg){
global $username;
global $password;
$url = ‘http://twitter.com/statuses/update.xml’;
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_POST, 1);
curl_setopt($curl_handle, CURLOPT_POSTFIELDS, "status=$msg");
curl_setopt($curl_handle, CURLOPT_USERPWD, "$username:$password");
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);
if (empty($buffer)) {
echo "failed\n";
}
else {
echo "succeeded\n";
}
}
Ta-daa. I set it to run every five minutes and it caught about one twitter every few hours.
Written by russ on November 16th, 2008 with no comments.
Read more articles on code and Current Methods and tips.
Markov content is automatically generated using some random input, based on the probability that word Y follows word X. You feed it some input text and then a script analyzes what words come after each word and then assembles an automatic post that’s supposed to look human-enough to fool web spiders. It’s not really a white or grey thing; you can create thousands of pages of keyword rich content, but it’s meaningless to a human.
One word followed by the next word is very unreadable, but basing your markoving on two words (and then randomizing the following word) is a better technique.
All code in these examples are PHP code. There’s a flaw somewhere in it, perhaps you can spot it.
Here are the steps that this markov generator goes through;
- Read in a bunch of topical content.
- Keeping it in order, split it up into an array of words.
- Go through the array, taking a pair of words:
From “The quick brown fox” you’d get “the quick”, “quick brown”, “brown fox.”
- Now that you have the list of word-pairs, get a list of words that follow this pair.
Keep that in an array, but don’t make it unique. This way it’ll weight the random selection. By which I mean, if you have a word pair like “credit card” and the array is “users, debt, debt,” the word “debt” will have twice the weight when you do a random selection from that array. Which is good, because in human-written content, the word “debt” appears twice as often as the word “users” after “credit card.”
- Start the content. I chose to start with the first word pair of my input, but it could and should be randomized.
- Take the last two words of the output ( in this case it’s the first word pair) and look up the array of words that follow. Then take a random word out of that array, and append it to the output.
- Is it long enough yet? If not, take the last two words of the output again, and look up the array of the words that follow those two words. Keep doing this until it’s long enough.
Here’s the script I came up with. Feel free to use it or abuse it. It’s got some extra verbose variables in it because I’m trying to track down a flaw. It randomly (heh) dies because it can’t find the word pair in the input. But that should be impossible, because I’m selecting it from the input. Hmm, it’s possible that the “find me” part doesn’t have the punctuation that the “input” has, and so it dies.
If I keep putting together code for this site, I’ll have to install one of those pretty-code plugins.
// how many words is our goal?
$wordcount=
1000;
// read the input files. Really this should be a glob or a readdir() loop.
$i[]=file(‘source/after-you-pay-off-credit-card-debt.txt’);
$i[]=file(‘source/agency-card-credit-debt-settlement.txt’);
$i[]=file(‘source/a-problem-called-credit-card-debt.txt’);
/*
$i[]=file(‘source/bad-debt-credit-card-what-is-that.txt’);
$i[]=file(‘source/before-you-go-for-credit-card-debt-help.txt’);
$i[]=file(‘source/blogging-consolidation-debt-and-new-information-technology-239.txt’);
$i[]=file(‘source/card-com-credit-debt-en-language-site.txt’);
$i[]=file(‘source/college-student-credit-card-debt.txt’);
$i[]=file(‘source/consolidate-credit-card-debt.txt’);
$i[]=file(‘source/consolidate-your-credit-card-debt.txt’);
$i[]=file(‘source/credit-card-debt-consolidation-loan.txt’);
$i[]=file(‘source/credit-card-debt-consolidation.txt’);
$i[]=file(‘source/credit-card-debt-counseling.txt’);
$i[]=file(‘source/credit-card-debt-management.txt’);
$i[]=file(‘source/credit-card-debt-negotiation.txt’);
$i[]=file(‘source/credit-card-debt-reduction.txt’);
$i[]=file(‘source/credit-card-debt-relief.txt’);
$i[]=file(‘source/credit-card-debt-settlement.txt’);
$i[]=file(‘source/credit-card-debt.txt’);
$i[]=file(‘source/creditcarddebt.txt’);
$i[]=file(‘source/eliminate-credit-card-debt.txt’);
$i[]=file(‘source/excessive-credit-card-debt.txt’);
$i[]=file(‘source/get-out-of-credit-card-debt.txt’);
$i[]=file(‘source/is-consolidating-credit-card-debt-a-good-option.txt’);
$i[]=file(‘source/reduce-credit-card-debt.txt’);
$i[]=file(‘source/re-financing-to-consolidate-debt.txt’);
$i[]=file(‘source/taking-a-step-towards-credit-card-debt-elimination.txt’);
$i[]=file(‘source/teen-credit-card-debt-statistics.txt’);
$i[]=file(‘source/the-benefits-from-credit-card-debt-consolodation.txt’);
*/
// some print output to mark my spot while I troubleshoot.
print count( $i )." files loaded\n";
// put them all together.
for ($j=0;$j<count($i);$j++ ){
$input=array_merge($input, $i[$j] );
}
print "merged\n";
// clean up the input. We’re winding up with just a list of words separated by ONE space.
$input=implode(" ", $input );
$input=preg_replace(‘|[^a-zA-Z0-9 ]|’, ‘ ‘, $input );
// this next line cleans out all double spaces.
while( strpos( $input, ‘ ’) !== false ) { $input = str_replace( ‘ ’, ‘ ‘, $input ); }
$inputstring=$input;
$holder=explode(‘ ‘, $input );
unset( $input );
// not sure why I had empty elements in the $input array but this is to remove all of them.
$input=array();
foreach( $holder as $k=>$v) {
if ( strlen( trim( $v )) > 0 ) {
$input[]=trim( $v );
}
}
print "cleaned\n";
// this array is for the word pairs.
$segments=array();
$holder=array_shift( $input );
while( count( $input ) ) {
// take the current word and the next word from the input array.
$matchme=$holder.‘ ‘.$input[0];
$pattern="|$matchme (.*?) |";
// get all the matches from input- the long string of words not the array
preg_match_all( $pattern, $inputstring, $matches );
// add the list of matched words to the segments array
$segments[ $matchme ] = $matches[1];
// cut the first item from the input array and move it over one.
// note that the array_shift compares against the while strlen above, so the shift is important
$holder=array_shift( $input );
}
// just some output
print count($segments)." segments calculated\n";
$keys=array_keys( $segments );
// starting to assemble the output. Using $keys[0] to be the first bit of the markoved content.
$output[]=$keys[0];
// $bracket is reused each loop, but $output holds the … wait for it … output.
$bracket=$output[0];
print "assembling\n";
// yeah yeah compare the length of output against the desired length.
while( count( $output ) < $wordcount ){
// $bracket is the key for $segments.
$index=$bracket;
$possiblenext=$segments[$index];
// $possiblenext is the array of words-that-follow ‘bracket’.
if ( count( $possiblenext ) == 0 ) {
// this is where it’s dying oddly
print "no possible next for $bracket\n";
die("\n\naiee!!\n");
}
$rand=rand(0, count( $possiblenext )-1 );
// pick a random number out of the array. I had this all wrapped together but broke it apart when it wasn’t working
$nextword=$possiblenext[$rand];
$output[] = ‘ ‘.$nextword; // prepend a space so the words don’t mash together
// the next bracket is the second word of this group plus the nextword variable.
$holder=split( " ", $bracket );
// makes me sad that I can’t use split(‘ ‘, $bracket)[1];
$nextbracket=$holder[1].‘ ‘.$nextword;
$bracket = $nextbracket;
}
$output = implode( ‘ ‘, $output );
print $output;
print "\n";
So in a perfect world, I’d find that flaw and then maybe substitute some links into it from this stuff to a real site ( for instance, $output = str_replace(“bad debt”, “bad debt“, $output ) ) and then use xmlrpc and post it to a splog and ping pingomatic.
Hopefully this gets you going in writing your own markov generator. It’s important to write your own stuff, so you know how it works and to keep your stuff from looking too much like someone else’s stuff. My markov generator might generate different output ( different looking output ) from Nickycake‘s generator. It’ll definitely be different output, because it’s all randomized. You can feel free to use the code above any way you want, I’m sure that since it’s unfinished, your touches will make it unique.
Written by russ on November 3rd, 2008 with no comments.
Read more articles on code.
Wtf!
I’ve long been a fan of the National Novel Writing Month, NaNoWriMo. I don’t think I could keep up with the novel writing because of my other “irons in my fire” but the Old Blind Ape is hosting a shindig. We’re writing 50,000 words of content; which could be ( does the math ), 100 500 word articles for various sites.
I have sites that could use a few 500 word articles. It’ll be good to get the content running on them. And if I get the pump primed well enough, I’ll be able to talk myself into writing more often.
For instance, I don’t think I could come up with content for project car catalog, but with this, I should be able to come up with bloody something.
Written by russ on November 3rd, 2008 with no comments.
Read more articles on Current Methods and goals.