Scrolling Game Development Kit Forum
General => Off-Topic => Topic started by: bluemonkmn on 2010-08-05, 06:26:58 PM
-
I saw an article on Slashdot.org today that Capcha can now be defeated by automated programs. It gave me the thought to try to develop my own program that would test someone's ability to prove their human-ness. Here's what I got (source code and binary included). What do you think?
http://enigmadream.com/misc/HumanVerification.zip (http://enigmadream.com/misc/HumanVerification.zip)
-
It looks nice. I think facebook does something like this too. You have to identify your friends.
-
8) :laugh: I think this is clever. The only downside (think) is that someone may try to hack into the system to figure out which picture is represented for what. I'm just saying...
-
Hacking into the system to see which picture is what would require indexing Google images... that's where the images are coming from. That would be a pretty large index, and could possibly be foiled by distorting the pictures.
(You didn't think I had 6,000 images embedded in a 30KB EXE did you? :))
I've "played" this test a couple dozen times and haven't seen the same image twice yet. Every image I see is the first time I see it.
-
I like it a lot.! :laugh: Especially when strange and seemingly unrelated images appear. Example: "celeste" word got 3 girls in bikini. And "spider" got a really disgusting image about spiders nestled within a hole in some person's leg. Yuck!!!
Abstract of difficult concepts to put in images can pick up weird images I guess. Like "half-brother", this one wasn't obvious. Then again, that's exactly the kind of abstraction a computer cannot make. :)
I must admit that it's perfect 99% of the time.
-
Are you going to embed it in the forums?
-
No. At this point, it would be more work for me to dust off my PHP coding skills and edit someone else's code to change the verification system than it would be to continue manually accepting/rejecting people based on whether they sent me the required email during the sign-up process. Although it's tempting just as a interesting project -- I kind of want to keep my PHP coding skills fresh. But I have to work on the game I am creating with SGDK2... back on the other hand again, the timing is right for a better proof of concept on this human verification system, which will probably interest more people than my game will interest. But back on the other other hand, I don't know that I'd want to put this system in place on a live system where I'll have to rely on it, especially since some of the images it picks are inappropriate! Decisions decisions. I think my solution is too simple, and there are other more experienced people working on it, so I'd just be wasting my time. I'll let the experts do what they do, and I'll stick to what I'm good at. I could use this as a game idea, though :). A scrolling game that downloads things from Google images... wouldn't that be interesting... I know durnurd made a template that downloaded snowflake images from an online snowflake generator.
-
It seems like it might be a better idea to have like maybe five different images, each from a different category, and you have to match the category to the image. Or maybe you have four images where three of them are from one category and one unrelated, and you have to find the three that go together. Or some variation on the theme that might not be so strict, since sometimes the images do not obviously go into one particular category, but you could allow for some small deviation.
-
I didn't know you could do that. Can it download images from other sites that you designate it to?
-
It seems like it might be a better idea to have like maybe five different images, each from a different category, and you have to match the category to the image. Or maybe you have four images where three of them are from one category and one unrelated, and you have to find the three that go together. Or some variation on the theme that might not be so strict, since sometimes the images do not obviously go into one particular category, but you could allow for some small deviation.
What exactly does that accomplish that the current program does not? I think the main problem with the current program is that the noun list on which it is based has some problematic nouns in it. So some of the images are inappropriate (offensive) and others are ambiguous. But the vast majority of cases, a human (or at least I) can tell what the computer is expecting. So given 3 tries at passing 3 rounds in a row, it's very unlikely that I would be able to pass. Some younger folks might have a problem unless we simplify the word list a bit. But even they would often know that 3 pictures of catamarans probably aren't pictures that relate to the word "pig", even if they don't know what a catamaran is. Or that 3 pictures of a quail would go with the word "quail", even though they don't know exactly what a quail looks like, and only know that it's a bird (it's unlikely another bird appears in the list).
It seems like if you picked "categories", we wouldn't be able to use Google images as the source of the program's data... how would it work? You would search on fewer words, but look farther down the list of result images to pick random images? My problem with that is that the farther down the list you go, I think, the less relevant the images become to the word. But I suppose if you use a really good word, there should be lots of relevant images.
How's your PHP... do you want to try implementing one or both of these to show off as a web site instead of a C# program? :)
-
I didn't know you could do that. Can it download images from other sites that you designate it to?
It's really simple with .NET. It has objects/commands to interact with the internet very easily. You could load the code into C# Express and see exactly how it works. It's pretty short. Of course it has to manually parse the HTML of the pages it gets to find the relevant images. But it's still pretty simple.
-
I didn't know you could do that. Can it download images from other sites that you designate it to?
It's really simple with .NET. It has objects/commands to interact with the internet very easily. You could load the code into C# Express and see exactly how it works. It's pretty short. Of course it has to manually parse the HTML of the pages it gets to find the relevant images. But it's still pretty simple.
I like that! It gives me an idea.
-
What exactly does that accomplish that the current program does not?
Given a set of 5 options and 3 sets, that's a total of only 5^3 or 125 different values to choose randomly from. The probability that two random number generators pick the same number between 1 and 125 is relatively high compared with how quickly a robot can submit tries. And that's taking it completely randomly. If the robot takes the 5 words you give and does a search on those and finds even 1 image that's similar to or the same as one you provide, it can guess with a much greater degree of accuracy.
When I say "category" I mean the list of words you have in the program, not anything more broad than that. If you require matching a set of images to a set of categories, or words, then each answer relies on the previous answers also being correct, since you cannot choose a word that has already been chosen. If matching 5 images, That gives you 5!, or 120 different possible answers to choose from randomly. A similar number, even somewhat lower. But adding even one more image bumps it up to 720 possibilities, whereas adding another option to the first set only jumps it up to 216, or adding another set bumps it to 625, but that requires downloading and looking at more images.
-
I've got a PHP version running here:
http://www.findmyed.com/test/
Note that it submits the random seed used to find the current images to verify the results. Once you verify a seed once, it will always be the same, and you could continuously just submit the same URL over and over to get a "correct" response. This can be worked around using a time-of-day hash with a secret salt or various other methods (such as storing the seed on the server and using a session ID to access it) which I didn't bother to implement.
I particularly like the search for "Random" which turns up an image of white noise, bright colors, a guy staring at a wine glass, and Chuck Norris.
-
Can I see the source of that durnurd?
-
Can I see the source of that durnurd?
Right Click -> View Source
-
Can I see the source of that durnurd?
Right Click -> View Source
It's PHP, it doesn't work like that. It's executed server-side and HTML is sent to the browser.
-
Given a set of 5 options and 3 sets, that's a total of only 5^3 or 125 different values to choose randomly from.
You have to keep in mind how much work the user is doing versus how likely it is that a random guesser could get this right. I'm still not sure I understand your scheme, but it seems to me like you still have basically the same ratio between user work to probability of random correctness. Ideally my version of the test would be presented on 1 page where you have to select 3 correct answers and *then* submit everything at once rather than go through 3 separate trials. But each answer is still a separate question that the user has to think about. Does your solution provide the user with the opportunity to think about fewer images (and/or fewer words to match) while providing a lower chance of correct random guessing?
-
Fewer images to look at, yes. There would be only 6 images instead of 15 or 20 (assuming 6 images for the matching, 5 images for each of 3 or 4 groups for your method). Less of a random correct guess, yes. The number of incorrect answers is much larger for matching vs. your method (719 vs 624). Fewer words to deal with, yes, since there's only 6 words instead of 15 or 20 options to choose 3 or 4 from.
-
I make no claims about the code's prettiness.
<?php
$checkMode = false;
$nextSeed = rand();
if (isset($_GET['seed']) && isset($_GET['value'])) {
$seed = $_GET['seed'];
$checkMode = true;
} else {
$seed = $nextSeed;
}
?>
<html>
<head>
<script language="javascript">
function validate(value) {
location.href = "<?= $_SERVER['PHP_SELF'] ?>?seed=<?= $nextSeed ?>&value=" + value.innerHTML;
}
</script>
</head>
<body>
<?php
function base64EncodeImage($url) {
$imageData = file_get_contents($url);
return base64_encode($imageData);
}
function getWords(&$searchTerm) {
$words = file('words.txt');
$indices = array_rand($words,5);
shuffle($indices);
$searchTerms = array();
foreach ($indices as $idx) {
$searchTerms[] = $words[$idx];
}
$searchTerm = $searchTerms[array_rand($searchTerms)];
return $searchTerms;
}
srand($seed);
$searchTerms = getWords($realTerm);
if ($checkMode) {
if (trim($realTerm) == $_GET['value']) {
echo 'Correct<br>';
} else {
die 'Incorrect';
}
srand($nextSeed);
$searchTerms = getWords($realTerm);
}
$contents = file_get_contents('http://images.google.com/images?q=' . $realTerm);
preg_match_all('/"(http[^"]*gstatic[^"]*)"/s',$contents,$arr);
$arr = array_slice($arr[1],0,10); //Only use the first 10 results
$indices = array_rand($arr,4); //Pick 4 random images from the first 10
foreach ($indices as $idx) {
$imageData = base64EncodeImage($arr[$idx]);
echo "<img src=\"data:image/png;base64,$imageData\">\n"; //Display the image as a base-64-encoded image so the filename isn't shown
}
?>
<br>
<select onchange="validate(this.options[this.selectedIndex])">
<option selected>--Choose One--</option>
<?php foreach ($searchTerms as $term) { ?>
<option><?= $term ?></option>
<?php } ?>
</select>
</body>
-
Fewer images to look at, yes. There would be only 6 images instead of 15 or 20 (assuming 6 images for the matching, 5 images for each of 3 or 4 groups for your method). Less of a random correct guess, yes. The number of incorrect answers is much larger for matching vs. your method (719 vs 624). Fewer words to deal with, yes, since there's only 6 words instead of 15 or 20 options to choose 3 or 4 from.
I'm still not seeing it -- how hard would it be for you to try to make a demo of your version?
-
Have you ever taken a quiz where there was matching involved? Like a list of words on one side with blanks, and a list of definitions on the other side with letters, and you had to fill in the blanks with the letter for the correct definition of the word. That's what I'm talking about, except with images and nouns instead of words and definitions.
-
Well, in a sense, you have to look at more images for your test than mine because the only reason I am picking 3 images is because often times Google images' first result does not give a good picture of a specific term. If you are only picking one image per word, sure, your chances might be a *little* bit better with your test that you can rule out all the other words using the other images and figure out an ambiguous image, but I think you eliminate a lot more ambiguity by showing 3 images per word. The chances of getting 2 words whose images look completely unrelated to the words is relatively high in your test, I fear. Also, that process of elimination can be significantly more taxing on the user. Easier to have more clues. When I think about it, looking at images is much quicker than picking a word from a list, too, so it seems easier to select 1 of 6 words 3 times that to select 1 of 6 words 5 times (and let the remaining word be the only choice). Technically you're not selecting 1 of 6 words each time, but in a sense you are because you still want to consider every word for every image in case you didn't pick the best match in an earlier choice. If we were to make the user pick from a completely separate list for each image clue (or trio of image clues), I think it would be about the same amount of effort on the user's part, but provide significantly more security in eliminating random success: 6^5 = 7776.