In this post I will demonstrate a linear web Crawler(meaning it doesn't split and search all URL's on page simultaneously, as this would be computationally intensive and exponential.
This loops through the current URL(target) and searches for the tell tale expressions of an href. if it exists it copies it and builds it as a string into a text box and copies the last URL of a page into the target url box.Then a button to give the user control of the Web Crawler:
and importing the following packages allows the program to function:
However a feed back loop is important to ensure that all URLs are eventually mapped:
this code should appear just below Next and it will map all of the n-1 links for the web chain. variables is just a variable, and btncontrol is a variable for a user controlled button or for a timer that resets every t minutes. For the button one would write:
btncontrol=btncontrol+1
while previously(outside of the individual button_click controller) state that btncontrol is equal to 0 at t=0, meaning that it is equal to 0 when the program initiates. By adding this feedback loop, the Web Crawler will autonomously run through the Internet if the buttons are replaced with timers.
No comments:
Post a Comment