| Free £25 Bet! | Free £50 Bet at VCBet! | Free £25 Bet! |

In association with Sports-Punter Free Bets Odds Comparison BetHelp Limso
We are the Official Forum of FCBet.com
| Sports News | Sports Stats | Live Scores | OddsChecker | Place Bets | Suggest a Site |
| |||||||
| Punters Tools & General Betting Help Forum Everyone has tools that use to help them bet. Match reports, odds comparison services, tipster services, betting calculators, selection software - discuss them here. |
| Free £25 Bet at Jaxx! |
![]() |
| | Thread Tools | Display Modes |
| | #1 (permalink) |
| Pro Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 23 Oct 2003 Location: Westdorpe Age: 43
Posts: 4,753
| Lesson 9 ( Word document ) Punters Lounge JAVA programming course. Lesson 9 - Reading a website. Reading a website, using Java that is, is done in a very similar way to reading from a file. You open a stream to the site and then sequentially read the contents of that site into a buffer. Before we start digging in we need to talk about HTML and URL. 9.1 HTML The World Wide Web is all about sharing information, and sharing in such a way that the information posted on a web site can be accessed from any computer anywhere in the world. So how can you make sure, that what you put on your site, can be viewed on all those different computers. Both in terms of different hardware and different operating systems? Well what happens is that we add codes to the information, telling the computer how information should be displayed. For example the code < b > … < /b > means that the information in between should be displayed as Bold. Now when looking at a website you don’t see these codes, they instruct the computer in the background on how to display information, but you as a user only get to see the actual information, the content. The codes are a matter of convention. We, as a global community, have made an agreement about this. This agreement has resulted in a standard called HTML. It stands for Hyper Text Markup Language. Anyone producing computer hardware or software can look at the HTML standard and incorporate that standard in the product they are making. So when you look at a website on a PC or a laptop, a Compaq or Dell or IBM or homemade brand, using Windows or UNIX or Linux, you see the same thing. That is because the manufacturers have incorporated the HTML codes standard in their product. What does this all mean for us? Well what we will be doing is reading websites. And we will read the content as well as the HTML codes. And then we will Parse that. That means we search through the whole of the codes and content and extract the individual bits of information that we want. If you are familiar with HTML codes that’s great. It will be helpful as you go along. If you are not familiar with HTML codes don’t worry about it. You will be looking at a screen full of meaningless codes, at times it will look like gibberish or babytalk if you wish. That’s fine, you don’t need to know the details of HTML for our purpose. We don’t care about the codes; we want the info. And to get at that we don’t need to know the meaning of the HTML codes. There is a catch however. Not all information displayed on a website is necessarily contained in HTML codes. Many websites use JavaScript or possibly homemade software. In that case other techniques need to be used to get at the info. Outside the scope of this course I’m afraid. This will limit the number of sites we can access. But don’t worry, there are plenty of sites we can access. A note on the term “JavaScript”. It is totally different from Java as we are learning it. Where we can use Java for anything, JavaScript is limited to websites. You could say it is a programming language inside a webpage or website. It can be used to add some intelligence to a website. Other than the name and a similar syntax it is a totally different language than JAVA. 9.2 URL, (Uniform Resource Locator, or web address) How do we locate specific information on the World Wide Web? We use web addresses that look like this: http://www.punterslounge.com The location of information on the WWW is called a resource. The address used to access a resource is called a locator. When using the above naming convention we talk about an URL, Uniform Resource Locator. There are actually very specific different parts to an address like this but for our purpose we can simple look at the entire name as an address. Usually a web site has a home address, like above, and then different sections or parts can be accessed by expanding that address into more details. We will use the PL as example. Normally when you use the above address you do not see the address change as you go from page to page. This is just the way Ezboard has set it up. If you use a more direct and better visible address for the homepage you can see the address change as you go to different locations on the lounge. This is so ezboard can make changes to their servers without you noticing anything. So the direct address may change over time. Home page: http://p076.ezboard.com/bpuntserslounge This is the URL of the General Chat forum http://p076.ezboard.com/fpuntersloungegeneral And this is the Techy Forum http://p076.ezboard.com/fpuntersloungefrm57 As you go from location to location you should see this address change in your browser. I am using Internet Explorer and I can see the address change in the Address line at the top. This is the URL of the top post in the Techy Forum, the introduction to the course http://p076.ezboard.com/fpuntersloun...icID=119.topic Have a look around, see how the address changes as you move from page to page. This is the full tutorial on networking by Sun: http://java.sun.com/docs/books/tutor...ing/index.html For the SCOLARS i recommend you have a look at the parts on URL http://java.sun.com/docs/books/tutor...rls/index.html For the ROOKIES i recommend just to have a look so you get an idea of the details involved. But don't get into details, for our purpose all we need is a simple address. 9.3 Examining or researching a web page When looking at a web page in a browser you can right-click the mouse and then select the option “View source” . This opens the HTML source in your default text viewer. In my case I use Windows 2000 and Windows 98 with Internet Explorer, I have “View source” as option when I right-click and I can view the HTML source code of the page in Notepad. I can also find the option in the browser's menu at the top : option View and then Source This may be different on other operating systems or when using other browsers. Try it on the page of the Introduction post and see what you get. You should see a lot of HTML codes and as you scroll down to halfway you should be able to recognise the texts from the posts. You can use the search function of Notepad to search for something you know is there. If you have added a post you should be able to find your username and the text of your own post. For example if you search for the name datapunter you get the start of the thread. I have removed empty lines and the far right side of very wide lines to make this more visible. Also i've had to replace the < character with ! and -- with - because otherwise Ezboard would interprete the text as actual HTML code and probably not display the whole post. So this bit may be misleading, look in the Word document you can download from the link at the top of the post for the right text. The items in bold are the actual visible content , the rest is HTML codes. !span class=title> !A HREF=http://b11.ezboard.com/bpunterslounge.showUserPublicProfile? gid=datapunter!Datapunter!/A>!/span>!br> !span class=usertitle>Techy Punter !br> !img src=http://www.myezboard.com/projects/e… src=http://www.myezboard.com/projects/ezboard/ezboard_userimages/puntersloun… src=http://www.myezboard.com/projects/ezboard/ezboard_userimages/puntersloun… src=http://www.myezboard.com/projects/ezboard/ezboard_userimages/puntersloun… src=http://www.myezboard.com/projects/ezboard/ezboard_userimages/puntersloun… src=http://www.myezboard.com/projects/ezboard/ezboard_userimages/puntersloun… src=http://www.myezboard.com/projects/ezboard/ezboard_userimages/puntersloun… Posts: 65!br> !font color=#FF0000>(17/4/04 14:40)!/font>!br> !a href=http://p076.ezboard.com/fpuntersloungefrm57.showAddReplyScreenFromWeb?... !a href=http://p076.ezboard.com/fpuntersloungefrm57.showEditScreen?topicID=119.top ic… !a href=http://p076.ezboard.com/fpuntersloungefrm57.deleteTopicConfirm?topicID=119 .topic… !/td> !td valign=top align=left class=m> !span class=title> !img src="http://www.ezboard.com/images/posticons/pi_smile.gif… !!-EZCODE BOLD START->!strong>!!-EZCODE UNDERLINE START->!span style="text-decoration:underline">!!-EZCODE ITALIC START->!em>!!-EZCODE LINK START->!a href="http://mars.walagata.com/w/datapunter/IntroPost.doc" target="top">Introduction ( Word document )!/a>!!-EZCODE LINK END->!/em>!!-EZCODE ITALIC END->!/span>!!- EZCODE UNDERLINE END->!/strong>!!-EZCODE BOLD END->!br> !br> !!-EZCODE FONT START->!span style="font-size:medium;">!!-EZCODE BOLD START->!strong>Punters Lounge JAVA programming course.!/strong>!!-EZCODE BOLD END->!/span>!!-EZCODE FONT END->!br> !br> This course was set up specifically to teach computer programming in a context of sports betting. In the world of sports betting many punters nowadays use the Internet to collect information and place their bets. This can in some cases take a lot of time. Imagine you want to find the best price available on an event, like a football match. You will need to look at various bookmakers, make notes and compare. And you need to do this every time you want to bet. It is exactly this kind of re-occurring action that takes a lot of time. What takes you hours and hours can be done in a matter of minutes with a bit of programming. That is the goal of this course. To provide you with the basic skill to extract information from the internet and use that for your betting purposes.!br> !br> And so on. See if you can find your way around a bit. If you know HTML codes then that is great. If you don’t all you need to be able to do is find the difference between the codes and the content. After all we don’t care about how the info is displayed, we want the info itself. If you do the course later than may 2004 the threads may no longer exist. So if you are having trouble with this you can use this file. Simply download it to your PC. Use the file as if it was a web address. ( so you can type for example C JAVA\Testpage.html in your browser )9.4 Reading a web page in JAVA To read a web page in JAVA first we create an object for the URL. Then we use an inputstream to read the page. Just like we did reading from a file. To speed things up we use a buffer. 9.4.1 Import the required object libraries import java.io.*; At the top of our program we import the library holding the Input / Output objects. import java.net.*; And at the top we import the library holding the Internet web objects. 9.4.2 Create a URL object URL introthread = new URL(“http://p076.ezboard.com/fpuntersloun...cID=119.topic”); new URL(“http://p076.ezboard.com/fpuntersloun...cID=119.topic”); we create a new URL object and initialise it with the address of the course introduction thread URL introthread = , we declare we want to use the name introthread for an object of the URL type and use the = sign to assign the name introthread to the newly created URL object. This is the description of the URL object http://java.sun.com/j2se/1.4.2/docs/...a/net/URL.html 9.4.3 Create a read buffer BufferedReader readpage = new BufferedReader(new InputStreamReader(introthread.openStream())); introthread.openStream() this is a method inside the URL object , it opens a stream to the web location for reading. new InputStreamReader this is the object that holds the info read using openStream(); new BufferedReader we create a BufferedReader object to buffer the read stream from the InputStreamReader BufferedReader readpage = we declare the name readpage for the BufferReader object And assign the name to our object with the = sign. We can then use the readLine() method from the BufferedReader object to read the web site. This is the object BufferedReader , is the same as used with reading/writing files http://java.sun.com/j2se/1.4.2/docs/...redReader.html And this is the object InputStreamReader http://java.sun.com/j2se/1.4.2/docs/...eamReader.html So a program reading a web address and displaying the contents on the screen looks like this: // import the libraries with the Input / Output objects and the networking objects import java.net.*; import java.io.*; // our program calledLesson9.java public class Lesson9 { // our main method, the words throws Exception are required as part of the error handling public static void main(String[ ] arguments) throws Exception {
String aline; // creation of a URL object containing the address of the web page we want to read URL webadres = new URL("http://p076.ezboard.com/fpuntersloungefrm57.showMessage?topicID=119.topic "); // creation of a BufferedReader object to hold the info read from the web page BufferedReader Readpage = new BufferedReader(new InputStreamReader(webadres.openStream())); /* while we are able to read lines from the page * * the method readLine() from the BufferedReader object is used * to read a line from the webpage, that line is stored in the aline variable */ while ((aline = Readpage.readLine()) != null) { // display the line read on the screen. System.out.println(aline); } // when all lines from the webpage are read close the BufferedReader object and // by closing that object we automatically close the stream to the URL Readpage.close(); } // end bracket of main() method } // end bracket of Lesson9 program As we read the lines from the web page we can manipulate that info just like in lesson 6. For example here i count the number of times the name datapunter is found. int DP = 0; while ((aline = Readpage.readLine()) != null) {
if (aline.indexOf("datapunter") != -1) {
System.out.println("The name datapunter was found "+DP+" times."); Download full program here: Lesson9.java 9.4.4 Extracting the information we want And now we have come to the overall method we will use to get at the information we want. We read a webpage either line by line or all at once in a buffer. We then search for specific pieces of text, sequences of characters based on some manual research beforehand. And from there we locate the actual information we want and extract that. A program to extract information using this method could have an outline something like this: class GetThatInfo { public static void main(String[ ] arguments) throws Exception {
// create a URL object // create a BufferReader object to read from the URL while ( lines left to read from the URL ) {
if ( some info found ) {
// store the info, for example in an array // write the contents of our array to a file } // end of method main() } // end of class GetThatInfo Now this can become as big and complex as you want. You could search a webpage for references to other web pages and automatically read those pages. An application like this is generally known as a Spider. The trick is to do your homework. Make your functional design, make your technical design. Give these the time they need, it will save you tremendous along the way. Then start building your application bit by bit. Start with small workable programs that cover a single part of the whole application. Then as you get more and more individual parts working, bring it al together in a single application. Again, this is not the right way to do it, this is not the only way to do it, it's simply the way i usually do it. You will have to find what works for you yourself. But I think I’ve outlined a good starting point. To complete this lesson here is an application reading info from the BBC web site and storing that info in a file. That file can be read directly by Excel. The info we grab is the current table for the England Premiership Football League. Functional design: Download the Premiership League Table so it can be used in Excel Technical design: By manually browsing the bbc website I found the table is located on this page: http://news.bbc.co.uk/sport1/hi/foot...em/default.stm By looking at the source code for that page I found these strings are unique and can be used to find and extract the info I want: String to identify the start of the table Barclays Premiership table String in which each team is located /sport/hi/english/football/teams/a/arsenal/default.stm">Arsenal Each team is followed by 3 lines like this, where the third line contains the points collected td class="fsb" align="right">0< / td> We could search for exactly 20 teams or use this string to identify the end of the table: < / table > To get the info into Excel we will need a CSV file, ( comma separated value file format ), listing the team name and the scored points separated by a comma. I will call the file Premier.csv and the contents looks like this: Arsenal,12 Aston Villa,10 Birmingham,5 And so on. Download full program here : GetPremier.java // import the classes containing the internet class objects import java.net.*; import java.io.*; // ************************************************** ********** class GetPremier { // ************************************************** ********** // main method public static void main(String[] args) throws Exception { // ************************************************** ********** // create a buffered reader object for the page at the BBC URL tablepage = new URL("http://news.bbc.co.uk/sport1/hi/football/eng_prem/default.stm"); BufferedReader ReadPage = new BufferedReader(new InputStreamReader(tablepage.openStream())); // stringbuffer to hold the whole page StringBuffer WholePage = new StringBuffer(100000); // create the object for the output file BufferedWriter WriteTable = new BufferedWriter(new FileWriter(new File("Premier.csv"))); // ************************************************** ********** // read the whole page into the stringbuffer String OneLine = ""; while ((OneLine = ReadPage.readLine()) != null ) {
} ReadPage.close(); // ************************************************** ********** int start = 0; int end = 0; String team = ""; String points = ""; // find the start of the table start = WholePage.indexOf("Barclays Premiership table"); // ************************************************** ********** // loop 20 times, once per team for ( int t = 0 ; t < 20 ; t++ ) { // find a start position of the first team start = WholePage.indexOf("/sport/hi/english/football/teams",start); // extract the teams name start = WholePage.indexOf(">",start); end = WholePage.indexOf("<",start+1); team = WholePage.substring(start+1,end); // extract the teams points start = WholePage.indexOf("td class=",end); start = WholePage.indexOf("td class=",start+1); start = WholePage.indexOf("td class=",start+1); start = WholePage.indexOf(">",start+1); end = WholePage.indexOf("<",start+1); points = WholePage.substring(start+1,end); // write a line to the output file WriteTable.write( team + "," + points ); WriteTable.newLine(); System.out.println(team + "," + points); } // end of the for loop // close the output file. WriteTable.close(); } // end of main() method } // end of class definition On the previous pages is the listing of the program. Not that big a deal is it ? Almost fits on 1 page. After you run this program you get a file called Premier.csv that you can open with Excel. As you look at the program notice how you need to shift the position in the string, start and end, to find what you want. Use indexOf() to find a position then keep shifting that position as you go along. If you are having difficulties with this, what you need to do is simply write it out on paper and do the actions manually, until you get the hang of it. Imagine this string where you want to extract ABC. I’ve written out each single step, you can follow the program using the line numbers. code unique code >ABC< code ( note: if you are not using Word to read this the text may be unreadable or the columns may not be spaced equally, just write the text out yourself and count the characters starting at 0 ) Program parts: .1 start = String.indexOf(“unique”); // start search at position 0, start becomes 5 .2 start = String.indexOf(“>”,start); // start search at 5 , start becomes 17 .3 end = String.indexOf(“<”,start+1); // start search at 18 , end becomes 21 .4 ABC = String.substring(start+1,end); // substring starts at 18 and ends at 21 There are no more assignments. This is almost it. Only one more short lesson to go and then we start building applications. At this point have a look at some of the sites you visit on a regular basis. See if they use HTML and if you can use this method to get at the info. |
| |
| Free £100 Bet! | Free £100 Bet! |
| Partner Sites |