| Free £25 Bet! | Free £50 Bet at VCBet! | Free £25 Bet! |

In association with Sports-Punter Free Bets Odds Comparison BetHelp Limso
We are the Official Forum of FreeBetting.net & FCBet.com
| Sports News | Sports Stats | Live Scores | OddsChecker | Place Bets | Suggest a Site |
| |||||||
| Systems & Strategy Forum Discuss all your strategies, systems, selection methods and staking plans here. Try and keep your match selections to the other forums. |
| Free £25 Bet at Jaxx! |
![]() |
| | Thread Tools | Display Modes |
| | #1 (permalink) |
| Newbie Punter Join Date: 13 Nov 2007
Posts: 23
| ive got a dataset of of 8120 matches and i want to predict for an away win. im building a logistic model and there are 2339 away wins in my dataset. therefore my other 2339 should consist of home winds and draws. but how do i choose the split? should it be 2339 away wins and 1170 draws and 1170 home wins? please help thanks |
| | |
| | #2 (permalink) |
| Mens Doubles Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 21 Dec 2003 Location: Newcastle upon Tyne Age: 24
Posts: 10,003
| I don't understand. Why do you need the same amount of homes/draws as aways? ![]()
__________________ I use statistics much as a drunken man uses lamp-posts - as support rather than illumination. (Andrew Lang) Everyone thought Einstein was crazy until he started kicking ass. |
| | |
| | #4 (permalink) |
| Mens Doubles Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 21 Dec 2003 Location: Newcastle upon Tyne Age: 24
Posts: 10,003
| As you've got a pretty decent number of games I'd start by using half of the data for training and half for validation.
__________________ I use statistics much as a drunken man uses lamp-posts - as support rather than illumination. (Andrew Lang) Everyone thought Einstein was crazy until he started kicking ass. Last edited by Mr Intensity; 21-01-2008 at 23:29. |
| | |
| | #5 (permalink) |
| Newbie Punter Join Date: 13 Nov 2007
Posts: 23
| the whole point of doing logistic regression is that ur "goods" are the same volume as your "bads" so that there is no bias. I'm just wondering whether in football modelling u should consider this standard statistical practice or include everybody in ur sample?? in a season, on average, there are 50% home wins and the other 50% is made up of draws and away wins. my question is whether ur sample should model all observations even thogh there may be a bias. or u should evenly split out the population so that ur dealing with equal volumes. |
| | |
| | #6 (permalink) | |
| Mens Doubles Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 21 Dec 2003 Location: Newcastle upon Tyne Age: 24
Posts: 10,003
| Quote:
![]()
__________________ I use statistics much as a drunken man uses lamp-posts - as support rather than illumination. (Andrew Lang) Everyone thought Einstein was crazy until he started kicking ass. Last edited by Mr Intensity; 22-01-2008 at 11:11. | |
| | |
| | #7 (permalink) |
| Junior Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 30 Oct 2004
Posts: 12,352
| If you artificially force your samples of "bads" to have 50% home wins and 50% draws, then you'll be introducing much more of a bias. Since home wins are actually more frequent than draws, you'll probably be heavily biasing the "bads" in favour of factors that correlate with the home team doing badly. I know what logistic regression is about, more or less, though I don't know much about the nuts and bolts. But I don't understand why you need the samples of goods and bads to have the same size? |
| | |
| | #8 (permalink) | |
| Junior Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 30 Oct 2004
Posts: 12,352
| Quote:
"good" varies when you have knowledge of other factors? So fixing the total sample so that the overall probability is 0.5 doesn't necessarily prejudge the answer. | |
| | |
| | #9 (permalink) | |
| Mens Doubles Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 21 Dec 2003 Location: Newcastle upon Tyne Age: 24
Posts: 10,003
| Quote:
. Setting the sample with 50% home wins is wrong but for the reasons you've stated.
__________________ I use statistics much as a drunken man uses lamp-posts - as support rather than illumination. (Andrew Lang) Everyone thought Einstein was crazy until he started kicking ass. | |
| | |
| | #11 (permalink) |
| Mens Doubles Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 21 Dec 2003 Location: Newcastle upon Tyne Age: 24
Posts: 10,003
| No, then you're forcing the data, which is bad. Do as the thread title says - choose randomly.
__________________ I use statistics much as a drunken man uses lamp-posts - as support rather than illumination. (Andrew Lang) Everyone thought Einstein was crazy until he started kicking ass. |
| | |
| | #12 (permalink) |
| Newbie Punter Join Date: 13 Nov 2007
Posts: 23
| ok so what should be my "good" and "bad" outcomes?? i still dont get what the splits should be. lets take an example. say i want to model the probability of a home win and my data set size is 8000. 4000 are home wins, 2000 are draws and 2000 are away wins. could you possibly explain to me how i would build a logistic model based on the above info?? thanks |
| | |
| | #13 (permalink) |
| Mens Doubles Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 21 Dec 2003 Location: Newcastle upon Tyne Age: 24
Posts: 10,003
| What software are you using? Easiest way to do it is to take your data, order it by date, take the first 4000 results and use that as your training data. You need to decide which factors you want to include. This is easy. I'd start by including everything you might want to include. You then want to create your model using the software and do an Analysis of Deviance. Your software should add terms sequentially, so you have a forward stepwise approach and can do chi-squared tests to get a P-value and use hypothesis tests to determine which factors to keep in. Then when you have decided which factors to keep in you can run the model again using different link functions to determine which is the best. That should give you a model to start with. Then you can start messing around and use the testing data ![]() Sorry if that's a bit patronizing, from you're posts not sure how much you know ![]()
__________________ I use statistics much as a drunken man uses lamp-posts - as support rather than illumination. (Andrew Lang) Everyone thought Einstein was crazy until he started kicking ass. |
| | |
| | #14 (permalink) |
| Newbie Punter Join Date: 13 Nov 2007
Posts: 23
| thanks very much for that advice mr intensity. im actually using SAS. so what would be my target variable? and how would you define the target variable? as in what would the "1" represent and what would "0" represent?? |
| | |
| | #15 (permalink) |
| Mens Doubles Punter ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: 21 Dec 2003 Location: Newcastle upon Tyne Age: 24
Posts: 10,003
| Replyed on msn ![]()
__________________ I use statistics much as a drunken man uses lamp-posts - as support rather than illumination. (Andrew Lang) Everyone thought Einstein was crazy until he started kicking ass. |
| | |
| Free £100 Bet! | Free £100 Bet! |
| Partner Sites | ||||||||||
| Football Betting Tips | Australian Free Bets HOT | Free Bets HOT | Odds Comparison | Soccer Punter |
| Bookmakers | Livescore | SoccerVista | Asian Handicap Betting Guide | Euroleague Betting Picks |
| Soccer Picks |
© 2008 PuntersLounge.Com Ltd | Gambling Problems?
Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.