DateTime::Event::Predict::Profile could be a subclass that defines bucket and weight profiles for certain types of predictions. For
instance, there could be a profile for log files (or different types of log files), for weather events, etc.
There could be a method similar to predict that you would give the expected prediction to. If the method does not predict that date
then maybe it could figure out what tinkering of weights (or other options) would be required to produce that prediction. *This would
probably be very hard.
Maybe one way to do comparisons would be to take each date and do a diff between it and the dates before and after it (if there are any), then do
some kind of filtering based on that.
NOTE: Okay, so we calculate all the poisson probabilities for the intervals that we want to predict for (days, months, etc), and then finding the best
fits (highest probabilities) we find the next date for that interval (add the interval onto the most recent date) and match it against the buckets to
see if it's a fit on any of them. Then we can add weights to the distinct buckets so that certain matches can be preferred over others.
IDEA: Create a custom accessor for "Week-segment", or even "work-week-segment", which would split the week up into parts, like Mon-Wed would be
beginning of work week, Tues-Thurs would be middle of work week, and Wed-Fri would be end of work week. Some dates would of course end up in multiple
places (i.e. Tues is both beginning and middle) however I think regression would allow a best-fit line to be made.
*IDEA*: Maybe we can use distribution math for all the buckets, both distinct and interval, and use the standard deviation
of the values for each bucket to sort them (and maybe provide weights?), and then scan for new dates. We could also use the
variance of each scanned new date to determine if it falls within the margin of error, i.e. if we're search for the next date
of Easter and the standard deviation for "day_of_week" is 0, because all of them are on Sunday, then we know that any date
offered as a prediction but fall within that standard deviation, i.e. 0, and therefore MUST be Sunday.
NOTE: Predicting Easter completely accurately will be impossible because it depends on the moon phase. Without taking that
into account (possible?) we won't get an accurate date
IDEA: It should be possible to pass in any (reasonable) number n of arbitrary attributes with which to identify dates. We just add them
on to the dimensions we already have in order to operate over them.
IDEA: For finding a beginning search point in the future we can optionally prevent searching before the current date, so if,
say, there was an equal possibility of a predicted date being in a near-current cluster or a future cluster, if that current
cluster occured in the past and the option was set then the date would HAVE to be in the new cluster.
IDEA: If we assume that for any date-part, the data points follow a normal distribution, then for any given prediction
(supplied by the module or by the end-user) we can calculate the probability that that date part is correct (by its standard
deviation), and then for all the date parts we can use Bayes Theorem to combine their probabilities to determine the
complete probability for that particular date.
SUPPLEMENT: It's also possible that we could combine the probabilities of multiple predicted dates to provide the overall probability
of a field of date predictions.
NOTE: Actual cluster centroids (and comparisons to them) will have to defined through hires epoch time, but that should be OK.
NOTE: By doing this tiered search where we go through each incremental possible date we are probably going to end up with a lareg
number of predictions if the standard deviations are of any decent size, although that depends on the enabled buckets.