Teams / Mira Fourier Coefficient Team / Dataset pre-processing Considerations

Dataset pre-processing Considerations


You are viewing a wiki page. You are welcome to join the no group and then edit it.

This page contains dataset pre-processing consideration and guidelines, in some cases quoted from Google Group messages or excerpts thereof.
 
tamino, 17 Sep 2010: "Stars don't have to have perfect light curves.  I suggest we limit ourselves to data since JD 2,430,000 *, so there's no need to edit before that.  I also suggest we limit ourselves to data before JD 2,455,000*, so there's no need for editing to be perfectly up-to-date.  That's a reasonable cutoff date, which includes modern data but doesn't require us constantly to update our results as new data come in."

*Note: date range amended below

tamino, Sep 17 2010: "Everyone is welcome to examine the light curves of Miras which are on our list, and visually estimate whether they're "ready to go" or "in need of editing".  Don't neglect to look at the light curve on several time scales (blocks of 5000 days might be good, try other lengths if you think that will be revealing)."

tamino, 18 Sep 2010: "It's a good idea for the keeper of the list to create a column to indicate who's selected a given star to work on.  The "status" column also obviously needs more possible entries.  These can include:
    ready-to-go
    needs-editing
    probably ready-to-go but not sure (so someone else should offer a 2nd opinion)
    probably needs-editing (so someone else should offer a 2nd opinion)
    I-haven't-a-clue-so-I'd-like-a-2nd-opinion---Help!"

tamino, 18 Sep 2010: "It's not unlikely that most of the stars are in good shape, editing-wise.  I base this on the fact that Templeton et al. did that big study of period changes in Miras, which would have required well-edited data to work with."
 
tamino, 18 Sep 2010: "I'm now leaning toward using only the data from 2,440,000 to 2,455,000"
(2440000..2455000) This will someone lighten the workload but will not reduce the number of stars in the sample.  It may also make the sample more homogeneous, since probably most (if not all) of our targets will have good data coverage during that time interval.  If others feel differently, do say so!  Remember, I'm learning how to do this project as we go along just as you all are."

tamino, 18 Sep 2010: "It might also be good for people to claim a constellation and commit to examining all the stars in that constellation.  This will make the division of labor easier to do, and easier to keep track of."

tamino, 18 Sep 2010: "…it's a very good idea just to examine the light curves visually, to get a "feel" for how they behave.  Take note, as you go along, of those that seem to show unusual behavior like bumps, humps, secondary maxima/minima, visually obvious period/amplitude changes, etc.  Don't feel out of your depth if you spend a fair amount of time just "getting to know" these light curves.  The more you look at them, the more you'll learn -- and you may notice something important and useful."

jbedient, Sep 19 2010: "I've found it only takes a few minutes once you get used to it to go through a star's data with Zapper and mark points as discrepant.  Most of the older data, before 2448000 or so is pretty clean, only later do you see any bad data. I might suggest a double pass method, with two of us vetting the data for each star.  More work, but probably gives a more consistent product."

david_benn, Sep 19 2010: "Is there some minimum number of datapoints below which we would either exclude an object or broaden the JD  range?"

tamino's reply: "Yes and no.  There is a minimum data density, but if that's not met we should remove the target from our list rather than broaden the JD range -- because we'll be working with time intervals of fixed length."

Older data

I have been noticing as I work through some stars that despite what I say about older data there still un-validated discrepant points in the older stuff that have come in as other databases have been imprted to the AAVSO database. Look carefully regrdless of data age.

importing other datasets

Yes, blind data import was a major mistake of procedure. If the need was felt to own all visual data extant in the World, it should have been stored in its own separate tables, which would even allow flagging of source material on plots, in lightcurve generators, and in downloaded datasets. In fact, information from the data is thrown away by blind import. Then an analyst could merge the data at their own preference, using their own preferred means, until the archivists had time and expertese to merge the data properly. This could be potentially impossible now if discriminants have been lost and there is no longer any ready information with which to distinguish the imported data from the AAVSOsequence data. Provenance is not just a politeness game, it is an important characteristic and property of information and data in the practice of scientific investigation. John

Thanks David

For collating all these comments. Very useful!

And thanks for...

...amending the date range. :)

mathematical algorithms for windowed fourier techniques

If you examine the introduction and methodology sections of this paper http://articles.adsabs.harvard.edu/full/2001MNRAS.325.1383H you'll find application of windowed fourier techniques made publicly available via a formulation, and that has also been "tested in anger" upon Miras and multiperiodic semiregular stars as defined and applied by John Howarth of the BAA. This practical use of such (his first interest was in trying to untangle the horribly messy lightcurve of the multiperiodic semiregular variable W Cygni) by John H. over the years, in tandem with an industry application he had of the method, meant he had experience of the algorithms in terms of practical data manipulation. Comments on windowing and on filtering are included for instance, but John was the mathematician, not I, so I can not speak authoritatively on such matters. Outliers and occasional long gaps were in fact not a problem in practice, as long as the gaps were not many in number (one or two) and flanked by long runs of data, the analysis tended to pick up where it left off (although there could be a jump between phase "zeropoint" before and after a large gap). Evidently messy data inbetween time was sparse, and could be identified and removed and/or flagged if preferred by comparing phase over time against the raw magnitude over time lightcurve. The basic windowing technique in this form is quite simple, although not a programmer Iwas able to convert this algorithm into a form of MS Basic many years ago, despite the fact that the latter was heavily trigonometric functionally challenged. It should be fairly amenable to inclusing in VSTAR. You will no doubt recognise a lot of the formulae from your earlier VSTARwork. In the above noted paper up to section 3, all prior to section 3, is purely John Howarth's work, outlining his procedure. He did the maths at that time, I'd do the practical application. However he'd also chip in on the conclusions and any discussion sections especially when mathematical interpretation of results was essential. Phase over time derived in this way plots the same, but with inverted Yaxis, as O-C, except it gives a more objectively appraisable plot, and a tighter and smoother plot. This is for a quite simple reason, O-C tracks the migration of a special, unique, somewhat subjectively measured, point upon a lightcurve, ie the time of maximum (or minimum, or whatever other predefined point is being used). Fourier deconvolution of a lightcurve tracks the migration and evolution of the _whole_ lightcurve. At core John tended to take a representative period (usually the main period derived via DFT) as a test period, and track the phase and amplitude evolution of the lightcurve (from the JD and magnitude data) relative to the midpoint of the data (this is from memory). The test period does not even have to be accurate, but the less accurate it is the more useless the results, and the more accurate it is the more direct conclusions about the real star can be made and given. Remember, as a cardinal rule, that for both O-C and fourier element evolution over time, _slope_ from the horizontal in any direction, merely means that the period at that time was different, the steepness of the slope correlates to the size of the difference, and the direction with it was a smaller or longer period. IT DOESNOT mean the period was changing continually. An arc, sometimes parabolic, but certainly well defined, is what results when there is an actual rate of change of period away from the input test period. A horizontal line, albeit often with some scatter and waggle for MIras (usually to not too large an amount), represents a static period close indeed to the test period. John

Mira Fourier Coefficient Team

My Teams

Not a member of any groups.

Powered by Drupal