Off Topic
Don\
Welcome! Log In Register

Advanced

For Grant

Posted by PAddy 
PAddy
Patrick McVeigh
Super Moderator
Location: Toronto, ON
Join Date: 12/21/2005
Age: Midlife Crisis
Posts: 358

Rally Car:
Student Loans



For Grant
May 22, 2007 08:11PM
Grant:

I have a cohort of ~60 patients who have had quantitative MRI's done with ROI's defining tissues of interest on all of them. From this ROI I get a spectrum of values (~1000 pixels per patient), the distributions of which appear to be non-normal (Lilliefors test).

* I want to get a measure of how 'similar' each patient's distribution is to every other patients, to allow a grouping togather of similar patients based on some metric. I am presently using a two-sample Kolmogorov-Smirnov test as a measure of 'similarity', but don't know if this is the best test for distributions of this size. Am I going to have a bias as a result of having such a large number of points for K-S, and is there a fancier statistic I should consider? I tend to prefer using what MATLAB or SPSS has built in....
Please Login or Register to post a reply
Tom B
Tom B
Infallible Moderator
Location: Douche Canoe, WA
Join Date: 02/27/2006
Age: Midlife Crisis
Posts: 780

Rally Car:
VW Golf



Re: For Grant
May 22, 2007 08:20PM
PAddy Wrote:
-------------------------------------------------------
> Smirnov test as a measure of
> 'similarity'

I can help with this part



-Tom
DemonRallyTeam | Fine Tuning | CTS Turbo & RP Turbos | RalleyTuned | JRM | Meister Autowerks
Spitfire EFI | Product Apparel | JVAB Imports | NLS | AP Tuning | USRT

Add us on Facebook | Next Event: 2013 Olympus Rally June 22-23 Olympia, WA
Please Login or Register to post a reply
PAddy
Patrick McVeigh
Super Moderator
Location: Toronto, ON
Join Date: 12/21/2005
Age: Midlife Crisis
Posts: 358

Rally Car:
Student Loans



Re: For Grant
May 22, 2007 09:46PM
Tom B Wrote:
-------------------------------------------------------
> I can help with this part

Ummmmmmmmmm.........fire away?





Edited 1 time(s). Last edit at 05/22/2007 09:48PM by PAddy.
Please Login or Register to post a reply
Tom B
Tom B
Infallible Moderator
Location: Douche Canoe, WA
Join Date: 02/27/2006
Age: Midlife Crisis
Posts: 780

Rally Car:
VW Golf



Re: For Grant
May 23, 2007 08:43AM


you are providing the testing suppllies, riiiiight?


PS. I think you spelled Smirnoff wrong up there winking smiley





-Tom
DemonRallyTeam | Fine Tuning | CTS Turbo & RP Turbos | RalleyTuned | JRM | Meister Autowerks
Spitfire EFI | Product Apparel | JVAB Imports | NLS | AP Tuning | USRT

Add us on Facebook | Next Event: 2013 Olympus Rally June 22-23 Olympia, WA
Please Login or Register to post a reply
Josh Wimpey
Josh Wimpey
Super Moderator
Location: VA
Join Date: 12/27/2006
Age: Midlife Crisis
Posts: 649

Rally Car:
Sneak the Golf


Re: For Grant
May 23, 2007 08:51AM
If you are interested in grouping patients together accross some metrics, it might be useful to start with some simple PCA, factor analysis, or clustering. THis won't give you a measure of how similar the patients are, but it will allow you to identify the unique charachteristics/factors that they cluster on.



____________________________________________________________-

One. Class -- 2WD

www.quantumrallysport.com

http://www.facebook.com/home.php?#/pages/Quantum-Rally-Sport/281129179600?ref=nf
Please Login or Register to post a reply
NoCoast
Grant Hughes
Ultra Moderator
Location: Whitefish, MT
Join Date: 01/11/2006
Age: Midlife Crisis
Posts: 6,818

Rally Car:
BMW



Re: For Grant
May 23, 2007 09:58AM
PAddy Wrote:
> I have a cohort of ~60 patients who have had
> quantitative MRI's done with ROI's defining
> tissues of interest on all of them. From this ROI
> I get a spectrum of values (~1000 pixels per
> patient), the distributions of which appear to be
> non-normal (Lilliefors test).
>
> * I want to get a measure of how 'similar' each
> patient's distribution is to every other patients,
> to allow a grouping togather of similar patients
> based on some metric. I am presently using a
> two-sample Kolmogorov-Smirnov test as a measure of
> 'similarity', but don't know if this is the best
> test for distributions of this size. Am I going
> to have a bias as a result of having such a large
> number of points for K-S, and is there a fancier
> statistic I should consider? I tend to prefer
> using what MATLAB or SPSS has built in....

Lilliefors is based on K-S and you should be mostly okay there.
Is order important?
What's the range of potential values.
K-S I believe is fairly robust to large sample sizes. It tends to weight towards the center of the distributions and ignore tails, but the most common test to account for that doesn't handle large sample sizes well. I have a conference call in 3 minutes, but I'll post back some more later.
Have any snapshots of data or distribution of data you could email me?



Grant Hughes
Please Login or Register to post a reply
PAddy
Patrick McVeigh
Super Moderator
Location: Toronto, ON
Join Date: 12/21/2005
Age: Midlife Crisis
Posts: 358

Rally Car:
Student Loans



Re: For Grant
May 23, 2007 10:11AM
I have already done k-means on the overall dataset (ie, all participant values pooled into one big distribution) and got some significant results. I am now interested in seeing if the individual patient distributions can be correlated to outcome data - hence my searching for a metric to judge how 'similar' they are.


Josh Wimpey Wrote:
-------------------------------------------------------
> If you are interested in grouping patients
> together accross some metrics, it might be useful
> to start with some simple PCA, factor analysis, or
> clustering. THis won't give you a measure of how
> similar the patients are, but it will allow you to
> identify the unique charachteristics/factors that
> they cluster on.


Please Login or Register to post a reply
PAddy
Patrick McVeigh
Super Moderator
Location: Toronto, ON
Join Date: 12/21/2005
Age: Midlife Crisis
Posts: 358

Rally Car:
Student Loans



Re: For Grant
May 23, 2007 10:19AM
* Order shouldn't be important, I want to construct a dissimilarity table so the expectation will be that it is symmetric with a trace of zero. In true scientific fashion I have already done this and am working ahead, assuming some statistician can back me up later on...

* The range of values is from -1000 to 3000, with 95% of the data distributed on the range (1000,1400). Because of the size of the distributions (~1000 points), I thought KS would be better than Anderson-Darling or similar. I will see about posting a histogram...

As above, end goal is a measure of 'dissimilarity' between patient's distributions so I can group togather similar patients and see how those groupings correlate to things like disease-free survival time, tumor size etc.


NoCoast Wrote:
-------------------------------------------------------
> Lilliefors is based on K-S and you should be
> mostly okay there.
> Is order important?
> What's the range of potential values.
> K-S I believe is fairly robust to large sample
> sizes. It tends to weight towards the center of
> the distributions and ignore tails, but the most
> common test to account for that doesn't handle
> large sample sizes well. I have a conference call
> in 3 minutes, but I'll post back some more later.
> Have any snapshots of data or distribution of data
> you could email me?


Please Login or Register to post a reply
NoCoast
Grant Hughes
Ultra Moderator
Location: Whitefish, MT
Join Date: 01/11/2006
Age: Midlife Crisis
Posts: 6,818

Rally Car:
BMW



Re: For Grant
May 23, 2007 10:37AM
PAddy Wrote:
-------------------------------------------------------
> * Order shouldn't be important, I want to
> construct a dissimilarity table so the expectation
> will be that it is symmetric with a trace of zero.
> In true scientific fashion I have already done
> this and am working ahead, assuming some
> statistician can back me up later on...
>
> * The range of values is from -1000 to 3000, with
> 95% of the data distributed on the range
> (1000,1400). Because of the size of the
> distributions (~1000 points), I thought KS would
> be better than Anderson-Darling or similar. I
> will see about posting a histogram...
>
> As above, end goal is a measure of 'dissimilarity'
> between patient's distributions so I can group
> togather similar patients and see how those
> groupings correlate to things like disease-free
> survival time, tumor size etc.

So you want to use the 'grouping' as a predictor? Or do you want to each patient's distibution and parameters of the distribution as a predictor?

So far what you have told me it sounds like you aren't making any obvious errors or doing anything blatantly incorrect. I'll think about it more later today.



Grant Hughes
Please Login or Register to post a reply
PAddy
Patrick McVeigh
Super Moderator
Location: Toronto, ON
Join Date: 12/21/2005
Age: Midlife Crisis
Posts: 358

Rally Car:
Student Loans



Re: For Grant
May 23, 2007 11:04AM
NoCoast Wrote:
-------------------------------------------------------
> So you want to use the 'grouping' as a predictor?
> Or do you want to each patient's distibution and
> parameters of the distribution as a predictor?

Yes, I want to measure how 'similar' each patient's distribution of values is to every other ones', group them togather based on that, then use those groupings (and not necessarily the parameters of the group) to do something like Cox regression to correlate with clinical factors - ie, test how strongly being lumped togather with all the hypoxic tumor cases affects your survival at 5 years post radiation.

I tend to just design studies that use t-tests and Cox regression, so I thought I'd ask someone who actually knows more than 2 statistical tests about it...thanks!
Please Login or Register to post a reply
Josh Wimpey
Josh Wimpey
Super Moderator
Location: VA
Join Date: 12/27/2006
Age: Midlife Crisis
Posts: 649

Rally Car:
Sneak the Golf


Re: For Grant
May 23, 2007 12:00PM
PAddy Wrote:

> Yes, I want to measure how 'similar' each
> patient's distribution of values is to every other
> ones', group them togather based on that, then use
> those groupings (and not necessarily the
> parameters of the group) to do something like Cox
> regression to correlate with clinical factors -
> ie, test how strongly being lumped togather with
> all the hypoxic tumor cases affects your survival
> at 5 years post radiation.
>
> I tend to just design studies that use t-tests and
> Cox regression, so I thought I'd ask someone who
> actually knows more than 2 statistical tests about
> it...thanks!

You could just use the mean and varaince of each patient's distribution as regressors in your Cox regression. I would also include random effects (shared frailty) in the model's specification. Don't know how to do it in SPSS but if you wan't some simple stata code, I can post it here.






____________________________________________________________-

One. Class -- 2WD

www.quantumrallysport.com

http://www.facebook.com/home.php?#/pages/Quantum-Rally-Sport/281129179600?ref=nf
Please Login or Register to post a reply
NoCoast
Grant Hughes
Ultra Moderator
Location: Whitefish, MT
Join Date: 01/11/2006
Age: Midlife Crisis
Posts: 6,818

Rally Car:
BMW



Re: For Grant
May 23, 2007 12:23PM
I had that thought, but not knowing how the distributions actually look, there could be errors with that. Of course, then we get into errors with the K-S test and tails. Two patient distributions could theoretically the same mean and variance but one could be right tailed, while the other is left tailed.

The other thing is that you are talking about a large number of pairwise comparisons and post-hoc manipulations to some extent. By some manipulative grouping you could effectively force significance depending upon your testing parameters.

Still thinking and consulting a few texts...



Grant Hughes
Please Login or Register to post a reply
PAddy
Patrick McVeigh
Super Moderator
Location: Toronto, ON
Join Date: 12/21/2005
Age: Midlife Crisis
Posts: 358

Rally Car:
Student Loans



Re: For Grant
May 23, 2007 12:47PM
Thanks, I'm just hesitant to go to a 2-number description of the individuals when the vast majority of them have non-normal distributions. This is why I was leaning towards a test to see if the two samples came from the same distribution or not, thinking it would better account for different patients leaning a different way from the norm.

Ultimately my entire sample is drawn from people with the same underlying root condition, so I would expect the majority of distributions to be around some tight range (call it cancer) of measured values. The task then is to see if this specific measurement can be of any use in deliniating subpopulations within this group (cancer + necrosis may lean towards larger values, cancer + angiogenesis may lean towards smaller ones). I happen to have a lot of experience doing cluster analysis (usually hyperspectral, not this univariate stuff), and it seemed like an OK way to proceed in investigating the data...




Josh Wimpey Wrote:
-------------------------------------------------------
> You could just use the mean and varaince of each
> patient's distribution as regressors in your Cox
> regression. I would also include random effects
> (shared frailty) in the model's specification.
> Don't know how to do it in SPSS but if you wan't
> some simple stata code, I can post it here.



Please Login or Register to post a reply
PAddy
Patrick McVeigh
Super Moderator
Location: Toronto, ON
Join Date: 12/21/2005
Age: Midlife Crisis
Posts: 358

Rally Car:
Student Loans



Re: For Grant
May 23, 2007 12:56PM
Yah, unfortunately using this kind of grouping eliminates most types of tests of significance for the raw clustering results (a t-test for differences between cluster means will almost always return a meaningless 0.0001 since it is maximizing separation by definition).

My thinking was that by keeping this clustering/measurement blinded to the other outcome variables, then running a regression later on to see if they results correlated with anything, I wouldn't be forcing the false significance of the clusters on the regression. Unless the results mean anything, they should just be clutter in the regression's way...



NoCoast Wrote:
-------------------------------------------------------
> The other thing is that you are talking about a
> large number of pairwise comparisons and post-hoc
> manipulations to some extent. By some
> manipulative grouping you could effectively force
> significance depending upon your testing
> parameters.


Please Login or Register to post a reply
Josh Wimpey
Josh Wimpey
Super Moderator
Location: VA
Join Date: 12/27/2006
Age: Midlife Crisis
Posts: 649

Rally Car:
Sneak the Golf


Re: For Grant
May 23, 2007 01:18PM
PAddy Wrote:
-------------------------------------------------------
> Thanks, I'm just hesitant to go to a 2-number
> description of the individuals when the vast
> majority of them have non-normal distributions.
> This is why I was leaning towards a test to see if
> the two samples came from the same distribution or
> not, thinking it would better account for
> different patients leaning a different way from
> the norm.


Then you could make it more than 2-number description. Particularly if this is simply exploratory analysis.

Include Mean, variance, skewness, and kurtosis as regressors.

Or, use quintiles or deciles to capture the shape of the distribution. Or use the distance from decile to mean for each decile. ---I often use a variation on these last 2 tricks to evaluate the validity of probit or logit results where it is not possible to show that you have indeed found a global maixima instead of a local one.

I have a decidely KISS approach to statistics :-0







____________________________________________________________-

One. Class -- 2WD

www.quantumrallysport.com

http://www.facebook.com/home.php?#/pages/Quantum-Rally-Sport/281129179600?ref=nf
Please Login or Register to post a reply
Sorry, only registered users may post in this forum.

Click here to login