and see if T actually has a T -distribution with n + m - 2 degrees of freedom. Also calculate the value of " Z" ,
.
How do the sizes of n and m affect the distribution of T ? How important is it for the distribution variances to be equal for T to actually have a t-distribution with n + m - 2 degrees of freedom?
> randomize():
>
N := 500;
n := 6;
m := 18;
VarX := 1;
VarY := 36;
>
for k from 1 to N do
X := NormalS(0, VarX, n):
Y := NormalS(0, VarY, m):
TT[k] := evalf((Mean(X) - Mean(Y))/sqrt(((n-1)*Variance(X) + (m-1)*Variance(Y))/(n + m - 2)*(1/n + 1/m))):
ZZ[k] := evalf((Mean(X) - Mean(Y))/sqrt(Variance(X)/n + Variance(Y)/m)):
od:
T := [seq(TT[j], j = 1 .. N)]:
Z := [seq(ZZ[j], j = 1 .. N)]:
>
MedianofTs := Median(T);
MedianofZs := Median(Z);
MeanofTs := Mean(T);
MeanofZs := Mean(Z);
SDofTs := StDev(T);
SDofZs := StDev(Z);
Min(T), Max(T);
Min(Z), Max(Z);
> display({BoxPlot(T, Z)}, title = `Box Plots for Z (above) and T (below)`, titlefont=[TIMES,BOLD,16]);
>
xtics := [seq(-3.5 + k*0.5, k = 0 .. 14)]:
ytics := [seq(0.1*k, k = 1 .. 11)]:
>
P1 := HistogramFill(T, -3.5 .. 3.5, 28):
P2 := plot(TPDF(n+m-2, x), x = -3.5 .. 3.5, y = 0 .. 0.75, color=black, thickness=2, xtickmarks=xtics, ytickmarks=ytics, labels=[``,``]):
display({P1, P2}, title = `Histogram of T Observations\nwith T(n+m-2) Superimposed`, titlefont=[TIMES,BOLD,16]);
>
Percentile(T, 0.05), Percentile(T, 0.95);
TP(n+m-2,0.05), TP(n+m-2,0.95);
>
P3 := HistogramFill(Z, -3.5 .. 3.5, 28):
P4 := plot(NormalPDF(0, 1, x), x = -3.5 .. 3.5, y = 0 .. 0.45, color=black, thickness=2, xtickmarks=xtics, ytickmarks=ytics, labels=[``,``]):
display({P3, P4}, title = `Histogram of Z Observations\nwith N(0, 1) p.d.f. Superimposed`, titlefont=[TIMES,BOLD,16]);
>
Percentile(Z, 0.05), Percentile(Z, 0.95);
NormalP(0, 1, 0.05), NormalP(0, 1, 0.95);
>
k := 'k':
TQuants := [seq(TP(n+m-2, k/(N+1)), k = 1 .. N)]:
>
T := sort(T):
P5 := ScatPlotCirc(T, TQuants):
P6 := plot([[-3.5, -3.5], [3.5, 3.5]], x = -3.5 .. 3.5, y = -3.5 .. 3.5, color=black, thickness=2, xtickmarks=xtics, ytickmarks=xtics, labels=[``,``]):
display({P5, P6}, title=`q-q Plot of T(n+m-2) Quantiles\nVersus the T Order Statistics`, titlefont=[TIMES,BOLD,16]);
> ZQuants := [seq(NormalP(0, 1, k/(N+1)), k = 1 .. N)]:
>
Z := sort(Z):
P7 := ScatPlotCirc(Z, ZQuants):
P6 := plot([[-3, -3], [3, 3]], x = -3 .. 3, y = -3.5 .. 3.5, color=black, thickness=2, xtickmarks=xtics, ytickmarks=xtics, labels=[``,``]):
display({P7, P6}, title=`q-q Plot of N(0, 1) Quantiles\nVersus the Z Order Statistics`, titlefont=[TIMES,BOLD,16]);
Welch showed that U = Z, where
,
really has an approximate T distribution with the number of degrees of freedom given by the greatest integer (or the floor) in the following expression:
We shall now fit a t -distribution to the U = Z data using as the number of degrees of freedom Welch's suggestion but with distribution variances rather than sample variances.
> v := floor((VarX/n + VarY/m)^2/((VarX/n)^2/(n-1) + (VarY/m)^2/(m-1)));
>
P1 := HistogramFill(Z, -3.5 .. 3.5, 28):
P2 := plot(TPDF(v, x), x = -3.5 .. 3.5, y = 0 .. 0.45, color=black, thickness=2, xtickmarks=xtics, ytickmarks=ytics, labels=[``,``]):
display({P1, P2}, title=`Histogram of U Observations\nwith T(v) Superimposed`, titlefont=[TIMES,BOLD,16]);
>
Percentile(Z, 0.05), Percentile(Z, 0.95);
TP(v, 0.05), TP(v, 0.95);
> TvQuants := [seq(TP(v, k/(N+1)), k = 1 .. N)]:
>
Z := sort(Z):
P5 := ScatPlotCirc(Z, TvQuants):
P6 := plot([[-3.5, -3.5], [3.5, 3.5]], x = -3.5 .. 3.5, y = -3.5 .. 3.5, color=black, thickness=2, xtickmarks=xtics, ytickmarks=xtics, labels=[``,``]):
display({P5, P6}, title=`q-q Plot of T(v) Quantiles\nVersus the U Order Statistics`, titlefont=[TIMES,BOLD,16]);