Wednesday, April 29, 2015

Assignment 5


Introduction:

I have been asked by the UW System to analyze enrollment numbers for all UW system schools.  I have been given enrollment numbers for the number of students at UW schools from all 72 counties.  The simplified explanation is that the UW System wants to know why students choose the schools they are going to. 

Methods:

                To do this analysis our project leader gave us some data that he received from the UW system. Attached in this data are all the UW schools and their enrollment numbers. We only had to choose two to analyze and I picked UW-Eau Claire and UW-Green Bay. Other data associated with this file is the number of citizens in each county with a bachelor’s degree and the median household income for that county. We are also given the distance each County (from its center) is from the different universities. We ran a linear regression analysis and I found four variables that stuck out in terms of their statistical significance (rejecting the null hypothesis, which is that there is no linear association between any two variables). These variables are the distance the students’ home county is from their university (for both UWEC and UWGB), the counties residual bachelor degree count (for UWEC), and the county’s median household income (for UWGB). With these varibles I was able to make four separate maps that I could then further analyze to try and pick apart why counties are sending more or less students to UWEC and UWGB.                    

Results:
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
8.518
6.797
 
1.253
.214
EAUVAR
.124
.004
.972
34.626
.000
 
Table 2: Table showing UWEC Bachelors Degree Variable
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
-126.472
78.935
 
-1.602
.114
PerBSDeg
4283.038
1381.570
.347
3.100
.003
Table 2: Table showing UWEC Bachelors Degree Variable
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
-80.982
116.509
 
-.695
.489
MEDHHI
.006
.004
.193
1.645
.104
Table 3: Table showing UWGB Distance Variable

Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
22.782
4.911
 
4.639
.000
GBYVAR
.026
.001
.981
41.768
.000


Table 4: Table showing UWGB Median Household Income Variable



Figure 1: Map showing UWEC's Distance Variable
Figure 2: Map showing UWEC's Bachelors Degree Variable
Figure 3: Map showing UWGB's Distance Variable
Figure 4: Map showing UWGB's Bachelors Degree Variable
              
Conclusion:

                We can see from the maps and charts above that we can spatially recognize some potentially significant reasons as to why students may attend certain universities. Let us focus on UW-Eau Claire. We can see that by the surrounding orange counties that many of those students may have come to Eau Claire due to how close it is to their hometown. As we get further away we can see that pattern slowly start to disperse. Eau Claire County of course sends a lot of its students to UWEC. Then we get to Counties such as Milwaukee, Door, and Bayfield. These counties have a low number of students sent, Why? Maybe because of cultural differences, I would say that people who grew up in Milwaukee may find Eau Claire very appealing. Maybe because of age differences, Door County has a lot of retired older couples. As far as Bayfield I’m not sure, maybe because many of those students want to go to UM-Duluth. When looking at the Eau Claire bachelor’s degree map we can see that there is four counties that stick out abruptly and in no apparent pattern. These counties are Marathon, Brown, Dane, and Waukesha and I think I may know why they send a lot. Inside or near these counties we can find UW-Stevens Point, UW-Green Bay, UW-Madison, and UW-Milwaukee. It is highly possible that many of those that graduated from their stayed and raised their families there and are more likely to send their children to college than the average Wisconsin family.

                When looking at UW-Green Bay we can see by their distance map there are clearly marked zones of dispersion. These zones are indicated by the purple lines and as we get farther from green bay, less and less counties send a large amount of students, however, why is Brown County itself so low? This may be because of the old adage that kids want to get out of the house and away and explore different parts of the world (in this case, different parts of Wisconsin). Eau Claire didn’t have this problem but Brown County has a lot more people than Eau Claire County does so that may be the reason why the numbers are so skewed. When looking at the other UW-GB map we can see that many counties close to Green Bay that have a high number of bachelor degrees, have sent their students to UW-GB. Maybe those high income families find it easier to send their student to the school nearby. St Croix County on the west end of Wisconsin has a really low number of students sent. I think that may because all those students decide to go to the twin cities for school and instead of treading all the way across Wisconsin.

                You could probably go a lot more in depth in this study to find more reasons as to what pulls or pushes people away from certain schools. It will always be difficult to know for sure since you cannot be inside the mind of an 18 year getting ready to make one of the biggest and most important decisions of their life.

 

Thursday, April 9, 2015

Assignment 4


Part one:

Section 1:

           The first exercise in this assignment has us looking at the correlation between Distance (in Feet) and Sound Level (in Decibels). My null hypothesis is that there is no linear correlation between Distance and Sound Level. My alternative hypothesis is that there is a linear correlation between Distance and Sound Level. The Pearson Correlation test that I ran shows that there IS a correlation between the two and it is a very high negative correlation (-.896). Therefore I reject the Null hypothesis and fail to reject the alternative hypothesis.
Table 1: Graph showing a very high negative correlation between Sound Level and Distance. Sound level is on the X-Axis while Distance is on the Y-Axis.
 
Section 2:
 
           For the second section of Part One we created a Correlation Matrix of Milwaukee County in Wisconsin. There are some patterns that we can gather from this matrix. One variable we can look at is Percent White. When looking at Percent White we can see that when it increases, other races decrease. This is rather obvious as when one percent of race goes up then the other must go down, however in Milwaukee County it is especially true between white and black. Milwaukee is one of the most racially segregated cities in America and the high negative correlation (-.887) shows that through statistical values.
Some more analytical variables we can look at are the correlation between people who are below the poverty line to people that walk to work. There is a low correlation between the two which means that we can see that when proportions of people below the poverty line increase, proportions of people that walk to work increase as well.
Part Two:
 
Introduction:
                I have been hired by the Texas Election Commission (TEC) to run some statistics on data from the 1980 and 2008 elections. They have given me the percent democratic and voter turnout for each election. I wanted to add another variable to maybe shed some light on why we might be seeing the patterns that we see after running the statistics tests, so I have also downloaded a Hispanic population dataset.
The reason why they want me to analyze this data is to determine if there is a clustering of voting patterns anywhere in state along with if there is a clustering of voting turnout patterns. The TEC wants to give this information to the governor to see if over a 30 year period the patterns have changed throughout the state. To run this data analysis I will be mostly using GeoDa and SPSS to determine if any patterns take place. The TEC specifically wants me to determine if there is any special autocorrelation within the state.
Methods:
                To start the data analysis I first had to gain access to all of the required data. Luckily for me the wonderful TEC commissioner has provided me with the Texas election data. It is up to me to get the Hispanic data, so I choose to go through the Census Bureau. I got the Percent Hispanic Population for 2010 and while I was there, I also downloaded the county shapefile for Texas. Now I have all the files that I need to run the statistics, all I need now is the software to do all the complicated stuff that I cant do. No need to download any software as it is already on the computers so I jump right into GeoDa. Within GeoDa I imported the Texas County shapefile and created a new spatial weight since I would be running a spatial autocorrelation test. While creating the weight I selected ROOK as the contiguity weight.
                Now since I determined the weight I am able to make Moran’s I and LISA Cluster Maps. To create the Moran’s I was very self-explanatory as I simply clicked on the Moran’s I icon and selected the variable I wanted and it instantly made the graph.  I then did that for the rest of the variables and then moved onto the LISA Cluster Maps. This was just as simple as I selected its own icon and then cluster map and WALA, I had myself a LISA Cluster Map.
 
Results:
                After running all the tests I was left with 5 Moran’s I’s and 5 LISA Cluster Maps. I can instantly see from the Moran’s I that over time (from 1980-2008) that the counties that vote democratic have become more clustered. This is not the case with the voter turnout as it seems to have gotten less clustered over time. When looking at the percent Hispanic population we can see that it is very concentrated and by looking at the beautiful LISA Cluster Maps that Geoda created we can see where that is. As you see by Figure 1 we can see that there is high clustering of counties that all have high Hispanic population in the south along the US-Mexico border. There is also a cluster in the northeast of counties that all don’t have high populations of Hispanics. We can see from the other Cluster Maps that the same area that is occupied by a high number of Hispanics also has a high number of counties that all vote democratic both in 1980 and 2008. One other pattern we see in the state (especially in south Texas) is that those areas with high numbers of Hispanics and a high number of democratic voters also have a low percentage of voting turnout.
Figure 1: LISA Cluster Map showing percent Hispanic Population.
 
Legend for LISA MAPS
                One interesting thing that goes against the pattern that we see through most the state is that the Dallas-Fort Worth area (Figure 2) has a high voter turnout but with a higher Hispanic population as well. It doesn’t show up on the map in Figure 1 because all the counties around it have a much lower Hispanic population and due to the Rook Contiguity Weight that I put on before, those high Hispanic counties are affected by the lower Hispanic counties to the top, bottom, left, and right of them.
Figure 2: LISA Cluster Map showing voter turnout in 2008.
 
Here are the other Moran’s I and LISA Cluster Maps:
Figure 3: Moran's I for Percent Hispanic Population

Figure 4: Moran's I for Percent Democratic 1980

Figure 5: Moran's I for Percent Voter Turnout 2008

Figure 6: Moran's I for Percent Voter Turnout 1980

Figure 7: Moran's I for Percent Democratic 2008

Figure 8: LISA Cluster Map showing Percent Democratic 1980
Figure 9: LISA Cluster Map showing Percent Democratic 2008

Figure 10: LISA Cluster Map showing Percent Voter Turnout 1980

 
 
 
Conclusion:
                 As far as if election patterns have changed over time I would say they have but only slightly. There doesn’t seem to be any mass migration of voters but rather individual little pockets of change that pop up over the state. Those pockets of change seem to be around urban areas with regard to voter turnout and rural areas with regard to democratic voters. I would think that this could be due to the huge turnover of rural to urban populations that we have experienced over the last 30 years with a majority of our population living in urban areas now.  I do not believe that any of these patterns will affect elections in an astronomical way nor does anything need to done to redistrict Texas in order to make up for these changes.
Sources:
                Texas Election Committee
                U.S. Census Bureau
 





















Monday, March 16, 2015

What is "Up North"?


Introduction:

            The Tourism Board of Wisconsin as asked me to conduct some research regarding the concept of “up North”.  I have been provided a large data set of variables from the State of Wisconsin. They asked that I choose three variables and conduct a Chi-Square test on each of them as well as create some maps. We have broken up the State of Wisconsin into two parts, northern and southern counties, and we are dividing them along Highway 29.

            The three variables that I have selected are Resident Gun Deer License Sales, Nonresident Gun Deer License Sales, and Nonresident Archery Deer Licenses Sales. I choose these three variables because as a hunter I think I could provide some helpful insight as to why we are seeing any patterns within the maps. I also think that hunting is a fundamental part of Wisconsin, culturally and economically.

            My Null Hypothesis is that there is no difference between Northern and Southern Wisconsin. My Alternative Hypothesis is that there is a difference between Northern and Southern Wisconsin.

Methodology:


            First issue that needed to be addressed was where the boundary of Northern and Southern Wisconsin was. To do this I went and downloaded data from the Census Bureau (for the County Shapefile), ESRI and the Wisconsin DNR for the major roads in Wisconsin. From there I selected highway 29 through a variety of Select by Attribute tools and found that there was some gaps in each datasets version of Highway 29 but when you combine both of them together the create the entire stretch of Highway 29. Then I selected which counties belonged in what part of the state. Most of them feel completely within the Northern or Southern zones but some border counties created some problems. To fix this I put the county in whatever zone that occupied most of the area along the Highway 29 border. Now that I Labeled which counties are in the North or South I added a field in the attribute table and gave the county an attribute of 1 for North and 2 for South. I ended up with a map that looked something like this:
Map One
 

 The next step was to join the data from the State of Wisconsin to my map of Wisconsin counties and Highway 29. We joined the data based off of the counties. I then created more fields within the table for my three variables to set up a Chi Square Test. This was a confusing concept for most but I found it relatively easy.  I went to symbology in Arc, then quantities, then classify, and told it to give me four groups of counties all based off of an equal interval. This gave me the four breaks that I would base my test off of. In the new field in the attribute table I selected field calculator and entered four as the attribute for all the counties. I then went to select by attributes and selected everything that was less than my highest break. Then through the field calculator I gave those attributes a 3. I did this for the rest of the breaks and that gave me a field in the table that grouped the counties into four subsets that I could use for my Chi Square Test.  Once I did this for all three variables I exported the table as a dBAse so I could open it in SPSS.

Now I opened SPSS and my table that I exported from Arc. To get to the Chi Square test I had to go to Analyze, then Descriptive Statistics, the crosstabs. I selected Chi Square and it was rather simple after this as SPSS did all the work for me! Now all I have to do is analyze the data.

Results:

            After conducting the Chi Square tests and creating my maps these are my results.

            Map two below shows the Sale of Resident Gun Deer Licenses. We can see a higher number of licenses in the southern region of Wisconsin with a concentration of sales in Southeastern Wisconsin. This makes sense due to the higher population of Southern Wisconsin. The only Northern County that was in the fourth group (Dark Red) was Marathon County and that could simply be because of the size of the county. This Variable had a Chi Square value of .295 which shows that there is very little similarity between the North and South.
Map Two


            Map three shows the Sale of Nonresident Gun Deer Licenses. In this map we have seen a complete flip of where the highest numbers are. They have moved from the highly populated southeastern portion of Wisconsin to the North/ Northwestern part of Wisconsin. In my opinion most people would think of Northern Wisconsin as where everybody goes to hunt deer. So why would the first map of Residents show the opposite and the map on of Nonresidents show what we see below? The Chi Square Value of this variable was .190 which shows even a further dissimilarity between North and South.


Map Three

            In map four we see the sale of Nonresident Archery Deer Licenses. This map compliments map two and shows relatively the same pattern. Low numbers in the south and high numbers in the north. The Chi Square was even smaller than the previous variables at .085! This would imply a great difference between the Northern Wisconsin and Southern Wisconsin. Even with map two we have seen a difference and with every map after that, the difference keeps getting greater and greater.
Map Four
 

Conclusion:

The data has been complied and my interpretation of the data has ended. This is what I have found. There IS a difference between Northern and Southern Wisconsin with regard to Deer License Sales. Therefore I reject the Null Hypothesis and fail to reject the Alternative Hypothesis. I have come to this conclusion through analyzing the maps I created and the Chi Square Tests. The concept of “Up North” is a cultural concept and I think that is shown by these maps. It’s not only residents that hunt deer in Wisconsin, Thousands of people migrate to Wisconsin to hunt deer and where do they go? Northern Wisconsin. Why you may ask? Well where else would you go? From tales of the Turdy Point Buck to stories from the Wisconsin Wilderness, Hunting in Northern Wisconsin has become a staple for this state. I’m sure that other variables show similar results but I believe that these variables really speak to what “Up North” is. It says that although the majority of the population may live in the Southern region as shown by map 1, that when getting rid of where you live and trying to locate where you hunt, the Nonresidents point us right in the right direction. Therefore I can say that statistically there is absolutely a difference between Northern and Southern Wisconsin and that definitely plays a role in people’s perception of what “Up North” is.

Thank you to the Wisconsin Board of Tourism for letting me conduct this study and to the State of Wisconsin, Wisconsin DNR, and ESRI for their data.

My data concerns are small. I do question the validity of the license sales locations as I imagine it shows just where the license was bought. Most people will buy them when they are near their deer camp and this number doesn’t count for those nonresidents that buy their license in their state. I don’t think this would change much but it is worth noting.