Overview

Background and Objectives

Infection by human immunodeficiency virus (HIV) may lead to acquired immunodeficiency syndrome (AIDS) and continues to be a global public health problem, with an estimated 37 million individuals worldwide who are HIV-positive. New York City has the oldest and largest HIV epidemic in the United States history, and it also so leads the nation in the number of new HIV cases nowadays.

In this project, we aim to study the relationships between different characteristics of patients and the HIV/AIDS Diagnosis Outcome in New York City from 2011 to 2015.

Data Sources

Main data

HIV/AIDS Diagnoses by Neighborhood, Age Group, and Race/Ethnicity from NYC open data. [website]

Data discription:

Data reported to the HIV Epidemiology and Field Services Program by June 30, 2016. All data shown are for people ages 13 and older. Borough-wide and citywide totals may include cases assigned to a borough with an unknown UHF or assigned to NYC with an unknown borough, respectively. Therefore, UHF totals may not sum to borough totals and borough totals may not sum to citywide totals."

Other data

  • Shapefile of NYC Zip Codes - tabulation areas provided by NYC Department of Information Technology & Telecommunications (DOITT) [website]

  • Zip code of United Hospital Fund neighborhood [website]

  • 2011-2015 American Community Survey (ACS) Public Use Microdata Sample (PUMS). [website]

PUMS data wrangling:

The raw dataset is super large with hundreds of variables, so we select it based on our target variables - location and total person income. The selected dataset is saved as “selected_pums.csv” in the data folder under our R project. The location data from 2011 ACS is based on Public Use Microdata Area Code (PUMA) 2000, while the definition for PUMA 2000 is nowhere to be found. This is why we exclude the income data from 2011.

For the 2012-2015 PUMS data, we transfer the PUMA 2010 into zipcode for a better visualization on the NYC map. The transform is based on the following two datasets.

Method

After archiving our main data “HIV/AIDS Diagnoses by Neighborhood, Age Group, and Race/Ethnicity” from NYC open data, we tidied it using R. Multiple linear regression models are fitted around two outcomes: The number of HIV diagnosis and Death rate related to HIV. Three Geomaps were created to show the geographical distribution and differences of HIV diagnosis, HIV rate and income between each United Hospital Funded Neighborhood.

Results

Outcome1: The number of HIV diagnosis

Model1:

HIV_diagnoses = 0.983 + 0.297I(borough = Brooklyn) + 3.091I(borough = Manhattan) - 1.245I(borough = Queens) - 4.376I(borough = Staten Island) + 6.083I(male) + 9.600I(age20-29) + 6.870I(age30-39) + 4.627I(age40-49) + 2.355I(age50-59) + 0.427I(age60+)

Model2:

HIV_diagnoses = 5.143 + 0.357I(borough = Brooklyn) + 3.710I(borough = Manhattan) - 1.494I(borough = Queens) - 5.251I(borough = Staten Island) + 7.299I(male) - 3.628I(Asian/Pacific Islander) + 7.304I(Black) + 5.399I(Latino/Hispanic) - 5.008I(Other/Unknown)

Outcome2: Death rate related to HIV

Model3:

HIV_related_death_rate = 2.249 + 0.100I(borough = Brooklyn) + 0.118I(borough = Manhattan) - 1.007I(borough = Queens) + 2.902I(borough = Staten Island) - 0.221I(male) + 2.625I(age20-29) + 2.632I(age30-39) + 3.357I(age40-49) + 4.893I(age50-59) + 8.148I(age60+)

Model4:

HIV_related_death_rate = 4.780 - 1.083I(borough = Brooklyn) + 0.375I(borough = Manhattan) - 1.080I(borough = Queens) + 2.283I(borough = Staten Island) + 0.220I(male) - 0.183I(Asian/Pacific Islander) + 0.430I(Black) + 1.281I(Latino/Hispanic) - 2.153I(Other/Unknown)

Conclusion

  1. Overall, female has a lower possibility to be diagnosed as HIV and a higher possibility of death due to HIV infection than any another groups.

  2. Most infection of HIV occurred between 20 and 29 years old.

  3. The rates of new HIV infection among black people and Latino/Hispanic are higher than other race groups.

  4. The total HIV incidences in NYC declined from 2011 to 2015.

  5. The impact of income on both the number of HIV diagnosis and death rate related to HIV is not significant.

Screencast

Screencast of our project can be found here.