Steps of the Wizard

Steps of the Wizard

Step 1: Select PPA Metrics

Step 1.1 – Select PPA Metrics

On this page, select the metrics to include in your PPA visuals. All PPAs must include Number of Facilities; Care Seeking; and at least one Diagnostic or Treatment. A PPA may optionally include more than one form of diagnosis (up to four) and/or more than one form of treatment (up to four).

Step 1.2 – Customize PPA Metric Names

This page gives you the option to overwrite the default names for PPA metrics with custom names. This is particularly recommended for service availability metrics in order to provide more a descriptive label for the service (e.g. “Smear Microscopy” and “GeneXpert” in lieu of “Diagnostic 1” and “Diagnostic 2”).  With this step complete, proceed to Step 2.1.


Step 2: Provide Data

Step 2.1 – Select Data Sources

Here you must identify the file containing the data required for each PPA metric. If your team has already saved the file to your Team Space, it will be available via the dropdown menus on this page; do not upload it again. If you upload a new file from this page, it will then appear in the metric dropdown menus for all PPAs in your Team Space and will be listed on your Team Data Sources page.  You may reuse the same data source for multiple PPA metrics, and across multiple PPAs. For more information on the data sources required for the PPA, review the PPA Data Cheat Sheet or the Data Requirements video playlist.

Step 2.2 – Apply Sample Weights

Many PPAs include data from surveys that use weighted sampling. In this situation, sample weights must be applied so that the proportions shown in the final PPA visuals accurately represent the population. For these data sets, select the column containing sample weights.

The wizard applies the sample weights when the final output is generated. The Value Counts that appear on many of the wizard pages starting with Step 2.3 are not weighted. Value Counts are simply the number of rows in the dataset containing the specified value for the column selected.

For data sets where sample weights are applied, you have the option to adjust the weight multiplier. From a technical standpoint, adjusting the weight multiplier is not essential to complete a PPA. If your team does not know the weight multiplier (though it should be included in the survey documentation) or does not require accurate sample sizes, leave it at one (the default). All proportions reported in the final PPA visuals will still be accurate (though the numbers may not be).

In the context of the Patient Pathway Analysis, care seeking data is the most common scenario in which sample weights are used. Virtually all care seeking data comes from surveys, and most of these surveys use weighted sampling. This includes the DHS, the HEUS, and prevalence surveys. When service availability data comes from a survey of health facilities (rather than a census), applying sample weights is required if those surveys used weighted sampling. This is the case for the SPA and the SARA (when the SARA is conducted as a survey rather than a census).  

The DHS variables have standard numeric codes across countries and years. For individual recode (women’s survey) datasets, the column containing sample weights is v005 and the weight multiplier is .000001 (as in the example from the Demo Team).

For data sources that don’t use sample weights (census data or unweighted surveys), no action is required in this step—the wizard will apply the default weight of one to all people or health facilities contained in the raw data. 

2.3 – Subset Data

Sub-setting the data is required any time the raw data contains people or health facilities that your team does not want included in the PPA. Operations commonly achieved via sub-setting include:

  • For care seeking data, exclude all survey respondents that did not seek care for illness
  • For health facility data, exclude any facilities deemed not relevant to the analysis
  • For all data sources, exclude entries missing critical information
  • For all data sources, exclude a certain geographic area

On this page, you have the option to subset a data source by up to two variables (columns). Upon selecting a column to subset by, the values contained in the column appear on the right of the screen. Select the values to retain.

If using DHS data for care seeking, standardized variables to subset by include h44a_1 (place of care seeking for child’s diarrhea) or h46a_1 (place of care seeking for child’s fever). Exclude NA values by un-checking them, while retaining all survey respondents that reportedly sought care for their child’s illness (those with non-NA values for the care seeking variable). Some teams may opt to exclude values coded 96 (“other”), though this is solely at their discretion. 

Selecting two subset columns acts as a double filter. If sub-setting is not needed for a data source, no action is required for the data source on this step. When all needed sub-setting is complete, move on to Step 3.1.


Step 3: Identify Variables

Step 3.1 – Identify Global Variables

Global variables are essential for mapping each data source to the common patient pathway. For example, a “Private Hospital” in one dataset may appear as a “Religious Hospital” or “NGO Hospital” in another data set. Similarly, “Capital Region” from one dataset may appear as “Region 1” in another data set. Before you can map values from one data set to another, you must identify the columns containing the values to be mapped. These are columns for Facility Type, Health Sector, and Level of Geographic Aggregation.

On this page,  select a column for Facility Type, or Health Sector, or both – for each data source. Some data sources include data on health facility sector and type as one variable (e.g. “Private Health Center” could be a value in the Facility Type column), whereas other data sources split sector and type into two variables (e.g. “Private” for Sector and “Health Center” for Facility Type). Either structure is fine; identify columns for Sector and Facility Type only if they are both necessary.

For a subnational PPA, select the column identifying the Level of Geographic Aggregation for each data source. This will correspond to the Level of Geographic Aggregation specified on the Team PPAs page (Region, State, etc.; or a custom level). If conducting a national PPA, the wizard will not prompt the user for this information.

Note that the column names in the raw data need not match “Facility Type” or “Health Sector” verbatim. Columns may have any name. The presence of this data is the only requirement. 

Step 3.2 – Identify Service Availability Variables

On this step, the wizard displays each Service Availability metric included in the PPA (per your specifications in Step 1.1). For each health service listed on the left, select the column from the corresponding data source that indicates whether the service is available. After selecting the column, a list of the values it contains will appear on the right. Check the box(es) indicating the service is present.

The raw data may, for example, include a column titled “Service A Present” containing yes/no values . Or, it may include a column like “Services Present,” with values A, B, C, none, etc. Either structure will work within the wizard. After identifying service availability variables from the data sets and specifying the values corresponding to service availability, proceed to Step 4.1. 


Step 4: Map Facilities

Step 4.1 – Create PPA Sectors and Levels

On this page, teams define the Health Sectors and Levels to include in a PPA. Pre-set sectors include Public, Private, and Informal Private. Your teams may opt to include one custom sector and provide a customized name.

You may select up to ten sector/level combinations by checking up to ten boxes. Each box you check will correspond to a row in your final PPA visual. The ten-row limit is to make the visuals easily interpretable and ensure each visual fits on one page.

The Introduction of this How-to Guide describes the rationale for creating Health Sectors and Levels, and relevant considerations. Teams that are new to the PPA should review this section prior to completing this step in the Wizard. This step should be completed as a team, with full participation of NTP managers and other key stakeholders within the NTP. 

If you see more Health Sector/Facility Type combinations than expected, or it seems like way too many to map, you may wish to consult the Data Cleaning Tips at the end of this chapter.

Step 4.2 – Map Health Facility Types to PPA Sectors and Levels

Each data source has a unique way of categorizing and naming health facilities types. On this page, you will map the health facility types contained in your data sources to the PPA Sectors and Levels your team defined in the Step 4.1. When you highlight a data source on the left of the screen, the wizard displays all Health Sectors or all Facility Types contained in the selected dataset on the right of the screen. If the dataset contains fields for “Health Sector” and “Facility Type” (and you identified both in Step 3.1), the wizard will display all Health Sector/Facility Type combinations from the dataset, along with the number of times they appear.

For each data source, assign each Health Sector/Facility Type combination to a PPA Sector and Level. A check mark will show up next to a data source when this task is complete for that data source.

It is important to consult the broader team on health facility mapping, to ensure consensus around the rationale for grouping health facilities and design the PPA to meet the programmatic needs of the country team.

For care seeking data from the DHS individual recode, look up your care seeking variable in the .FRW file to see the labels (facility types) that correspond to the numerically coded values for that variable. In the example, numeric codes indicate the types of facilities where respondents reportedly sought care for child’s fever (h46a_1).

When all Health Sector / Facility Types combinations for all data sources have been assigned to a PPA Sector / Level, proceed to Step 5.1.

If you see a Health Sector or Facility Type you don’t want included in the PPA, go back to Step 2.3 and exclude this group of facilities via sub-setting.
If you see Health Sector/Facility Type combinations that do not provide enough information to assign a PPA Sector/Level (for example, if one or both fields are blank), you have 3 choices:
  1. Make an educated guess, where appropriate, and assign the PPA Sector/Level accordingly
  2. Go back to Step 2.3 and exclude these entries from your analysis. For example, you might exclude all rows where the value in the Facility Type column is blank.
  3. Go back to your original data source and learn more about the problem entries so that you can enter the correct values for Health Sector and Facility Type.
  4. Choice 3 is the only acceptable choice if the problematic entries constitute a substantial proportion of the entries in the data source. Even if they account for a small portion, proceed with caution if conducting a subnational PPA. You may wish to go back to Step 4.1, create an “unknown” Sector, and assign the problem entries to this sector. Then when you run the subnational PPA you can check to see if the unknown entries are concentrated in specific geographies.


Step 5: Map Geographies

Step 5.1 – Define PPA Geographies

Steps 5.1 and 5.2 pertain to subnational PPAs only. If conducting a national-level PPA, proceed directly to Step 6.1.

If conducting a subnational PPA, your team determined the Level of Geographic Aggregation when creating the PPA on the Team PPAs page. In Step 3.1, you selected the column from each data source containing the geographies. For example, the columns containing values for “Region.”

It is likely that the geographies included in each of your data sources are not identical. The data sources may contain the same set of geographies represented with different names, spellings, abbreviations, or numeric codes. Or, there may be some slight variation across data sources in the true administrative areas that are represented. Defining your PPA Geographies is the first step to addressing this situation.

On this page, specify the “master” geographies that you wish to use for the PPA; these are your PPA Geographies. The wizard will generate a PPA for each PPA Geography listed here. You may create a list of PPA Geographies anew if desired. To do so, click “create new” for each geography to include. It is likely, however, that one of your data sources contains the exact geographies you wish to use or comes close. In this case, select the data source from the dropdown menu and click “populate PPA Geographies from this data source.” The geographies for that data source will then appear. The list is just a starting point; change spellings, add geographies, or remove geographies if desired. When you are satisfied with the PPA Geographies listed on this page, move to Step 5.2.

If you see more geographies than you expect for a data source, it may be a data cleaning issue. See Data Cleaning Tips.
You may assign multiple geographies from a data source to one PPA Geography. This may be useful if you have an old data source with multiple geographies that have since been combined, and you want a PPA visual for the new combined geography.
You cannot, however, split a geography from a data source into multiple PPA Geographies. In a situation where an old data source contains a geography that has since split into two or more geographies, there are multiple ways to approach it:
  1. Leave the older, larger geography as the master (PPA Geography) and map the smaller geographies from the remaining datasets to it. Programmatically, this will mean that two or more Geographies will share the same PPA for a larger geography of which they are a part. This may be acceptable in some situations and not in others.
  1. If you have enough information from other variables in the data source with the larger geography to subset it to the smaller geography, you can create a separate “national” PPA for only the smaller geography. For example, you could do this if a region was split by district into two regions, and the older data source.

Step 5.2 – Map Data Source Geographies to PPA Geographies

If conducting a subnational PPA, your team created a list of PPA Geographies in Step 5.1. On this page you map the geographies from each dataset to your PPA Geographies.

Highlight a data source on the left of the screen. The wizard then displays all geographies in the selected dataset on the right of the screen.

Assign each geography from the dataset to the appropriate PPA Geography via the dropdown menus. Note that some dropdowns may be auto-filled. This happens in two scenarios:

  1. The data source highlighted was the one selected to populate PPA Geographies
  2. It happens that the geographies contained in a data source match the PPA Geographies exactly

Also note: It is okay to assign multiple geographies from a data source to a given PPA Geography (see tips from Step 5.1 regarding situations where this may apply).

For regions from the DHS individual recode, look up v024 in the .FRW to see the labels that correspond to the numerically coded values. In the example, numeric codes correspond to the five regions reflected in the survey.

You must assign each geography from each data source to a PPA Geography before moving on to Step 6.1 to generate PPAs. 


Step 6: Create Output

Step 6.1 – Generate, Preview, and Download PPAs

For subnational PPAs, you must complete all steps of the wizard prior to generating an output. For national PPAs, you must complete all steps except 5.1 and 6.1. If you have completed all these steps, click “generate new output.” It may take a few minutes for the wizard to generate your PPA. Once it is complete, it will appear on the screen, labeled with the current date and time. Highlight the output to preview your PPA visual(s) on the right of the screen. You may cycle through subnational PPA visuals via the dropdown menu.

You may also download PPAs from this page. The downloaded zip file will contain images for every geography included in the PPA, and an excel file containing a record of the inputs associated with the visuals. 


%d bloggers like this: