5 Assignment and Imputation

5.0.1 Objectives

The objectives of assignment and imputation are to use business rules to assign statistical areas and gears when available from multiple sources and fill in missing records for area, gear, mesh category, and effort. Additionally, the process provides measures of uncertainty associated with imputed values.

5.1 Overview

All dealer and vessel landings must be assigned to a valid Federal Statistical Area (referred to as “area”) and gear. When multiple sources (e.g. dealer and vessel trip reports) provide information about area and gear, business rules are applied to determine the best source of information for the trip. Remaining records with missing areas are then imputed to assign the most likely area. Gear code is also imputed followed by mesh (if missing) for trawl and gillnet gears. Finally, we impute records with missing effort (days absent DA and days fished DF). DA and DF are calculated at the subtrip level. Metrics of uncertainty are included with all imputations but not with those directly assigned (including areas assigned based on adjacent port for state trips and lobster-only trips prior to 2023). State (permit = ‘000000’) and federal trips are imputed independently but following the same methods after assignments are made.

Following the apportionment [6] the imputation proceeds through a series of steps:

  1. Area Assignment and Imputation [5.2]
  2. Gear and Mesh Category Assignment and Imputation [5.3]
  3. Effort Imputation [5.4]

Last modified: 2025-06-06

5.2 Area Assignment and Imputation

5.2.1 Assignment of Area

The joined DLR-VTR landings with federal permits receive their federal statistical area assignment based on permit status and dealer source 5.1. Records from dealer source 08 are assigned the area from the dealer. These originate in the clam logbooks and are considered the best source of area information. There are no missing areas in these records. The AREA_SOURCE in CAMS tables for these records is assigned as ‘CLAM’. Similarly, records with source 11 receive the CFDERS area assigned in the ACCSP warehouse. These come primarily from the Chesapeake Bay (area = 625) Menhaden purse seine fishery. The AREA_SOURCE assigned to these records is ‘WHSE’. When area is missing it is assigned a value of 625.

The remaining federal permit records are distinguished and processed based on federally permitted species during each trip. Permit status is classified into lobster class (PERMIT_CLASS) as described in section 4 and table 4.1. If a federal permit is recorded but there are no species registered in the vessel permit system (VPS) during the trip (PERMIT_CLASS = ‘NOP’), we assume that the vessel acted using its state permits and fished in state waters unless they submitted a VTR to indicate otherwise. If there is a VTR with a valid area (area in CFG_NEMAREA during data preparation 4), then we use that area as the best source of information. The VTR-reported area is used preferentially for groundfish trip and the calculated area is used preferentially for all other trips 4. The AREA_SOURCE assigned these records is AREA or CAREA, respectively. For no-permit (NOP) trips without a VTR area, we assign the area adjacent to the port associated with the record. If indeed acting as a state trip then it must be within three miles of shore so the port-adjacent area is the mostly likely fishing area. The AREA_SOURCE associated with these records is ‘PORT’ in any CAMS tables. This is a form of assignment imputation without any associated uncertainty. It represents the best source of information we currently have for these trips. There is currently no VTR-reporting requirement for lobster and therefore typically no VTR-reported area to use and no representative VTR area pool from which to impute. Therefore, trips with only lobsters allowed with their permit at that time (PERMIT_CLASS = ‘LOO’) are treated identicially to the NOP trips previously described for area assignment. This may change if regulations change to require VTR on lobster trips. At such a time, LOO trips could move into the other imputation process 5.1.

The remaining federal permits are assigned area from their VTR as described previously. Those that are missing VTR area enter the imputation process from this same set that have a valid area. The imputation process for area is described in the next section 5.2.3. These records are assigned AREA_SOURCE = ‘IMPUTE’. Dealer source 05 (state dealers) and dealer source 07 (federal dealers) are imputed from separate pools to avoid inappropriate mixing of data. After imputation if there is still a missing area the dealer-reported area in CFDERS is used and receive AREA_SOURCE = ‘DLR’. Finally, if area is still missing then the port-adjacent area is used as the source of imputation (AREA_SOURCE = ‘PORT’) 5.1.

Finally, state permits (PERMIT = ‘000000’) are all assigned the port-adjacent area with the assumption that the most likely fishing location is the closest area. As with NOP federal permits, there is no uncertainty associated with this imputation assignment and the AREA_SOURCE is assigned as ‘PORT’.

The assignment and imputation of federal statistical area based on data source. Records with missing areas continue down the flow chart until an appropriate area is assigned or imputed.

Figure 5.1: The assignment and imputation of federal statistical area based on data source. Records with missing areas continue down the flow chart until an appropriate area is assigned or imputed.

5.2.2 Area Imputation Summary

  • Impute by stratification level: separate for state and federal dealer data
    1. YEAR, QUARTER, PERMIT, MAIN_SPP_GRP (AREA_IMP_METHOD = ‘B’)
    2. YEAR, QUARTER, MAIN_PORT_GRP, MAIN_SPP_GRP, TONCL2 (AREA_IMP_METHOD = ‘C’)
    3. YEAR, MAIN_PORT_GRP (AREA_IMP_METHOD = ‘D’)
    4. YEAR, STATE (AREA_IMP_METHOD = ‘E’)

5.2.3 Federal Permit Imputation

Those federal permits with missing areas that are not assigned an area from another source enter the imputation process 5.1. For federally permitted trips, we summarize subtrips with VTR areas for imputation. We exclude subtrips with generic statistical areas and areas outside the US EEZ in the set: ('110', '100', '500', '510', '520', '528', '530', '540', '550', '551', '552', '560', '600', '610', '620', '630', '799', '800', '899', '458', '459', '462', '463', '466', '468', '469'). Only areas listed in the CFG_FED_AREA table are considered valid federal statistical areas used for imputation. The proportion of subtrips in each area is then used in imputing records within each stratification. Imputation is done by weighted random selection of area based on the cumulative proportion of subtrips each area within the stratification (multinomial probability selection). The proportion of the area selected is then reported as AREA_PROP as a measure of uncertainty for the record based on the multinomial proportions observed in the data. Data pools and therefore proportions are calculated separately for state (dealer source 05) and federal dealers (dealer source 07). This is done to avoid inappropriate mixing of data sources. All landings sold to state dealers should be from state waters and therefore represent a different distribution of areas from which to impute.

The first level of stratification is permit-specific, thereby relating the behavior of that permit to the area they most likely fished based on their existing VTR reports. The stratification is by year, quarter of the year, permit, and main species group (YEAR, QUARTER, PERMIT, MAIN_SPP_GRP). Main species group is the broad, three digit species group (e.g. FIN, PEL, LOB) found in CFG_ITIS with the highest landings on the subtrip. Records with missing areas that match these stratifications are assigned the area randomly selected in proportion to the number of strata subtrips in each area. In addition to this newly assigned area, the meta data field AREA_IMP_METHOD is recorded as ‘B’ and the AREA_PROP field receives the proportion of subtrips that fished in the selected area for that strata. The records that still have missing areas continue to the broader stratification levels (i.e. Level B -> C -> D -> E).

The remaining stratifications are based on categorizing common characteristics of fishing fleets related to the areas they fish rather than being specific to a permit or vessel. Prior to these next imputations, subtrips from international areas are removed from VTR imputation dataset. This is done because only a very limited set of permits are allowed to fish in international waters and those with missing areas should be imputed at level B at the permit level. Those remaining records needing imputation should not be assigned international area based on fleet characteristics if they do not have international permits. The second level of stratification is by year, quarter, main port group, main species group, and two-digit tonnage class (YEAR, QUARTER, MAIN_PORT_GRP, MAIN_SPP_GRP, TONCL2). The main port group is the port group from CFG_PORT with the highest landings on the subtrip. These are assigned a value of ‘C’ in the AREA_IMP_METHOD field and the AREA_PROP gets the proportion of subtrips that fished in the area selected for the strata.

The next area imputation stratification is by year and main port group (YEAR, MAIN_PORT_GRP). These are assigned a value of ‘D’ for AREA_IMP_METHOD. The final area imputation stratification is slighly broader with year and state (YEAR, STATE) and these receive a value of ‘E’ for AREA_IMP_METHOD and as with all stratification imputations the AREA_PROP gets the proportion of subtrips that fished in the selected area for that level of stratification. All records with area imputed by stratification have AREA_SOURCE = ‘IMPUTE’.

Only groups with at least 5 subtrips in the stratification are used for imputation at the permit-level (AREA_IMP_METHOD = ‘C’) and only strata with at least 15 subtrips are used for imputation at fleet-level stratification (C, D, E). There can be a small number of federal trips that enter the CAMS system but do not have at least 15 VTR-reported subtrips in any of these stratifications. These typically have ports and states outside the region and land species without a VTR-reporting requirement. These records are next assigned the dealer-reported area, received ‘DLR’ as the AREA_SOURCE, and have no value for AREA_IMP_METHOD or AREA_PROP (NULL). These records have no associated uncertainty in their area imputation assignment 5.1. Finally, if there still was no area after dealer assignment, the remaining records are assigned the port-adjacent statistical area. Any of these records with unknown ports or ports not in CFG_PORT_AREA are left as unknown with a value of ‘000’. At the end, a check is made to ensure that all NEMAREA fall within the final AREA assigned and if there is a discrepancy the AREA is assigned as the final NEMAREA with no further subdivision.

All results are then combine and the output is passed to the gear imputation process 5.3.

5.2.4 Area Assignment Summary Table

The 2019 data were used as the primary source of evaluation and comparison for the CAMS system. Below is an example of the number of records from each AREA_SOURCE and any associated AREA_IMP_METHOD for 2019 landings.

Table 5.1: Summary of area source and area imputation methods for state and federally permitted vessels in the combined CAMS dealer-VTR landings tables. Query run on 2026-04-05
PERMIT AREA_SOURCE NA B C D E
FEDERAL AREA 149202
FEDERAL CAREA 265853
FEDERAL CLAM 2735
FEDERAL DLR 185
FEDERAL IMPUTE 536 868 7160 653
FEDERAL PORT 442967
FEDERAL WHSE 26
STATE DLR 23418
STATE PORT 1328526

Last modified: 2025-12-16

5.3 Gear and Mesh Category Assignment and Imputation

5.3.1 Assignment of Gear

In the below section, all reference to VTR gears are the NEGEAR derived from VTR GEARCODE using the CFG_VLGEAR table as described in chapter on cleaning and joining landings 4. NEGEAR is the common gear code system used across tables in CAMS.

Following area assignment and imputation, gears are assigned from appropriate sources based on the data source and imputed when missing 5.2. Carrier trips (indicated by the VTR) do not get a gear assignment unless recorded on the VTR. These records receive GEAR_SOURCE = “CAR”. Records from dealer source 08 are assigned the gear from the dealer. These originate in the clam logbooks and are considered the best source of gear information. There are no missing areas in these records. The GEAR_SOURCE in CAMS tables for these records is assigned as ‘CLAM’. Similarly, records with source 11 receive the CFDERS gear assigned in the ACCSP warehouse. These come from the Chesapeake Bay Menhaden primarily purse seine fishery that represents the best source of gear information. The GEAR_SOURCE assigned to these records is ‘WHSE’. When source 11 gear are missing they are assigned an NEGEAR of 123 (menhaden purse seine) and a GEAR_SOURCE of ‘VA_MEN’.

The remaining federal permit records are processed based on the trip lobster class (PERMIT_CLASS) as described in section 4 and table 4.1. For no-permit (NOP) and lobster-only (LOO) trips, we use the VTR gear if available (GEAR_SOURCE = ‘VTR’). For records without a VTR gear, we use the dealer-reported gear. The GEAR_SOURCE associated with these records is ‘DLR’ in any CAMS tables. Those NOP and LOO trips that still have missing gears enter the imputation process described below.

Trips with federal permits for non-lobster species (PERMIT_CLASS LOP and NOL) are further divided into two groups. They are split by dealer orphans (trip and species orphans) with dealer source 05 and the other remaining trips (other: LOP and NOL trips with dealer source 05 and 07 records that match at the species level or are VTR orphans) 5.2. These records get the dealer-reported gear if available. Then the VTR reported gear is used if available. They are commonly available for species-level dealer orphans (STATUS = ‘DLR_ORPHAN_SPECIES’) but never available to trip-level dealer orphans (STATUS = ‘DLR_ORPHAN_TRIP’). Any records still missing gear enter the imputation process 5.3.3. The dealer gear is prioritized for these records to select the most appropriate gear for the rare situation involving mixed gears during a trip mixing state and federal permits. This arises when a vessel fishes targeting species under their federal permit with one gear and submits a VTR referencing the associated landings and effort, but on the same trip sells other species landed under their state permits and a using a different gear. In CAMS, the VTR matches the dealer records at the trip level but the VTR only refers to the species caught with one gear and it would be inappropriate to assume that gear was used for the species not on the VTR and not requiring a VTR submission at all (dealer source 05). One example of this is using purse seine, other nets, or even hand gear to land mackerel, using some of those mackerel caught in state waters as bait in traps in federal waters for lobster or crabs, then selling the remaining mackerel to a federal dealer. If the VTR only lists lobster or crab pots, then assigning the mackerel to that gear would be inaccurate. Therefore, the species not on the VTR get the dealer gear if available as this most commonly matches the species sold. The VTR-reported species get assigned the VTR-reported gear. This results in a separate subtrip for the dealer orphans if the gear is different. For this reason, CAMS processes distinguish between VTR subtrips and overall subtrips defined after the area, gear, and mesh assignment and imputation processes.

The other remaining federal permits are assigned gear from their matched VTR as described previously. Those that are missing VTR gear enter the imputation process. The imputation process for area is described below 5.3.3. These records are assigned GEAR_SOURCE = ‘IMPUTE’.

Finally, state permits (PERMIT = ‘000000’) are assigned the dealer-reported gear and GEAR_SOURCE is assigned as ‘DLR’. If dealer gear is missing then gear is imputed from the pool of dealer reported gears. These are not mixed with the federal permit imputations and are therefore assigned a GEAR_SOURCE of ‘IMPUTE_STATE’ for distinction.

A flow chart of the assignment and imputation of gear (NEGEAR). Records with missing gears continue down the flow chart until an appropriate gear is assigned or imputed.

Figure 5.2: A flow chart of the assignment and imputation of gear (NEGEAR). Records with missing gears continue down the flow chart until an appropriate gear is assigned or imputed.

5.3.2 Mesh Categories

Mesh always comes from the VTR (MESH_SOURCE = ‘VTR’) as it is not reported by the dealer. Therefore, mesh is always missing (VTR_MESH, MESH, and MESH_CAT are NULL) for federal dealer trip orphans and state data since those groups never have a matching VTR record. When mesh is present on the VTR, we classify it into different size categories (MESH_CAT). Mesh is categorized as small if the opening is less than 4 inches (4.00 > SM >= 0.00 inches) and large if it is greater than or equal to 4 inches (LM >= 4.00 inches). Additionally, for gillnets (NEGEAR mapped from VTR GEARCODE: ‘100’, ‘105’, ‘115’, ‘117’, ‘116’) mesh greater than or equal to 8.0 inches receive an ‘XL’ classification. This XL classification is done based on the need for analyses involving regulatory extra large gillnet exceptions. Mesh is imputed for gill nets and otter trawls (MESH_SOURCE = ‘IMPUTE’), otherwise NULL if MESH not reported on the VTR.

5.3.3 Gear-Mesh Imputation

Records missing gear or mesh after the assignment process described above 5.2 enter the imputation process. The imputation process follows the cumulative probability by stratification method described for area 5.2, but with different stratifications. Gear and mesh are initially imputed together to help minimize mismatches between proportions of mesh categories and gears. For example, when calculating the proportions for the multinomial distribution, 100-SM, 100-LM, and 100-XL would each get their own proportion of subtrips in the strata and the gear-mesh would be selected together. The main species group landed is used in all levels of gear imputation because of the close associate of gear and species caught and to prevent impossible combinations of species and gear not found in the data. The permit-level strata only require three VTR subtrips to be included for imputation (B, C), whereas the fleet-level strata (D-F) require a minimum of 15 subtrips for inclusion. This is because gear can be specific to a vessel associated with a permit. Initial trials found higher accuracy in gear imputation when using permit-level information. This limits chances of incompatible gear being assigned to a trip (e.g. gillnets are unlikely to be assigned to a permit that only fishes trawl and handlines whereas at the fleet level based on area and main species, gillnets might be a likely assignment). The stratifications are as described in table 5.2.

Table 5.2: Stratification groups for gear-mesh imputation by metadata field GEAR_IMP_METHOD. The MESH_IMP_METHOD field has the same value as the GEAR_IMP_METHOD field for stratifications B-E. Query run on 2026-04-05
GEAR_IMP_METHOD STRATIFICATION
B year, quarter, main_spp_grp, permit, area
C year, permit, main_spp_grp, dlr_source2
D year, quarter, main_port_grp, main_spp_grp, toncl2, area
E year, main_spp_grp, area
F year, main_spp_grp

The GEAR_IMP_METHOD and MESH_IMP_METHOD are the same for stratifications B-F. Gillnets and otter trawl still missing mesh categories after this are further imputed with stratifications G-I as described in table 5.3. These are mostly commonly dealer orphans that received the gear from the dealer and have no matching VTR from which to get mesh information. All federal permit records are then combined and the output is passed to the effort imputation process 5.4. State permits are imputed based on gear reported by the dealer progressing through gear-mesh stratifications D-F. Without any permit information for state permits (000000), stratification B and C are skipped as unnecessary given levels D and E.

Table 5.3: Stratification groups for mesh imputation by metadata field MESH_IMP_METHOD.
GEAR_IMP_METHOD STRATIFICATION
G year, quarter, negear, main_spp_grp, toncl2, area
H year, negear, area
I year, negear

5.3.4 Gear-Mesh Assignment Summary Tables

The 2019 data were used as the primary source of evaluation and comparison for the CAMS system. Below is an example of the number of records from each GEAR_SOURCE, MESH_SOURCE and any associated imputations for 2019 landings.

Table 5.4: Summary of gear source imputation methods for state and federally permitted vessels in the combined CAMS dealer-VTR landings tables.
GEAR_SOURCE GEAR_IMP_METHOD N_RECORDS
CLAM 3213
DLR 1658986
IMPUTE B 573
IMPUTE C 1440
IMPUTE D 11398
IMPUTE E 12924
IMPUTE F 6158
IMPUTE_STATE C 93287
IMPUTE_STATE D 10310
IMPUTE_STATE E 6816
IMPUTE_STATE 14
VTR 417010
Table 5.5: Summary of gear and mesh category source imputation methods for state and federally permitted vessels in the combined CAMS dealer-VTR landings tables.
GEAR_SOURCE MESH_SOURCE GEAR_IMP_METHOD MESH_IMP_METHOD N_SUBTRIPS
CLAM 3022
DLR IMPUTE G 883
DLR IMPUTE H 2802
DLR IMPUTE I 926
DLR 925084
IMPUTE IMPUTE B B 38
IMPUTE IMPUTE C C 164
IMPUTE IMPUTE D D 492
IMPUTE IMPUTE E E 1549
IMPUTE IMPUTE F F 428
IMPUTE B B 164
IMPUTE C C 323
IMPUTE D D 3361
IMPUTE E E 1712
IMPUTE F F 539
IMPUTE_STATE C 55382
IMPUTE_STATE D 5179
IMPUTE_STATE E 4637
IMPUTE_STATE 14
VTR IMPUTE G 6
VTR IMPUTE H 49
VTR IMPUTE I 34
VTR VTR 28654
VTR 46893

Last modified: 2025-12-16

5.4 Imputation: Effort

5.4.1 Methods

  1. Days Fished and Days Absent

Days Fished (DF) is calculated as the tow duration in hours multiplied by the number of tows and divided by 24 to convert hours to days: \(DF = \sum \frac{Tow_{dur} N_{tows}}{24}\).

Days absent (DA) is calculated as the land date minus the sail date plus one (\(DA = date_{land} - date_{sail} + 1\)).1 We then separate data by FMCODE to model separately.

  1. Hierarchical Model for Days Fished and Days Absent

We use a hierarchical log-linear model to estimate the effects of permit, month, area, species group, gear, and port on DA and DF separately for fixed, mobile, and other gear. This results in six separate, independent models with each using the same formula. The model uses three random effect groupings. The groups are as follows for modeling days fished:

  • \(\gamma\) = permit-quarter-year
  • \(\delta\) = area-month-main species group
  • \(\nu\) = area-negear

Random effect groupings are the same for modeling days absent except \(\nu\) includes port group (\(\nu\) = area-negear-port group) because the transit time between port and fishing area is likely more important for days absent than days fished.

The resulting equation is a linear model with random effects

\[ log(y) \sim \mathcal{N}(\mu_{\epsilon}, \sigma_{\epsilon}) \] \[ \mu_{\epsilon} = \mu + \beta_{\gamma}X_1 + \beta_{\delta}X_2 + \beta{\nu}X_3 \] \[ \beta_{\gamma} \sim \mathcal{N}(0, \sigma_{\gamma}) \] \[ \beta_{\delta} \sim \mathcal{N}(0, \sigma_{\delta}) \]

\[ \beta_{\nu} \sim \mathcal{N}(0, \sigma_{\nu}) \]

where y is either DA or DF and \(\sigma\) is the standard deviation indicating the overall observed error from the expectation. The values of \(\sigma_{\gamma}\), \(\sigma_{\delta}\), and \(\sigma_{\nu}\) are the standard deviations among the values within each of the three groups estimated by the model. We assume a lognormal distribution since the variables need to be positive real numbers and the error is likely to increase with higher effort values. The model is implemented in a Bayesian framework2 using the brms package in R which is a wrapper for the Stan language.

We treat the groups permit-quarter-year, area-main_spp_grp-month, and area-gear-port_grp as random effects. Therefore, if a record needing imputation is missing one of these groupings, it is assigned a value following the probability distribution from the records in that group (zero on average).

A benefit of the Bayesian approach is that posterior probabilities are calculated and uncertainty can be understood in a probabilistic manor regardless of the size of the stratification/grouping.

We examined using correlated in DA and DF in a joint model of the two (multivariate lognormal) but found that it greatly slowed down the modeling without improving predictive accuracy or precision.

  1. Imputation of Missing Records

Following the modeling, any missing values for DA or DF were imputed using the resulting equations. Imputed values were calculated from each iteration of the Monte Carlo estimation and there capture a full posterior distribution for each imputed value. From these predictions, the median is used as the estimate of DA or DF. We also report the middle interquartile range for each variable as DF_Q25, DF_Q75, DA_Q25, and DA_Q75. These values provide measures of uncertainty for the analysts.3.

Last modified: 2025-06-06


  1. refer to matching documentation for information about assignment of official sail and land dates for trips associated with each CAMSID↩︎

  2. We are currently implementing a Meanfield Algorithm for Automatic Differentiation Variational Inference to approximate the posterior probability. This is much faster than fully Bayesian Monte Carlo sampling. Full sampling could be implemented for end of year data freezing↩︎

  3. Any probability values from the posterior can be reported as desired↩︎