Dataset History

 

Mosquito Information Management Project (MIMP)

(taken from Foley et al. 2007. Ecological Entomology. 33:12-23)

The database reported here began with the Mosquito Information Management Project (MIMP), initiated in 1979 to develop a computer-based system for storing and retrieving data on mosquitoes (Faran et al., 1984). MIMP was tasked with digitizing the records of paper collection forms for vouchered specimens housed at the National History Museum, Smithsonian Institution. Notable among these sources was the Mosquitoes of Middle America (MOMA) collection from the University of California, Los Angeles. The MOMA records are a testament to the foresight and taxonomic skills of John Belkin, who saw the importance to future mosquito researchers of accessible collection details in a standardized format (Belkin & Heinemann, 1973; Zavortink, 1990). Belkin & Heinemann (1973, 1975a,b, 1976a,b,c), Heinemann & Belkin (1977a,b,c, 1978a,b,c, 1979), Heinemann et al. (1980) and Heinemann (1980) contain records for the MOMA collections, some of which date back to 1899. In the early 1980s, information pertaining to about 402,000 specimens on over 15,500 paper collection forms, including details on the identification, collection location and ecological characteristics of the collection site, were entered into a computer (Faran et al., 1984) utilizing the SELGEM software (Creighton & Crockett, 1971). Although many paper records remained to be digitized, the MIMP project terminated in 1983, after which these digital records languished. In 1999 the Smithsonian Institution decided to convert all its electronic specimen data, including the mosquito database, into a single database. On September 10, 2001, the only digital copies of these records arrived at a company a couple of blocks from the World Trade Center in New York, where they were to be converted from magnetic tape to a modern digital format. Despite their proximity to “ground zero”, these records survived the terrorist attacks the following day, and were returned to the Smithsonian to be converted to a modern object-relational hybrid data structure.

Faran et al. (1984) describe the composition of the 78 categories and subcategories of information within the original MIMP database, which included locality description, collection code, collection date, latitude and longitude, species identification, a unique record identification number, collector, and date of collection. The database was divided into those records that had geographic coordinate data (i.e. degrees longitude and latitude) and those that did not. Entries were further divided into those of questionable taxonomy (i.e. species entries followed by group, complex or aff. = affinity), apparent identification failures (i.e. entries where species identification was not given or was followed by “uncertain” or “?”), and those with unequivocal species identification. Those without geographic coordinates were divided into those that had Military Grid Reference System (MGRS) coordinates and those that did not. MGRS coordinates were converted to geographic coordinates for WGS-84 using the batch options in GEOTRANS V2.2.6 (US Army Topographic Engineering Center, Geospatial Information Division). The appropriate horizontal datum and ellipsoid were determined by inspection of maps housed at the WRBU that were originally used to arrive at MGRS coordinates. Where MGRS data entry errors were suspected, such as transposed letters and digits and incomplete coordinates, the Universal Transverse Mercator (UTM) zone number and designator were first confirmed by cross-checking against a world map of UTM grids. Error detection at this stage was helped by use of the electronic gazetteer eGAZ, and BIOLINK Map Assistant V2.1.309 (Shattuck, 1997), which located collection site names on a map. Many MGRS readings were to 1 km precision but for a number of Caribbean islands these could be increased to 100 m by re-georeferencing collection locations where these points were obvious on original maps. When geographic coordinates were already present in the database in degrees-minutes format, these were converted to decimal degrees and checked to ensure they had the correct sign (+ or -) for their hemisphere of origin.

Specimens with unequivocal identifications and geocodes were filtered in Microsoft Excel for unique locations, and these point data were converted to shape files for mapping in DIVA-GIS 5.3 (http://www.diva-gis.org/ ). Further data cleaning was undertaken by the ‘check coordinates’ option of DIVA-GIS, a “point-in-polygon” method (Chapman, 2005a), which identifies points located outside all polygons (i.e. fell in the ocean), and points that did not match relations for the country names (i.e. fell in another country). Locations so identified (n=273) were rechecked and corrected by consulting original collection cards and maps housed at the WRBU or through the Alexandria Digital Library online Gazetteer (http://middleware.alexandria.ucsb.edu/client/gaz/adl/index.jsp). Data were imported into ARCVIEW GIS 3.3 for graphical display. Generic, subgeneric and species names were updated to follow the SCC on the WRBU website (http://www.mosquitocatalog.org/main.asp, accessed 23 May 2006). Mosquito species composition by country was obtained from the WRBU website (accessed 16 June 2006). 

National Institute for Communicable Diseases (NICD)—Anopheles gambiae database for Africa

The database covers more than 36 years of work by many entomologists in Africa. It was initiated by the late Prof George Davidson and Dr Maureen Coetzee. According to Coetzee et al. (2000) “The collection of distributional data was initiated by the late George Davidson of the London School of Hygiene and Tropical Medicine, UK, who spent much of his retirement collecting and collating data for the distribution maps. Distributional records are based solely on identified samples, and come from scientific publications, reports of the Ross Institute of the London School of Hygiene and Tropical Medicine and unpublished records from recognized research institutions that carry out species identifications. The computer programs used were dBASE III Plus and Quattro Pro for the databases and MapInfo for the production of the maps. The database used to compile the maps includes the following categories: country, place name, map coordinates, species, method of collection, date of collection, method of identification, number of specimens identified and the references from which the data were obtained.

The reader is cautioned that some distributional points might be inaccurate because of incorrect identifications or incorrect locality data in the original publications. Indeed, original map coordinates placed some records in the wrong country, or in the ocean. In addition, at least one record of An. gambiae in southern Africa is known to be the result of laboratory contamination.”

“Visit the website of the Department of Medical Entomology, South African Institute for Medical Research/University of the Witwatersrand for more information or access to the database and bibliography: http://www.wits.ac.za/fac/med/entomology/medento.htm”

Dr Coetzee has published 3 papers on the distribution of the gambiae complex (see below) with the 2000 paper being a synopsis of the database submitted to MosquitoMap by Dr Coetzee in Sept 2006. Dr Coetzee provided the background to this database and suggested calling it NICD.

The NICD database was checked for errors such as sign of hemisphere and decimal degrees were recalculated from original coordinates. Questionable entries were set aside and inconsistencies in spelling of location names and species identifications corrected. Further data cleaning was undertaken by the ‘check coordinates’ option of DIVA-GIS, a “point-in-polygon” method (Chapman, 2005a), which identifies points located outside all polygons (i.e. fell in the ocean), and points that did not match relations for the country names (i.e. fell in another country). 32 points fell outside all polygons and 72 fell outside country polygon. Some of these points were coastal or in border areas – considering the 1 minute precision of the coordinates, it is not surprising some errors occur. These points were corrected by use of the Alexandria Digital Library online Gazetteer (http://middleware.alexandria.ucsb.edu/client/gaz/adl/index.jsp) and after consultation with Dr Maureen Coetzee. A total of 2536 records are included in the database hosted by MosquitoMap.

By Desmond Foley, 19 Sept 2006

POSTSCRIPT: It has come to my attention (late 2007) that this database is also hosted on the MARA/ARMA Project site. The following text is taken from that website: “The database originated at the London School of Hygiene & Tropical Medicine through the efforts of Professor George Davidson and Mr Ron Page. Many entomologists who work or have worked in Africa contributed to the database, either by sending material to the Ross Institute in the 1960's and 70's for cross mating identification or by donation of reprints. We thank them for their assistance and hope that they find the database and species distribution useful. The World Health Organization (WHO/AFRO), South African Institute for Medical Research and South African Medical Research Council are thanked for financial support. Coetzee, M., M.H. Craig and D. le Sueur. 2000. Mapping the distribution of members of the Anopheles gambiae complex in Africa and adjacent islands. Parasitology Today 16: 74-77. “ The database from the MARA site containes decimal degrees coordinates and Number of specimens frequently contains “a” with the explanation “It is not known what “a” stands for in George Davidson's original database.” The records with “a” in the MARA database appear as zero in the database provided by Dr M. Coetzee. 

Instituto Nacional de Biodiversidad (INBio)—Mosquitoes of Costa Rica

The Instituto Nacional de Biodiversidad (INBio) dataset was submitted by Luis G. Chaverri during May-July 2006. According to Mr Chaverri “Originally I took the information from the maps with the coordinate system LCRN, scale 1:50000, ellipsoid Clarke 1866 and datum Ocotepeque, in order to provide the information with Long_decimal degrees and Lat_decimal degrees I did the convertion using our Atta system and the information that I will send for all the localities follow this format: coordinate system Lat/Long, scale 1:50000, ellipsoid WGS 84, datum WGS 84. Unfortunately I don´t have the information in Excel, I hope that the format in Word will be fine. In the locality information first is the country, followed by provincia, canton and distrito that correspond to state, city and county. In some cases there are some names in capital like ACOSA or ACLA-C that refers to conservation areas according with our national parks system. The information followed by the word "En" refers to the larvae habitat. We have almost 12000 specimens and more than 95% were reared from larvae. The codes LGCh and BHB refers to the collector and correspond to Luis Guillermo Chaverri and Braulio Hernández Bogantes. For each locality I included only the collecting codes (ex. LGCh 076) but not the rearing codes (ex LGCh 076.01, LGCh 076.02 .....) because we have a lot of specimens reared from the same event and I think that you only need the locality information and the specie.”

The word documents for each species were transcribed into a single Excel file. The INBio database was checked for errors and questionable entries and inconsistencies in spelling corrected. Further data cleaning was undertaken by the ‘check coordinates’ option of DIVA-GIS, a “point-in-polygon” method (Chapman, 2005a), which identifies points located outside all polygons (i.e. fell in the ocean), and points that did not match relations for the country names (i.e. fell in another country). Very few points were problematic and as Mr Chaverri explained “The problem, as you mention, is that the points are very close to the coast, I did some collects just some meters from the tide. Also the point close to Nicaragua is right, we found some material just a few kilometers from the frontier.” The final database has 786 records for 97 species.

By Desmond Foley, April 2008  

Chiapas, Mexico - UTMB

Included here is collection data for ~35,000 mosquitoes from Southern Mexico. Primary collector was Eleanor R. Deardorff. Help with traps and identification was provided by Williamns Ornelas and Aida-Luz Salgado (both of Chiapas, Mexico). Occasional assistance was provided by Jose-Guillermo Estrada Franco and Nicole Arrigo (both of UTMB, Galveston Texas), the P.I of the project is Scott C. Weaver of UTMB, Galveston, Texas. Identification was based on adult female morphology. All mosquitoes were collected in coastal Chiapas, Mexico, during 2006 and 2007. There were 8 field sites that were samples 4 times for 2 days each time over the course of the year. Only adult females were counted - no males were counted and no larvae or pupae were collected. Bait used included CO2 baited CDC traps, trinidad traps with hamsters as bait, malaise trapping using calves as bait and direct aspiration off of horses. GPS coordinates for each site were obtained in February of 2006 prior to collection. Nepuyo and Patois viruses were isolated from two different mosquito pools each but no other viruses were found by CPE assay on Vero cells.

By Eleanor Deardorff, Aug 2010

 

CHPPM-W

Mosquito collection records from United States Department of Defence bases and related facilities in the CHPPM-West region (now US Armed forces Health Promotion – Western region) of the United States, have been collected every year since 1947. Unfortunately, these records were not georeferenced, and georeferencing them presented numerous challenges due to base closures, and lack of detailed maps to the sites mentioned in collection data. In early 2010, 294 files, comprising individual base files and yearly files from 1947-51 through to 2009 were provided to the author upon request, by Mr. Francis A. Maloney of CHPPM-West.

In the absence of maps detailing individual sites I opted to summarize their location by defining the centroid of the base and estimating uncertaintly using published information about the area of the base. I was assisted in this early stage by Victoria Adeboye. Sources of information that were particularly useful were internet lists of bases (e.g. http://en.wikipedia.org/wiki/List_of_United_States_military_bases) that linked to information including coordinates. For estimating spatial uncertainty, we used data on area that was publically available on the internet, and The Base Structure Report (BSR) FY 2007 Baseline, by the Office of the Deputy Undersecretary of Defense (Installations and Environment) Washington DC (http://www.acq.osd.mil/ie/download/bsr/BSR_FY2007_Baseline.pdf). When area was given in acres, uncertainty in meters was calculated using the formula =SQRT((D2*4046.85642)/3.1416). Batch georeferences for street addresses were obtained using GPS Visualizer (http://www.gpsvisualizer.com/geocoder/), with Google set as the source. Input data were first edited to remove non-essential information, and arranged in a standard order to minimize geocoding errors. Output addresses were checked against input to identify discrepancies, and results that had a low precision level (e.g. to Street or City) were flagged for further checking. Discrepancies were usually resolved through a combination of Internet searches for key terms and orientation with Google Earth. Use of the historical imagery, altitude, distance along a path, street view, and the link to Google Maps in Google Earth, were found to be particularly useful for resolving problematic collection sites. When georeferences could not be resolved to street level, the town of the collection was georeferenced using Biogeomancer 1.2.4. (http://bg.berkeley.edu/latest/). A small minority of records had georeference information, either in geodetic or MGRS. The MGRS location information was incomplete, so approximate location as determined in Google Earth was used to obtain a first approximation in Geotrans, then the northing and easting information input to obtain the precise decimal degrees georeference. In most cases these coordinates were checked in Google Earth to see if the location corresponded with any text information that was recorded for the collection site. Uncertainty was estimated in the Manis Georeferencing Calculator, from Biogeomancer, or estimated by visual assessment of the extent in Google Earth, or as the radius of a circle described by the area.

I composited yearly files into one Excel sheet, with a sequence number added to recreate the original order. A new column was added with species_Genus_subgenus_author taken from the Mosquitomap collection form. The mosquito species recorded were checked for current taxonomic status (care of Jim Pecor). The geolocations of records were checked in DIVA-GIS for agreement with the Country and State of occurrence. The location of each species was mapped and this checked against known records for these species (care of Jim Pecor). Records (n=18269) with trap catches labeled “Not operated”, “Negative”, “Misc. Culicidae”, “Aedini”, and one record labeled “Ochlerotatus atlanticus/tormentor” were placed in a separate Excel sheet. Those records (n=267) that could not be georeferenced, and anomalous records (n=15), the distribution of which did not agree with established knowledge, were also separated. This left n=100610 georeferenced and quality controlled records. Data fields were rendered into the MosquitoMap format, and records were uploaded into MosquitoMap in Nov 2010.

By Desmond Foley, Nov 2010

Return to top
DoD-GEIS   SI   WRAIR    wrbu    UNHM         Copyright 2014 | Smithsonian Institution | Privacy | Terms of use