To the Editor: PubMed “comprises over 20 million citations for biomedical literature from MEDLINE, life science journals, and online books,” a database that can be queried using the Entrez search engine.
1
Since January 1996, e-mail addresses for first authors, when available, have been added to the MEDLINE record as they appear in the journals.2
Search and retrieval of PubMed results may occur through the PubMed Web page and also with use of several Entrez Programming Utilities that “provide access to Entrez data outside of the regular web query interface.”- US Department of Health and Human Services (DHS)
The NLM Technical Bulletin. No 287.
ftp://nlmpubs.nlm.nih.gov/nlminfo/newsletters/techbull/pdf_tb/novdec95.pdf
Date: November-December; 1995
3
With regard to these latter tools, corresponding documentation and an educational course with examples are available to the public online.4
Electronic spam can be defined as unsolicited e-mail sent to a large number of addresses. Individuals sending spam harvest e-mail addresses from the Internet using a variety of techniques, including automated use of software to search Web pages for strings of text recognized as e-mail addresses, as well as manual efforts to gain access to large collections of addresses (eg, by subscribing to mailing lists to collect the addresses of other users). Techniques for avoiding spam are many and include avoiding the online publication of e-mail addresses in text form (as opposed to providing an image of the address) and preventing those with malicious intent from accessing large sources of addresses.PubMed is extremely vulnerable to e-mail address harvesting. When available, e-mail addresses for first authors are included within citations in text form, making them easily retrieved by software in an automated fashion. However, more concerning is the ability to quickly generate listings containing thousands of e-mail addresses using the Entrez Programming Utilities. With regard to this latter vulnerability, having only basic computer programming knowledge, within 30 minutes of discovering the aforementioned utilities, I was able to generate a listing of more than 7000 addresses. Therefore, clearly more responsible handling of e-mail addresses in PubMed is needed, which may be accomplished by eliminating publication of e-mail addresses in text form and restricting the return of e-mail addresses when results are fetched outside of the regular Web query interface.
REFERENCES
- PubMed Help: FAQs.(Accessed March 3, 2011.)
- The NLM Technical Bulletin. No 287.ftp://nlmpubs.nlm.nih.gov/nlminfo/newsletters/techbull/pdf_tb/novdec95.pdf(Accessed Feburary 8, 2011.)Date: November-December; 1995
- Entrez programming utilities. NCBI Website.(Updated February 16, 2009. Accessed February 8, 2011.)
- Building customized data pipelines using the Entrez Programming Utilities (eUtils).(Accessed February 8, 2011.)
- Merriam-Webster Online Web site.([1] Accessed February 8, 2011.)
Article Info
Identification
Copyright
© 2011 Mayo Foundation for Medical Education and Research. Published by Elsevier Inc. All rights reserved.