Publication:
You’ve Got Email: A Workflow Management Extraction System

Loading...
Thumbnail Image
Official URL
Full text at PDC
Publication Date
2017
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Facultad de Ciencias Económicas y Empresariales. Instituto Complutense de Análisis Económico (ICAE)
Citations
Google Scholar
Research Projects
Organizational Units
Journal Issue
Abstract
Email is one of the most powerful tools for communication. Many businesses use email as the main channel for communication, so it is possible that substantial data are included in email content. In order to help businesses grow faster, a workflow management system may be required. The data gathered from email content might be a robust source for a workflow management system. This research proposes an email extraction system to extract data from any incoming emails into suitable database fields. The database, which is created by the program, has been planned for the implementation of a workflow management system. The research is presented in three phases: (1) define suitable criteria to extract data; (2) implement a program to extract data, and store them in a database; and (3) implement a program for validating data in a database. Four criteria are applied for an email extraction system. The first criterion is to select contact information at the end of the email content; the second criterion is to select specified keywords, such as tel, email, and mobile; the third criterion is to select unique names, which start with a capital letter, such as the names of people, places, and corporates; the fourth criterion is to select special texts, such as Co. Ltd. com, and www. The empirical results suggest that when all four criteria are considered, the accuracy of a program and percentage of blank fields are at an acceptable level compared with the results from other criteria. When four criteria are applied to extract 7,340 emails in English, the accuracy of this experiment is approximately 68.66%, while the percentage of blank fields in a database is approximately 68.05. The database created by the experiment can be applied in a workflow management system.
Description
Keywords
Citation
[1] Carenini, G., Raymond, T., and Xiaodong, Z. (2007), Summarizing email conversations with clue words, Proceedings of the 16th International Conference on World Wide Web, May 2007, pp. 91-100. [2] Muresan, S., Tzoukermann, E., and Klavans, J. (2001), Combining linguistic and machine learning techniques for email summarization, Proceedings of the 2001 Workshop on Computational Natural Language Learning, Volume 7, July 2001, pp. 1-8. [3] Tzoukermann, E., Muresan, S., and Klavans, J. (2001), GIST-IT: Summarizing email using linguistic knowledge and machine learning, Proceedings of the Workshop on Human Language Technology and Knowledge Management, July 2001, pp. 1-8. [4] Hailpern, J., Asur, S., and Rector, K. (2014), AttachMate: Highlight extraction from email attachments, Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, October 2014, pp. 107-116. [5] Carenini, G. and Murray, G. (2012), Methods for mining and summarizing text conversations, Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, August 2012, pp. 1178-1179. [6] Nomoto, T. and Matsumoto, Y. (2001), A new approach to unsupervised text summarization, Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 2001, pp. 26-34. [7] Bekkerman, R., El-Yaniv, R., Tishby, N., and Winter, Y. (2003), Distributional word clusters vs. words for text categorization, Journal of Machine Learning Research Archive, Volume 3, March 2003, 1183-1208. [8] Chrupała, G. (2012), Hierarchical clustering of word class distributions, Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure, June 2012, pp. 100-104. [9] Baker, D. and McCallum, A. (1998), Distributional clustering of words for text classification, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 1998, pp. 96-103. [10] Shunyao, W., Jinlong, W., Huy, V., and Gang, L. (2010), Text clustering with important words using normalization, Proceedings of the 10th Annual Joint Conference on Digital Libraries, June 2010, pp. 393-394. [11] Hui, H., Manavoglu, E., Giles, C., and Hongyuan, Z. (2003), Rule-based word clustering for text classification, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 2003, pp. 445-446.