UserNLP: User-centered Natural Language Processing Workshop

The UserNLP workshop will be hosted by the Web Conference. The EasyChair submission link is here.

Important Dates

  • February 3rd - Submissions Due
  • March 3rd - Notification of Acceptance
  • April 25th or 26th - Workshop Date (online)

Motivation and Goals

Natural Language Processing (NLP) models are vital for analyzing, retrieving, and summarizing the vast amounts of digital information produced every day. However, models trained as "one-size-fits-all" do not explicitly consider language and interpretation diversity among individuals or groups of individuals. Such user-level variation (delineated by, e.g., demographics, culture, user interests) can cause stylistic and even semantic disparities, which decrease dialogue coherence, harm fairness, and reduce model robustness. User-centered NLP can fill these gaps by explicitly taking these variations into account and focusing on user-level modeling tasks. Including user-level elements into NLP models, for example inferring user preferences, beliefs, and behavior in general, has an increasing impact on a broad range of downstream applications, including text understanding and generation, conversational information retrieval, and mental health care.

The UserNLP Workshop will provide a unique chance for researchers to collect and aggregate current development, challenges, and opportunities across a multitude of related fields. This includes:
  • the privacy and fairness concerns in user data collection and annotation
  • opportunities and stereotyping risks in modeling
  • and challenges of personalized evaluation
We aim to create a platform where researchers can present rising challenges in building user-centered NLP models, going beyond modeling individual documents, and exploring various ways to capture stylistic variations and enhance model personalization.

Since annotating user-level data requires making non-trivial judgements based on a collection of documents, the complexity of user information can prevent researchers from developing precise annotations on user-level corpora. While we have data statement discussions and schema for document-level annotations, few studies have explored data schema and standards on the user-level.

Moreover, conventional evaluation metrics and schemes for the document-level models may not be appropriate to capture the diversity and complexity of user-level information. The lack of ethically appropriate, standardized, and easily accessible evaluation data and metrics is perhaps the major hindrance to the development of this field, impeding the reproducibility of experimental results.

Finally, user-centered NLP raises a wide range of important ethical questions, such as algorithmic fairness and user privacy. An informed discussion on these timely topics requires gathering at one table researchers who encounter stylistic disparities and user-level tasks directly or indirectly in their work.

While an increasing number of NLP research works develop user-level models and datasets, there is no de-facto venue to bring this topic to interdisciplinary researchers and communities. Some of the NLP workshops, such as those focusing on social media or dialog systems, have recently raised awareness of user-related issues. Our proposed workshop will provide a platform beyond those venues by focusing solely on the emerging challenges of user-level models and applications, such as: privacy and fairness, human-level modeling, and personalization for interdisciplinary communities. Furthermore, our venue can bring a unique chance to present user-involved NLP topics, such as interactive and human-in-the-loop NLP.

The goals of our proposed workshop squarely align with the Web Conference’s mission of developing technologies, practices and applications to create a Web ecosystem that is efficient, trustworthy, safe, open and inclusive for everyone. Of particular interest, are the topics of user modeling and personalization which have been crucial to the success of the web technologies such as search and recommendation systems. As we deploy systems that perform inferences over personal data, it is critical to ensure that these models can accurately represent individuals and account for their uniqueness. Better user models and representations can directly contribute to improvements in areas that are relevant for the Web Conference community such as methods for Social Network Analysis, and applications for Computational Social Sciences, and Digital Epidemiology, to name a few. However, this also raises important questions related to the ethics, fairness, privacy, transparency, and accountability of Web technologies, which will be of interest to participants of the Web4Good: FATES workshop.

Call for Papers

The overarching questions that motivate this workshop are:
  1. To what extent do stylistic variations indirectly impact downstream applications which were historically treated as stylistically uniform?
  2. To what extent is it desirable to exploit individual variations to reduce demographic disparity, promote user-level models and personalize NLP applications?
  3. To what extent recent advances in related areas including representation learning, domain adaptation and transfer learning can leverage individual variations, understand user intentions, customize NLP models, and deliver interpretable outputs for users’ specific needs?
  4. How to better evaluate user-centric models to shed insight on user-level disparities and the impact of personalized models?
  5. How to better achieve privacy-preserving user centric NLP models when it comes to a wide range of personalization and user level tasks?
  6. How much user data is sufficient for system performance?
A non-exhaustive list of proposed topics and applications of interest follows. Suggested topics include:
  • Effects of stylistic variation on downstream tasks
  • User-level distributional vector models
  • Personalization and user-aware natural language generation
  • Fairness and ethics in user-level tasks
  • User modeling and user behavior analysis
  • Effective approaches to evaluate user-level models
  • Interactive and personalized information retrieval
  • Challenges in user privacy and private user-centered models
Potential applications include:
  • User sociodemographic inference applications, together with their issues and risks
  • Personalized text generation
  • User modeling for health applications (e.g. mental health, preventive care)
  • Identifying trustworthiness and deception of users
  • Rhetoric and personalization (e.g. stylistic choices in political speeches, etc.)
Submission Guidelines:
  • Full research papers (up to 8 pages for main content)
  • Short research papers (up to 4 pages for main content)
  • Vision/Position papers (up to 4 pages for main content)
The workshop calls for full research papers (up to 8 pages + 2 pages of appendices + 2 pages of references), describing original work on the listed topics, and short papers (up to 4 pages + 2 pages of appendices + 2 pages of references), on early research results, new results on previously published works, demos, and projects. In accordance with Open Science principles, research papers may also be in the form of data papers and software papers (short or long papers). The former present the motivation and methodology behind the creation of data sets that are of value to the community; e.g., annotated corpora, benchmark collections, training sets. The latter presents software functionality, its value for the community, and its application to a non-specialist reader. To enable reproducibility and peer-review, authors will be requested to share the DOIs of the data sets and the software products described in the articles and thoroughly describe their construction and reuse.

The workshop will also call for vision/position papers (up to 4 pages + 2 pages of appendices + 2 pages of references) providing insights towards new or emerging areas, innovative or risky approaches, or emerging applications that will require extensions to the state of the art. These do not have to include results already, but should carefully elaborate on the motivation and the ongoing challenges of the described area.

Submissions for review must be anonymous and in PDF format and must adhere to the ACM template and format. Submissions that do not follow these guidelines, or do not view or print properly, may be rejected without review.

The proceedings of the workshops will be published jointly with The Web Conference 2022 proceedings.

Submit your contributions following the link: https://easychair.org/cfp/UserNLP_2022

The deadline for submission is 11:59pm GMT -12 on Feb, 3rd, 2022.

Reviewing Procedure:

Submissions will be peer reviewed in the double-blind format and evaluated on relevance to the community. The presentation format (talk or poster) will be decided based on scientific merit and potential interest to a broad audience.

Multiple-Submission Policy:

Papers to appear in the workshop proceedings have to contain a creative and original work, which was not submitted elsewhere. You are allowed to submit an already submitted paper in case you are only interested in a non-archival presentation of your work. This, however, has to be clearly indicated at submission time.

Organizers:

Something maybe about who to contact or a contact email that gets forwarded to all of us? If you would like to contact the team please email user-centered-nlp@googlegroups.com.
Xiaolei Huang (xiaolei.huang@memphis.edu) is an Assistant Professor at the University of Memphis. He received his Ph.D. degree in Information Science at University of Colorado Boulder. His research interest is in natural language processing, deep learning, and user modeling. He focuses on developing transfer learning methodologies to enhance model robustness and personalization. His research has broad applications in public health including suicide prevention, alcoholism diagnosis, and vaccination surveillance.
Lucie Flek (lucie.flek@uni-marburg.de) is an Associate Professor at the Philipps-Universität Marburg, leading the research group on Conversational AI and Social Analytics (CAISA). Lucie's interests lie in the area of user representation learning for social NLP applications and for dialog systems. In her previous academic work, e.g. at the University of Pennsylvania and UCL, she has been focusing on psychological and social insights into stylistic variation. She has been serving as Area Chair for Computational Social Sciences at multiple ACL/NAACL/EMNLP conferences, as a review editor in several NLP-oriented journals, and as a workshop chair at ECIR 2022. In the past, she co-organized the workshops on Stylistic Variation (NAACL) and Widening NLP.
Silvio Amir (s.amir@northeastern.edu) is an assistant professor in the Khoury College of Computer Sciences and a core faculty member of the Institute for Experiential AI and the NULab for Texts, Maps, and Networks at Northeastern University. His research develops methods for tasks involving subjective, personalized or user-level inferences (e.g. opinion mining and digital phenotyping) and aims to improve the reliability, interpretability and fairness of predictive models and analytics derived from personal and user generated data.
Diyi Yang (diyi.yang@cc.gatech.edu) is an assistant professor at the School of Interactive Computing, Georgia Tech. Her research focuses on computational social science, user centric language generation, and learning with limited and noisy text data. She co-organized the Widening NLP (WiNLP) workshops at NAACL 2018 and ACL 2019, and has served as area chairs for NAACL, EMNLP and ACL.
Charles Welch (welchc@staff.uni-marburg.de) is a postdoctoral researcher at the University of Marburg in Germany. He recently received his PhD from the University of Michigan where he studied personalization in the context of language modeling, dialog systems, and word embeddings. He is working on controllable generation and how to apply language models for mental health and misinformation applications.
Ramit Sawhney (ramitsawhney@sharechat.co) is a lead AI scientist at ShareChat AI, India and also a research collaborator at the University of Marburg, Georgia Institute of Technology, and the University of Southern Carolina. His research focuses on NLP-based user representation learning, ethical and privacy oriented aspects of user generated text modeling for mental health, and computational social science. On the industrial front at ShareChat AI, Ramit leads the user personalization team for generating recommendations for users on the ShareChat and Moj, Indian apps for over 100 million users per day. Ramit has organized the NLP for Social Good workshops at IEEE BigMM in the past, and has served as a program committee member for ACL, EMNLP, NAACL along with AI conferences such as AAAI, IJCAI, UAI, AISTATS, WWW, WSDM, etc.
Franck Dernoncourt (dernonco@adobe.com) is an NLP researcher at Adobe Research at San Jose. He received his PhD in machine learning from MIT, co-authored over 100 peer-reviewed research publications, filed over 50 patents and received 3 best paper awards. His research interests include neural networks and natural language processing.


Logistics

The goals and themes of this workshop stand at the intersection of various disciplines (e.g. NLP, HCI, algorithmic fairness, computational social-sciences) and thus we anticipate that it will be of interest to researchers and practitioners from different communities. We believe that the best format for this workshop is a full-day mini-conference including: oral presentations of selected papers, two gathertown poster sessions, 3-4 keynotes and a virtual panel discussion. Below is our proposed schedule including the already confirmed keynote speakers. Our aim is to distribute the program across participants' time zones in order to support truly global discussions on this topic.

Tentative Schedule

Schedule details are coming soon but will include keynotes from:
  • Prof. Khalid Al-Khatib (University of Groeningen)
  • Dr. Shiran Dudy (University of Colorado, Boulder)
  • Dr. Shereen Oraby (Amazon Alexa)
  • Prof. Maarten Sap (Carnegie Mellon University)


Program Committee

  • Barbara Plank, IT University of Copenhagen, Denmark
  • Shiran Dudy, University of Colorado, USA
  • Steven R. Wilson, Oakland University, USA
  • Maarteen Sap, CMU, USA
  • Shereen Oraby, Amazon Alexa, USA
  • Zeerak Talat, University of Sheffield, UK
  • Martin Potthast, Leipzig University, Germany
  • Silviu Oprea, University of Edinburgh, UK
  • Flora Sakketou, University of Marburg, Germany
  • Paolo Rosso, Politecnica de Valencia, Spain
  • Federico Bianchi, University of Bocconi, Italy
  • Allison Lahnala, University of Marburg, Germany
  • Daniel Preotiuc-Pietro, Bloomberg, USA
  • Rajiv Ratn Shah, IIIT Delhi, India
  • Sven Buechel, Jena University, Germany
  • Joan Plepi, University of Marburg, Germany
  • Hye Sun Yun, Northeastern University, USA
  • Michael Manzon, Northeastern University, USA
  • Shijia Liu, Northeastern University, USA
  • Monica Munnangi, Northeastern University, USA
  • Yuexin Wu, University of Memphis, USA
  • Md Muminul Hossain, University of Memphis, USA