Mining and Learning in the Legal Domain

International Workshop on Mining and Learning in the Legal Domain (MLLD-2021)

In conjunction with the 21st IEEE International Conference on Data Mining, December 7-10, 2021, Auckland, New Zealand

The increasing accessibility of large legal corpora and databases create opportunities to develop data driven techniques as well as more advanced tools that can facilitate multiple tasks of researchers and practitioners in the legal domain. While recent advancements in the areas of data mining and machine learning have gained many applications in domains such as biomedical, healthcare and finance, there is still a noticeable gap in how much the state-of-the-art techniques are being incorporated in the legal domain. Achieving this goal entails building a multi-disciplinary community that can benefit from the competencies of both law and computer science experts. The goal of this workshop is to bring the researchers and practitioners of both disciplines together and provide an opportunity to share the latest novel research findings and innovative approaches in employing data analytics and machine learning in the legal domain.


Following the success of the 1st MLLD workshop (MLLD 2020), the 2nd workshop on Mining and Learning in the Legal Domain (MLLD 2021) discusses a broad variety of topics in various aspects of analyzing legal data such as Legislations, litigations, court cases, contracts, patents, Non-Disclosure Agreements (NDAs) and Bylaws. We encourage submissions on novel mining and learning based solutions in:

  • Applications of data mining techniques in the legal domain

    • case outcome prediction

    • classifying, clustering and identifying anomalies in big corpora of legal records

    • legal analytics

    • citation analysis for case law

    • eDiscovery

  • Applications of natural language processing and machine learning techniques for legal textual data

    • information extraction and entity extraction/resolution for legal document reviews

    • information retrieval and question answering in applications such as identifying relevant case law

    • summarization of legal documents

    • legal language modelling and legal document embedding and representation

    • recommender systems for legal applications

    • topic modelling in large amounts of legal documents

    • harnessing of deep learning approaches

  • Ethical issues in mining legal data

    • privacy and GDPR in legal analytics

    • bias in the applications of data mining

    • transparency in legal data mining

  • Training data for legal domain

    • acquisition, representation, indexing, storage, and management of legal data

    • automatic annotation and learning with human in the loop

    • data augmentation techniques for legal data

    • semi-supervised learning, domain adaptation, distant supervision and transfer learning

  • Emerging topics in the intersection of data mining and law

    • digital lawyers and legal machines

    • smart contracts

    • future of law practice in the age of AI


You are invited to submit your original research and application papers to the workshop. As per ICDM instructions, papers are limited to a maximum of 8 pages, and must follow the IEEE ICDM format requirements. All accepted workshop papers will be published in the formal proceedings by the IEEE Computer Society Press. Each paper is reviewed by at least 3 reviewers from the program committee. Paper review is triple-blind. Manuscripts are to be submitted through CyberChair. Please forward your questions to the organizing committee.

Thomson Reuters Labs Best Paper Award

Thomson Reuters Labs will generously provide a total of $1000 USD to the best paper(s) submitted (one $1000 award or two $500 awards). The successful paper(s) must have at least one student author, and a student must be cited as the first author. The best paper recipient(s) will be selected by the program committee.

Thomson Reuters Labs is hiring! TR Labs is looking for experienced candidates across research, data science, engineering and more, in Toronto, Bangalore, Zurich, London, and Minneapolis St. Paul. Learn more about these opportunities here.

Important Dates

  • Paper submission due date: September 6, 2021

  • Notification of acceptance: September 24, 2021

  • Camera ready submission: October 1, 2021

  • MLLD Panel on "The Future of AI and Law": November 22, 2021 at 9-10:15am Eastern Standard Time (EST)

  • MLLD -2021 Workshop: December 6, 2021 at 8-11:30pm Eastern Standard Time (EST)

December 7, 2021 at 2-5:30pm New Zealand Standard Time (NZST)

Program Committee

  • Wolfgang Alschner, University of Ottawa, Canada

  • Kevin Ashley, University of Pittsburgh, USA

  • Karl Branting, MITRE Corporation, USA

  • Jack Conrad, Thomson Reuters Labs, USA

  • Anna Farzindar, University of Southern California, USA

  • Randy Goebel, University of Alberta, Canada

  • Diana Inkpen, University of Ottawa, Canada

  • Daniel Martin Katz, Illinois Tech - Chicago Kent College of Law, USA

  • Sourav Mukherjee, Fairleigh Dickinson University, Canada

  • Isabelle Moulinier, Thomson Reuters Labs, USA

  • Aileen Nielsen, ETH Zurich, Switzerland

  • Adam Roegiest, Kira Systems, Canada

  • Ken Satoh, National Institute of Informatics, Japan

  • Jaromír Šavelka, University of Pittsburgh, USA

  • Frank Schilder, Thomson Reuters Labs, USA

  • Vasilis Tsolis, Cognitiv+, UK

  • Hannes Westermann, Université de Montréal

  • Adam Wyner, Swansea University, UK

  • Farhana Zulkernine, Queens University, Canada

Accepted Papers

  • "Detection of Similar Legal Cases on Personal Injury", Jason Lam, Yuhao Chen, Farhana Zulkernine, and Samuel Dahan [Best Paper Award Winner]

  • "Simplify Your Law: Using Information Theory to Deduplicate Legal Documents", Corinna Coupette, Jyotsna Singh, and Holger Spamann

  • "Legal Entity Extraction using a Pointer Generator Network", Stavroula Skylaki, Ali Oskooei, Omar Bari, Nadja Herger, and Zac Kriegman

  • "Determining Standard Occupational Classification Codes from Job Descriptions in Immigration Petitions", Sourav Mukherjee, David Widmark, Vince DiMascio, and Tim Oates

Workshop Program

The workshop will be held on Dec 6th at 8-11:30 pm Eastern Time. The session is open to the public through

Meeting ID: 975 7275 6906

Passcode: gZ3Bn_GdJb

Keynote Speaker

It is with great honor to announce that Prof. Sharad Goel of Harvard University will be giving the keynote talk at 2nd MLLD workshop.

Sharad Goel is a Professor of Public Policy at the Harvard Kennedy School. He looks at public policy through the lens of computer science, bringing a computational perspective to a diverse range of contemporary social and political issues, including criminal justice reform, democratic governance, and the equitable design of algorithms. Prior to joining Harvard, Sharad was on the faculty at Stanford University, with appointments in Management Science & Engineering, Computer Science, Sociology, and the Law School. He holds a BS in mathematics from the University of Chicago, as well as an MS in Computer Science and a PhD in Applied Mathematics from Cornell University.

Title: Designing Equitable Algorithms for Criminal Justice and Beyond

Abstract: Machine learning methods are increasingly used to model risk in criminal justice, banking, healthcare, and other high-stakes domains. These new tools promise gains in accuracy, but also raise challenging statistical, legal, and ethical questions. In this talk, I’ll describe the dominant axiomatic approach to fairness in machine learning, and argue that common mathematical definitions of fairness can, perversely, lead to discriminatory outcomes in practice. I’ll then present an alternative, consequentialist perspective for designing equitable algorithms that foregrounds the inherent tension between competing concerns in many real-world problems.


MLLD will host a panel on The Future of AI and Law with a lineup of experienced industry practitioners, governments personnel, and academics. The panelists will discuss topics such as emerging use cases of AI in law, issues relevant to adoption of AI, responsible AI, etc. Here is the full list of our panelists.

Rawia Ashraf, J.D.

Vice President, Legal Practice and Productivity at Thomson Reuters

Samuel Dahan, Ph.D.

Director and Professor of Law at Conflict Analytics Lab., Queens University

Data and Time

The panel session will be held on Nov 22nd at 9-10:15 am Eastern Time and is open to the public via this link. The passcode for the meeting is 2021. A recorded version of the panel session will be shared with ICDM attendees.

Recorded Session

A recorded version of the panel session is available here.

Panel Topics

Responsible AI:

    • Access of AI capabilities in justice system

    • Ethics, bias, and fairness

    • Interpretability and explainability, and the trade-off between explainability and IP rights. Can we have both?

    • Privacy and GDPR

Emerging use cases: What are the most probable use cases of AI/ML in law over the next decade? Examples:

    • Legal language simplification: Creating AI models (e.g., GPT-3 to simplify the legal language for users)

    • Meaningful summarization of legal text

    • The role of data scientists in contract life-cycle management

    • Digital lawyers

Human in the loop: How will humans be involved in the future AI-enabled decision makers? Will future AI solutions be fully automated with zero human supervision, or will humans have the final say in a decision assisted by AI?

    • After proper human evaluations, AI systems can make decisions without human interaction/supervision,

    • Humans review AI decisions and accept or reject them,

    • Humans will be the primary decision makers, but they will utilize the AI systems for assistance (AI-augmented decision making),

Adoption of AI: What will convince/force law firms to adopt AI in their practice? What are the incentives and deterrents?

    • The promise of AI: Is AI a hype or are AI-enabled solutions performing as promised?

    • Where have we seen success of AI in LegalTech

    • What will it take lawyers to adopt AI? How can we gain their trust?

    • AI cost savings opportunities vs cost of adoption of AI,

    • Customer demand,

Regulating body: Who should be the decision makers as to what AI-enabled solutions would be allowed to make decisions that impact people?

    • Governments: may be slow and have historically shown lack of expertise,

    • Big tech: no incentive,

    • A new advisory/consultancy board comprised of academics, government, and big tech

Workshop Organizers