index.html

---
layout: default
---

<div class="header-container jumbotron">
    <div class="container">
        <h1>CDL Misinfo Datasets</h1>
        <p>A repository for misinformation datasets and benchmarks for detection made by <a href="https://www.complexdatalab.com/">Complex Data Lab </a></p>
    </div>
</div>

<div class="container">
    <div class="row">
        <div class="col-md-6">
            <h6 class="lead">Misinformation is a challenging societal issue, and mitigating solutions are difficult to
                create due to data deficiencies. To address this problem, we have curated a growing collection of
                (mis)information datasets in the literature. From these, we evaluated the quality of all of
                the 36 datasets that consist of statements or claims.
                If you would like to contribute a novel dataset or report any issues,
                please <a href="mailto:misinfodataset@googlegroups.com">email us</a>, visit our <a
                    href="https://huggingface.co/datasets/ComplexDataLab/Misinfo_Dataset">Hugging Face</a>, or <a
                    href="https://github.com/ComplexData-MILA/misinfo-datasets">GitHub</a>.</h6>
        </div>
        <div class="col-md-6 text-center">
            <img src="{{ '/assets/img/misinfo_logo.png' | relative_url }}" alt="Jekyll logo" class="img-responsive" style="width: 400px; height: auto;">
        </div>
    </div>
    <hr>
    <div class="row">
        <div class="col-sm-4">
            <h1 class="text-center"><i class="fa fa-cubes" aria-hidden="true"></i></h1>
            <!-- <h1 class="text-center"><img src="{{ "/assets/img/network.png" | relative_url }}" alt="Jekyll logo" class="img-responsive"></h1> -->
            <h3 class="text-center">A survey of multiple (mis)information datasets </h3>
            <h5>A curated collection of <a href="https://huggingface.co/datasets/ComplexDataLab/Misinfo_Datasets"> misinformation datasets</a>, and a
                unified setup to work with the claim and statement datasets, available <a
                    href="/docs/dataset_overview/">here.</a></h5>
        </div>
        <div class="col-sm-4">
            <h1 class="text-center"><i class="fa fa-cogs" aria-hidden="true"></i></h1>
            <h3 class="text-center">Dataset Quality Assessment</h3>
            <h5>We evaluated the quality of the datasets in the survey, identifying potential flaws such as insufficient label quality
                and spurious correlations. This helps researchers select datasets that are suitable for their work.</h5>
        </div>
        <div class="col-sm-4">
            <h1 class="text-center"><i class="fa fa-wrench" aria-hidden="true"></i></h1>
            <h3 class="text-center">Evaluation of Detection Models</h3>
            <h5><a href="https://arxiv.org/abs/2411.05060">Our paper</a> provides state-of-the-art baselines for misinformation detection models on
                these datasets, demonstrating the limitations of categorical labels and suggesting alternative
                evaluation methods.</h5>
        </div>
    </div>
    <hr>
    <!-- <blockquote>
        Camille et al (2024). 
        <strong>A Guide to Misinformation Detection Datasets</strong>. arXiv. 
        <a href="https://arxiv.org/abs/2409.00009" target="_blank">https://arxiv.org/abs/2409.00009</a>
    </blockquote> -->


</div>