About 1,780,000 results
Open links in new tab
  1. splink · PyPI

    Mar 16, 2020 · Splink is a Python package for probabilistic record linkage (entity resolution) that allows you to deduplicate and link records from datasets that lack unique identifiers.

  2. Splink - GitHub Pages

    Splink is a Python package for probabilistic record linkage (entity resolution) that allows you to deduplicate and link records from datasets without unique identifiers.

  3. moj-analytical-services/splink - GitHub

    Splink is a Python package for probabilistic record linkage (entity resolution) that allows you to deduplicate and link records from datasets that lack unique identifiers.

  4. Splink: Fast, accurate and scalable record linkage

    Sep 23, 2022 · Splink is a free Python package that can be installed in the usual way - using ‘pip install splink’. We recommend users start by looking at our online tutorial, which is part of our main...

  5. Super-fast deduplication of large datasets using Splink and DuckDB

    Jan 18, 2024 · Splink is a free, open source Python library to address this problem. It's designed for use on very large datasets, so speed is imperative. It uses DuckDB as its default backend to achieve fast …

  6. May 29, 2024 · Splink is a Python package for probabilistic record linkage (entity resolution) that allows you to deduplicate and link records from datasets that lack unique identifiers.

  7. Getting Started - Splink - GitHub Pages

    To get a basic Splink model up and running, use the following code. It demonstrates how to: Use clustering to generate an estimated unique person ID. If you're using an LLM to suggest Splink code, …

  8. Deduplicating and linking large datasets using Splink

    Nov 22, 2023 · The result is Splink – which is a Python package that implements the Fellegi-Sunter model, and enables parameters to be estimated using the Expectation Maximisation algorithm.

  9. Splink 3: Fast, accurate and scalable linkage in Python

    Aug 5, 2022 · Splink 3 now offers support for Python and AWS Athena backends, in addition to Spark. It's now easier to use, faster and more flexible, and can be used for close to real time linkage.

  10. moj-analytical-services/splink_demos - GitHub

    This repo contains interactive notebooks containing demonstration and tutorial for version 3 of the Splink record linking library, the homepage for which is here. You can run these notebooks in an interactive …