loading...

Python ML - Similarity with spaCy

daviducolo profile image Davide Santangelo ・2 min read

Intro

I started learning ML and Data Science, and I want share with you a powerful python library for semantic text analysis called spaCy.

Description

Similarity is determined by comparing word vectors or “word embeddings”, multi-dimensional meaning representations of a word.

spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. We like to think of spaCy as the Ruby on Rails of Natural Language Processing.

spaCy is able to compare two objects, and make a prediction of how similar they are. Predicting similarity is useful for building recommendation systems or flagging duplicates. For example, you can suggest a user content that’s similar to what they’re currently looking at, or label a support ticket as a duplicate if it’s very similar to an already existing one.

Installation instructions

pip

Using pip, spaCy releases are available as source packages and binary wheels (as of v2.0.13).

# pip3 install spacy

Models

spaCy’s models can be installed as Python packages. This means that they’re a component of your application, just like any other module. They’re versioned and can be defined as a dependency in your requirements.txt. Models can be installed from a download URL or a local directory, manually or via pip. Their data can be located anywhere on your file system.

python3 -m spacy download en

spaCy currently provides support for the following languages. Here is a complete list.

Example

here is a little python code that compare two string.

import spacy

nlp = spacy.load("en_core_web_sm")

first_text = nlp(input("insert first text: "))
second_text = nlp(input("insert second text: "))

print(f"similarity: {first_text.similarity(second_text)}")

result

insert first text: i'm a software developer
insert second text: i'm a software web developer

similarity: 0.9302790237853475

Enjoy!!

Posted on Feb 13 by:

daviducolo profile

Davide Santangelo

@daviducolo

developer - dad. APIs specialist, In love with #ruby, #rails and #python - developer at @getchorally

Discussion

markdown guide