top of page

Words to Vectors

Manually tagging millions of rows of transactions with their correct merchant group is extremely time consuming and costly, which is why we train models to do this job for us but how does a computer understand text? Enter the world of word embedding, which revolves around representing words as vectors.


Table shows 'petrol' in two different transaction string patterns

Most word embedding techniques rely on a core assumption – words that appear in the same contexts share semantic meaning. In this case we’re less concerned with the semantic meaning and more with how the word is spelt, nevertheless the same reasoning applies. One of these techniques is called bag of words, which represents a text as a set of vocabulary.


To visualise these vectors further, we can apply dimensionality reduction techniques such as T-SNE to summarise these vectors into 3 dimensions. Here we focus on representing asda, tesco, amazon, link/ATM and uber.





We can see in this visualisation that similar sets of text are clustered closely together, thus allowing us to calculate distances and classify them into their corresponding group.

Comments


Award-Winning European Transaction Data Suppliers & Buyers
HFM EU Quant Services Awards 2021_CustomWinnerLogos_Ls22 (002).jpg

©2022 Fable Data All Rights Reserved

  • LinkedIn
  • Twitter

Subscribe for updates

Sign up to receive notifications of our latest reports and posts.

Upscale-1.png

Fable Data Ltd
5-7 Playhouse Court, 62 Southwark Bridge Road,

London, SE1 0AT

Registered No: ​11108980
Company registered in England & Wales

bottom of page