Note: This article is based on another written by Jean Villedieu on Neo4J: https://graphgist-portal.neo4j.org/graph_gists/credit-card-fraud-detection
Fraud detection is the process of identifying and preventing fraudulent activities in various industries, such as finance, e-commerce, and healthcare. It involves using techniques such as data analysis, machine learning, and artificial intelligence to detect patterns that indicate fraudulent behavior. Fraud can come in many forms, including identity theft, credit card fraud, and insurance fraud, among others. Detecting and preventing fraud is crucial for businesses and organizations to maintain financial stability, protect their reputation, and ensure the safety and security of their customers.
On this post, we'll cover a typical situation for detecting credit card theft with Apache AGE and visualize it with AGE Viewer.
Crime Scheme
Criminals can steal credit card information in a number of ways. The information could be taken in a major retailer's bulk data breach or the perpetrator can just be the cashier worker at the grocery store, where the victim's card is secretly swiped or secretly written down. Let's think of a case where a pair of criminals try to steal credit cards from stores. One would collect the credit card information, locate where the owner of the credit card lives and place orders online to be delivered at their victims' home address. The other criminal would then wait at the address to intercept the deliveries.
The Solution With Apache AGE
Graph databases may make it easier to track down credit card fraudsters. We can search for the common factor in fraud instances and identify the scam's starting point by depicting interactions with a graph.
Graphs can be used to visualize the credit card activities. Both client and vendor are elements in each transaction, which links these two. The properties of the transaction contains the timestamp and a state of it. The state of legitimate deals is "Undisputed" whils transactions that are fraudulent are "Disputed".
-- Create graph "FraudDetection"
SELECT create_graph('FraudDetection');
-- Create customers
SELECT * FROM cypher('FraudDetection', $$
CREATE (Paul:Person {id:'1', name:'Paul', gender:'man', age:'50'})
CREATE (Jean:Person {id:'2', name:'Jean', gender:'man', age:'48'})
CREATE (Dan:Person {id:'3', name:'Dan', gender:'man', age:'23'})
CREATE (Marc:Person {id:'4', name:'Marc', gender:'man', age:'30'})
CREATE (John:Person {id:'5', name:'John', gender:'man', age:'31'})
CREATE (Zoey:Person {id:'6', name:'Zoey', gender:'woman', age:'52'})
CREATE (Ava:Person {id:'7', name:'Ava', gender:'woman', age:'23'})
CREATE (Olivia:Person {id:'8', name:'Olivia', gender:'woman', age:'58'})
CREATE (Mia:Person {id:'9', name:'Mia', gender:'woman', age:'51'})
CREATE (Madison:Person {id:'10', name:'Madison', gender:'woman', age:'37'})
-- Create merchants
CREATE (Amazon:Merchant {id:'11', name:'Amazon', street:'2626 Wilkinson Court', address:'San Bernardino, CA 92410'})
CREATE (Abercrombie:Merchant {id:'12', name:'Abercrombie', street:'4355 Walnut Street', age:'San Bernardino, CA 92410'})
CREATE (Wallmart:Merchant {id:'13', name:'Wallmart', street:'2092 Larry Street', age:'San Bernardino, CA 92410'})
CREATE (MacDonalds:Merchant {id:'14', name:'MacDonalds', street:'1870 Caynor Circle', age:'San Bernardino, CA 92410'})
CREATE (American_Apparel:Merchant {id:'15', name:'American Apparel', street:'1381 Spruce Drive', age:'San Bernardino, CA 92410'})
CREATE (Just_Brew_It:Merchant {id:'16', name:'Just Brew It', street:'826 Anmoore Road', age:'San Bernardino, CA 92410'})
CREATE (Justice:Merchant {id:'17', name:'Justice', street:'1925 Spring Street', age:'San Bernardino, CA 92410'})
CREATE (Sears:Merchant {id:'18', name:'Sears', street:'4209 Elsie Drive', age:'San Bernardino, CA 92410'})
CREATE (Soccer_for_the_City:Merchant {id:'19', name:'Soccer for the City', street:'86 D Street', age:'San Bernardino, CA 92410'})
CREATE (Sprint:Merchant {id:'20', name:'Sprint', street:'945 Kinney Street', age:'San Bernardino, CA 92410'})
CREATE (Starbucks:Merchant {id:'21', name:'Starbucks', street:'3810 Apple Lane', age:'San Bernardino, CA 92410'})
CREATE (Subway:Merchant {id:'22', name:'Subway', street:'3778 Tenmile Road', age:'San Bernardino, CA 92410'})
CREATE (Apple_Store:Merchant {id:'23', name:'Apple Store', street:'349 Bel Meadow Drive', age:'Kansas City, MO 64105'})
CREATE (Urban_Outfitters:Merchant {id:'24', name:'Urban Outfitters', street:'99 Strother Street', age:'Kansas City, MO 64105'})
CREATE (RadioShack:Merchant {id:'25', name:'RadioShack', street:'3306 Douglas Dairy Road', age:'Kansas City, MO 64105'})
CREATE (Macys:Merchant {id:'26', name:'Macys', street:'2912 Nutter Street', age:'Kansas City, MO 64105'})
-- Create transaction history
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'986', time:'4/17/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'239', time:'5/15/2014', status:'Undisputed'}]->(Starbucks)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'475', time:'3/28/2014', status:'Undisputed'}]->(Sears)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'654', time:'3/20/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'196', time:'7/24/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'502', time:'4/9/2014', status:'Undisputed'}]->(Abercrombie)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'848', time:'5/29/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'802', time:'3/11/2014', status:'Undisputed'}]->(Amazon)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'203', time:'3/27/2014', status:'Undisputed'}]->(Subway)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'35', time:'1/23/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'605', time:'1/27/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'62', time:'9/17/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'141', time:'11/14/2014', status:'Undisputed'}]->(Amazon)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'134', time:'4/14/2014', status:'Undisputed'}]->(Amazon)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'336', time:'4/3/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'964', time:'3/22/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'430', time:'8/10/2014', status:'Undisputed'}]->(Sears)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'11', time:'9/4/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'545', time:'10/6/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'457', time:'10/15/2014', status:'Undisputed'}]->(Sprint)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'468', time:'7/29/2014', status:'Undisputed'}]->(Justice)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'768', time:'11/28/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'921', time:'3/12/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'740', time:'12/15/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'510', time:'11/27/2014', status:'Undisputed'}]->(Abercrombie)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'414', time:'1/20/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'721', time:'7/17/2014', status:'Undisputed'}]->(Amazon)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'353', time:'10/25/2014', status:'Undisputed'}]->(Subway)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'681', time:'12/28/2014', status:'Undisputed'}]->(Sears)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'87', time:'2/19/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'533', time:'8/6/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'723', time:'1/8/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'627', time:'5/20/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'74', time:'9/4/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'231', time:'7/12/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'924', time:'10/4/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'742', time:'8/12/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'276', time:'12/24/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'66', time:'4/16/2014', status:'Undisputed'}]->(Starbucks)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'467', time:'12/23/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'830', time:'3/13/2014', status:'Undisputed'}]->(Sears)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'240', time:'7/9/2014', status:'Undisputed'}]->(Amazon)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'164', time:'12/26/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'630', time:'10/6/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'19', time:'7/29/2014', status:'Undisputed'}]->(Abercrombie)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'352', time:'12/16/2014', status:'Undisputed'}]->(Subway)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'147', time:'8/3/2014', status:'Undisputed'}]->(Amazon)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'91', time:'6/29/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1021', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1732', time:'5/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1415', time:'4/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1849', time:'12/20/2014', status:'Disputed'}]->(Macys)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1914', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1424', time:'5/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1721', time:'4/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1003', time:'12/20/2014', status:'Disputed'}]->(Macys)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1149', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1152', time:'8/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1884', time:'8/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1790', time:'12/20/2014', status:'Disputed'}]->(Macys)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1925', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1374', time:'7/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1368', time:'7/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1816', time:'12/20/2014', status:'Disputed'}]->(Macys)
$$) AS (a agtype);
In AGE Viewer we can see how our graph is displayed:
SELECT * FROM cypher('FraudDetection', $$
MATCH (V)-[R]-(V2)
RETURN V,R,V2
$$) AS (V agtype, R agtype, V2 agtype);
Now, with the query bellow, we can retrieve the fraudulent transactions:
SELECT * FROM cypher('FraudDetection', $$
MATCH (victim :Person)-[r :HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
RETURN victim.name, merchant.name, r.amount, r.time
$$) AS (Customer agtype, Store agtype, Amount agtype, Transaction_Time agtype);
customer | store | amount | transaction_time
-----------+--------------------+--------+------------------
"Madison" | "Apple Store" | "1925" | "7/18/2014"
"Olivia" | "Apple Store" | "1149" | "7/18/2014"
"Marc" | "Apple Store" | "1914" | "7/18/2014"
"Paul" | "Apple Store" | "1021" | "7/18/2014"
"Madison" | "Urban Outfitters" | "1374" | "7/10/2014"
"Olivia" | "Urban Outfitters" | "1152" | "8/10/2014"
"Marc" | "Urban Outfitters" | "1424" | "5/10/2014"
"Paul" | "Urban Outfitters" | "1732" | "5/10/2014"
"Madison" | "RadioShack" | "1368" | "7/1/2014"
"Olivia" | "RadioShack" | "1884" | "8/1/2014"
"Marc" | "RadioShack" | "1721" | "4/1/2014"
"Paul" | "RadioShack" | "1415" | "4/1/2014"
"Madison" | "Macys" | "1816" | "12/20/2014"
"Olivia" | "Macys" | "1790" | "12/20/2014"
"Marc" | "Macys" | "1003" | "12/20/2014"
"Paul" | "Macys" | "1849" | "12/20/2014"
(16 rows)
Knowing which customers and stores are involved, we must now locate our criminal. In this case, we must consider both legitimate and illegitimate transactions due to how the criminals operate. So we need to find both these cases and also the transactions that were made before the theft.
SELECT * FROM cypher('FraudDetection', $$
MATCH (victim:Person)-[r:HAS_BOUGHT_AT]->(merchant)
MATCH (victim)-[r :HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
WITH victim, merchant, r,
toInteger(split(r.time, '/')[0]) + "-" +
toInteger(split(r.time, '/')[1]) + "-" +
toInteger(split(r.time, '/')[2]) as date_r
MATCH (victim)-[t:HAS_BOUGHT_AT]->(othermerchants)
WITH victim, merchant, othermerchants, r, t, date_r,
toInteger(split(t.time, '/')[0]) + "-" +
toInteger(split(t.time, '/')[1]) + "-" +
toInteger(split(t.time, '/')[2]) as date_t
WHERE t.status = "Undisputed" AND date_t < date_r
RETURN victim.name, othermerchants.name, t.amount, date_t
ORDER BY date_t DESC
$$) AS (customer_name agtype, store_name agtype, amount agtype, transaction_time agtype);
customer_name | store_name | amount | transaction_time
---------------+-----------------------+--------+------------------
"Olivia" | "Wallmart" | "231" | "7-12-2014"
"Olivia" | "Wallmart" | "231" | "7-12-2014"
"Olivia" | "Wallmart" | "231" | "7-12-2014"
"Madison" | "Wallmart" | "91" | "6-29-2014"
"Madison" | "Wallmart" | "91" | "6-29-2014"
"Madison" | "Wallmart" | "91" | "6-29-2014"
"Paul" | "Starbucks" | "239" | "5-15-2014"
"Marc" | "American Apparel" | "336" | "4-3-2014"
"Marc" | "American Apparel" | "336" | "4-3-2014"
"Paul" | "Just Brew It" | "986" | "4-17-2014"
"Paul" | "Just Brew It" | "986" | "4-17-2014"
"Marc" | "Amazon" | "134" | "4-14-2014"
"Marc" | "Amazon" | "134" | "4-14-2014"
"Paul" | "Sears" | "475" | "3-28-2014"
"Paul" | "Sears" | "475" | "3-28-2014"
"Paul" | "Sears" | "475" | "3-28-2014"
"Marc" | "Wallmart" | "964" | "3-22-2014"
"Marc" | "Wallmart" | "964" | "3-22-2014"
"Marc" | "Wallmart" | "964" | "3-22-2014"
"Paul" | "Wallmart" | "654" | "3-20-2014"
"Paul" | "Wallmart" | "654" | "3-20-2014"
"Paul" | "Wallmart" | "654" | "3-20-2014"
"Madison" | "Subway" | "352" | "12-16-2014"
"Madison" | "Subway" | "352" | "12-16-2014"
"Madison" | "Subway" | "352" | "12-16-2014"
"Madison" | "Subway" | "352" | "12-16-2014"
"Madison" | "MacDonalds" | "630" | "10-6-2014"
"Madison" | "MacDonalds" | "630" | "10-6-2014"
"Madison" | "MacDonalds" | "630" | "10-6-2014"
"Madison" | "MacDonalds" | "630" | "10-6-2014"
"Olivia" | "Soccer for the City" | "924" | "10-4-2014"
"Olivia" | "Soccer for the City" | "924" | "10-4-2014"
"Olivia" | "Soccer for the City" | "924" | "10-4-2014"
"Olivia" | "Soccer for the City" | "924" | "10-4-2014"
(34 rows)
Now it is necessary to change the Cypher query to arrange the prior findings by the number of times we've seen each merchant.
SELECT * FROM cypher('FraudDetection', $$
MATCH (victim:Person)-[r:HAS_BOUGHT_AT]->(merchant)
MATCH (victim)-[r :HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
WITH victim, merchant, r,
toInteger(split(r.time, '/')[0]) + "-" +
toInteger(split(r.time, '/')[1]) + "-" +
toInteger(split(r.time, '/')[2]) as date_r
MATCH (victim)-[t:HAS_BOUGHT_AT]->(othermerchants)
WITH victim, merchant, othermerchants, r, t, date_r,
toInteger(split(t.time, '/')[0]) + "-" +
toInteger(split(t.time, '/')[1]) + "-" +
toInteger(split(t.time, '/')[2]) as date_t
WHERE t.status = "Undisputed" AND date_t < date_r
RETURN DISTINCT othermerchants.name, count(DISTINCT victim.name)
ORDER BY count(DISTINCT victim.name) DESC
$$) AS (suspicious_store agtype, num_victims agtype);
suspicious_store | num_victims
-----------------------+-------------
"Wallmart" | 4
"Amazon" | 1
"American Apparel" | 1
"Just Brew It" | 1
"MacDonalds" | 1
"Sears" | 1
"Soccer for the City" | 1
"Starbucks" | 1
"Subway" | 1
(9 rows)
We now know the location and the date on which the customer's credit card numbers were stolen. With a graph visualization solution like AGE Viewer, we could inspect the data to confirm our intuition. Now we can notify the authorities and the merchant on the situation. They should be able to take it from there! Each fraudulent transaction involved a credit card holder who had visited Walmart in the days just before.
The use of graph databases and visualization solutions like AGE Viewer can greatly improve fraud detection and help maintain financial stability, protect reputations, and ensure the safety and security of customers.
For more information about age, checkout these links:
Top comments (0)