πŸ’³

Payment Fraud Detection

β€œ3 simple flags β€” foreign IP, new device, after midnight β€” predicted 85% of fraud. The pattern was already in the data.”

Risk Score ModellingPrecision/RecallRegional Fraud AnalysisFalse Positive Trade-offFeature Engineering
5,000
Transactions
312
Fraud Cases
6.2%
Fraud Rate
85%
Detection Rate
2
High-Risk Regions
1

Step 1 β€” Data Extraction from Payment Gateway Logs

5,000 payment transaction records were exported for Jan–Dec 2024. Each record includes metadata, device signals, IP geolocation, and a confirmed fraud label (0 = legitimate, 1 = fraud) for risk model validation.

Sample raw extract β€” payment transaction log

Payment IDDateRegionTypeAmountHourNew DeviceForeign IPIs Fraud
PAY-000012024-03-14Kuala LumpurCredit CardRM 1,24014NoNo0
PAY-000022024-03-14KedahE-WalletRM 3,80002YesYes1
PAY-000032024-03-15PenangDebit CardRM 48011NoNo0
PAY-000042024-03-15SelangorCredit CardRM 58023YesNo1
PAY-000052024-03-16Johor BahruBank TransferRM 2,10009NoNo0
PAY-000062024-03-16KedahBNPLRM 92003YesYes1
PAY-000072024-03-17KLCredit CardRM 29016NoNo0
PAY-000082024-03-17SabahE-WalletRM 4,50001YesYes1
PAY-000092024-03-18PenangDebit CardRM 18010NoNo0
PAY-000102024-03-18SelangorCredit CardRM 75022NoYes1
2

Step 2 β€” Data Cleaning

Issues found and resolved

IssueCountActionResult
Duplicate Transaction IDs18Removed4,982 unique records
Amount = RM 0.00 (test transactions)12RemovedExcluded
Fraud label missing34Excluded from model training onlyFlagged
Hour field blank7Derived from full timestampImputed
Region field inconsistent casing28Standardised (e.g. kl β†’ Kuala Lumpur)Unified
3

Step 3 β€” Transform: Risk Feature Engineering + VLOOKUP

A risk score was built by combining 7 binary flags into a weighted total per transaction. VLOOKUP joined the regional risk weighting table to apply location-based score additions.

Risk score components β€” each flag adds points to the transaction total

Risk FlagConditionPoints AddedActual Fraud Rate When True
After-Midnight TransactionHour 00:00–04:59+69.1%
High AmountAmount > RM 2,000+811.4%
New DeviceDevice not seen before for this customer+58.2%
Foreign IP AddressIP geolocated outside Malaysia+1014.8%
Prior Fraud HistoryCustomer flagged in last 90 days+1219.3%
BNPL Payment TypeBuy Now Pay Later selected+47.6%
High-Risk RegionKedah or Kuala Lumpur+38.1%

Pivot Output β€” Fraud Rate by Region

Grouped by region β€” shows geographic concentration of fraud

RegionTotal TransactionsFraud CasesFraud Rate %Risk Level
Kedah342288.2%πŸ”΄ HIGH
Kuala Lumpur1,248927.4%πŸ”΄ HIGH
Sabah398287.0%πŸ”΄ HIGH
Selangor998626.2%🟑 ELEVATED
Johor Bahru892444.9%🟑 MODERATE
Penang748385.1%🟑 MODERATE
Sarawak374205.3%🟑 MODERATE
πŸ” KEY FINDING
Kedah (8.2%) and KL (7.4%) show fraud rates more than 2Γ— the national average. Transactions combining Foreign IP + New Device + After Midnight have an 18% fraud rate β€” nearly 3Γ— the overall 6.2%. These 3 flags alone, as a detection rule, catch 85% of all confirmed fraud cases in this dataset.
4

Step 4 β€” Analysis: Model Performance + Precision/Recall

312
Confirmed Fraud Cases
6.2% of all transactions
85%
Detection Rate
At threshold score 30
8.2%
Kedah Fraud Rate
Highest region
8%
False Positive Rate
Acceptable for 2FA trigger

Precision/Recall at different thresholds β€” finding the optimal cut-off

ThresholdFraud CaughtFalse PositivesPrecisionRecallRecommendation
Score β‰₯ 2092%22%71%92%Too many false positives β€” bad UX
Score β‰₯ 3085%8%86%85%βœ… Recommended β€” best balance
Score β‰₯ 4071%3%94%71%Misses too much fraud
Score β‰₯ 5054%1%97%54%Too conservative
πŸ” KEY FINDING
A threshold of 30 is optimal. It catches 85% of fraud while triggering a 2FA prompt for only 8% of legitimate customers β€” roughly 18 real people per week experiencing a minor delay. That is the trade-off to present to management: minor inconvenience for 18 people per week vs RM 477k in preventable fraud per year.
5

Step 5 β€” Visualisation

Fraud Rate % by Region
Kedah and KL are 2Γ— the national average of 6.2%
Fraud Rate by Hour of Day (24h)
Red bars = 12am–4am window, 2Γ— daytime rate
Precision vs Recall Curve
Score β‰₯30 is the optimal balance point
Fraud Rate % by Region
Bar chart ranked β€” Kedah and KL isolated in red. No explanation needed after seeing this chart.
Risk Score Distribution: Fraud vs Legitimate
Overlapping histogram β€” legitimate clusters at 0–20, fraud at 30–80. Visual proof the model separates the two groups.
Precision vs Recall Curve
Line chart β€” score 30 marked as the optimal decision point. Management sees the trade-off being made.
Fraud Rate by Hour of Day
24-bar chart β€” 12am–4am has 2Γ— daytime rate. Directly supports the after-midnight transaction limit.
6

Step 6 β€” Report to Management

Actions submitted to Risk Management from this analysis

FindingAction RequiredOwnerDeadlineExpected Impact
Kedah & KL at 7–8% fraudDeploy 2FA for KL/Kedah transactions > RM 500Risk ManagerFeb 17Reduce regional fraud to < 4%
3-flag combo = 18% fraud rateAuto-flag for manual review when 2+ flags triggerRisk TeamFeb 17Catch 85% before payment clears
BNPL at 7.6% fraud rateExtra verification for BNPL > RM 300Product ManagerMar 1Reduce BNPL fraud by est. 60%
After-midnight elevatedCap transactions at RM 1,000 between 12am–4amRisk ManagerFeb 24Limit exposure in highest-risk window
βœ… WHY MANAGEMENT MUST ACT
  • Financial exposure: 312 fraud cases Γ— avg RM 1,800 = RM 561,600 total annual exposure. The model catches 85% = RM 477k recoverable per year.
  • Reputational risk: KL and Kedah concentration means affected customers are geographically clustered. One viral complaint about a stolen payment spreads fast locally.
  • Regulatory risk: Bank Negara Malaysia requires active fraud monitoring for payment processors. Non-compliance risks licence review β€” not just a fine.
  • False positive cost is minimal: ~18 real customers per week get a 2FA prompt. That is a 30-second inconvenience vs RM 477,000 in prevented fraud annually.