Samples: Multilingual Receipt and Invoice OCR with GenAI

Introduction and data format

The best way to understand how UnderDoc works is by samples! In this page you will see a few samples of expense document image and their corresponding structure extracted.

The following is the structured data format:

  • Image Type (Receipt, Invoice, Others)
  • Shop Name
  • Shop Address
  • Date (in ISO 8601 format)
  • Expense Category (e.g. Food, Car Rental, etc. Inferred from the expense details)
  • Currency (e.g. USD, HKD, etc. From document or infer from shop name, addresses, etc.)
  • Total Amount
  • Expense Items. For each item, provide the following properties:
    • Name
    • Quantity
    • Unit Price
    • Subtotal

1. Household Expenses

Sample 1.1 - Demand for Rate and Government Rents

  • Input is a demand note from Government for rates and rents for an apartment (with mixed English and Chinese content)

Sample 1.1 Government Rent

  • Output structured data (in JSON format)
Expense Data
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

{
  "receipt_data": {
    "image_type": "Invoice",
    "expense": {
      "shop_name": "RATING AND VALUATION DEPARTMENT",
      "shop_address": "*****masked************",
      "date": "28/01/2025",
      "expense_category": "Government Rent",
      "currency": "HKD",
      "total_amount": 3130,
      "items": [
        {
          "name": "ODD CENTS B/F",
          "quantity": 0,
          "unit_price": 0.32,
          "subtotal": 0.32
        },
        {
          "name": "Rates",
          "quantity": 1,
          "unit_price": 1956,
          "subtotal": 1956
        },
        {
          "name": "ODD CENTS C/F",
          "quantity": 0,
          "unit_price": 0.32,
          "subtotal": -0.32
        },
        {
          "name": "ODD CENTS B/F",
          "quantity": 0,
          "unit_price": 0.89,
          "subtotal": 0.89
        },
        {
          "name": "Government Rent",
          "quantity": 1,
          "unit_price": 1173.6,
          "subtotal": 1173.6
        },
        {
          "name": "ODD CENTS C/F",
          "quantity": 0,
          "unit_price": 0.49,
          "subtotal": -0.49
        }
      ]
    }
  }
}

Personal Accessories

Sample 2.1 - A receipt for a pair of glasses

  • Input is a receipt for a pair of glasses. Content was in English

Sample 1.1 Government Rent

  • Output structured data (in JSON format)
Expense Data
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

{
  "receipt_data": {
    "image_type": "Invoice",
    "expense": {
      "shop_name": "48.Lindberg by Puyi",
      "shop_address": "Shop OT 316, Level 3, Ocean Terminal, Harbour City, Tsimshatsui, Kowloon, Hong Kong",
      "date": "2024-09-07 18:02:08",
      "expense_category": "Optical",
      "currency": "HKD",
      "total_amount": 12498,
      "items": [
        {
          "name": "5802/48-850-140-PU13-P30-K263",
          "quantity": 1,
          "unit_price": 4998,
          "subtotal": 4998
        },
        {
          "name": "ZEISS PAL SmartLife Plus (Asiana) 1.60 PhotoFusion X (Pioneer/Grey/ExtraGrey/Brown/Blue /Black) with i. Scription (DAVP /DVP) (Order) (FF Value 0-6) (Order)",
          "quantity": 2,
          "unit_price": 3750,
          "subtotal": 7500
        },
        {
          "name": "EXTERNAL OCULAR HEALTH CHECK UP $180",
          "quantity": 1,
          "unit_price": 0,
          "subtotal": 0
        },
        {
          "name": "Zeiss Individual Eye Examination with Retinal Screening & Intraocular Pressure Measurement",
          "quantity": 1,
          "unit_price": 0,
          "subtotal": 0
        },
        {
          "name": "2pcs Mooncake gift box",
          "quantity": 1,
          "unit_price": 0,
          "subtotal": 0
        },
        {
          "name": "See It Differently 2024 Eyewear Cloth",
          "quantity": 1,
          "unit_price": 0,
          "subtotal": 0
        },
        {
          "name": "禮券折扣 COUPON DISCOUNT",
          "quantity": 1,
          "unit_price": 0,
          "subtotal": 0
        }
      ]
    }
  }
}

Dining

Sample 3.1 - A receipt for breakfast

  • Input is a receipt for breakfast. The content is in Chinese

Sample 1.1 Government Rent

  • Output structured data (in JSON format)
Expense Data
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

{
  "receipt_data": {
    "image_type": "Receipt",
    "expense": {
      "shop_name": "深仔記",
      "shop_address": "尖沙咀亞厘道29-39號\n九龍中心地下A舖",
      "date": "2021-04-27",
      "expense_category": "Misc",
      "currency": "HKD",
      "total_amount": 39,
      "items": [
        {
          "name": "早餐拼\n(早優)\n叉燒/面",
          "quantity": 1,
          "unit_price": 36,
          "subtotal": 36
        },
        {
          "name": "早B餐跟\n煎旦/油多",
          "quantity": 1,
          "unit_price": 0,
          "subtotal": 0
        },
        {
          "name": "(跟)冬啡\n少甜",
          "quantity": 1,
          "unit_price": 3,
          "subtotal": 3
        }
      ]
    }
  }
}

Contact Us

If you have expense document that was not able to parse correctly, please contact us at:

It’s our goal in providing the best document understanding platform.