This function checks whether a character is one of the 20950 CJK Unified Ideographs. These include all the Chinese hanzi, Japanese kanji, and Korean hanja in common usage. Some rarer characters and variants are stored elsewhere, but it's still a reliable test real world applications.
def is_hanzi(char): """Check for CJK Unified Ideograph.""" return ord(char) >= 0x4e00 and ord(char) <= 0x9fff
ord is built-in function returning the Unicode code point of a single Unicode character.
for char in 'a. 一见钟情': print(is_hanzi(char))
False False False True True True True
If you spot any errors please let me know on twitter.