Last Updated on October 30, 2023 by Ankit Kochar
The ord function in Python is a versatile tool that allows you to convert a single character into its corresponding Unicode code point integer value. Unicode is a standard that represents characters from various writing systems, allowing Python to work with a wide range of characters and symbols beyond the ASCII character set.
In this article, we will delve into the ord) function in Python, exploring its functionality and use cases. We will discuss how it can be employed to obtain Unicode code points for characters, enabling you to work with text data that includes non-ASCII characters and special symbols. Understanding ord() is valuable for tasks such as text processing, internationalization, and character analysis within Python programs. Let’s deep dive and discuss what does ord do in Python.
What is Ord Function in Python?
ord() function in python is a built-in function that is used to return an integer representing the Unicode code point of a specified character. The Unicode character set is a universal encoding standard that can represent any character from any language in the world. It assigns a unique numerical value, or code point, to each character. When given a Unicode object as an argument, the ord function in python returns an integer corresponding to the character’s Unicode code point; if the parameter is an 8-bit string, it returns the value of the byte.
Syntax of Ord Function in Python
The syntax of the ord function in python is quite simple. It takes a single argument, which is a string of length 1 containing the character whose Unicode code point we want to find.
Here’s the syntax:
Parameter of the Ord Function in Python
The character argument can be a string, a variable that contains a string, or a single character enclosed in quotes.
Return Value of the Ord Function in Python
The ord function in python will return an integer that represents the Unicode of the given character.
Unicode is a character encoding standard that allows for the representation and processing of text in various writing systems and languages. It provides a unified, standardized way to represent characters from different scripts and languages in a digital format, regardless of the platform, device, or program being used.
Prior to Unicode, there were many different character encoding systems, which made it difficult to transfer or display text correctly across different systems. Unicode solved this problem by assigning a unique code point to each character, which allowed for consistent representation of text in different languages and scripts.
Unicode covers a vast range of characters from many different writing systems, including the Latin alphabet, Arabic, Cyrillic, Chinese, Japanese, and many others. It also includes a wide variety of special characters, such as punctuation, mathematical symbols, and emojis.
The most common encoding scheme used with Unicode is UTF-8, which is a variable-length encoding system that allows for efficient use of space while still supporting the full range of Unicode characters. Other encoding schemes, such as UTF-16 and UTF-32, are also available but are less commonly used.
- UTF-8 is the most widely used encoding scheme for Unicode. It uses a variable-length encoding system, which means that each character is represented by one to four bytes, depending on its Unicode code point. ASCII characters, which are the most commonly used characters in the English language, are represented by a single byte in UTF-8. This makes UTF-8 a space-efficient encoding scheme for text that includes mostly ASCII characters, while still allowing for the representation of characters from many different scripts and languages.
- UTF-16 is a fixed-length encoding scheme that uses two bytes to represent most characters, and four bytes to represent certain characters that require more space. It is used primarily in Windows systems and for certain languages that require a larger character set, such as Chinese and Japanese.
- UTF-32 is a fixed-length encoding scheme that uses four bytes to represent every character. It is less commonly used than UTF-8 and UTF-16 but provides a simple and efficient way to process and manipulate text, particularly for certain applications that require fixed-length characters, such as text editors and word processors.
Example 1 of Ord Function in Python: Using the ord function on a character
We will now see the code and implementation of the above-mentioned code.
s = 'Hello, world!' print(ord(s))
Explanation of the above example
In this code, we have defined a string variable s with the value ‘Hello, world!’. We then call the ord function with the argument s, which selects the first character of the string (in this case, ‘H’).
The ord function returns the Unicode code point of the character, which is an integer value representing the character in the Unicode standard. In this case, the code point of ‘H’ is 72.
Unicode is a standard for encoding characters from all the world’s writing systems. It assigns each character a unique code point, which is a non-negative integer in the range 0 to 1,114,111 (hexadecimal 0x10FFFF). The first 128 code points are reserved for ASCII characters, which have the same values as their ASCII codes.
The ord function in Python takes a string of length 1 and returns the Unicode code point of its first character. If you want to convert an entire string to a list of code points, you can use a list comprehension
Example 2 of the Ord Function in Python: Error Condition
We will see the code and implementation of the above-mentioned example.
ord(65) ord("ab") ord('\ud83d')
Traceback (most recent call last):
File "c:\Users\Hp\Downloads\input1.py", line 1, in
ord(65) # TypeError: ord() expected string of length 1, but int found
TypeError: ord() expected string of length 1, but int found
Explanation of the above code
Here above error is raised because the character \ud83d is only half of a surrogate pair and is not a valid Unicode character on its own.
It’s also worth noting that ord() can only handle characters in the BMP (Basic Multilingual Plane) range of Unicode, which includes characters with code points up to 0xFFFF. Characters outside this range, such as emoji characters, will require additional handling to properly convert them to integer values.
Chr Function in Python
There is no char() function in Python. However, the closest equivalent is the chr() function, which is used to convert an integer representing an ASCII code to its corresponding character.
The chr() function takes a single argument, an integer representing an ASCII code, and returns the corresponding character. For example, chr(65) would return the uppercase letter ‘A’, since 65 is the ASCII code for ‘A’.
The chr() function is often used in conjunction with the ord() function, which converts a character to its corresponding ASCII code. Together, these functions can be used to convert between characters and their ASCII codes.
It is important to note that the chr() function only works with ASCII codes in the range 0-127. If you try to pass an integer outside this range to chr(), a ValueError will be raised.
Example of Chr and Ord Function in Python
We will see the code with implementation and output of the above-mentioned example.
Explanation of the above example
In the above example we have used the ord function which will give the Unicode of the given character and after that with chr function when we put the Unicode value it will return the corresponding character.
The ord function in Python is a valuable utility for obtaining the Unicode code point of a character. It plays a crucial role in text processing, especially when dealing with non-ASCII characters, special symbols, or internationalization tasks. By using ord(), you can bridge the gap between textual data and numerical representations, facilitating a wide range of operations on strings.
In this article, we’ve explored the ord function in Python, its functionality, and common use cases. We’ve also discussed the importance of Unicode and its role in handling text data in a globally connected world.
As you continue to work with text and strings in Python, remember that ord() is a powerful tool that empowers you to manipulate and analyze characters in a meaningful way, transcending language and encoding barriers.
Frequently Asked Questions Related to Ord Function in Python
Here are some FAQs related to Ord Function in Python.
1. Are there any limitations to using ord() for Unicode handling?
While ord() is useful for most Unicode characters, it may not work correctly with certain rare or less commonly used characters. Additionally, it doesn’t handle multi-character strings, as it expects a single character as input..
2. How do I use the ord() function?
To use the ord() function, simply pass a single character (e.g., a letter or a symbol) as an argument. It will return the Unicode code point integer value for that character.
3. What is Unicode, and why is it important in Python?
Unicode is a character encoding standard that represents characters from various writing systems. It allows Python to work with a wide range of characters and symbols beyond the ASCII character set, making it essential for internationalization and text processing.
4. Can I use the ord() function with non-ASCII characters?
Yes, the ord() function can be used with non-ASCII characters, such as accented letters, Greek characters, or emoji. It will return the Unicode code point for the provided character.
5. Is there a reverse function for ord() to convert Unicode code points back to characters?
Yes, you can use the chr() function in Python to convert a Unicode code point integer value back into its corresponding character. For example, chr(65) returns the character ‘A’.