Data conversion is a common task in various fields, and one frequent scenario involves converting Excel files (both .xls
and .xlsx
formats) to CSV (Comma Separated Values) files. The challenge arises when dealing with special characters or non-English alphabets, requiring UTF-8 encoding to ensure data integrity. This article provides a comprehensive guide on how to convert Excel files to CSV with UTF-8 encoding, ensuring that your data is accurately represented and ready for use in various applications.
UTF-8 is a character encoding capable of encoding all possible characters (called code points) in Unicode. It's crucial when your Excel files contain characters outside the standard ASCII range, such as accented letters (é, à, ü), characters from non-English languages (Hebrew, Chinese, etc.), or special symbols. Without UTF-8 encoding, these characters may be garbled or replaced with incorrect symbols during the conversion process, leading to data corruption.
Microsoft Excel offers a built-in method to save files as CSV with UTF-8 encoding, although it requires a few extra steps:
.xls
or .xlsx
file you want to convert.Note: some older version of excel might not have the option to directly save as UTF-8 CSV. If this is the case see method 2 for a workaround.
If the direct method doesn't work or isn't available, you can use a combination of Excel and Notepad to achieve the desired result:
.xlsx
file using Microsoft Excel..txt
file with Microsoft Notepad.Ctrl+H
to open the "Find and Replace" window. Copy a tab character from the file (the space between two column headers) and paste it into the "Find what" field. Enter a comma (,
) in the "Replace with" field. Click "Replace All"..csv
extension. Crucially, change the "Encoding" option to "UTF-8". Click "Save"..csv
file in Excel to ensure your data is displayed correctly.LibreOffice Calc, a free and open-source spreadsheet program, often handles UTF-8 encoding more reliably than Excel in some cases:
.xls
or .xlsx
file with LibreOffice Calc.For those who need to automate the conversion process, a VBScript can be used. This is particularly useful for developers or users who regularly convert files:
'***********************************************************************
'* file: SaveAs.CSV.bat
'***********************************************************************
sInputFile = Wscript.Arguments(0)
WScript.Echo "Excel input file: " & sInputFile
Set ex = CreateObject("Excel.Application")
Set wb = ex.Workbooks.Open(sInputFile)
ex.Application.DisplayAlerts = False
'https://learn.microsoft.com/en-us/office/vba/api/office.msoencoding
wb.WebOptions.Encoding = 65001 ' UTF-8 Encoding
ex.Application.DefaultWebOptions.Encoding = 65001 ' UTF-8 Encoding
'https://learn.microsoft.com/en-us/office/vba/api/excel.xlfileformat
sOutputFile = Replace(sInputFile & "*",".xlsx*",".txt")
ex.Worksheets(1).SaveAs sOutputFile, 20
ex.ActiveWorkbook.Close
ex.Application.Quit
WScript.Echo "CSV file has been created."
WScript.Quit
.vbs
file (e.g., SaveAsUTF8.vbs
)..bat
file with the following content, replacing the file path with your Excel file's path:cscript SaveAsUTF8.vbs "D:\Documents\YourExcelFile.xlsx"
pause
.bat
file to execute the script. This will convert the first sheet of your Excel file to a tab-separated .txt
file with UTF-8 encoding.Note: This script saves the file as a TXT file with tab separated values. You may need to manually change the file extension to .csv and replace tab characters with commas for full CSV compatibility
TRIM()
function to remove these spaces before converting to CSV.Converting Excel files to CSV with UTF-8 encoding can be tricky, but by following these methods and troubleshooting tips, you can ensure that your data is accurately converted and ready for use in various applications. Choose the method that best suits your needs and technical skills, and always verify the output to ensure data integrity.