Read from text file with multiple line formats using TextFieldParser

04 May

When using TextFieldParser to read fixed field width text file with optional fields, MalformedLineExceptions could happen. It happens because the imported line length is shorter than expected.

For example:

reader.SetFieldWidths(5, 8, -1)

' Exception will be thrown when the line is only 12 characters long or less
' because the 3rd field will be regarded as missing

We could prevent the exception by determining which format to use according to the line length.

' DataSetImport.DataTableImport is a strongly typed DataTable defined in DataSet Designer
Private _DataTableImport As New DataSetImport.DataTableImport

Private Sub _OpenFile()


    ' Pop a OpenFileDialog for file path
    Dim openFileDialog As New OpenFileDialog
    If openFileDialog.ShowDialog() = DialogResult.OK Then

        Dim filePath As String = Me.OpenFileDialog.FileName


    End If

    ' Display parsed data in DataGridView
    Me.DataGridViewImport.DataSource = Me._DataTableImport

End Sub

Private Sub _LoadTxt(ByVal filePath As String)

        Using reader As New Microsoft.VisualBasic.FileIO.TextFieldParser(filePath)
            reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.FixedWidth
            reader.TrimWhiteSpace = True

            ' currentLine holds the current line data to determine what format to use
            Dim currentLine As String
            ' currentFields array holds the parsed fields of the current line
            Dim currentFields As String()

            While Not reader.EndOfData
                    ' The format used to parse the current line
                    Dim format As Integer()

                    ' The expected line length which is the last field's index + 1
                    ' For example:
                    '    If the line contains 2 fields and the 2nd field's index is 5,
                    '    then the length here is 6
                    Dim length As Integer = 101

                    ' Using PeekChars() to get the line length when the line length < 101
                    ' Assume there are 4 fields at most which are {20, 40, 40, 10}
                    currentLine = reader.PeekChars(length)
                    ' Using currentLine.Length to get the line format
                    format = Me.GetLineFormat(currentLine.Length)

                    currentFields = reader.ReadFields()
                    Dim currentField As String
                    Dim newRow As DataRow = Me._DataTableImport.NewRow
                    Dim columnIndex As Integer = 0
                    For Each currentField In currentFields
                        newRow(columnIndex) = currentField
                        columnIndex = columnIndex + 1
                Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                    MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
                End Try
            End While
        End Using

    Catch ex As Exception
    End Try

End Sub

Private Function GetLineFormat(ByVal lineLength As Integer) As Integer()
    Dim format As Integer()

    ' Assume there are 4 fields at most which are {20, 40, 40, 10}
    Select Case lineLength
        Case 21 To 60
            format = New Integer() {20, -1}
        Case 61 To 100
            format = New Integer() {20, 40, -1}
        Case 101
            format = New Integer() {20, 40, 40, -1}
    End Select

    Return format
End Function

Reference : How to: Read From Text Files with Multiple Formats in Visual Basic

Leave a comment

Posted by on May 4, 2012 in .NET, VB



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: