For any queries you can reach us at infovistarindia@gmail.com / WhatsApp us: +919158876092

Methods to Load Data using Python TensorFlow

Methods to Load Data using Python TensorFlow

There are two ways to load data, they are as follows:

  1. Load Data using NumPy Array
  2. Load Data using TensorFlow Data Pipeline

Load Data using NumPy Array

We can hard-code data into a NumPy Array or we can load data from an Excel (xls or xlsx) or CSV file into a Pandas DataFrame later will be converted into a NumPy Array. If your dataset is not pretty big, that is less than 10 gigabytes, you can use this method. The data can fit into memory.

## Numpy to pandas    
import numpy as np    
import pandas as pd  
  
h = [[1,2],[3,4]]     
df_h = pd.DataFrame(h)    
print('Data Frame:', df_h)    
    
## Pandas to numpy    
df_h_n = np.array(df_h)    
print('Numpy array:', df_h_n)

The output of the above code will be Data Frame: 0 1 0 1 2 1 3 4 Numpy array: [[1 2] [3 4]]

Load Data using TensorFlow Data Pipeline

Tensorflow has built-in API that helps you to load the data, perform the operation and feed the machine learning algorithm easily. This method works very well when you have a pretty large dataset. For instance, image records are known to be huge and do not fit into memory. The data pipeline manages the memory by itself. This method works best if you have a huge dataset. For instance, if you have a dataset of 50 gigabytes, and your computer has only 16 gigabytes of memory then the machine will crash.

In this circumstances, you need to build a Tensorflow pipeline. The pipeline will load the data in batch, or small chunk. Each batch will be pushed to the pipeline and be ready for the training. Building a pipeline is an excellent solution because it permits you to use parallel computing. It means Tensorflow will train the model through multiple CPUs. It fosters the computation and permits for training powerful neural network.

Methods to create TensorFlow Data Pipeline:

  1. Create the Data:

    import numpy as np  
    import tensorflow as tf  
    x_input = np.random.sample((1,2))  
    print(x_input)

    In the above code, we are generating two random numbers using the NumPy's Random Number Generator

  2. Create the Placeholder

    x = tf.placeholder(tf.float32, shape=[1,2], name = 'X')

    We are creating a placeholder using the tf.placeholder()

  3. Define the Dataset Method

    dataset = tf.data.Dataset.from_tensor_slices(x)

    We define the dataset method as tf.data.Dataset.from_tensor_slices()

  4. Create the Pipeline

    iterator = dataset.make_initializable_iterator()   
    get_next = iterator.get_next()

    In above code, we need to initialize the pipeline where the data will flow. We need to create an iterator with make_initializable_iterator. We name it iterator. Then we need to call this iterator to supply the next batch of data, get_next. We name this step get_next. Note that in this example, there is only one batch of data with only two values.

  5. Execute the Operation

    with tf.Session() as sess:  
        # feed the placeholder with data  
        sess.run(iterator.initializer, feed_dict={ x: x_input })   
        print(sess.run(get_next)) # output [ 0.52374458  0.71968478]  

    In the above code, we initiate a session, and we run the operation iterator. We feed the feed_dict with the value generated by numpy. These two value will populate the placeholder x. Then we run get_next to print the result.

TensorFlow_Pipeline.py

import numpy as np  
import tensorflow as tf  
x_input = np.random.sample((1,2))  
print(x_input)  
# using a placeholder  
x = tf.placeholder(tf.float32, shape=[1,2], name = 'X')  
dataset = tf.data.Dataset.from_tensor_slices(x)  
iterator = dataset.make_initializable_iterator()   
get_next = iterator.get_next()  
with tf.Session() as sess:  
    # feed the placeholder with data  
    sess.run(iterator.initializer, feed_dict={ x: x_input })   
    print(sess.run(get_next))

The output of the above code will be [[0.87908525 0.80727791]] [0.87908524 0.8072779 ]