Normal form conversion

03 Oct 2023

In the world of databases, one of the key concepts is achieving a normal form which helps in organizing and structuring data efficiently. Normal forms ensure that data is stored without redundancy and dependency issues. In this blog post, we will explore how to convert data into normal form using Python.

What is Normal Form?

A normal form is a set of rules that define the structure and organization of data in a database. There are various levels of normal forms, each with its own set of rules. The most commonly used normal forms are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF).

Converting to First Normal Form (1NF)

The first step in converting data into normal form is ensuring that it meets the requirements of 1NF. To achieve 1NF, we need to eliminate duplicate rows and make sure that each attribute contains only atomic values.

# Example code for converting to 1NF
import pandas as pd

# Read the data from a CSV file
df = pd.read_csv('data.csv')

# Remove duplicate rows
df = df.drop_duplicates()

# Split values in a column with multiple values
df = df.assign(categories=df['categories'].str.split(','))

# Normalize the data by creating a new table for categories
categories_df = pd.DataFrame(df['categories'].explode().unique(), columns=['category'])

# Merge the normalized table with the original data
merged_df = df.merge(categories_df, left_on='product_id', right_on='category', how='left')

# Drop the unnecessary columns
normalized_df = merged_df.drop(['categories', 'category'], axis=1)

# Save the normalized data to a new CSV file
normalized_df.to_csv('normalized_data.csv', index=False)

Converting to Second Normal Form (2NF)

Once the data is in 1NF, the next step is to achieve 2NF. In 2NF, we eliminate partial dependencies by ensuring that every non-key attribute depends on the entire key.

# Example code for converting to 2NF
import pandas as pd

# Read the data from a CSV file
df = pd.read_csv('1nf_data.csv')

# Split the table into separate tables based on the key attributes
keys_df = df[['product_id', 'product_name']]
dependencies_df = df[['product_id', 'supplier_id', 'supplier_name']]

# Save the normalized tables to new CSV files
keys_df.to_csv('2nf_keys.csv', index=False)
dependencies_df.to_csv('2nf_dependencies.csv', index=False)

Converting to Third Normal Form (3NF)

The final step is to achieve 3NF by eliminating transitive dependencies. In 3NF, every non-key attribute should depend only on the key attributes, not on other non-key attributes.

# Example code for converting to 3NF
import pandas as pd

# Read the data from a CSV file
df = pd.read_csv('2nf_keys.csv')

# Split the table into separate tables based on the relationships
products_df = df[['product_id', 'product_name']]
suppliers_df = df[['supplier_id', 'supplier_name']]

# Save the normalized tables to new CSV files
products_df.to_csv('3nf_products.csv', index=False)
suppliers_df.to_csv('3nf_suppliers.csv', index=False)

Summary

Converting data into normal form is crucial for efficient data storage and management. By following the steps mentioned above and using Python, we can easily convert data into First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). The normalization process helps in reducing redundancy, improving data integrity, and simplifying database queries.

#databasenormalization #pythonprogramming