Good day Of us, and welcome again to our research time!
In our final article, we realized how to easily fetch cryptocurrency data from CoinMarketCap and integrate it into a Fabric Lakehouse in a notebook environment. At present, we’re persevering with our knowledge journey by delving into the thrilling world of medallion structure.
Think about it as giving our knowledge engine a turbo enhance! We’ll be utilizing Microsoft Material Warehouses to create our Information Warehouse and Pipelines to maneuver knowledge between our three layers.
This journey unfolds throughout three posts, starting proper right here. We’re diving deeply into the idea of a medallion structure, inspecting the way it can enhance our knowledge operations and eventually loading knowledge into our mannequin within the bronze layer.
I hope you guys take pleasure in these three articles, and you probably have any strategies, questions, or feedback, please be at liberty to share your opinions.
So, let’s dig in and perceive what medallion structure is all about.
Medallion Structure
We will outline a medallion structure as a knowledge administration framework that consists of three major layers: bronze, silver, and gold. Every layer has a particular position within the knowledge processing and analytics pipeline:
- Bronze Layer: Step one within the knowledge processing journey. Right here, knowledge from completely different sources is introduced into the system in its unique kind, with none modifications. This permits for an entire and untouched take a look at the group’s knowledge. The bronze layer serves as a place to begin for brand spanking new info, making it straightforward to rapidly entry and use uncooked knowledge.
- Silver Layer: Right here the info from the bronze layer goes via processes of transformation and cleaning to enhance its high quality and value. Duties equivalent to knowledge cleansing, normalization, and enrichment are carried out on this layer, making ready the info for additional evaluation and insights era. The silver layer acts as an middleman stage the place knowledge is refined and standardized earlier than getting used for enterprise intelligence and reporting functions.
- Gold Layer: On the ultimate stage of the info lifecycle, the gold layer holds fastidiously curated knowledge that’s optimized for superior analytics and decision-making. This structured and aggregated knowledge is saved for environment friendly querying and evaluation, making it straightforward for enterprise customers and knowledge analysts to discover and visualize. Sometimes saved in knowledge warehouses or knowledge lakes, the gold layer comprises datasets which can be prepared for consumption.
General, medallion structure supplies a structured and scalable framework for managing knowledge throughout its lifecycle, from ingestion to consumption. By incorporating a number of layers of processing and transformation, organizations can make sure the reliability, high quality, and value of their knowledge belongings, empowering knowledgeable decision-making and driving enterprise worth.
Now that now we have a grasp on what a medallion structure is, let’s prepare to leap in and begin working. It’s time to show our data into motion and begin developing our bronze layer.
The Bronze Layer
Let’s begin coding our Bronze Layer in a brand new Pocket book. In case you’re not sure about how Material works or tips on how to create a brand new Pocket book, simply check with my final article here.
This code snippet demonstrates the method of constructing an API request to fetch the newest cryptocurrency listings from CoinMarketCap and dealing with the response utilizing Python.
from pyspark.sql import SparkSession# Create a SparkSession
spark = SparkSession.builder.appName("Session").getOrCreate()
#This instance makes use of Python 2.7 and the python-request library.
from requests import Request, Session
from requests.exceptions import ConnectionError, Timeout, TooManyRedirects
import json
url = 'https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/newest'
parameters = {
'begin':'1',
'convert':'USD'
}
headers = {
'Accepts': 'software/json',
'X-CMC_PRO_API_KEY': '8f30839a-97a2-4e88-83fb-cc6b58825494',
}
session = Session()
session.headers.replace(headers)
attempt:
response = session.get(url, params=parameters)
knowledge = json.masses(response.textual content)
#knowledge=response.json()
print("JSON Loaded")
besides (ConnectionError, Timeout, TooManyRedirects) as e:
print(e)
Now, it’s time to normalize our JSON knowledge, including a timestamp column, and saving the ensuing DataFrame as a CSV file to Azure Information Lake Storage.
import pandas as pd
import pytz
from datetime import datetime
import os# Normalizing knowledge and including a timestamp
df = pd.json_normalize(knowledge['data'])
#df['name'] = df['name'].astype("string")
#df['symbol'] = df['symbol'].astype("string")
#df['slug'] = df['slug'].astype("string")
#df['date_added'] = df['date_added'].astype("string")
df['tags'] = df['tags'].astype("string")
df['tvl_ratio'] = df['tvl_ratio'].astype("string")
df['quote.USD.tvl'] = df['quote.USD.tvl'].astype("string")
#df['platform.name'] = df['platform.name'].astype("string")
#df['platform.symbol'] = df['platform.symbol'].astype("string")
#df['platform.slug'] = df['platform.slug'].astype("string")
#df['platform.token_address'] = df['platform.token_address'].astype("string")
# Outline New York timezone
ny_tz = pytz.timezone('America/New_York')
# Get present time in New York timezone
current_time_ny = datetime.now(ny_tz)
# Add present time in New York timezone as a brand new column
local_datetime = current_time_ny.strftime('%Y-%m-%d %H:%M:%S')
local_date = current_time_ny.strftime('%Y-%m-%d')
df['log_date'] = local_datetime
#print(df.dtypes)
show("JSON Normalized")
#Save currrent CSV file
file_name = "CoinMarketCap" + local_date + ".csv"
abfs = "abfss://a20e1f0c-e94e-4349-b995-c2479037a2e9@onelake.dfs.material.microsoft.com/156c7e07-d0d0-4cf1-8208-5b1dcb5dc729/Information/Bronze/"
file_path = abfs + file_name
# Examine if the file exists
if os.path.exists(file_path):
# If the file exists, delete it
os.take away(file_path)
# Save CSV
df.to_csv(file_path)
show("CSV Saved")
After executing each code, you’ll discover your CSV recordsdata saved in a folder named Bronze, situated throughout the Information listing.
Conclusion
At first phases of our collection on incorporating a medallion structure in Microsoft Material, we’ve began by delving into the basic ideas and actively coding the bronze layer. We’ve explored how Microsoft defines medallion structure as a structured system for enhancing knowledge processing and analytics.
With the utilization of Notebooks, we’ve showcased the steps to retrieve knowledge from an API, standardize it, embody a timestamp, and put it aside in Azure Information Lake Storage. This sensible strategy has provided worthwhile views on executing a medallion structure.
To the following sections, we’ll discover additional the creation of the silver and gold layers, bettering our knowledge pipeline, and discovering extra worthwhile insights from our knowledge. Preserve a watch out for extra intriguing insights and hands-on demonstrations as we progress in our quest to grasp medallion structure in Microsoft Material.