Data Pre-processing and text analytics using Orange
1. Discretization
classOrange.preprocess.discretize
Discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers.
code:
import Orange
store = Orange.data.Table("iris.tab")
iris = Orange.preprocess.Discretize()
iris.method = Orange.preprocess.discretize.EqualFreq(n=3)
d_store = iris(store)
print("Original dataset:")
for e in store[:3]:
print(e)
print("Discretized dataset:")
for e in d_store[:3]:
print(e)
2. Continuization
Orange.preprocess.
Continuize
- binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument
zero_based
. - multinomial variables are treated according to the argument
multinomial_treatment
. - discrete attribute with only one possible value is removed;
code:
import Orange
titanic = Orange.data.Table("titanic")
continuizer = Orange.preprocess.Continuize()
titanic1 = continuizer(titanic)
3. Normalization
class Orange.preprocess.
Normalize
(zero_based=True, norm_type=Normalize.NormalizeBySD, transform_class=False, center=True, normalize_datetime=False)[source] Construct a preprocessor for normalization of features. Given a data table, preprocessor returns a new table in which the continuous attributes are normalized.
Code:
from Orange.data import Table
from Orange.preprocess import Normalize
data = Table("iris.tab")
normalizer = Normalize(norm_type=Normalize.NormalizeBySpan)
normalized_data = normalizer(data)
4. Randomization
Orange.preprocess.
Normalize
(zero_based=True, norm_type=Normalize.NormalizeBySD, transform_class=False, center=True, normalize_datetime=False)[source]Construct a preprocessor for normalization of features. Given a data table, preprocessor returns a new table in which the continuous attributes are normalized.
from Orange.data import Table
from Orange.preprocess import Normalize
data = Table("iris.tab")
normalizer = Normalize(norm_type=Normalize.NormalizeBySpan)
normalized_data = normalizer(data)
Orange.preprocess.
Randomize
(rand_type=Randomize.RandomizeClasses, rand_seed=None)[source]Construct a preprocessor for randomization of classes, attributes and/or metas. Given a data table, preprocessor returns a new table in which the data is shuffled.
from Orange.data import Table
from Orange.preprocess import Randomize
data = Table("iris")
randomizer = Randomize(Randomize.RandomizeClasses)
randomized_data = randomizer(data)
How to work with Orange in Python and vice-versa?
Orange is a visualization and research platform for open-source data, where data mining is conducted by graphic programming or Python scripting. The instrument has components for deep learning, bioinformatics, and text mining add-ons, and it is filled with data analytics features. Orange is a library for Python.
The python example in orange is as given above.
And if we want to use orange in python then we just need to import orange shown below.
References:
https://developer.ibm.com/technologies/data-science/articles/data-preprocessing-in-detail/
https://en.wikipedia.org/wiki/Data_pre-processing/
https://orange.biolab.si/categories/text-mining/
https://www.electronicsmedia.info/2017/12/20/what-is-data-preprocessing/
https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825/
No comments:
Post a Comment