• No products in the cart.

Sentiment Analysis

Sentiment Analysis

Sentiment Analysis for Imdb Dataset in Python.

Import the required libraries

In [ ]:
import pandas as pd
import sklearn as sk

Import the Dataset from your folder.

In [9]:
### Importing the data ###
imdb=pd.read_csv("D:ProjectsDATAVEDICasestudysCS31- Sentiment Labelled Sentences-pythonsentiment labelled sentencesimdb_labelled_1.csv",names = ["text", "Sentiment"])
imdb.shape
imdb.Sentiment.value_counts()
imdb.head(5)
Out[9]:
text Sentiment
0 But Storm Trooper” is not even bad enough to … 0
1 & That movie was bad. 0
2 (very serious spoilers) this movie was a huge … 0
3 ) Don’t waste your time. 0
4 ) very bad performance plays Angela Bennett, a… 0

Splitting the dataset into Train and test DataSets

In [10]:
from sklearn.cross_validation import train_test_split
imdb_train,imdb_test=train_test_split(imdb,train_size=0.8)
Bag of Words Model

To perform machine learning on text documents, we need to turn the text content into numerical feature vectors.We use a Bag of words Model to make the test data into numerical Data.

A bag-of-words model based on the word counts(occurences). we use the CountVectorizer in scikit-learn to create a dictionary of words in our text.

In [11]:
from sklearn.feature_extraction.text import CountVectorizer
count = CountVectorizer()
bag_train= count.fit_transform(imdb_train['text'])
bag_train.shape
### To check the contents of vocabulory in our Bag of Words ###
print(count.vocabulary_)
{'cheerless': 387, 'like': 1357, 'sounded': 2118, 'producer': 1785, 'empty': 741, 'era': 763, 'aired': 84, 'cult': 549, 'technically': 2265, 'incendiary': 1175, 'owns': 1649, 'choice': 400, 'sells': 2004, 'success': 2205, 'begin': 229, 'achille': 46, 'prompted': 1791, 'allow': 91, 'estate': 771, 'sum': 2213, 'rescue': 1889, 'values': 2479, 'manna': 1430, 'contract': 495, 'imagine': 1159, 'wasn': 2533, 'cheesiness': 388, 'undertone': 2430, 'content': 490, 'relaxing': 1871, 'ages': 78, 'judge': 1267, 'pap': 1662, 'forces': 920, 'escalating': 766, 'balls': 208, 'violence': 2501, 'documentaries': 657, 'exceptionally': 793, 'sundays': 2215, 'terrible': 2280, 'subtle': 2201, 'going': 1010, 'contributing': 497, 'wasting': 2537, 'somewhat': 2109, 'game': 966, 'instant': 1199, 'thug': 2324, 'subject': 2197, 'pieces': 1706, 'continuously': 494, 'backdrop': 198, 'ghibili': 992, 'rare': 1829, 'cost': 512, 'actress': 55, 'tsunami': 2397, 'nut': 1588, 'jobs': 1256, 'too': 2351, 'reporter': 1886, 'received': 1852, 'aged': 77, 'annoying': 128, 'thought': 2312, 'willie': 2587, 'would': 2622, 'guests': 1034, 'intrigued': 1220, 'drag': 674, 'became': 222, 'known': 1298, 'kieslowski': 1285, 'quiet': 1818, 'desert': 604, 'paint': 1657, 'clear': 415, 'phrase': 1701, 'charles': 377, 'war': 2524, 'foolish': 915, 'material': 1446, 'th': 2284, 'violin': 2502, 'agreed': 81, 'crime': 540, 'jet': 1252, 'bit': 254, 'sole': 2101, 'tranquillity': 2368, 'achievement': 45, 'involving': 1223, 'wanted': 2521, 'black': 255, 'horrible': 1123, 'completely': 455, '90': 28, 'latest': 1319, 'fish': 894, 'washing': 2532, 'thanks': 2286, 'today': 2340, 'using': 2474, 'animals': 121, 'mouth': 1513, 'underwater': 2431, 'unoriginal': 2452, 'sounds': 2119, 'songs': 2111, 'rocks': 1922, 'nc': 1545, 'captain': 339, 'thus': 2326, 'labute': 1304, 'telly': 2273, 'instead': 1200, 'ineptly': 1189, 'ever': 778, 'parker': 1667, 'steve': 2164, 'trilogy': 2383, 'weaving': 2554, 'rochon': 1921, 'wouldnt': 2624, 'terminology': 2278, 'screened': 1982, 'question': 1816, 'operas': 1621, 'de': 569, 'plmer': 1726, 'chalkboard': 366, 'sinking': 2070, 'scare': 1964, 'iffy': 1154, 'mickey': 1475, 'considers': 483, 'schrader': 1973, 'screen': 1981, 'sound': 2117, 'getting': 991, '17': 5, 'blown': 261, 'accurately': 43, 'joy': 1265, 'talented': 2251, 'following': 913, 'childhood': 393, 'origins': 1627, 'unfunny': 2440, 'shirley': 2041, 'produce': 1783, 'sorry': 2114, 'articulated': 152, 'speed': 2131, 'beautifully': 221, 'unless': 2446, 'screenplay': 1983, 'totally': 2357, 'limitations': 1362, 'pseudo': 1799, 'skip': 2080, 'tears': 2264, 'kitchy': 1293, 'president': 1771, 'rent': 1882, 'enjoyable': 752, 'else': 728, 'otherwise': 1632, 'leap': 1330, 'actually': 57, 'feeling': 866, 'decay': 578, 'zombie': 2648, 'costumes': 514, 'configuration': 470, 'racial': 1823, 'full': 953, 'peter': 1693, 'delivers': 594, 'viewer': 2498, 'political': 1739, 'extraneous': 818, 'our': 1634, 'rendition': 1880, 'muddled': 1523, 'certainly': 365, 'took': 2352, 'details': 616, 'casting': 353, 'rita': 1914, 'stagy': 2146, 'duet': 689, 'boiling': 268, 'produced': 1784, 'ground': 1032, 'neil': 1553, 'hayao': 1064, 'marbles': 1433, 'spacey': 2126, 'we': 2550, 'turkey': 2399, 'forwarded': 932, 'juano': 1266, 'masterpiece': 1445, 'howe': 1135, 'stoic': 2170, 'fat': 855, 'fifties': 877, 'two': 2408, 'shatner': 2031, 'continuation': 492, 'ultra': 2412, 'chow': 403, 'popular': 1746, 'christopher': 405, 'establish': 770, '2005': 16, 'something': 2107, 'mishima': 1488, 'unconvincing': 2419, 'written': 2631, 'superbly': 2219, 'rubin': 1933, 'lady': 1308, 'yardley': 2633, 'commands': 445, 'badly': 201, 'scot': 1978, 'older': 1611, 'escapism': 767, 'nobody': 1569, 'im': 1155, 'cheap': 381, 'syrupy': 2243, 'elsewhere': 729, 'haven': 1061, 'connor': 477, 'dropped': 686, 'overt': 1643, 'hummh': 1141, 'columbo': 437, 'armageddon': 147, 'money': 1498, 'issues': 1232, 'deeply': 585, 'nature': 1543, 'touch': 2358, 'before': 227, 'fit': 895, 'fear': 862, 'en': 742, 'since': 2063, 'miss': 1490, 'forgotten': 927, 'steele': 2159, 'try': 2394, 'babie': 194, 'main': 1416, 'murdering': 1527, 'knightley': 1295, 're': 1838, 'actor': 53, 'these': 2300, 'stocking': 2169, 'wedding': 2555, 'splendid': 2134, 'century': 363, 'adams': 58, 'ridiculous': 1909, 'indictment': 1184, 'candle': 337, 'accurate': 42, 'suggests': 2212, 'hide': 1091, 'keith': 1279, 'normally': 1575, 'smith': 2095, 'were': 2560, 'traffic': 2366, 'challenges': 367, 'checking': 384, 'happy': 1052, 'want': 2520, 'hollander': 1112, 'hopeless': 1121, 'psychological': 1800, 'also': 97, 'plot': 1727, 'revealing': 1900, 'masculine': 1441, 'revere': 1902, 'appearance': 139, 'pans': 1660, 'due': 688, 'early': 701, 'running': 1935, 'crash': 528, '2006': 17, 'eyes': 822, 'cuts': 556, 'thing': 2303, 'cartoons': 349, 'movie': 1518, 'precisely': 1761, 'assaulted': 164, 'other': 1630, 'george': 986, 'experiences': 809, 'conflict': 472, 'player': 1718, 'unpredictable': 2455, 'fascination': 851, 'understated': 2428, '10': 0, 'the': 2288, 'captured': 340, 'knocked': 1296, 'nerves': 1554, '30': 20, 'dysfunction': 698, 'trouble': 2388, 'gripping': 1029, 'taelons': 2246, 'unmitigated': 2449, 'unbearably': 2415, 'entertained': 757, 'filmed': 880, 'used': 2471, 'critical': 543, 'pay': 1676, 'wave': 2545, 'dozen': 671, 'survivors': 2235, 'joe': 1257, 'space': 2124, 'engaging': 749, 'creates': 533, 'lilli': 1360, 'need': 1547, 'ho': 1106, 'note': 1581, 'anatomist': 113, 'guess': 1033, 'nationalities': 1540, 'monstrous': 1501, 'that': 2287, 'pulls': 1804, 'shined': 2040, 'theatres': 2291, 'sincere': 2064, 'bonus': 272, 'barren': 213, 'mostly': 1508, 'flowed': 907, 'unaccompanied': 2413, 'nonsense': 1573, 'lighting': 1356, 'seeing': 1997, 'go': 1006, 'do': 655, 'woa': 2597, 'small': 2090, 'come': 438, 'baby': 195, 'house': 1131, 'longer': 1386, 'subjects': 2198, 'characters': 375, 'example': 789, 'dreary': 680, 'contrast': 496, 'rather': 1832, 'places': 1711, 'ask': 159, 'bland': 257, 'custer': 552, 'renowned': 1881, 'relief': 1874, 'angel': 115, 'brings': 302, 'estevez': 772, 'recurring': 1857, 'guy': 1037, 'realized': 1847, 'rpger': 1931, 'll': 1376, 'selections': 2002, 'person': 1690, 'balance': 205, 'antithesis': 131, 'hair': 1042, 'master': 1443, 'james': 1240, 'personally': 1692, '1986': 13, 'cinematographers': 409, 'middle': 1476, 'turned': 2401, 'occupied': 1595, 'left': 1337, 'exceptional': 792, 'surroundings': 2234, 'jamie': 1241, 'anything': 134, 'served': 2018, 'vivid': 2509, 'anyone': 133, 'dark': 563, 'only': 1618, 'vey': 2493, 'lance': 1310, 'politics': 1741, 'idealogical': 1149, 'witty': 2596, 'plane': 1713, 'indoor': 1187, 'lestat': 1344, 'di': 620, 'keep': 1276, 'dvd': 696, 'breaking': 293, 'innocence': 1192, 'greatness': 1025, 'punishment': 1807, 'span': 2127, 'english': 750, 'gradually': 1018, 'expectations': 805, 'decisions': 582, 'as': 157, 'certain': 364, 'laughable': 1324, 'hip': 1102, 'heart': 1071, 'evil': 785, 'stanwyck': 2148, 'given': 1000, 'am': 100, 'casted': 352, 'after': 72, 'effective': 715, 'sort': 2115, 'silly': 2059, 'bop': 276, 'climax': 422, 'kevin': 1281, 'dracula': 673, 'positive': 1751, 'situation': 2077, 'terribly': 2281, 'simmering': 2060, 'awful': 189, 'knows': 1299, 'struggle': 2185, 'school': 1972, 'stories': 2172, 'accused': 44, 'appealing': 138, 'away': 187, 'makers': 1421, 'moment': 1495, 'smack': 2089, 'talk': 2253, 'practical': 1759, 'filmmaker': 882, 'documentary': 658, 'yun': 2647, 'dwight': 697, 'sweep': 2239, 'horror': 1125, 'mst3k': 1521, 'cover': 522, 'goth': 1015, 'unmatched': 2448, 'sophisticated': 2112, 'hour': 1129, 'photograph': 1699, 'punches': 1805, 'scripting': 1987, 'biographical': 252, 'under': 2420, 'explanation': 812, 'noteworthy': 1582, 'townsend': 2363, 'murdered': 1526, 'olde': 1610, 'death': 574, 'throughout': 2321, 'unrecommended': 2457, 'further': 960, 'garbage': 969, 'pitiful': 1709, 'odd': 1599, 'rips': 1912, 'preservation': 1770, 'welsh': 2558, 'tremendous': 2380, 'started': 2153, 'reviewer': 1904, 'par': 1664, '54': 23, '1949': 10, 'wholesome': 2577, 'went': 2559, 'medical': 1458, 'weak': 2551, 'lino': 1367, 'continuity': 493, 'vibe': 2494, 'former': 930, 'trooper': 2387, 'spy': 2141, 'bother': 285, 'japanese': 1242, 'disparate': 648, 'convey': 503, 'dead': 570, 'jones': 1263, 'seen': 2001, 'time': 2330, 'unrecognizable': 2456, 'greenstreet': 1027, 'edge': 710, 'latin': 1321, 'little': 1372, 'painfully': 1656, 'buffet': 311, 'intelligent': 1207, 'laselva': 1314, 'cheerfull': 386, 'her': 1084, 'scrimm': 1985, 'spoilers': 2139, 'avoided': 183, 'sink': 2069, 'move': 1514, 'vehicles': 2484, 'lines': 1366, 'repeating': 1884, 'confusing': 474, 'murder': 1525, 'corn': 509, 'experience': 808, 'directors': 637, 'marine': 1435, 'warmth': 2525, 'whatever': 2563, 'audience': 175, 'gerardo': 987, 'should': 2046, 'fascinating': 850, 'four': 934, 'worse': 2616, 'adaptation': 59, 'worthless': 2619, 'grasp': 1021, 'pleasing': 1724, 'bond': 270, 'enjoy': 751, 'unrestrained': 2459, 'body': 266, 'old': 1609, 'twice': 2404, 'abandoned': 30, 'its': 1235, 'most': 1507, 'student': 2187, 'apart': 136, 'attractive': 174, 'composition': 459, 'dull': 690, 'beautiful': 220, 'or': 1623, '12': 1, 'tolerate': 2344, 'drifting': 683, 'fascinated': 849, 'allowing': 92, 'settings': 2022, 'raw': 1836, 'loads': 1377, 'marriage': 1439, 'austen': 178, 'chimp': 398, 'hours': 1130, 'humorous': 1142, 'glasses': 1004, 'others': 1631, 'initially': 1191, 'park': 1666, 'taxidermists': 2259, 'females': 871, 'mchattie': 1451, 'uneasy': 2432, '20': 15, 'utterly': 2477, 'meaning': 1456, 'faces': 824, 'sites': 2076, 'yelps': 2637, 'redeemed': 1858, 'granted': 1019, 'fox': 935, 'self': 2003, 'portrayed': 1749, 'hitchcock': 1105, 'march': 1434, 'of': 1600, 'verbatim': 2487, 'such': 2206, 'mediocre': 1459, 'mini': 1481, 'visual': 2506, 'sibling': 2052, 'honest': 1116, 'towers': 2362, 'bat': 215, 'idiotic': 1152, 'company': 451, 'trumpeter': 2393, 'flashbacks': 900, 'fire': 892, 'season': 1992, 'wind': 2589, 'treatments': 2378, 'imdb': 1160, 'omit': 1614, 'insomniacs': 1197, 'probably': 1778, 'schultz': 1974, 'worthwhile': 2620, 'dealt': 573, 'spent': 2132, 'confirm': 471, 'famed': 839, 'detailing': 615, 'massive': 1442, 'deserved': 605, 'resume': 1897, 'along': 94, 'reviews': 1906, 'trashy': 2374, 'power': 1756, 'wife': 2582, 'living': 1375, 'fx': 962, 'movements': 1516, 'explain': 810, 'cartoon': 348, 'lived': 1373, 'having': 1063, 'timeless': 2331, 'judging': 1268, 'ticker': 2327, 'provoking': 1797, 'zombiez': 2649, 'correct': 511, 'america': 106, '50': 22, 'side': 2054, 'action': 51, 'patent': 1674, 'faster': 854, 'energetic': 747, 'ishioka': 1229, 'negulesco': 1551, 'particularly': 1671, 'later': 1318, '40': 21, 'channel': 372, 'coming': 444, 'whom': 2578, 'fine': 889, 'lane': 1312, 'lacks': 1306, 'are': 145, 'cold': 432, 'rice': 1907, 'non': 1571, 'intentions': 1209, 'youthful': 2645, 'award': 184, 'coppola': 506, 'hendrikson': 1083, 'naked': 1535, 'foreign': 922, 'evaluate': 775, 'robert': 1919, 'primary': 1777, 'indulgent': 1188, 'portraying': 1750, 'flaming': 899, 'poignant': 1733, 'terror': 2283, '13': 2, 'hernandez': 1086, 'super': 2216, 'than': 2285, 'truly': 2391, 'mystifying': 1534, 'offend': 1602, 'hell': 1077, 'la': 1303, 'absolutely': 35, 'reactions': 1839, 'judith': 1269, 'share': 2029, 'adventure': 65, 'carrell': 345, 'blue': 262, 'raver': 1835, 'turns': 2402, 'years': 2636, 'bullock': 316, 'steamboat': 2158, 'working': 2612, 'themselves': 2297, 'frances': 937, 'mature': 1448, 'courtroom': 521, 'father': 856, 'stinker': 2167, 'applauded': 141, 'your': 2643, 'oh': 1607, 'arts': 156, 'remember': 1878, 'insipid': 1196, 'charisma': 376, 'design': 608, 'anne': 126, 'francis': 938, 'duris': 695, 'dry': 687, 'nothing': 1583, 'ponyo': 1742, 'unforgettable': 2437, 'jokes': 1261, 'africa': 71, 'vomited': 2514, 'distant': 649, 'professionals': 1789, 'creative': 534, 'simply': 2062, 'inexplicable': 1190, 'heaven': 1073, 'pitch': 1708, 'mansonites': 1431, 'bombardments': 269, 'hopefully': 1120, 'believable': 234, 'evidently': 784, 'handled': 1045, 'faux': 859, 'heroes': 1088, 'falls': 835, 'jessice': 1251, 'film': 879, 'must': 1531, 'wanting': 2522, 'canada': 335, 'ford': 921, 'late': 1317, 'leaves': 1334, 'trip': 2385, 'hatred': 1059, 'occurs': 1597, 'hate': 1057, 'episodes': 762, 'negative': 1550, 'among': 109, 'said': 1943, 'doomed': 668, 'strives': 2181, 'translate': 2371, 'describe': 602, 'watched': 2540, 'realised': 1844, 'proud': 1793, 'lies': 1352, 'spacek': 2125, 'times': 2333, 'sensitivities': 2009, 'generally': 979, 'researched': 1890, 'pulling': 1803, 'music': 1529, 'tedium': 2267, 'fare': 848, 'underrated': 2426, 'miserable': 1486, 'intelligence': 1206, 'oct': 1598, 'production': 1788, 'naughty': 1544, 'fausa': 858, 'possible': 1752, 'bailey': 203, 'competent': 453, 'boyle': 289, 'titta': 2338, 'veteran': 2492, 'perhaps': 1688, 'okay': 1608, 'puzzle': 1812, 'at': 167, 'fulfilling': 952, 'watkins': 2543, 'chilly': 397, 'place': 1710, 'garfield': 971, 'and': 114, 'youtube': 2646, 'developments': 619, 'unethical': 2434, 'mouse': 1512, 'poor': 1743, 'works': 2613, 'hard': 1053, 'teenagers': 2269, 'terrific': 2282, 'watching': 2541, 'huston': 1145, 'sucked': 2207, 'nurse': 1587, 'hero': 1087, 'terms': 2279, 'macbeth': 1411, 'previous': 1775, 'thrillers': 2319, 'ive': 1237, 'strange': 2177, 'artist': 154, 'awards': 186, 'powerhouse': 1758, 'higher': 1093, 'takes': 2248, 'elaborately': 722, 'ten': 2274, 'wants': 2523, 'lee': 1336, 'thousand': 2314, 'behold': 231, 'waste': 2534, 'secondary': 1994, 'surrounding': 2233, 'buy': 321, 'delight': 589, 'huge': 1138, 'emerge': 733, 'storyline': 2175, 'howell': 1136, 'miserably': 1487, 'showcasing': 2048, 'receive': 1851, 'bordered': 277, 'ben': 239, 'resounding': 1891, 'sequel': 2012, 'brilliance': 299, 'boring': 282, 'occur': 1596, 'italian': 1234, 'none': 1572, 'underlying': 2424, 'meredith': 1471, 'half': 1043, 'bela': 233, 'clothes': 425, 'best': 242, 'row': 1929, 'proceedings': 1781, 'borrowed': 283, 'overly': 1641, 'dramatic': 677, 'touching': 2360, 'rickman': 1908, 'piece': 1705, 'buddy': 308, 'loyalty': 1405, 'created': 532, 'faultless': 857, 'palance': 1658, 'disappointing': 639, 'teeth': 2270, 'put': 1810, 'not': 1578, 'educational': 713, 'finest': 890, 'explorations': 813, 'connections': 475, 'lie': 1351, 'aye': 191, 'irritating': 1227, 'politically': 1740, 'heard': 1070, 'motion': 1510, 'jennifer': 1247, 'itself': 1236, 'mercy': 1470, 'thread': 2315, 'holes': 1111, 'easy': 704, 'colored': 435, 'often': 1606, 'drift': 682, 'provokes': 1796, 'waster': 2536, 'nervous': 1555, 'scene': 1968, 'bible': 248, 'hurt': 1144, 'rubbish': 1932, 'features': 864, 'intangibles': 1204, 'over': 1638, 'follow': 912, 'location': 1379, 'community': 450, 'situations': 2078, 'give': 999, 'edition': 712, 'earlier': 700, 'hype': 1146, 'him': 1100, 'noir': 1570, 'idiot': 1151, 'pearls': 1678, 'humour': 1143, 'version': 2489, 'crackles': 525, 'impact': 1162, 'sydney': 2241, 'cause': 357, 'errors': 765, 'trailer': 2367, 'stable': 2143, 'hellish': 1078, 'see': 1996, 'aside': 158, 'superlative': 2220, 'culture': 550, 'supposedly': 2224, 'surface': 2228, 'chemistry': 390, 'helms': 1079, 'collect': 433, 'whiny': 2571, 'church': 406, 'disgrace': 645, 'trond': 2386, 'comments': 448, 'warning': 2527, 'needlessly': 1549, 'includes': 1176, 'fair': 831, 'momentum': 1497, 'pi': 1702, 'exemplars': 800, 'words': 2609, 'appears': 140, 'friends': 945, 'sobering': 2100, 'cast': 351, 'air': 83, 'emotion': 737, 'script': 1986, 'stunning': 2191, 'always': 99, 'weariness': 2553, 'dedication': 583, 'contrived': 499, 'less': 1342, 'sacrifice': 1941, 'done': 666, 'ones': 1617, 'still': 2166, 'regrettably': 1866, 'foxx': 936, 'disliked': 647, 'boasts': 264, 'own': 1647, 'treachery': 2375, 'disappointed': 638, 'comical': 443, 'graphics': 1020, 'grace': 1017, 'atmosphere': 168, 'gonna': 1011, 'tonight': 2348, 'gave': 974, 'sven': 2238, 'kinda': 1291, 'development': 618, 'tries': 2382, 'freedom': 941, 'sister': 2071, 'obliged': 1590, 'attempting': 172, 'jay': 1244, 'surprised': 2229, 'slightest': 2083, 'strong': 2182, 'man': 1427, 'leading': 1329, 'fi': 875, 'tons': 2349, 'budget': 309, 'flawed': 903, 'pure': 1809, 'seperate': 2011, 'artistic': 155, 'crashed': 529, 'nimoy': 1566, 'guilt': 1035, 'performances': 1687, 'female': 870, 'concerning': 465, 'around': 149, 'pray': 1760, 'actresses': 56, 'study': 2189, 'sick': 2053, 'alexander': 87, 'satanic': 1954, 'show': 2047, 'valentine': 2478, 'regret': 1864, 'doing': 663, 'dollars': 664, 'called': 327, 'single': 2067, 'unbelievable': 2416, 'worth': 2618, 'errol': 764, 'seuss': 2023, 'falwell': 837, 'jean': 1246, 'opening': 1620, 'unfaithful': 2435, 'watch': 2538, 'thinking': 2306, 'exteriors': 817, 'sculpture': 1989, 'managed': 1428, 'nevsky': 1561, 'underacting': 2421, 'structure': 2184, 'shallow': 2027, 'says': 1961, 'fun': 955, 'character': 373, 'astronaut': 166, 'doubt': 669, 'loose': 1391, 'slurs': 2088, 'wide': 2580, 'rest': 1893, 'hence': 1082, 'perfected': 1684, 'southern': 2123, 'friendship': 946, 'angus': 120, 'considered': 482, 'especially': 768, 'mind': 1478, 'semi': 2005, 'class': 412, 'virus': 2504, 'barcelona': 210, 'interview': 1217, 'bully': 317, 'so': 2098, 'racism': 1824, 'closed': 424, 'players': 1719, 'slackers': 2081, 'nor': 1574, 'destroy': 614, 'god': 1008, 'defined': 587, 'crowe': 546, 'exquisite': 815, 'type': 2410, 'breeders': 294, 'ebola': 707, 'coherent': 431, 'leave': 1333, 'lifetime': 1354, 'talents': 2252, 'view': 2497, 'girl': 996, 'individual': 1186, 'lives': 1374, 'hankies': 1048, 'hope': 1119, 'incomprehensible': 1178, 'mirrormask': 1485, 'dimensional': 630, 'agree': 80, 'hear': 1069, 'conception': 463, 'bad': 200, 'taped': 2255, 'rivalry': 1915, 'universe': 2445, 'succeeded': 2204, 'subplots': 2200, 'stratus': 2178, 'concerns': 466, 'rise': 1913, 'site': 2075, 'ladies': 1307, 'fest': 873, 'does': 660, 'comfortable': 442, 'landscapes': 1311, 'laughs': 1325, 'accessible': 40, 'very': 2491, 'uplifting': 2465, 'baxendale': 217, 'usual': 2476, 'unfortunately': 2439, 'versatile': 2488, 'cliff': 421, 'wonderful': 2602, 'luv': 1410, 'pleaser': 1723, 'gotta': 1016, 'explains': 811, 'everything': 782, 'hypocrisy': 1147, 'portrayal': 1747, 'atrocity': 170, 'enjoyment': 754, 'blah': 256, 'close': 423, 'weren': 2561, 'lame': 1309, 'promote': 1790, 'inside': 1194, 'paolo': 1661, 'tale': 2249, '1928': 7, 'famous': 841, 'lead': 1328, 'out': 1635, 'get': 989, 'confidence': 469, 'sappiest': 1951, 'oscar': 1629, 'face': 823, 'choices': 401, 'strident': 2180, 'witticisms': 2595, 'embassy': 732, 'calls': 328, 'light': 1355, 'kidnapped': 1283, 'system': 2244, 'advise': 66, 'christmas': 404, 'amateurish': 101, 'bear': 219, 'heroism': 1089, 'genre': 983, 'work': 2610, 'solid': 2102, 'artiness': 153, 'teen': 2268, 'havilland': 1062, 'starts': 2154, 'celebrity': 359, 'up': 2463, 'score': 1977, 'list': 1369, 'harris': 1054, 'perfect': 1683, 'french': 943, 'cat': 354, 'punish': 1806, 'mother': 1509, 'proudly': 1794, 'crap': 527, 'rpg': 1930, 'interpretations': 1216, 'scripts': 1988, 'eccleston': 708, 'salesman': 1945, 'mean': 1454, 'presence': 1768, 'photography': 1700, 'knew': 1294, 'moments': 1496, 'brian': 296, 'mountain': 1511, 'latifa': 1320, 'brutal': 307, 'their': 2293, 'cute': 553, 'appreciate': 143, 'green': 1026, 'earth': 702, 'amazing': 104, 'heels': 1074, 'built': 315, 'angelina': 117, 'gives': 1001, 'gere': 988, 'ireland': 1225, 'national': 1539, 'important': 1164, 'ps': 1798, 'sabotages': 1939, 'played': 1717, 'there': 2299, 'interest': 1211, 'foreigner': 923, 'consequences': 479, 'embarrassing': 731, 'confuses': 473, 'crew': 539, 'narration': 1536, 'lesser': 1343, 'any': 132, 'koteas': 1300, 'owls': 1646, 'oy': 1650, 'science': 1976, 'amount': 110, 'cheaply': 382, 'tremendously': 2381, 'logic': 1382, 'ranks': 1828, 'into': 1218, 'atrocious': 169, 'dislike': 646, 'incorrectness': 1179, 'both': 284, 'long': 1385, 'goremeister': 1013, 'doctor': 656, 'twist': 2406, 'story': 2174, 'fingernails': 891, 'high': 1092, 'recently': 1854, 'savor': 1958, 'effects': 716, 'jutland': 1274, 'ackerman': 47, 'rated': 1831, 'sit': 2073, '15': 3, 'delights': 591, 'ussr': 2475, 'showed': 2049, 'write': 2627, 'happen': 1049, 'tacky': 2245, 'disturbing': 653, 'underbite': 2423, 'barely': 211, 'paper': 1663, 'supposed': 2223, 'locations': 1380, 'emily': 735, 'relationships': 1870, 'sake': 1944, 'fort': 931, 'fill': 878, 'reality': 1845, 'trumbull': 2392, 'marred': 1438, 'ways': 2548, 'call': 326, 'theater': 2289, 'scenery': 1969, 'hadn': 1041, 'excellent': 790, 'giovanni': 995, 'shut': 2051, 'halfway': 1044, 'magnificent': 1415, 'treat': 2377, 'doesn': 661, 'extraordinary': 819, 'unpleasant': 2453, 'edward': 714, 'honestly': 1117, 'began': 228, 'virtue': 2503, 'fame': 838, 'beyond': 247, 'excerpts': 794, 'riveted': 1916, 'interim': 1214, 'girlfriend': 997, 'my': 1532, 'celluloid': 360, 'redeeming': 1859, 'deadly': 571, 'heads': 1068, 'stinks': 2168, 'secondly': 1995, 'trap': 2372, 'imaginable': 1156, 'flag': 897, 'insulin': 1202, 'tension': 2276, 'admins': 61, 'me': 1453, 'tell': 2272, 'repeats': 1885, 'games': 967, 'awkwardly': 190, 'awarded': 185, 'borderlines': 278, 'bates': 216, 'horrified': 1124, 'problem': 1779, 'trash': 2373, 'slightly': 2084, 'poised': 1737, 'wilkinson': 2584, 'treasure': 2376, 'cutting': 557, 'machine': 1412, 'lange': 1313, 'hang': 1047, 'peaking': 1677, 'stay': 2156, 'jerky': 1248, 'watson': 2544, 'indescribably': 1183, 'choked': 402, 'twists': 2407, 'conrad': 478, 'until': 2461, 'original': 1625, 'much': 1522, 'found': 933, 'summary': 2214, 'amazed': 103, 'outward': 1637, 'sequels': 2013, 'just': 1272, 'did': 626, 'parts': 1672, 'from': 948, 'saw': 1959, 'debbie': 576, 'plays': 1721, 'win': 2588, 'funny': 959, 'shenanigans': 2038, 'warn': 2526, 'glad': 1003, 'when': 2565, 'depicted': 597, 'pointillistic': 1735, 'constant': 486, 'post': 1754, 'facial': 825, 'minute': 1483, 'blandly': 258, 'shame': 2028, 'credible': 536, 'feature': 863, 'americans': 108, 'dads': 558, 'quinn': 1819, 'aimless': 82, 'coastal': 430, 'people': 1681, 'monumental': 1502, 'impressed': 1166, 'falling': 834, 'past': 1673, 'mention': 1469, 'originality': 1626, 'merit': 1472, 'maybe': 1450, 'cool': 505, 'premise': 1767, 'few': 874, 'what': 2562, 'wondered': 2601, 'why': 2579, 'aurvåg': 177, 'trying': 2395, 'drive': 684, 'hugo': 1139, 'pacing': 1652, 'will': 2585, 'storm': 2173, 'forget': 924, 'uniqueness': 2443, 'ceases': 358, 'pedestal': 1680, 'repeated': 1883, 'limited': 1363, 'enter': 756, 'predictable': 1763, 'stephen': 2161, 'product': 1787, 'credit': 537, 'audio': 176, 'murky': 1528, 'camerawork': 332, 'free': 940, 'pile': 1707, 'actions': 52, 'decipher': 581, 'hes': 1090, 'prelude': 1766, 'where': 2567, 'dance': 560, 'make': 1419, 'views': 2500, 'writer': 2628, 'off': 1601, 'ann': 125, 'malta': 1426, 'imagination': 1157, 'producers': 1786, 'family': 840, 'asleep': 160, 'girolamo': 998, 'buildings': 314, 'whites': 2573, 'surf': 2227, 'cutest': 554, 'although': 98, 'things': 2304, 'wrap': 2626, 'back': 197, 'puppets': 1808, 'soap': 2099, 'plug': 1728, 'teaches': 2262, 'period': 1689, 'tony': 2350, 'thriller': 2318, 'oriented': 1624, 'enjoyed': 753, 'dancing': 561, 'far': 846, 'quicker': 1817, 'loewenhielm': 1381, 'flick': 905, 'presents': 1769, 'unneeded': 2451, 'delivered': 592, 'feel': 865, 'sad': 1942, 'above': 33, 'putting': 1811, 'overwrought': 1644, 'suspense': 2236, 'stylized': 2195, 'possibly': 1753, 'got': 1014, 'lousy': 1398, 'existential': 802, 'crafted': 526, 'related': 1867, 'every': 779, 'journey': 1264, 'sex': 2025, 'upa': 2464, 'frontier': 950, 'delightful': 590, 'grates': 1022, 'box': 288, 'nostalgia': 1577, 'reenactments': 1860, 'low': 1404, 'young': 2641, 'generates': 980, 'makes': 1422, 'number': 1585, 'sublimely': 2199, 'latter': 1322, 'card': 342, 'poler': 1738, 'awesome': 188, 'joke': 1260, 'shed': 2033, 'let': 1345, 'by': 322, 'tone': 2347, 'suffering': 2210, 'least': 1332, 'member': 1462, 'poetry': 1732, 'comedy': 440, 'helps': 1081, 'explosion': 814, 'kid': 1282, 'home': 1115, 'garage': 968, 'extremely': 820, 'chodorov': 399, 'shots': 2045, 'state': 2155, 'can': 334, 'remotely': 1879, 'night': 1565, 'cable': 323, 'insane': 1193, 'kris': 1301, 'defensemen': 586, 'studio': 2188, 'genuine': 985, 'no': 1568, 'front': 949, 'prejudice': 1765, 'directed': 632, 'day': 567, 'aren': 146, 'appalling': 137, 'stereotypically': 2163, 'singing': 2066, 'function': 956, 'heist': 1075, 'reasonable': 1850, 'poorly': 1744, 'brother': 306, 'california': 325, 'spoiler': 2138, 'bakery': 204, 'deserving': 607, 'obvious': 1592, 'direction': 634, 'designer': 610, 'debated': 575, 'animation': 123, 'basic': 214, 'eye': 821, 'ball': 207, 'variation': 2482, 'houses': 1132, 'changes': 370, 'about': 32, 'lord': 1392, 'cruise': 548, 'jealousy': 1245, 'partaking': 1669, 'right': 1910, 'through': 2320, 'moves': 1517, 'expansive': 803, 'stuart': 2186, 'impulse': 1172, 'contained': 488, 'condescends': 468, 'superb': 2217, 'improved': 1169, 'depends': 596, 'brilliantly': 301, 'legal': 1338, 'didn': 627, 'someone': 2106, 'melodrama': 1460, 'create': 531, 'sing': 2065, 'teddy': 2266, 'finds': 888, 'sweet': 2240, 'quite': 1820, 'littered': 1371, 'memories': 1465, 'colours': 436, 'scamp': 1963, 'universal': 2444, 'bright': 298, 'miyazaki': 1492, 'contains': 489, 'sandra': 1950, 'central': 362, 'down': 670, 'decidely': 580, 'exchange': 796, 'sloppy': 2086, 'interesting': 1213, 'costs': 513, 'derivative': 601, 'daughter': 565, 'acting': 50, 'maker': 1420, 'academy': 38, 'another': 129, 'horrendous': 1122, 'expected': 806, 'great': 1023, 'essence': 769, 'video': 2496, 'riot': 1911, 'deal': 572, 'toons': 2353, 'accolades': 41, 'theme': 2295, 'say': 1960, 'serious': 2016, 'wise': 2590, 'kathy': 1275, 'sour': 2121, 'frankly': 939, 'short': 2042, 'them': 2294, 'implausible': 1163, 'anita': 124, 'cailles': 324, 'extant': 816, 'becomes': 225, 'particular': 1670, 'male': 1424, 'borders': 279, 'cross': 544, 'well': 2557, 'more': 1504, 'slavic': 2082, 'vomit': 2513, 'generic': 981, 'co': 427, 'hasn': 1056, 'significant': 2057, 'exactly': 787, 'fumbling': 954, 'dogs': 662, 'task': 2257, 'form': 928, 'attempt': 171, 'cinematography': 410, 'way': 2546, 'easily': 703, 'duper': 693, 'farce': 847, 'starring': 2151, 'wonder': 2600, 'addition': 60, 'european': 774, 'groove': 1030, 'life': 1353, 'los': 1393, 'control': 500, 'understand': 2427, 'straw': 2179, 'attention': 173, 'morons': 1506, 'brat': 292, 'lost': 1395, 'perabo': 1682, 'walked': 2518, 'chick': 391, 'bunch': 318, 'predictably': 1764, 'stayed': 2157, 'tying': 2409, 'flaws': 904, 'camera': 331, '25': 19, 'drago': 675, 'she': 2032, 'released': 1873, 'shows': 2050, 'care': 344, 'between': 245, 'chills': 396, 'real': 1843, 'adrift': 64, 'rolls': 1925, 'everywhere': 783, 'pretentious': 1772, 'meanings': 1457, 'complex': 456, 'be': 218, 'follows': 914, 'core': 508, 'take': 2247, 'considerable': 481, 'author': 180, 'he': 1066, 'betty': 244, 'convention': 502, 'bipolarity': 253, 'picture': 1703, 'funniest': 958, 'water': 2542, 'mesmerising': 1473, 'gross': 1031, 'emotions': 739, 'sat': 1953, 'kind': 1290, 'thick': 2302, 'top': 2354, 'embarrassed': 730, 'empowerment': 740, 'etc': 773, 'finally': 886, 'interacting': 1210, 'computer': 461, 'cowardice': 523, 'native': 1541, 'sam': 1946, 'scale': 1962, 'kids': 1284, 'jim': 1253, 'ironically': 1226, 'entertaining': 758, 'opened': 1619, 'network': 1558, 'jaclyn': 1239, 'depressing': 599, 'masterful': 1444, 'leni': 1341, 'taylor': 2260, 'indie': 1185, 'beginning': 230, 'including': 1177, 'vocal': 2510, 'jerry': 1249, 'holding': 1109, 'neighbour': 1552, 'how': 1133, 'making': 1423, 'editing': 711, 'art': 151, 'ass': 162, 'may': 1449, 'classic': 413, 'woven': 2625, 'marion': 1436, 'slow': 2087, 'though': 2311, 'crazy': 530, 'freeman': 942, 'wish': 2591, 'new': 1562, 'keira': 1278, 'narrative': 1537, 'hilt': 1099, 'general': 978, 'incredibly': 1181, 'aerial': 67, 'fan': 842, 'seemed': 1999, 'shell': 2036, 'riz': 1917, 'describes': 603, 'weight': 2556, 'clever': 417, '1948': 9, 'unmoving': 2450, 'bored': 281, 'depicts': 598, 'point': 1734, 'three': 2316, 'consolations': 485, 'difference': 628, 'buffalo': 310, 'band': 209, 'quaid': 1813, 'martin': 1440, 'worked': 2611, 'moving': 1520, 'if': 1153, 'ready': 1842, 'worry': 2615, 'improvement': 1170, 'parents': 1665, 'retreat': 1898, 'relationship': 1869, 'horse': 1126, 'room': 1926, 'builders': 313, 'imaginative': 1158, 'lid': 1350, 'ebay': 706, 'caught': 356, 'entire': 759, 'plenty': 1725, 'fans': 843, 'menacing': 1468, 'true': 2390, 'charming': 379, 'soundtrack': 2120, 'torture': 2355, 'males': 1425, 'expecting': 807, 'against': 75, 'don': 665, 'ventura': 2485, 'pictures': 1704, 'tract': 2365, 'savalas': 1955, 'towards': 2361, 'cotton': 515, 'actors': 54, 'brevity': 295, 'films': 884, 'shelves': 2037, 'clichés': 419, 'loved': 1401, 'wouldn': 2623, 'looking': 1389, 'whoever': 2575, 'gibberish': 994, 'lets': 1346, 'commercial': 449, 'plus': 1729, 'sucks': 2208, 'creature': 535, 'impossible': 1165, 'contributory': 498, 'overacting': 1639, 'speaking': 2129, 'hold': 1108, 'lilt': 1361, 'galley': 965, 'unfolds': 2436, 'check': 383, 'trek': 2379, 'clearly': 416, 'subversive': 2202, 'isn': 1230, '1971': 11, 'wonderfully': 2603, 'improvisation': 1171, 'vandiver': 2481, 'theatre': 2290, 'subverting': 2203, 'apt': 144, 'hollow': 1113, 'despised': 612, 'world': 2614, 'focus': 910, 'stupid': 2192, 'waiting': 2515, 'better': 243, 'brilliant': 300, 'credits': 538, 'lovable': 1399, 'melville': 1461, 'complete': 454, 'gloriously': 1005, 'whole': 2576, 'dumb': 691, 'pm': 1730, 'outlandish': 1636, 'act': 48, 'john': 1258, 'solving': 2104, 'babbling': 193, 'net': 1556, 'wooden': 2607, 'one': 1616, 'everyone': 781, 'stage': 2144, 'whenever': 2566, 'in': 1173, 'good': 1012, 'process': 1782, 'mess': 1474, 'childrens': 395, 'represents': 1887, 'recommended': 1856, 'dribble': 681, 'have': 1060, 'ability': 31, 'inspiring': 1198, 'natural': 1542, 'holds': 1110, 'diabetic': 621, 'setting': 2021, 'plain': 1712, 'bell': 236, 'directorial': 636, 'return': 1899, 'facing': 826, 'members': 1463, 'teacher': 2261, 'tom': 2345, 'highest': 1094, 'existent': 801, 'cars': 347, 'biggest': 250, 'concentrate': 462, 'raging': 1825, 'seems': 2000, 'job': 1255, 'use': 2470, 'fall': 833, 'pack': 1653, 'leaving': 1335, 'babysitting': 196, 'efforts': 717, 'sidelined': 2055, 'themes': 2296, 'imitation': 1161, 'blush': 263, 'style': 2194, 'composed': 458, '18th': 6, 'curtain': 551, 'episode': 761, 'nasty': 1538, 'second': 1993, 'bohemian': 267, 'direct': 631, 'whether': 2568, 'kirk': 1292, 'sentiment': 2010, 'evokes': 786, 'florida': 906, 'suggest': 2211, 'speak': 2128, 'ray': 1837, 'fairly': 832, 'excellently': 791, 'instruments': 1201, 'continually': 491, 'volcano': 2512, 'wartime': 2528, 'amazingly': 105, 'timing': 2334, 'is': 1228, '15pm': 4, 'hackneyed': 1039, 'plants': 1715, 'boobs': 273, 'recent': 1853, 'expect': 804, 'lion': 1368, 'sharing': 2030, 'same': 1947, 'track': 2364, 'roles': 1924, 'angeles': 116, 'flat': 901, 'final': 885, 'total': 2356, 'waitress': 2516, 'distorted': 651, 'role': 1923, 'women': 2598, 'underappreciated': 2422, 'each': 699, 'dependant': 595, 'hosting': 1127, 'underneath': 2425, 'spock': 2135, 'younger': 2642, 'kept': 1280, 'clients': 420, 'liked': 1358, 'ortolani': 1628, 'alert': 86, 'unwatchable': 2462, 'five': 896, 'cords': 507, 'paid': 1654, 'while': 2570, 'favourite': 861, 'genius': 982, 'powerful': 1757, 'uncalled': 2417, 'monotonous': 1500, 'title': 2337, 'transcend': 2369, 'ue': 2411, 'directing': 633, 'flynn': 909, 'unemployed': 2433, 'sub': 2196, 'washed': 2531, 'planned': 1714, 'idea': 1148, 'some': 2105, 'cinematic': 408, 'feet': 868, 'uptight': 2468, 'sisters': 2072, 'voice': 2511, 'morgan': 1505, 'revenge': 1901, 'tightly': 2329, 'build': 312, 'brainsucking': 291, 'alongside': 95, 'fact': 827, 'coal': 429, 'distinction': 650, 'egotism': 718, 'cancan': 336, 'being': 232, 'pointless': 1736, 'minutes': 1484, 'seamlessly': 1991, 'carry': 346, 'course': 519, 'constructed': 487, 'lots': 1397, 'local': 1378, 'upper': 2466, 'many': 1432, 'memorable': 1464, 'been': 226, 'cox': 524, 'club': 426, 'average': 181, 'occasionally': 1594, 'american': 107, 'angles': 119, 'goalies': 1007, 'moved': 1515, 'star': 2149, 'perfectly': 1685, 'smoothly': 2096, 'sci': 1975, 'predict': 1762, 'end': 743, 'worst': 2617, 'format': 929, 'run': 1934, 'keeps': 1277, 'uses': 2473, 'amaze': 102, 'disappointment': 640, 'gem': 977, 'dialog': 622, 'remaining': 1875, 'tickets': 2328, 'fast': 853, 'dumbest': 692, 'already': 96, 'respecting': 1892, 'save': 1957, 'struck': 2183, 'stowe': 2176, 'road': 1918, 'here': 1085, 'stand': 2147, 'you': 2640, 'result': 1895, 'joins': 1259, 'sphere': 2133, 'nearly': 1546, 'killing': 1288, 'gake': 964, 'performance': 1686, 'eloquently': 727, 'look': 1387, 'brooding': 305, 'intoning': 1219, 'comment': 446, 'allison': 90, 'connery': 476, 'tolerable': 2343, 'ryans': 1938, 'characterisation': 374, 'everybody': 780, 'fantasy': 845, 'iq': 1224, 'sea': 1990, 'ups': 2467, 'inappropriate': 1174, 'shepard': 2039, 'overall': 1640, 'judo': 1270, 'london': 1383, 'energy': 748, 'then': 2298, 'cheesy': 389, 'couldn': 517, 'widmark': 2581, 'prone': 1792, 'part': 1668, 'comedic': 439, 'writing': 2630, 'hoot': 1118, 'provided': 1795, 'human': 1140, 'moral': 1503, 'tv': 2403, 'conceptually': 464, 'yeah': 2634, 'help': 1080, 'centers': 361, 'stupidity': 2193, 'couple': 518, 'wrong': 2632, 'catchy': 355, 'references': 1861, 'discomfort': 643, 'highly': 1096, 'boogeyman': 274, 'user': 2472, 'kristoffersen': 1302, 'surely': 2226, 'find': 887, 'never': 1559, 'brain': 290, 'involved': 1222, 'chance': 368, 'set': 2019, 'next': 1563, 'cruel': 547, 'gung': 1036, 'legendary': 1339, 'owed': 1645, 'crowd': 545, 'favorite': 860, 'complexity': 457, 'timely': 2332, 'himself': 1101, 'aspect': 161, 'surprisingly': 2232, 'seriously': 2017, 'readers': 1840, 'throwback': 2322, 'painful': 1655, 'damian': 559, 'ferry': 872, 'barney': 212, 'wasted': 2535, 'personalities': 1691, 'tardis': 2256, 'executed': 799, 'skilled': 2079, 'conclusion': 467, 'kill': 1286, 'amusing': 111, 'vision': 2505, 'jessica': 1250, 'sand': 1949, 'verbal': 2486, 'last': 1315, 'learn': 1331, 'loves': 1403, 'bought': 287, 'insult': 1203, 'roth': 1927, 'animated': 122, 'scared': 1965, 'jack': 1238, 'ending': 745, 'portrayals': 1748, 'highlights': 1095, 'issue': 1231, 'director': 635, 'sorrentino': 2113, 'below': 238, 'unremarkable': 2458, 'told': 2342, 'disbelief': 642, 'ole': 1612, 'tear': 2263, 'fields': 876, 'wong': 2604, 'round': 1928, 'visually': 2507, 'billy': 251, 'happened': 1050, 'jonah': 1262, 'unbearable': 2414, 'talent': 2250, 'warts': 2529, 'regardless': 1863, 'future': 961, 'filmiing': 881, 'sympathetic': 2242, 'robotic': 1920, 'giving': 1002, 'length': 1340, 'stewart': 2165, 'fodder': 911, 'sitcoms': 2074, 'whatsoever': 2564, 'belmondo': 237, 'ought': 1633, 'pull': 1802, '1947': 8, 'without': 2594, 'freshness': 944, 'afternoon': 73, 'qualities': 1814, 'pathetic': 1675, 'nine': 1567, 'ago': 79, 'uninteresting': 2442, 'business': 319, 'it': 1233, 'senior': 2006, 'lucy': 1407, 'forgot': 926, 'paced': 1651, 'schizophrenic': 1971, 'yes': 2638, 'believe': 235, 'anyway': 135, 'jason': 1243, 'abroad': 34, 'wall': 2519, 'beware': 246, 'abysmal': 37, 'olivia': 1613, 'jimmy': 1254, 'gaudi': 973, 'sometimes': 2108, 'owned': 1648, 'change': 369, 'thoroughly': 2308, 'filmography': 883, 'thumper': 2325, 'deep': 584, 'several': 2024, 'captures': 341, 'really': 1848, 'remake': 1876, 'lust': 1409, 'level': 1347, 'notable': 1579, 'entirely': 760, 'starlet': 2150, 'surprising': 2231, 'unconditional': 2418, 'restrained': 1894, 'hearts': 1072, 'to': 2339, 'hayworth': 1065, 'taste': 2258, 'came': 329, 'emoting': 736, 'laugh': 1323, 'lasting': 1316, 'however': 1137, 've': 2483, 'know': 1297, 'insincere': 1195, 'array': 150, 'fantastic': 844, 'integration': 1205, 'armand': 148, 'akin': 85, 'turn': 2400, 'eighth': 719, 'mad': 1413, 'phenomenal': 1696, 'spoiled': 2137, 'daughters': 566, 'become': 224, 'angle': 118, 'greatest': 1024, 'bertolucci': 241, 'rate': 1830, 'changing': 371, 'develop': 617, 'lot': 1396, 'bag': 202, 'savant': 1956, 'series': 2015, 'mark': 1437, 'meanders': 1455, 'debut': 577, 'assante': 163, 'with': 2592, 'reason': 1849, 'comes': 441, 'assistant': 165, 'ruthless': 1936, 'mclaglen': 1452, 'pandering': 1659, 'compelling': 452, 'suspension': 2237, 'year': 2635, 'flakes': 898, 'events': 777, 'offer': 1604, 'movies': 1519, 'happiness': 1051, 'rating': 1833, 'love': 1400, 'stagey': 2145, 'step': 2160, 'weaker': 2552, 'incredible': 1180, 'now': 1584, 'lucio': 1406, 'gas': 972, 'tomorrow': 2346, 'dialogs': 623, 'gets': 990, 'has': 1055, 'inventive': 1221, '80s': 26, 'popcorn': 1745, 'falsely': 836, 'diaper': 625, 'dreams': 679, 'affected': 69, 'unsatisfactory': 2460, 'solidifying': 2103, 'decent': 579, 'broke': 304, 'thoughts': 2313, 'wild': 2583, 'austere': 179, 'element': 725, 'line': 1364, 'during': 694, 'shortlist': 2043, 'factory': 828, 'obsessed': 1591, 'child': 392, 'depth': 600, 'ends': 746, 'mindblowing': 1479, 'elderly': 723, 'hated': 1058, 'together': 2341, 'exciting': 797, 'snow': 2097, 'intense': 1208, 'goes': 1009, 'shelf': 2035, 'anthony': 130, 'smiling': 2094, 'song': 2110, 'critic': 542, 'crisp': 541, 'which': 2569, 'days': 568, 'dream': 678, 'stars': 2152, 'emotionally': 738, 'dr': 672, 'court': 520, 'layers': 1326, 'thrown': 2323, 'québec': 1821, 'viewing': 2499, '1980': 12, 'smells': 2092, 'commented': 447, 'phony': 1698, 'recommend': 1855, 'primal': 1776, 'desperation': 611, 'brief': 297, 'supernatural': 2221, 'cannot': 338, 'blare': 259, 'reflected': 1862, 'acted': 49, 'discovering': 644, 'sack': 1940, 'problems': 1780, 'think': 2305, 'made': 1414, 'sarcophage': 1952, 'cinema': 407, 'accents': 39, 'south': 2122, 'an': 112, 'looked': 1388, 'poet': 1731, 'squibs': 2142, 'senses': 2008, 'spot': 2140, 'hot': 1128, 'gay': 975, 'cutie': 555, 'identify': 1150, 'guys': 1038, 'frightening': 947, 'flaw': 902, 'play': 1716, 'children': 394, 'misplace': 1489, 'menace': 1467, 'convincing': 504, 'remarkable': 1877, 'regrettable': 1865, 'tiny': 2335, 'different': 629, 'soul': 2116, 'donlevy': 667, 'watchable': 2539, 'killings': 1289, 'even': 776, 'sure': 2225, 'impressive': 1168, 'trysts': 2396, 'mollusk': 1494, 'bothersome': 286, 'bob': 265, 'scares': 1966, 'seem': 1998, 'alike': 88, 'us': 2469, 'cameo': 330, 'history': 1104, 'screamy': 1980, 'wayne': 2547, 'designed': 609, 'peculiarity': 1679, '20th': 18, 'special': 2130, 'touches': 2359, 'footage': 916, 'philippa': 1697, 'fails': 830, 'obviously': 1593, 'bonding': 271, 'opinion': 1622, 'playing': 1720, 'had': 1040, 'vitally': 2508, 'linear': 1365, 'could': 516, 'spoil': 2136, 'this': 2307, 'white': 2572, '8pm': 27, 'nuts': 1589, 'writers': 2629, 'scary': 1967, 'tuneful': 2398, 'interplay': 1215, 'fashioned': 852, 'ended': 744, 'reviewers': 1905, 'campy': 333, 'yet': 2639, 'june': 1271, 'age': 76, 'lugosi': 1408, 'numbers': 1586, 'silent': 2058, 'darren': 564, 'race': 1822, 'emilio': 734, 'classical': 414, 'likes': 1359, 'screenwriter': 1984, 'unpredictability': 2454, 'howdy': 1134, 'won': 2599, 'sinister': 2068, 'twirling': 2405, 'hill': 1098, 'lazy': 1327, 'within': 2593, 'truck': 2389, 'pleased': 1722, 'cheekbones': 385, 'anniversary': 127, 'admitted': 62, 'mainly': 1417, '70': 24, 'circumstances': 411, 'definitely': 588, 'nice': 1564, 'muppets': 1524, 'tired': 2336, 'enough': 755, 'walk': 2517, 'diving': 654, 'woo': 2606, 'controversy': 501, 'shakespear': 2026, 'for': 917, 'excessively': 795, 'shot': 2044, 'results': 1896, 'william': 2586, 'consistent': 484, 'hockey': 1107, 'excuses': 798, 'major': 1418, 'ratings': 1834, 'matrix': 1447, 'smart': 2091, 'elias': 726, 'avoid': 182, 'tensions': 2277, 'wb': 2549, 'balanced': 206, 'dialogue': 624, 'impression': 1167, 'theatrical': 2292, 'coach': 428, 'versus': 2490, 'require': 1888, 'monolog': 1499, 'netflix': 1557, 'garbo': 970, 'unlockable': 2447, 'felt': 869, 'drooling': 685, 'helen': 1076, 'bore': 280, 'supporting': 2222, 'almost': 93, 'offers': 1605, 'because': 223, 'tender': 2275, 'gabriel': 963, 'word': 2608, 'who': 2574, 'lacked': 1305, 'relation': 1868, 'lovely': 1402, '80': 25, 'those': 2310, 'book': 275, 'gently': 984, 'killer': 1287, 'pg': 1694, 'geek': 976, 'myself': 1533, 'scream': 1979, 'backed': 199, 'fundamental': 957, 'offensive': 1603, 'forced': 919, 'fulci': 951, 'wont': 2605, 'worthy': 2621, '95': 29, 'baaaaaad': 192, 'cardboard': 343, 'review': 1903, 'manages': 1429, 'miner': 1480, 'scenes': 1970, 'thorsen': 2309, 'literally': 1370, 'chase': 380, 'looks': 1390, 'northern': 1576, 'charm': 378, 'all': 89, 'sequence': 2014, 'trinity': 2384, 'again': 74, 'head': 1067, 'his': 1103, 'missed': 1491, 'flying': 908, '1998': 14, 'deserves': 606, 'might': 1477, 'grew': 1028, 'indeed': 1182, 'eating': 705, 'affleck': 70, 'quality': 1815, 'losing': 1394, 'superbad': 2218, 'distressed': 652, 'random': 1826, 'failed': 829, 'on': 1615, 'drama': 676, 'release': 1872, 'either': 721, 'feelings': 867, 'applause': 142, 'sample': 1948, 'range': 1827, 'sense': 2007, 'corny': 510, 'yourself': 2644, 'dodge': 659, 'slimy': 2085, 'overs': 1642, 'broad': 303, 'bendingly': 240, 'but': 320, 'sign': 2056, 'elegant': 724, 'exaggerating': 788, 'levels': 1348, 'loneliness': 1384, 'reading': 1841, 'lewis': 1349, 'abstruse': 36, 'cases': 350, 'ryan': 1937, 'ed': 709, 'justice': 1273, 'simplifying': 2061, 'musician': 1530, 'psychotic': 1801, 'phantasm': 1695, 'giallo': 993, 'first': 893, 'blood': 260, 'notch': 1580, 'hilarious': 1097, 'vampire': 2480, 'hollywood': 1114, 'stereotypes': 2162, 'tanks': 2254, 'unintentionally': 2441, 'nevertheless': 1560, 'eiko': 720, 'potted': 1755, 'minor': 1482, 'smile': 2093, 'consider': 480, 'needed': 1548, 'adorable': 63, 'pretty': 1774, 'cliche': 418, 'compromise': 460, 'pretext': 1773, 'interested': 1212, 'delivering': 593, 'victor': 2495, 'they': 2301, 'dangerous': 562, 'surprises': 2230, 'television': 2271, 'sheer': 2034, 'suffered': 2209, 'force': 918, 'big': 249, 'disaster': 641, 'unfortunate': 2438, 'forgettable': 925, 'understatement': 2429, 'realize': 1846, 'aesthetically': 68, 'was': 2530, 'collective': 434, 'thrilled': 2317, 'hands': 1046, 'despite': 613, 'store': 2171, 'transfers': 2370, 'sets': 2020, 'stuff': 2190, 'modest': 1493, 'memorized': 1466}
Calculating the occurence of Words in Bag of Words Model

From the vocabulory count,we see the frequency (occurences) of different words. This count is feature for the analysis.
tf–idf (Term Frequency times Inverse Document Frequency) is a form for counting the occurences. tf is used for counting every word and idf is used for downscaling the weights for unwanted words.This is done by using tfidf transformer funtion in sklearn.

In [12]:
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
## using tf_idf for training Data
imdb_train_tfidf = tfidf_transformer.fit_transform(bag_train)
imdb_train_tfidf.shape
## using tf_idf for test Data
docs_test =imdb_test['text'] 
bag_test = count.transform(docs_test)
imdb_test_tfidf = tfidf_transformer.transform(bag_test)
imdb_test_tfidf.shape
Out[12]:
(200, 2650)

Model Building and Prediction

—–using Logistic Regression,SVM,Naive-Bayes Classification

Logistic Regression for classification

Building a logistic Regression model on training data

In [13]:
from sklearn.linear_model import LogisticRegression
logistic= LogisticRegression()
logistic.fit(imdb_train_tfidf,imdb_train['Sentiment'])
Out[13]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

Predicting the Logistic Model on test Data

In [17]:
test_log_pred = logistic.predict(imdb_test_tfidf)

checking the accuracy of logistic model

In [ ]:
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm_log = confusion_matrix(imdb_test[['Sentiment']],test_log_pred)
print(cm_log)
total_log=sum(sum(cm_log))
#####from confusion matrix calculate accuracy
accuracy_log=(cm_log[0,0]+cm_log[1,1])/total_log
accuracy_log

Bagging on Logistic Regression

In [18]:
from sklearn.ensemble import BaggingClassifier
Log_bagging=BaggingClassifier(base_estimator= LogisticRegression(), n_estimators=10,
                              max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, 
                              oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)

Log_bagging.fit(imdb_train_tfidf,imdb_train['Sentiment'])
bagging_pred_test=Log_bagging.predict(imdb_test_tfidf)

from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm_Log_bagging = confusion_matrix(imdb_test[['Sentiment']],bagging_pred_test)
print(cm_Log_bagging)
total_Log_bagging=sum(sum(cm_Log_bagging))
#####from confusion matrix calculate accuracy
accuracy_Log_bagging=(cm_Log_bagging[0,0]+cm_Log_bagging[1,1])/total_Log_bagging
accuracy_Log_bagging
[[70 38]
 [ 8 84]]
Out[18]:
0.77000000000000002
SVM Classification
In [ ]:
from sklearn import svm
svm_clf= svm.SVC(kernel='linear')

predicting svm on test data

In [ ]:
test_svm_pred=svm_clf.predict(imdb_test_tfidf)

checking accuracy

In [22]:
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm_svm = confusion_matrix(imdb_test[['Sentiment']],test_svm_pred)
print(cm_svm)
total_svm=sum(sum(cm_svm))
#####from confusion matrix calculate accuracy
accuracy_svm=(cm_svm[0,0]+cm_svm[1,1])/total_svm
accuracy_svm
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-6b7cee10898e> in <module>()
      1 from sklearn.metrics import confusion_matrix###for using confusion matrix###
----> 2 cm_svm = confusion_matrix(imdb_test[['Sentiment']],test_svm_pred)
      3 print(cm_svm)
      4 total_svm=sum(sum(cm_svm))
      5 #####from confusion matrix calculate accuracy

NameError: name 'test_svm_pred' is not defined

Bagginig on SVM

In [23]:
from sklearn.ensemble import BaggingClassifier
svm_bagging=BaggingClassifier(base_estimator= svm.SVC(kernel='linear'), n_estimators=5,
                              max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False,
                              oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)

svm_bagging.fit(imdb_train_tfidf,imdb_train['Sentiment'])
bagging_pred_test=svm_bagging.predict(imdb_test_tfidf)

from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm_svm_bagging = confusion_matrix(imdb_test[['Sentiment']],bagging_pred_test)
print(cm_svm_bagging)
total_svm_bagging=sum(sum(cm_svm_bagging))
#####from confusion matrix calculate accuracy
accuracy_svm_bagging=(cm_svm_bagging[0,0]+cm_svm_bagging[1,1])/total_svm_bagging
accuracy_svm_bagging
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-8e7c8ea2351f> in <module>()
      1 from sklearn.ensemble import BaggingClassifier
----> 2 svm_bagging=BaggingClassifier(base_estimator= svm.SVC(kernel='linear'), n_estimators=5, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)
      3 
      4 svm_bagging.fit(imdb_train_tfidf,imdb_train['Sentiment'])
      5 bagging_pred_test=svm_bagging.predict(imdb_test_tfidf)

NameError: name 'svm' is not defined
In [ ]:

Naive-Bayes Classification

Building a naive Bayes Model on training Data

In [ ]:
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB(fit_prior=False).fit(imdb_train_tfidf,imdb_train['Sentiment'])

Predicting the naive bayes model on the test Data

In [ ]:
test_bayes_pred = clf.predict(imdb_test_tfidf)

Checking Accuracy

In [ ]:
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm_bayes = confusion_matrix(imdb_test[['Sentiment']],test_bayes_pred)
print(cm_bayes)
total_bayes=sum(sum(cm_bayes))
#####from confusion matrix calculate accuracy
accuracy_bayes=(cm_bayes[0,0]+cm_bayes[1,1])/total_bayes
accuracy_bayes

Bagging on Naive Bayes Classification

In [19]:
from sklearn.ensemble import BaggingClassifier
Bayes_Bagging=BaggingClassifier(base_estimator= MultinomialNB(fit_prior=False), n_estimators=10, 
                                max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, 
                                oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)

Bayes_Bagging.fit(imdb_train_tfidf,imdb_train['Sentiment'])
bagging_pred_test=Bayes_Bagging.predict(imdb_test_tfidf)

from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm_bayes_bagging = confusion_matrix(imdb_test[['Sentiment']],bagging_pred_test)
print(cm_bayes_bagging)
total_bayes_bagging=sum(sum(cm_bayes_bagging))
#####from confusion matrix calculate accuracy
accuracy_bayes_bagging=(cm_bayes_bagging[0,0]+cm_bayes_bagging[1,1])/total_bayes_bagging
accuracy_bayes_bagging
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-7b115a852b0a> in <module>()
      1 from sklearn.ensemble import BaggingClassifier
----> 2 Bayes_Bagging=BaggingClassifier(base_estimator= MultinomialNB(fit_prior=False), n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=1, random_state=None, verbose=0)
      3 
      4 Bayes_Bagging.fit(imdb_train_tfidf,imdb_train['Sentiment'])
      5 bagging_pred_test=Bayes_Bagging.predict(imdb_test_tfidf)

NameError: name 'MultinomialNB' is not defined

DV Analytics

DV Data & Analytics is a leading data science training and consulting firm, led by industry experts. We are aiming to train and prepare resources to acquire the most in-demand data science job opportunities in India and abroad.

Bangalore Center

DV Data & Analytics Bangalore Private Limited
#52, 2nd Floor:
Malleshpalya Maruthinagar Bengaluru.
Bangalore 560075
India
(+91) 9019 030 033 (+91) 8095 881 188
Email: info@dvanalyticsmds.com

Bhubneshwar Center

DV Data & Analytics Private Limited Bhubaneswar
Plot No A/7 :
Adjacent to Maharaja Cine Complex, Bhoinagar, Acharya Vihar
Bhubaneswar 751022
(+91) 8095 881 188 (+91) 8249 430 414
Email: info@dvanalyticsmds.com

top
© 2020. All Rights Reserved.